|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Question about high CPU load during iperf ethernet testing
On Wed, 24 Sep 2014, Iurii Konovalenko wrote:
> Hi, Stefano!
> Thank you for your reply!
>
> On Tue, Sep 23, 2014 at 7:41 PM, Stefano Stabellini
> <stefano.stabellini@xxxxxxxxxxxxx> wrote:
> > On Mon, 22 Sep 2014, Iurii Konovalenko wrote:
> >> Hello, all!
> >>
> >> I am running iperf ethernet tests on DRA7XX_EVM board (OMAP5).
> >> Xen version is 4.4.
> >> I run only Linux (kernel 3.8) as Dom0, no other active domains (For clear
> >> tests results I decided not to start DomU).
> >> iperf server is started on host, iperf client is started on board with
> >> command line "iperf -c 192.168.2.10 -w 256k -m
> >> -f M -d -t 60".
> >
> > Just to double check: you are running the iperf test in Dom0, correct?
>
> Yes, iperf is running in Dom0.
>
> >> During test I studied CPU load with top tool on Dom0, and saw, that one
> >> VCPU is totally loaded, spending about 50% in
> >> software IRQs, and 50% in system.
> >> Running the same test on clear Linux without Xen, I saw that CPU load is
> >> about 2-4%.
> >>
> >> I decided to debug a bit, so I used "({register uint64_t _r; asm
> >> volatile("mrrc " "p15, 0, %0, %H0, c14" ";" : "=r"
> >> (_r)); _r; })" command to read timer counter before and after operations I
> >> want to test.
> >>
> >> In such way I've found, that most time of CPU is spent in functions
> >> enable_irq/disable_irq_nosync and
> >> spin_lock_irqsave/spin_unlock_irqrestore (mostly in "mrs %0, cpsr @
> >> arch_local_irq_save"/"msr cpsr_c, %0 @
> >> local_irq_restore"). When running without Xen it should not take so much
> >> time.
> >
> > There is nothing Xen specific in the Linux ARM implementation of
> > spin_lock_irqsave/spin_unlock_irqrestore and
> > enable_irq/disable_irq_nosync.
> >
>
> That is strange, because my explorations show a lot of time is spent
> there, for example in spin_unlock_irqrestore (mostly in mrs
> instuction) about 20%, when running in Dom0.
Unless you are doing something wrong in your measurements, if you really
narrowed it down to one instruction then I would try to do the same on a
different SoC of another vendor to see if it is actually an hardware issue.
> >> So, could anyone explain me some questions:
> >> 1. Is it normal behaviour?
> >
> > No, it is not normal.
> > Assuming that you assign all the memory to Dom0 and as many vcpu as
> > physical cpus on your platform then you should get the same numbers as
> > native.
>
> OK, so I might do something wrong.
>
> >> 2. Does hypervisor trap cpsr register? I suppose, that hypervisor trap
> >> access to cpsr register, that leads to
> >> additional overhead, but I can't find place in sources where it happens.
> >
> > We don't trap cpsr.
>
> It is strange, because it was only one my assumption, where time can be spent.
> So could you please advise where to go to understand the reason of
> such high VCPU load?
I don't know. When you say that arch_local_irq_save is the one taking
all the time, do you actually have something like:
time1 = read CNTVCT;
arch_local_irq_save();
time2 = read CNTVCT;
printk(time2-time1);
in your code?
> Best regards.
>
> Iurii Konovalenko | Senior Software Engineer
> GlobalLogic
>
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |