[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] issues with PLE and/or scheduler.
On 21/12/2011 10:34, "Zhang, Xiantao" <xiantao.zhang@xxxxxxxxx> wrote: > diff -r 381ab77db71a xen/arch/x86/hvm/vpt.c > --- a/xen/arch/x86/hvm/vpt.c Mon Apr 18 10:10:02 2011 +0100 > +++ b/xen/arch/x86/hvm/vpt.c Thu Dec 22 05:54:54 2011 +0800 > @@ -129,7 +129,7 @@ > if ( missed_ticks <= 0 ) > return; > > - missed_ticks = missed_ticks / (s_time_t) pt->period + 1; > + missed_ticks = missed_ticks / (s_time_t) pt->period; > if ( mode_is(pt->vcpu->domain, no_missed_ticks_pending) ) > pt->do_not_freeze = !pt->pending_intr_nr; > else > > Anyone can explain the above "plus one" logic ? why assume at least one tick > is missed in pt_process_missed_ticks ? missed_ticks = now - pt->scheduled pt->scheduled was deadline for next tick. Hence the number of missed ticks is the total number of period that have passed since pt->scheduled, plus one. If we had not missed at least one tick, we would return early, from the if-statement at the top of your patch fragment, above. What's the guest timer_mode? If there was at least one missed tick I would have expected a timer interrupt to get injected straight away. Except perhaps for timer_mode=2=no_missed_ticks_pending -- I don't understand that timer mode, and perhaps there could be a bad interaction with its specific interrupt-holdoff logic. -- Keir > In the guest kernel, ioapic's check_timer logic is used to determine how to > set IRQ0, and it uses mdelay to delay 10 ticks totally. If kernel can receive > 4+ ticks during the delay, kernel deems IRQ0 is routed correctly through > ioapic. > Unfortunately, mdelay is implemented as a tight pause loop, when PLE is > enabled, the tight pause loop will trigger PLE vmexit. In the PLE vmexit > handler, scheduler yields the CPU, but the yield operation triggers guest's > time save/restore logic, > eventually pt_process_missed_ticks gets called. Once pt_process_missed_ticks > is called, pt->scheduled is plused by one pt->period due to the above "plus > one" logic. > By default, ple_window is 4096, so each 4096 cycles in guest's mdelay > triggers one PLE vmexit, and each vmexit delays the vpt timer by one > pt->period, so the vpt timer maybe never be fired during the guest's delay. > This is why jiffies is not increased during the 10-tick mdelay. > > Thanks! > Xiantao > > > > > > -----Original Message----- > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Shan, Haitao > Sent: Wednesday, December 21, 2011 9:28 AM > To: Konrad Rzeszutek Wilk; xen-devel@xxxxxxxxxxxxxxxxxxx; > konrad.wilk@xxxxxxxxxx; George.Dunlap@xxxxxxxxxxxxx; keir@xxxxxxx; > andrew.thomas@xxxxxxxxxx > Subject: Re: [Xen-devel] issues with PLE and/or scheduler. > > We have reproduced your problem locally and are looking into this issue. It > seems "PLE with timer mode 2" will trigger the issue. We can post our findings > as soon as possible. > > Shan Haitao > >> -----Original Message----- >> From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel- >> bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of Konrad Rzeszutek Wilk >> Sent: Wednesday, December 21, 2011 4:42 AM >> To: xen-devel@xxxxxxxxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx; >> George.Dunlap@xxxxxxxxxxxxx; keir@xxxxxxx; andrew.thomas@xxxxxxxxxx >> Subject: Re: [Xen-devel] issues with PLE and/or scheduler. >> >> On Tue, Dec 20, 2011 at 04:41:07PM -0400, Konrad Rzeszutek Wilk wrote: >>> Hey folks, >>> >>> I am sending this on behalf of Andrew since our internal email >>> system is dropping all xen-devel mailing lists :-( >> >> <hits his head> And I forgot to CC andrew on it. Added here. >>> >>> Anyhow: >>> >>> This is with xen-4.1-testing cs 23201:1c89f7d29fbb and using the >>> default "credit" scheduler. >>> >>> I've run into an interesting issue with HVM guests which make use of >>> Pause Loop Exiting (ie. on westmere systems; and also on romley >>> systems): after yielding the cpu, guests don't seem to receive >>> timer interrupts correctly.. >>> >>> Some background: for historical reasons (ie old templates) we boot >>> OL/RHEL guests with the following settings: >>> >>> kernel parameters: clock=pit nohpet nopmtimer >>> vm.cfg: timer_mode = 2 >>> >>> With PLE enabled, 2.6.32 guests will crash early on with: >>> ..MP-BIOS bug: 8254 timer not connected to IO-APIC # a few lines >>> omitted.. >>> Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot >>> with apic=debug >>> >>> While 2.6.18-238 (ie OL/RHEL5u6) will fail to find the timer, but >>> continue and lock up in the serial line initialization. >>> >>> ..MP-BIOS bug: 8254 timer not connected to IO-APIC # continues >>> until lock up here: >>> Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing >>> enabled >>> >>> Instrumenting the 2.6.32 code (ie timer_irq_works()) shows that >>> jiffies isn't advancing (or only 1 or 2 ticks are being received, >>> which is insufficient for "working"). This is on a "quiet" system >>> with > no >> other activity. >>> So, even though the guest has voluntarily yielded the cpu (through >>> PLE), I would still expect it to receive every clock tick (even with >>> timer_mode=2) as there is no other work to do on the system. >>> >>> Disabling PLE allows both 2.6.18 and 2.6.32 guests to boot.. [As an >>> aside, so does setting ple_gap to 41 (ie prior to >>> 21355:727ccaaa6cce) >>> -- the perf counters show no exits happening, so this is equivalent >>> to disabling PLE.] >>> >>> I'm hoping someone who knows the scheduler well will be able to >>> quickly decide whether this is a bug or a feature... >>> >>> Andrew >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxxxxxxxx >> http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |