[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] million cycle interrupt
>From: Dan Magenheimer [mailto:dan.magenheimer@xxxxxxxxxx] >Sent: 2009年4月14日 10:34 > >> handler, why not take one more step to measure every handler > >Great idea! I added a max_cycles field to irq_desc_t >and check/update it at every interrupt in do_IRQ, >then print max_cycles in "xm debug-key i", >including a "max max". > >I'm not entirely sure how to interpret the output >from dump_irqs() but the only two IRQ's with >type==PCI-MSI have a "large" >max (450044 cycles and 730972 cycles). The third is >Vec240, an IO-APIC-edge interrupt that maxes at 1047500 >cycles. No other interrupt has a max exceeding 10000. > >Here's the relevant output. How do I map this to >something meaningful? One immediate trick is to print handler address for problematic vector, and then search its name in your dumpped Xen symbol file. Thanks, Kevin > >(XEN) Vec 49 IRQ -1: type=PCI-MSI status=00000010 >max_cycles=450044 in-flight=0 domain-list=0:254(----), > >(XEN) Vec208 IRQ -1: type=PCI-MSI status=00000010 >max_cycles=730972 in-flight=0 domain-list=0:255(----), Above two look like owned by dom0. > >(XEN) Vec240 IRQ 0: type=IO-APIC-edge status=00000000 >max_cycles=1047500 mapped, unbound looks like PIT, but not quite sure. > >(XEN) max_max_cycles = 1047500 > > > > >> -----Original Message----- >> From: Tian, Kevin [mailto:kevin.tian@xxxxxxxxx] >> Sent: Monday, April 13, 2009 5:19 PM >> To: Dan Magenheimer; Keir Fraser; Xen-Devel (E-mail) >> Subject: RE: [Xen-devel] million cycle interrupt >> >> >> >From: Dan Magenheimer >> >Sent: 2009年4月14日 5:15 >> > >> >> You can instrument irq_enter() and irq_exit() to read TSC >> > >> >Rather than do this generically and ensure I get all the macros >> >correct (e.g. per_cpu, nesting) I manually instrumented three >> >likely suspect irq_enter/exit pairs, two in do_IRQ() and one >> >in smp_call_function(). ALL of them show an issue with max >> >readings in the 300K-1M range... with smp_call_function showing >> >the lowest max and the second in do_IRQ (the non-guest one) >> >showing readings over 1M (and the guest one at about 800K). >> >> Since you already reach this step around calling actual action's >> handler, why not take one more step to measure every handler >> (serial, apic, vtd, ...)? You can first simply print which handlers >> are registered or invoked on your platform. If only one handler >> is experienced with abnormal high latency, it's possibly one >> specific point. Or else you can suspect on some common code >> shared by all handlers, or ... as Keir said, it could be SMM. :-) >> >> Thanks, >> Kevin >> >> > >> >Interestingly, I get no readings at all over 60K when I >> >recompile with max_phys_cpus=4 (and with nosmp) on my >> >quad-core-by-two-thread machine. This is versus several >> >readings over 60K nearly every second when max_phys_cpus=8. >> > >> >> Otherwise who knows, it could even be system management mode >> > >> >I suppose measuring irq_enter/exist pairs still don't rule >> >this out. But the "large" interrupts don't seem to happen >> >(at least not nearly as frequently) with fewer physical >> >processors enabled, so sys mgmt mode seems unlikely. >> > >> >Anyway, still a probable problem, still mostly a mystery >> >as to what is actually happening. And, repeat, this has >> >nothing to do with tmem... I'm just observing it using >> >tmem as a convenient measurement tool. >> > >> >> -----Original Message----- >> >> From: Keir Fraser [mailto:keir.fraser@xxxxxxxxxxxxx] >> >> Sent: Monday, April 13, 2009 2:24 AM >> >> To: Dan Magenheimer; Xen-Devel (E-mail) >> >> Subject: Re: [Xen-devel] million cycle interrupt >> >> >> >> >> >> On 12/04/2009 21:16, "Dan Magenheimer" >> >> <dan.magenheimer@xxxxxxxxxx> wrote: >> >> >> >> > Is a million cycles in an interrupt handler bad? Any idea what >> >> > might be consuming this? The evidence might imply more cpus >> >> > means longer interrupt, which bodes poorly for larger machines. >> >> > I tried disabling the timer rendezvous code (not positive I >> >> > was successful), but still got large measurements, and >> >> > eventually the machine froze up (but not before I observed >> >> > the stime skew climbing quickly to the millisecond-plus >> >> > range). >> >> >> >> You can instrument irq_enter() and irq_exit() to read TSC and >> >> find out the >> >> distribution of irq handling times for interruptions that Xen >> >> knows about. >> >> Otherwise who knows, it could even be system management >> mode on that >> >> particular box. >> >> >> >> -- Keir >> >> >> >> >> >> >> > >> >_______________________________________________ >> >Xen-devel mailing list >> >Xen-devel@xxxxxxxxxxxxxxxxxxx >> >http://lists.xensource.com/xen-devel >> > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |