[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] More network tests with xenoprofile this time
William and Andrew Sorry for the delay in replying. I have been traveling and did not have email access while away. > > Hi Renato, > > The article was an interesting application of the xenoprof. > > It seem like it would be useful to also have data collected using the > cycle counts (GLOBAL_POWER_EVENTS on P4) to give some indication of > areas with high overhead operations. There may be some areas with few > very expensive instructions. Calling attention to those areas > would help > improve performance. Yes, you are right. We have in fact collected GLOBAL_POWER_EVENTS, but did not include in the paper due to space limitations. I have attached oprofile results for our ttcp like benchmark(receive side) for the case with 1 NIC (both cycle counts and instructions). As you can see there are some functions with very expensive instructions. For example "hypercall" add anly 0.6% additional instructions but these consume 3.0% more clock cycles; "unmask_IO_APIC_irq" add 0.25% instructions but consume 5% more cycles. It would be interesting to investigate these and see if we can optimize them. > > The increases in I-TLB and D-TLB events for Xen-domain0 shown > in Figure > 4 are surprising. Why would the working sets be that much larger for > Xen-domain0 than regular linux, particularly for code? Is > there an table > similar to table 3 for I-TLB event sample locations? > Yes, we were also surprised by these results. I have attached the complete I-TLB and D_TLB oprofile results (for the 3 NICs case) (note these are on a different type of machine than the other 2 attached oprofile results) Aravind instrumented the macros in xen/include/asm-x86/flushtlb.h. I am not sure if he used PERFCOUNTER_CPU or if he included his own instrumentation. With this instrumentation we did not observe any TLB flush, but I suppose we could have missed TLB flushes that did not use the macro... I think it would be a good idea to investigate this further to confirm that TLB flushes are not happening. One additional observation is that in general the number of misses in NOT proportional to the size of the working set. It is possible that a small increase in the working set significantly increase the number of misses. Therefore it is possible that the increase in TLB misses is in fact due to a larger working set. But, I agree we have to investigate this further to get confirmation ... > Can't the VMM use a 4-MB page and the Xen-domain0 kernel shouldn't be > that much larger than regular linux kernel? > How were TLB flushes ruled > out as a cause? Could the PERFCOUNTER_CPU counters in perfc_defn.h be > used to see if the VMM is doing a lot of TLB flushes? > > Also how much of I-TLB and D-TLB events are due to the P4 > architecture? > Are the results so dramatic for a Athlon or AMD64 processors? > We did not try this on any other architecture. Right now xenoprof is only supported on P4. Support for other architectures is not on top of our priority list. Regards Renato > -Will > > Attachment:
time_func_xen0.prof Attachment:
instr_func_xen0.prof Attachment:
dtlb_3nic.prof Attachment:
itlb_3nic.prof _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |