[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-ia64-devel] [PATCH][RFC] performance tuning TAKE 5
Hi. I posted performance tuning patches. It must be evaluated for its effect before commit. (or improve them based on the analysis or discard the patches.) I analyzed the patch behaviour based on the performance counter to see if the patches work as I intended. The figures at the end of this mail are from performance counter. Reset the performance counter with 'P' keyhandler, execute hdparm or wget 3 times and then get the counter with 'p' keyhander I choice hdparm and wget just because they are handy. - p2m exposure to a domain effectively eliminates p2m conversion hypercalls. But m2p conversion hypercall remains. m2p conversion is done by in_swiotlb_aperture() of swiotlb to determine whether a given dma address is of swiotlb or not. So it might be meaning full to reduce m2p conversion hypercalls overhead. However this isn't easy as p2m case because of virtual frame table. Virtual frame table allocate m2p table sparsely. Simply exposing the m2p table to domain incurs page fault. It results in ring crossing and xen p2m traverse. On the other hand page fault handler on the m2p table area is very effectively implemented by hand-coded assembler. So it is doubtfull of the effect of exposing the m2p table. Another way is introducing fast hypercall like Linux/IA64's fsys mode. A domian jumps into the page xen provies. it issues epc and lookup xen's m2p table and returns without context save. I think this idea can be applied to fastpath of hyperprivop. It might be able to reduce hyperprivop overhead. - tlb tracking effectively elimiates full vTLB flush. The number of calls to domain_flush_vtlb_all() is very small and domain_flush_vtlb_track_entry() is called many times. - preregistering skbuff of netback and page of netfront effectively elimiates full vTLB flush. I thought that some kind of deferring page free was needed. But seeing this figures deferred page freeing might not be necessary, but just batching is sufficient. The implementation of preregistering netfront page is very hacky. It should be discussed to adopt or not. - deferred page freeing. Currently it is actually batched tlb flush. it doesn't defer page freeing. Seeing number of dqueue_flush_and_free_tlb_track_entries and dfree_queue_tlb_track_entry, it works. But I'm not sure it actually reduces overhead. Probably analysis based on profiling is needed. - tlb flush clock This tries to reduce flushing vhpt and mTLB when vcpu context switch and tlb entry flush. It works well when vcpu context swtich, but it doesn't work well when tlb entry flush. - per vcpu vhpt From those performance count figures, I can't interpret the effect of per vcpu vhpt. When VP model is adopted, xen VHPT size is reduced from 16MB to 64KB to reduce the over head of domain_flush_vtlb_all(). 64KB was chosen just because it was the smallest size Xen/IA64 accepted. (I tried 32KB, VHPT smallest size, but it didn't boot. I didn't track it down.) Appropriate VHPT size must be determined at some point. We might want to increase its size, and it affects the overhead of domain_flush_vtlb_all() and vcpu context switch. One factor we should take account of is scalability. When the number of physical cpu is large (e.g. 64 or 128), but the number of vcpus of a domain is not so large (e.g. 4 or 8), per vcpu vhpt reduces domain_flush_vtlb_all(). Without per vcpu vhpt, something like tracking dirty pcpu is necessary to get similar overhead reduction. *items rough description dom0vp_phystomach number of p2m conversion hypercall dom0vp_machtophys number of m2p conversion hypercall create_grant_host_mapping number of grant table page mapping destroy_grant_host_mapping number of grant table page unmapping steal_page number of grant table page transfer domain_flush_vtlb_all number of calls domain_flush_vtlb_all() This function flushes all vhpt, and mTLB of all vcpu of a domain domain_flush_vtlb_track_entry number of calls domain_flush_vtlb_track_entry This function flushes only one vhpt entry and mTLB of dirtied vcpu of a domain tlb_track_sar number of page zapping from a domain tlb_track_sar_not_tracked the virtual address of a page isn't tracked. whole vTLB flush is needed. i.e. domain_flush_vtlb_all() is called. tlb_track_sar_not_found tlb is tracked, but no tlb is inserted So when page is zapped, no tlb flush is needed. tlb_track_sar_found tlb is tracked and tlb insert of one virtual address is issued. So vTLB flush of this virtual address is needed. i.e. domain_flush_vtlb_track_entry() is tlb_track_sar_many tlb is tracked and tlb insert of more than one virtual address is issued. So whole vTLB flush is necessary. i.e. domain_flush_vtlb_all() is called. dom0vp_tlb_track_page number of tlb track register dom0vp_tlb_untrack_page number of tlb track unregister dqueue_flush_and_free_tlb_track_entries number of batched flush of tlb track entry dfree_queue_tlb_track_entry number of tlb track entry queueing tlbflush_clock_cswitch_purge number of tlb flush when context switch tlbflush_clock_cswitch_skip number of skipping tlb flush when context switch due to tlbflush clock tlbflush_clock_tlb_track_purge number of tlb flush when tlb track entry is flushed tlbflush_clock_tlb_track_skip number of skipping tlb flush when tlb track entry is flushed. *hdparm -t /dev/hda6 x3 (XEN) dom0vp_phystomach TOTAL[ 0] (XEN) dom0vp_machtophys TOTAL[ 502] (XEN) create_grant_host_mapping TOTAL[ 131359] (XEN) destroy_grant_host_mapping TOTAL[ 131359] (XEN) steal_page_refcount TOTAL[ 429] (XEN) steal_page TOTAL[ 430] (XEN) domain_flush_vtlb_all TOTAL[ 3] (XEN) domain_flush_vtlb_track_entry TOTAL[ 851] (XEN) domain_page_flush_and_put TOTAL[ 132243] (XEN) tlb_track_sar TOTAL[ 132245] (XEN) tlb_track_sar_not_tracked TOTAL[ 3] (XEN) tlb_track_sar_not_found TOTAL[ 131350] (XEN) tlb_track_sar_found TOTAL[ 893] (XEN) tlb_track_sar_many TOTAL[ 0] (XEN) dom0vp_tlb_track_page TOTAL[ 2] (XEN) dom0vp_tlb_untrack_page TOTAL[ 2] (XEN) dqueue_flush_and_free_tlb_track_entries TOTAL[ 469] (XEN) dfree_queue_tlb_track_entry TOTAL[ 896] (XEN) tlbflush_clock_cswitch_purge TOTAL[ 11821] (XEN) tlbflush_clock_cswitch_skip TOTAL[ 1533] (XEN) tlbflush_clock_tlb_track_purge TOTAL[ 902] (XEN) tlbflush_clock_tlb_track_skip TOTAL[ 0] *wget kernel source x3 (XEN) dom0vp_phystomach TOTAL[ 0] (XEN) dom0vp_machtophys TOTAL[ 163589] (XEN) create_grant_host_mapping TOTAL[ 57390] (XEN) destroy_grant_host_mapping TOTAL[ 57390] (XEN) steal_page_refcount TOTAL[ 86153] (XEN) steal_page TOTAL[ 86153] (XEN) domain_flush_vtlb_all TOTAL[ 19] (XEN) domain_flush_vtlb_track_entry TOTAL[ 210230] (XEN) domain_page_flush_and_put TOTAL[ 229734] (XEN) tlb_track_sar TOTAL[ 229738] (XEN) tlb_track_sar_not_tracked TOTAL[ 70] (XEN) tlb_track_sar_not_found TOTAL[ 19321] (XEN) tlb_track_sar_found TOTAL[ 210348] (XEN) tlb_track_sar_many TOTAL[ 0] (XEN) dom0vp_tlb_track_page TOTAL[ 15] (XEN) dom0vp_tlb_untrack_page TOTAL[ 9] (XEN) dqueue_flush_and_free_tlb_track_entries TOTAL[ 125912] (XEN) dfree_queue_tlb_track_entry TOTAL[ 210350] (XEN) tlbflush_clock_cswitch_purge TOTAL[ 8408] (XEN) tlbflush_clock_cswitch_skip TOTAL[ 1186] (XEN) tlbflush_clock_tlb_track_purge TOTAL[ 210284] (XEN) tlbflush_clock_tlb_track_skip TOTAL[ 0] -- yamahata _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-ia64-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |