[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-ia64-devel] [PATCH][RFC] performance tuning TAKE 5



Hi. I posted performance tuning patches.
It must be evaluated for its effect before commit.
(or improve them based on the analysis or discard the patches.)
I analyzed the patch behaviour based on the performance counter
to see if the patches work as I intended.

The figures at the end of this mail are from performance counter.
Reset the performance counter with 'P' keyhandler, execute
hdparm or wget 3 times and then get the counter with 'p' keyhander 
I choice hdparm and wget just because they are handy.

- p2m exposure to a domain effectively eliminates p2m conversion hypercalls.
  But m2p conversion hypercall remains.
  m2p conversion is done by in_swiotlb_aperture() of swiotlb to determine
  whether a given dma address is of swiotlb or not.
  So it might be meaning full to reduce m2p conversion hypercalls overhead.
  However this isn't easy as p2m case because of virtual frame table.
  Virtual frame table allocate m2p table sparsely. Simply exposing
  the m2p table to domain incurs page fault.
  It results in ring crossing and xen p2m traverse.
  On the other hand page fault handler on the m2p table area is
  very effectively implemented by hand-coded assembler.
  So it is doubtfull of the effect of exposing the m2p table.
  
  Another way is introducing fast hypercall like Linux/IA64's fsys mode.
  A domian jumps into the page xen provies.
  it issues epc and lookup xen's m2p table and returns without context save.
  I think this idea can be applied to fastpath of hyperprivop.
  It might be able to reduce hyperprivop overhead.


- tlb tracking effectively elimiates full vTLB flush. 
  The number of calls to domain_flush_vtlb_all() is very small and
  domain_flush_vtlb_track_entry() is called many times.


- preregistering skbuff of netback and page of netfront effectively
  elimiates full vTLB flush.
  I thought that some kind of deferring page free was needed.
  But seeing this figures deferred page freeing might not be necessary, but
  just batching is sufficient.

  The implementation of preregistering netfront page is very hacky.
  It should be discussed to adopt or not.


- deferred page freeing.
  Currently it is actually batched tlb flush. it doesn't defer page freeing.
  Seeing number of dqueue_flush_and_free_tlb_track_entries and 
  dfree_queue_tlb_track_entry, it works. But I'm not sure it actually
  reduces overhead.
  Probably analysis based on profiling is needed.


- tlb flush clock
  This tries to reduce flushing vhpt and mTLB when vcpu context switch
  and tlb entry flush.
  It works well when vcpu context swtich, but it doesn't work well
  when tlb entry flush.


- per vcpu vhpt
  From those performance count figures, I can't interpret the effect of
  per vcpu vhpt.
  When VP model is adopted, xen VHPT size is reduced from 16MB to 64KB
  to reduce the over head of domain_flush_vtlb_all().
  64KB was chosen just because it was the smallest size Xen/IA64 accepted.
  (I tried 32KB, VHPT smallest size, but it didn't boot.
   I didn't track it down.)

  Appropriate VHPT size must be determined at some point.
  We might want to increase its size, and it affects the overhead of
  domain_flush_vtlb_all() and vcpu context switch.
  One factor we should take account of is scalability.
  When the number of physical cpu is large (e.g. 64 or 128), but
  the number of vcpus of a domain is not so large (e.g. 4 or 8),
  per vcpu vhpt reduces domain_flush_vtlb_all().
  Without per vcpu vhpt, something like tracking dirty pcpu is necessary
  to get similar overhead reduction.


*items                            rough description
dom0vp_phystomach                 number of p2m conversion hypercall
dom0vp_machtophys                 number of m2p conversion hypercall

create_grant_host_mapping         number of grant table page mapping
destroy_grant_host_mapping        number of grant table page unmapping
steal_page                        number of grant table page transfer

domain_flush_vtlb_all             number of calls domain_flush_vtlb_all()
                                  This function flushes all vhpt, and mTLB of
                                  all vcpu of a domain
domain_flush_vtlb_track_entry     number of calls domain_flush_vtlb_track_entry
                                  This function flushes only one vhpt entry
                                  and mTLB of dirtied vcpu of a domain

tlb_track_sar                     number of page zapping from a domain
tlb_track_sar_not_tracked         the virtual address of a page isn't tracked.
                                  whole vTLB flush is needed.
                                  i.e. domain_flush_vtlb_all() is called.
tlb_track_sar_not_found           tlb is tracked, but no tlb is inserted
                                  So when page is zapped, no tlb flush is
                                  needed.
tlb_track_sar_found               tlb is tracked and tlb insert of
                                  one virtual address is issued.
                                  So vTLB flush of this virtual address is
                                  needed.
                                  i.e. domain_flush_vtlb_track_entry() is
tlb_track_sar_many                tlb is tracked and tlb insert of
                                  more than one virtual address is issued.
                                  So whole vTLB flush is necessary.
                                  i.e. domain_flush_vtlb_all() is called.
dom0vp_tlb_track_page             number of tlb track register
dom0vp_tlb_untrack_page           number of tlb track unregister

dqueue_flush_and_free_tlb_track_entries
                                  number of batched flush of tlb track entry
dfree_queue_tlb_track_entry       number of tlb track entry queueing

tlbflush_clock_cswitch_purge      number of tlb flush when context switch
tlbflush_clock_cswitch_skip       number of skipping tlb flush when context
                                  switch due to tlbflush clock
tlbflush_clock_tlb_track_purge    number of tlb flush when tlb track entry
                                  is flushed
tlbflush_clock_tlb_track_skip     number of skipping tlb flush when tlb track
                                  entry is flushed.


*hdparm -t /dev/hda6 x3
(XEN) dom0vp_phystomach                 TOTAL[         0]
(XEN) dom0vp_machtophys                 TOTAL[       502]

(XEN) create_grant_host_mapping         TOTAL[    131359]
(XEN) destroy_grant_host_mapping        TOTAL[    131359]
(XEN) steal_page_refcount               TOTAL[       429]
(XEN) steal_page                        TOTAL[       430]

(XEN) domain_flush_vtlb_all             TOTAL[         3]
(XEN) domain_flush_vtlb_track_entry     TOTAL[       851]
(XEN) domain_page_flush_and_put         TOTAL[    132243]
(XEN) tlb_track_sar                     TOTAL[    132245]
(XEN) tlb_track_sar_not_tracked         TOTAL[         3]
(XEN) tlb_track_sar_not_found           TOTAL[    131350]
(XEN) tlb_track_sar_found               TOTAL[       893]
(XEN) tlb_track_sar_many                TOTAL[         0]
(XEN) dom0vp_tlb_track_page             TOTAL[         2]
(XEN) dom0vp_tlb_untrack_page           TOTAL[         2]


(XEN) dqueue_flush_and_free_tlb_track_entries
                                        TOTAL[       469]
(XEN) dfree_queue_tlb_track_entry       TOTAL[       896]

(XEN) tlbflush_clock_cswitch_purge      TOTAL[     11821]
(XEN) tlbflush_clock_cswitch_skip       TOTAL[      1533]
(XEN) tlbflush_clock_tlb_track_purge    TOTAL[       902]
(XEN) tlbflush_clock_tlb_track_skip     TOTAL[         0]


*wget kernel source x3
(XEN) dom0vp_phystomach                 TOTAL[         0]
(XEN) dom0vp_machtophys                 TOTAL[    163589]

(XEN) create_grant_host_mapping         TOTAL[     57390]
(XEN) destroy_grant_host_mapping        TOTAL[     57390]
(XEN) steal_page_refcount               TOTAL[     86153]
(XEN) steal_page                        TOTAL[     86153]

(XEN) domain_flush_vtlb_all             TOTAL[        19] 
(XEN) domain_flush_vtlb_track_entry     TOTAL[    210230]
(XEN) domain_page_flush_and_put         TOTAL[    229734]
(XEN) tlb_track_sar                     TOTAL[    229738]
(XEN) tlb_track_sar_not_tracked         TOTAL[        70]
(XEN) tlb_track_sar_not_found           TOTAL[     19321]
(XEN) tlb_track_sar_found               TOTAL[    210348]
(XEN) tlb_track_sar_many                TOTAL[         0]
(XEN) dom0vp_tlb_track_page             TOTAL[        15]
(XEN) dom0vp_tlb_untrack_page           TOTAL[         9]

(XEN) dqueue_flush_and_free_tlb_track_entries
                                        TOTAL[    125912]
(XEN) dfree_queue_tlb_track_entry       TOTAL[    210350]


(XEN) tlbflush_clock_cswitch_purge      TOTAL[      8408]
(XEN) tlbflush_clock_cswitch_skip       TOTAL[      1186]
(XEN) tlbflush_clock_tlb_track_purge    TOTAL[    210284]
(XEN) tlbflush_clock_tlb_track_skip     TOTAL[         0]

-- 
yamahata

_______________________________________________
Xen-ia64-devel mailing list
Xen-ia64-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-ia64-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.