[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC V4 0/5] kvm : Paravirt-spinlock support for KVM guests
On 01/18/2012 12:06 AM, Raghavendra K T wrote: On 01/17/2012 11:09 PM, Alexander Graf wrote: [...] A. pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n B. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = n C. pre-3.2.0 + Jeremy's above patches with CONFIG_PARAVIRT_SPINLOCKS = y D. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = n E. pre-3.2.0 + Jeremy's above patches + V5 patches with CONFIG_PARAVIRT_SPINLOCKS = y [...] Maybe it'd be a good idea to create a small in-kernel microbenchmark with a couple threads that take spinlocks, then do work for a specified number of cycles, then release them again and start anew. At the end of it, we can check how long the whole thing took for n runs. That would enable us to measure the worst case scenario.It was a quick test. two iteration of kernbench (=6runs) and had ensured cache is cleared. echo "1" > /proc/sys/vm/drop_caches ccache -C. Yes may be I can run test as you mentioned.. Sorry for late reply. Was trying to do more performance analysis. Measured the worst case scenario with a spinlock stress driver [ attached below ]. I think S1 (below) is what you were looking for: 2 types of scenarios: S1. lock() increment counter. unlock() S2: do_somework() lock() do_conditional_work() /* this is to give variable spinlock hold time */ unlock() Setup: Machine : IBM xSeries with Intel(R) Xeon(R) x5570 2.93GHz CPU with 8 core , 64GB RAM, 16 online cpus. The below results are taken across total 18 Runs of insmod spinlock_thread.ko nr_spinlock_threads=4 loop_count=4000000 Results: scenario S1: plain counter ========================== total Mega cycles taken for completion (std) A. 12343.833333 (1254.664021) B. 12817.111111 (917.791606) C. 13426.555556 (844.882978) %improvement w.r.t BASE -8.77 scenario S2: counter with variable work inside lock + do_work_outside_lock ========================================================================= A. 25077.888889 (1349.471703) B. 24906.777778 (1447.853874) C. 21287.000000 (2731.643644) %improvement w.r.t BASE 15.12So it seems we have worst case overhead of around 8%. But we see improvement of at-least 15% once when little more time is spent in critical section. Guest Run ============ case A case B %improvement case C %improvement 166.999 (15.7613) 161.876 (14.4874) 3.06768 161.24 (12.6497) 3.44852Is this the same machine? Why is the guest 3x slower?Yes non - ple machine but with all 16 cpus online. 3x slower you meant case A is slower (pre-3.2.0 with CONFIG_PARAVIRT_SPINLOCKS = n) ? Got your point, There were multiple reasons. guest was 32 bit, and had only 8vcpu and the current RAM was only 1GB (max 4GB) when I increased it to 4GB it came around just 127 second. There is a happy news: I created a new 64 bit guest and ran with 16GB RAM and 16VCPU. Kernbench in The pv spinlock (case E) took just around 42sec (against 57 sec of host), an improvement of around 26% against host. So its much faster rather than 3x slower. Attachment:
spinlock_thread.c _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |