[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Poor SMP performance pv_ops domU



On 05/18/2010 10:34 AM, John Morrison wrote:
> Hi,
>
> Over the last year we have tried many times to get acceptable performance 
> from pv_ops kernels.
>
> Tests done with 1,2,4 and 8 cores. The more cores the lower the score.
>
> Inside the domU it shows all cores, top -s shows all cores in use.
> xentop in dom0 never shows over 99% cpu.
>
> 2.6.18.8-xenU kernel show's over 700% cpu and the scores are about 8 x the 
> pv_ops score.
>
> Any ideas ?
>   

Well, I guess some kind of bad serialization is going on in there, and
it should be fairly obvious with a bit of examination.

Have you tried building your own pvops domu kernels?  Does enabling PV
spinlocks make any difference?  Also enabling some of the lock
debugging/profiling/contention monitoring stuff may give useful results.

Can you post the corresponding 2.6.18 results?  Are there specific
sub-tests which show the effect more strongly than the others?

How does the 2.6.32 kernel fare when booted native?

Thanks,
    J

>
> John
>
>
> 1 core
>
> BYTE UNIX Benchmarks (Version 4.1-wht.2)
> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
> 2010 x86_64 GNU/Linux
> /dev/xvda1           141110136   1066476 132875660   1% /
>
> Start Benchmark Run: Tue May 18 13:54:54 BST 2010
>  13:54:54 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
>
> End Benchmark Run: Tue May 18 14:06:12 BST 2010
>  14:06:12 up 11 min,  2 users,  load average: 11.48, 5.20, 2.43
>
>
>                      INDEX VALUES
> TEST                                        BASELINE     RESULT      INDEX
>
> Dhrystone 2 using register variables        376783.7  8950813.0      237.6
> Double-Precision Whetstone                      83.1     2103.7      253.2
> Execl Throughput                               188.3     1568.4       83.3
> File Copy 1024 bufsize 2000 maxblocks         2672.0    64198.0      240.3
> File Copy 256 bufsize 500 maxblocks           1077.0    17781.0      165.1
> File Read 4096 bufsize 8000 maxblocks        15382.0   643717.0      418.5
> Pipe-based Context Switching                 15448.6    85379.4       55.3
> Pipe Throughput                             111814.6   478490.1       42.8
> Process Creation                               569.3     3329.6       58.5
> Shell Scripts (8 concurrent)                    44.8      380.7       85.0
> System Call Overhead                        114433.5   498712.3       43.6
>                                                                  =========
>      FINAL SCORE                                                     114.1
>
> 2-cores
>
> ==============================================================
> BYTE UNIX Benchmarks (Version 4.1-wht.2)
> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
> 2010 x86_64 GNU/Linux
> /dev/xvda1           141110136   1066548 132875588   1% /
>
> Start Benchmark Run: Tue May 18 14:07:27 BST 2010
>  14:07:27 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
>
> End Benchmark Run: Tue May 18 14:18:04 BST 2010
>  14:18:04 up 10 min,  1 user,  load average: 12.78, 5.53, 2.49
>
>
>                      INDEX VALUES
> TEST                                        BASELINE     RESULT      INDEX
>
> Dhrystone 2 using register variables        376783.7 10124838.6      268.7
> Double-Precision Whetstone                      83.1     1188.7      143.0
> Execl Throughput                               188.3     1596.2       84.8
> File Copy 1024 bufsize 2000 maxblocks         2672.0    58323.0      218.3
> File Copy 256 bufsize 500 maxblocks           1077.0    17776.0      165.1
> File Read 4096 bufsize 8000 maxblocks        15382.0   568217.0      369.4
> Pipe-based Context Switching                 15448.6    86111.3       55.7
> Pipe Throughput                             111814.6   469957.8       42.0
> Process Creation                               569.3     3298.1       57.9
> Shell Scripts (8 concurrent)                    44.8      378.9       84.6
> System Call Overhead                        114433.5   532828.4       46.6
>                                                                  =========
>      FINAL SCORE                                                     107.9
>
> 4-cores
>
> ==============================================================
> BYTE UNIX Benchmarks (Version 4.1-wht.2)
> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
> 2010 x86_64 GNU/Linux
> /dev/xvda1           141110136   1066628 132875508   1% /
>
> Start Benchmark Run: Tue May 18 14:19:17 BST 2010
>  14:19:17 up 0 min,  1 user,  load average: 0.00, 0.00, 0.00
>
> End Benchmark Run: Tue May 18 14:29:53 BST 2010
>  14:29:53 up 10 min,  1 user,  load average: 13.59, 6.35, 2.97
>
>
>                      INDEX VALUES
> TEST                                        BASELINE     RESULT      INDEX
>
> Dhrystone 2 using register variables        376783.7 10185429.8      270.3
> Double-Precision Whetstone                      83.1      759.8       91.4
> Execl Throughput                               188.3     1386.2       73.6
> File Copy 1024 bufsize 2000 maxblocks         2672.0    62331.0      233.3
> File Copy 256 bufsize 500 maxblocks           1077.0    16492.0      153.1
> File Read 4096 bufsize 8000 maxblocks        15382.0   563402.0      366.3
> Pipe-based Context Switching                 15448.6    87176.0       56.4
> Pipe Throughput                             111814.6   481068.1       43.0
> Process Creation                               569.3     3128.9       55.0
> Shell Scripts (8 concurrent)                    44.8      394.9       88.1
> System Call Overhead                        114433.5   539996.1       47.2
>                                                                  =========
>      FINAL SCORE                                                     102.6
> 8-cores
>  
> ==============================================================
> BYTE UNIX Benchmarks (Version 4.1-wht.2, 8 threads)
> System -- Linux test 2.6.32-21-server #32-Ubuntu SMP Fri Apr 16 09:17:34 UTC 
> 2010 x86_64 GNU/Linux
> /dev/xvda1           141110136   1066680 132875456   1% /
>
> Start Benchmark Run: Tue May 18 14:30:59 BST 2010
>  14:30:59 up 0 min,  1 user,  load average: 0.07, 0.02, 0.00
>
> End Benchmark Run: Tue May 18 14:42:52 BST 2010
>  14:42:52 up 12 min,  1 user,  load average: 25.56, 10.84, 4.96
>
>
>                      INDEX VALUES
> TEST                                        BASELINE     RESULT      INDEX
>
> Dhrystone 2 using register variables        376783.7  9972130.3      264.7
> Double-Precision Whetstone                      83.1      755.2       90.9
> Execl Throughput                               188.3     1584.7       84.2
> File Copy 1024 bufsize 2000 maxblocks         2672.0    58981.0      220.7
> File Copy 256 bufsize 500 maxblocks           1077.0    16904.0      157.0
> File Read 4096 bufsize 8000 maxblocks        15382.0   557735.0      362.6
> Pipe-based Context Switching                 15448.6    80738.2       52.3
> Pipe Throughput                             111814.6   450891.2       40.3
> Process Creation                               569.3     2948.5       51.8
> Shell Scripts (8 concurrent)                    44.8      378.1       84.4
> System Call Overhead                        114433.5   537443.2       47.0
>                                                                  =========
>      FINAL SCORE                                                     100.9
>
>
>
> --
> Professional hosting without compromise
> www.clustered.net
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel
>
>   


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.