[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Xen-devel] Re: domU using linux-2.6.37-xen-next pvops kernel with CONFIG_PARAVIRT_SPINLOCKS disabled results in 150% performance improvement (updated)
On 12/20/2010 05:03 PM, Dante Cinco wrote:
> (Sorry, I accidentally sent the previous post before finishing the
> summary table)
> For a couple of months now, we've been trying to track down the slow
> I/O performance in pvops domU. Our system has 16 Fibre Channel
> devices, all PCI-passthrough to domU. We were previously using a
> 2.6.32 (Ubuntu version) HVM kernel and were getting 511k IOPS. We
> switched to pvops with Konrad's xen-pcifront-0.8.2 kernel and were
> disappointed to see the performance degrade to 11k IOPS. After
> disabling some kernel debug options including KMEMLEAK, the
> performance jumped to 186k IOPS but still well below what we were
> getting with the HVM kernel. We tried disabling spinlock debugging in
> the kernel but it actually resulted in a drop in performance to 70k IOPS.
> Last week we switched to linux-2.6.37-xen-next and with the same
> kernel debug options disabled, the I/O performance was slightly better
> at 211k IOPS. We tried disabling spinlock debugging again and saw a
> similar drop in performance to 58k IOPS. We searched around for any
> performance-related posts regarding pvops and found two references to
> CONFIG_PARAVIRT_SPINLOCKS (one from Jeremy and one from Konrad):
> Both posts recommended (Konrad strongly) enabling PARAVIRT_SPINLOCKS
> when running under Xen. Since it's enabled by default, we decided to
> see what would happen if we disabled CONFIG_PARAVIRT_SPINLOCKS. With
> the spinlock debugging enabled, we were getting 205k IOPS but with
> spinlock debugging disabled, the performance leaped to 522k IOPS !!!
> I'm assuming that this behavior is unexpected.
Yeah, that would be one way to put it.
> Here's a summary of the kernels, config changes and performance (in
> pcifront linux
> 0.8.2 2.6.37-xen-next
> pvops pvops
> debugging enabled, 186k 205k
> debugging disabled, 70k 58k
> debugging disabled, 247k 522k
Spinlock debugging ends up bypassing all the paths that
PARAVIRT_SPINLOCKS affects, so that's consistent with the problem being
the paravirt locking code.
Basically, there's 3 reasons paravirt spinlocks could slow things down:
1. the overhead of calling into the pv lock code is costing a lot
(very hard to imagine how it would cause this degree of slowdown)
2. you're hitting the spinlock slowpath very often, and end up making
lots of hypercalls
3. your system and/or workload gets a very strong benefit from the
ticket lock's FIFO properties
4. (something else entirely)
When you're running with PARAVIRT_SPINLOCKS=y, are you getting a lot of
counts on the per-cpu spinlock irqs?
What happens if you raise the "timeout" threshold? If you have
XEN_DEBUG_FS enabled, you can do that on the fly by writing it to
/sys/kernel/debug/xen/spinlocks/timeout, or adjust TIMEOUT in
arch/x86/xen/spinlock.c. In theory, if you set it very large it should
have the same effect as just disabling PARAVIRT_SPINLOCKS (except still
using byte locks). That should help isolate which of the possibilities
above are coming into play.
The other data in /sys/kernel/debug/xen/spinlocks could be helpful in
working out what's going on as well.
Do you know if there are specific spinlocks being particularly pounded
on by your workload? I'm guessing some specific to your hardware.
Xen-devel mailing list