[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] an issue with 'xm save'



On 2012-09-27 19:59, Konrad Rzeszutek Wilk wrote:
On Thu, Sep 27, 2012 at 01:58:19PM +0800, Zhenzhong Duan wrote:

On 2012-09-26 20:35, Konrad Rzeszutek Wilk wrote:
On Wed, Sep 26, 2012 at 04:48:42PM +0800, Zhenzhong Duan wrote:
Konrad Rzeszutek Wilk wrote:
On Fri, Sep 21, 2012 at 05:41:27PM +0800, Zhenzhong Duan wrote:
Hi maintainers,

I found there is an issue when 'xm save' a pvm guest. See below:

When I do save then restore once, CPU(%) in xentop showed around 99%.
When I do that second time, CPU(%) showed 199%

top in dom0 showed:
     PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    20946 root      18  -2 10984 1284  964 S 19.8  0.3   0:48.93 block
    4939 root      18  -2 10984 1288  964 S 19.5  0.3   1:34.68 block

I could kill the block process, then all look normal again.
What is the 'block' process? If you attach 'perf' to it do you get an idea
of what it is spinning at?
It's /etc/xen/scripts/block
I add 'set -x' to /etc/xen/scripts/block, found it blocked at claim_lock.
When domU was created first time, claim_lock/release_lock finished quickly,
when 'xm save' was called, claim_lock spin in its own while loop.
I can ensure no other domU create/save/etc happen when I test.
OK, so how come you have two block processes? Is it b/c you have two
disks attached to the guest? The are multiple claim_lock in the shell
script - do you know where each of two threads are spinning? Are they
spinning on the same function?
In above test, I run save/restore twice, so two block processes.
In other test, run save/restore once, there is only one block process.
After do 'xm save', I see block process spin at line 328:
321   remove)
322     case $t in
323       phy)
324         exit 0
325         ;;
326
327       file)
328         claim_lock "block"
329         node=$(xenstore_read "$XENBUS_PATH/node")
330         losetup -d "$node"
331         release_lock "block"
332         exit 0
333         ;;
So with the patches in OVM - do they have this fixed? Can they be upstreamed
or are the dependent on some magic OVM sauce?
After replace locking.sh with OVM's, it worked.
But xen-tools evolved to use flock based locking currently. We can't revert back.
It seems changeset 25595:497e2fe49455 bring the issue.
Finally, I came with a small patch that workaround the issue.

diff -r d364becfb083 tools/hotplug/Linux/locking.sh
--- a/tools/hotplug/Linux/locking.sh    Thu Sep 20 13:31:19 2012 +0200
+++ b/tools/hotplug/Linux/locking.sh    Fri Sep 28 18:27:31 2012 +0800
@@ -66,6 +66,7 @@
 release_lock()
 {
     _setlockfd $1
+    flock -u $_lockfd
     rm "$_lockfile"
 }

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.