[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Probable Xen bug triggered by localhost migration
Once again I have had a test fail during "10 migrations of a PV domain to localhost", with an apparent Xen or dom0 lockup or other serious problem. Failure modes include: * dom0 reporting soft lockup BUGs (showing xl stuck in a privcmd ioctl, apparently in a hypercall) * dom0 disk controller failure due to apparent lost/stuck interrupt (dom0 decides disk not working, tries unsuccessfully to reset) * apparent dom0 lockup or networking failure Problems occur with both XCP 2.6.27 and pvops 2.6.32 kernels. Problems seem only to happen with xl but that's likely to be because it's due to a race; xl and xend will make various calls in different orders and with different timing. Having added some machinery to request Xen debug keys, I now have some more information: http://www.chiark.greenend.org.uk/~xensrcts/logs/5639/test-amd64-i386-xl-credit2/info.html The most relevant files there are these: http://www.chiark.greenend.org.uk/~xensrcts/logs/5639/test-amd64-i386-xl-credit2/14.ts-guest-localmigrate.log That shows the failure. The test harness ssh's to the dom0 to run "xl migrate" and gets "No route to host", which typically means it has stopped responding to arp requests. In this particular case the failure happened after an apparently-successful previous migration, but the more common failure mode is that "xl migrate" prints the 0% progress message and then nothing else gets through. http://www.chiark.greenend.org.uk/~xensrcts/logs/5639/test-amd64-i386-xl-credit2/serial-woodlouse.log Serial log. Scroll to around "Feb 4 03:30:35" (timestamps, and the messages about clients connecting and disconnecting, are from the serial concentrator). You'll see a series of debug key outputs, which you can correlate with the test harness's requests, listed with timestamps here: http://www.chiark.greenend.org.uk/~xensrcts/logs/5639/test-amd64-i386-xl-credit2/15.ts-logs-capture.log After the Xen debug keys have been run through, the test harness sends the "q" guest debug key, which also produces the output you can see in the serial log. Then the test harness switches the serial back to dom0 and sends RET and we can see dom0 produce a new login prompt. So dom0 is not entirely dead. However, later entries in the "ts-logs-capture" log show that it still isn't responding to the network, and eventually the test harness decides to power cycle the host and collect what remains from the dom0 filesystem. So that's why you see a pile of boot messages at the end of the test log - these should be disregarded. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |