[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] State of GPLPV tests - 28.11.11
On 29.11.2011 00:16, James Harper wrote: I am still running tests 7 days a week on two test systems. Results are quite discouraging though. After experiencing crash after crash I wanted to test if the configuration I called "stable" (Xen 4.0.1, GPLPV 0.11.0.213, dom0 kernel 2.6.32.18-pvops0-ak3) was stable indeed. But even that config crashed when running my torture test. It is stable on our production systems - running other workloads of course.What crash are you getting these days? Is it the same one as you used to get? Yes, still exactly the same crashes.Good good news: I think I have found the bug. Since I am not really a Xen or Windows kernel developer it cannot say for sure but here is what I found: When domU hang I ran xentop and found out that the number of vbd read requests was an number like 0x7FFFzzzz in hex which lead me to a thesis: GPLPV crashes as soon as the number of disk requests reaches 2^32. On my hardware with 5000 IIOPs/sec this is reached in 2^32 / 5000 IIOPs / 3600 sec-per-hour / 24 hours-per-day = 9.94 days And there we go: there are the 9-10 days I was always seeing.I studied the source code of blkback/blktap/aio and found nothing. But in GPLPV and its use of the ring macros I found suspicious code in every version of GPLPV I ever used while (more_to_do) { rp = xvdd->ring.sring->rsp_prod; KeMemoryBarrier(); for (i = xvdd->ring.rsp_cons; i < rp; i++) { rep = XenVbd_GetResponse(xvdd, i);If now rp is 10 for example and xvdd->ring.rsp_cons is 0xFFFFFFF7 then the for loop is skipped, responses are not delivered and we see the hang. Regards Andreas _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |