[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] State of GPLPV tests - 28.11.11


  • To: "Andreas Kinzler" <ml-xen-devel@xxxxxx>
  • From: "James Harper" <james.harper@xxxxxxxxxxxxxxxx>
  • Date: Wed, 30 Nov 2011 09:39:32 +1100
  • Cc: xen-devel@xxxxxxxxxxxxxxxxxxx
  • Delivery-date: Tue, 29 Nov 2011 22:40:44 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>
  • Thread-index: AcyuuRQ0qh1MgRFaSEehkp8pCHodrQALndJg
  • Thread-topic: [Xen-devel] State of GPLPV tests - 28.11.11

> 
> On 29.11.2011 00:16, James Harper wrote:
> >> I am still running tests 7 days a week on two test systems. Results
> >> are quite discouraging though. After experiencing crash after crash
I
> >> wanted to test if the configuration I called "stable" (Xen 4.0.1,
> >> GPLPV 0.11.0.213, dom0 kernel
> >> 2.6.32.18-pvops0-ak3) was stable indeed. But even that config
crashed
> >> when running my torture test. It is stable on our production
systems
> >> - running other workloads of course.
> > What crash are you getting these days? Is it the same one as you
used
> > to get?
> 
> Yes, still exactly the same crashes.
> 
> Good good news: I think I have found the bug. Since I am not really a
Xen or
> Windows kernel developer it cannot say for sure but here is what I
found:
> 
> When domU hang I ran xentop and found out that the number of vbd read
> requests was an number like 0x7FFFzzzz in hex which lead me to a
thesis:
> GPLPV crashes as soon as the number of disk requests reaches 2^32. On
my
> hardware with 5000 IIOPs/sec this is reached in
> 2^32 / 5000 IIOPs / 3600 sec-per-hour / 24 hours-per-day = 9.94 days
And
> there we go: there are the 9-10 days I was always seeing.
> 
> I studied the source code of blkback/blktap/aio and found nothing. But
in
> GPLPV and its use of the ring macros I found suspicious code in every
version
> of GPLPV I ever used
> 
>    while (more_to_do)
>    {
>      rp = xvdd->ring.sring->rsp_prod;
>      KeMemoryBarrier();
>      for (i = xvdd->ring.rsp_cons; i < rp; i++)
>      {
>        rep = XenVbd_GetResponse(xvdd, i);
> 
> If now rp is 10 for example and xvdd->ring.rsp_cons is 0xFFFFFFF7 then
the
> for loop is skipped, responses are not delivered and we see the hang.
> 

Good work! I'm impressed :)

I'll get straight on that... I must have gone wrong somewhere very early
on in development.

James

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.