[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Re: [PATCH] blkfront: Move blkif_interrupt into a tasklet.



 On 09/24/2010 12:14 AM, Andrew Jones wrote:
> On 09/23/2010 08:36 PM, Jeremy Fitzhardinge wrote:
>>  On 09/23/2010 09:38 AM, Paolo Bonzini wrote:
>>> On 09/23/2010 06:23 PM, Jeremy Fitzhardinge wrote:
>>>>> Any developments with this? I've got a report of the exact same
>>>>> warnings
>>>>> on RHEL6 guest. See
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=632802
>>>>>
>>>>> RHEL6 doesn't have the 'Move blkif_interrupt into a tasklet' patch, so
>>>>> that can be ruled out. Unfortunately I don't have this reproducing on a
>>>>> test machine, so it's difficult to debug.  The report I have showed
>>>>> that
>>>>> in at least one case it occurred on boot up, right after initting the
>>>>> block device. I'm trying to get confirmation if that's always the case.
>>>>>
>>>>> Thanks in advance for any pointers you might have.
>>>> Yes, I see it even after reverting that change as well.  However I only
>>>> see it on my domain with an XFS filesystem, but I haven't dug any deeper
>>>> to see if that's relevant.
>>>>
>>>> Do you know when this appeared?  Is it recent?  What changes are in the
>>>> rhel6 kernel in question?
>>> It's got pretty much everything in stable-2.6.32.x, up to the 16 patch
>>> blkfront series you posted last July.  There are some RHEL-specific
>>> workarounds for PV-on-HVM, but for PV domains everything matches
>>> upstream.
>> Have you tried bisecting to see when this particular problem appeared? 
>> It looks to me like something is accidentally re-enabling interrupts -
>> perhaps a stack overrun is corrupting the "flags" argument between a
>> spin_lock_irqsave()/restore pair. 
>>
> Unfortunately I don't have a test machine where I can do a bisection
> (yet). I'm looking for one. I only have this one report so far, and it's
> on a production machine.

The report says that its repeatedly killing the machine though?  In my
testing, it seems to hit the warning once at boot, but is OK after that
(not that I'm doing anything very stressful on the domain).

>> Is it only on 32-bit kernels?
>>
> This one report I have is a 32b guest on a 64b host.

Is it using XFS by any chance?  So far I've traced the re-enable to
xfs_buf_bio_end_io().  However, my suspicion is that it might be related
to the barrier changes we did.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.