[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen 4.1 + 3ware 9690SA = rejecting I/O to offline device



On 27/09/2011 19:13, Christopher S. Aker wrote:
> On 10/11/10 5:44 PM, Christopher S. Aker wrote:
>> In an effort to fix the problem described in my previous xen-devel post
>> ("New CPUS, now get: NETDEV WATCHDOG: eth0: transmit timed out"), we've
>> come across another problem. 3ware 9690SA cards to not behave under Xen
>> 4.1 (as of cs 22155).
>>
>> We have a simple Xen thrash test suite which fires up domUs that do
>> different workloads (some swap thrash, some kernel build, some spin
>> CPUs, some cycle rebooting, etc). Almost immediately after launching the
>> suite we can get the 3ware 9690SA card to fail with something like the
>> following:
>>
>> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x28) timed out, resetting
>> card.
>> sd 0:0:0:0: WARNING: (0x06:0x002C): Command (0x0) timed out, resetting
>> card.
>> sd 0:0:0:0: rejecting I/O to offline device
>> sd 0:0:0:0: rejecting I/O to offline device
>>
>> Under a 2.6.32 dom0 it sometimes also triggers Xenwatch like so:
>>
>> http://theshore.net/~caker/xen/BUGS/9690SA/xenwatch.txt
>>
>> Results matrix:
>>
>> +---------------------------------------------------------------+
>> | Xen           | Dom0                | 9550SXU | 9690SA | 9750 |
>> +---------------------------------------------------------------+
>> | 3.4.1         | 2.6.18.8-931-2      | OK      | OK     | OK   |
>> | 3.4.4-rc1-pre | 2.6.18.8-931-2      | OK      | OK     | OK   |
>> | 3.4.4-rc1-pre | 2.6.32.23-g41a85de5 | OK      | OK     | OK   |
>> | 4.1 @ 22155   | 2.6.18.8-931-2      | OK      | FAIL   | OK   |
>> | 4.1 @ 22155   | 2.6.32.23-g41a85de5 | OK      | FAIL   | OK   |
>> +---------------------------------------------------------------+
>>
>> The failures were verified on at least 2 machines of identical
>> specification.
>>
>> The same dom0 kernels that produce a stable 9690SA under Xen 3.4, bomb
>> under Xen 4.1.
> I'm back at this, and the problem still exists with a 4.1.1/3.0.4 stack.
>
> Konrad, in the "offline raid" thread you asked for the following debug 
> information:
>
> http://www.theshore.net/~caker/xen/BUGS/offline-raid/
>
> The sysrq-t.txt and triple-a-star.txt outputs are after I got the raid 
> card to hang up (but before it timed out and started spewing to the 
> console).
>
> Oddly, lspci shows three devices assigned IRQ 16, however 
> /proc/interrupts only lists two of them.  Side effect of MSI?
>
> Also, the problem still happens even with MSI disabled (pci=nomsi).
>
> Thanks,
> -Chris

This is almost certainly the bug to do with not ack'ing a migrating line
level interrupt which I fixed in c/s 23145:1092a143ef9d.  Try applying
that patch, or just running from the tip of
http://xenbits.xen.org/hg/xen-4.1-testing.hg/

~Andrew

>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxxxxxxxx
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.