[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [BUG] Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1163



On 17/01/2016 23:07, Håkon Alstadheim wrote:
> Den 17. jan. 2016 17:30, skrev Håkon Alstadheim:
>> Den 17. jan. 2016 16:16, skrev Andrew Cooper:
>>> On 17/01/16 14:50, Håkon Alstadheim wrote:
>>>> Den 15. jan. 2016 12:05, skrev Andrew Cooper:
>>>>> On 15/01/16 10:58, Håkon Alstadheim wrote:
>>>>>> CPUINFO:
>>>>>> vendor_id    : GenuineIntel
>>>>>> cpu family    : 6
>>>>>> model        : 63
>>>>>> model name    : Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
>>>>>>
>>>>>> # smbios-sys-info
>>>>>> Libsmbios version:      2.2.28
>>>>>> Product Name:           Z10PE-D8 WS
>>>>>> Vendor:                 ASUSTeK COMPUTER INC.
>>>>>> BIOS Version:           3101
>>>>>>
>>>>>>
>>>>>> I have been experiencing issues with domains with passed through PCIe
>>>>>> devices since I first installed xen. Then at version 4.5.x , I'm now
>>>>>> at 4.6.0 with gentoo patches. Crashes SEEM mostly related to this pci
>>>>>> pass through and interrupts (usb-cards, sound cards).
>>>>>>
>>>>>> Recently the system has been more stable, whether it is because I pass
>>>>>> through as few things as possible, or because of improvements in Xen I
>>>>>> do not know. I have also taken to building with debug, which leads to
>>>>>> more abrupt but less mysterious failures. Earlier (w/o debug and under
>>>>>> xen 4.5 ) stuff would just gradually stop working and end up in total
>>>>>> hang of everything. So, hey, things are improving :-b
>>>>> This isn't the first time we have seen this on Haswell processors. Do
>>>>> you have microcode loading set up?
>>>>>
>>>>> ~Andrew
>>>>>
>>>> Still happening with kernel-genkernel-x86_64-4.1.15-gentoo and updated
>>>> cpu microcode, using microcode from 20151106.
>>> Ok - I previously investigated this issue, but my repro evaporated from
>>> under my feet with a firmware update, and I never got to the bottom of it.
>>>
>>> Please can you start with the following patch which will dump some more
>>> information on crash.
>>>
>>> ---8<---
>>> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
>>> index 1228568..588b562 100644
>>> --- a/xen/arch/x86/irq.c
>>> +++ b/xen/arch/x86/irq.c
>>> @@ -1165,6 +1165,13 @@ static void __do_IRQ_guest(int irq)
>>>      if ( action->ack_type == ACKTYPE_EOI )
>>>      {
>>>          sp = pending_eoi_sp(peoi);
>>> +        if ( unlikely(!((sp == 0) || (peoi[sp-1].vector < vector))) )
>>> +        {
>>> +            int p;
>>> +            for ( p = sp; p > 0; --p )
>>> +                printk("**peoi[%d] = {%d, 0x%u, %d}\n",
>>> +                       p-1, peoi[p-1].irq, peoi[p-1].vector,
>>> peoi[p-1].ready);
>>> +        }
>>>          ASSERT((sp == 0) || (peoi[sp-1].vector < vector));
>>>          ASSERT(sp < (NR_DYNAMIC_VECTORS-1));
>>>          peoi[sp].irq = irq;
>>>
>>>
>> Will do. Building now.
>> Seems there is a line accidentally folded "peoi[p-1].ready);" belongs at
>> the end of preceding line I presume?
>>
> There we go :-/ . Log attached from boot to assertion-failure with
> loglvl=all guest_loglvl=all . Some of the log output might be a bit
> cryptic, they are notes to myself from local boot-scripts, basically
> firing up my router/name-server/dhcp-server and waiting until services
> are ready before continuing.

Would you mind running with the second patch I sent? It gathers more
information.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.