[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic



Thimo E. wrote on 2013-09-05:
> Hello again,
> 
> the last two weeks no crash with pinning dom0_vcpus_pin and 
> restricting
> dom0 to 1 cpu. But yesterday it crashed again. So changed the command 
> line again to:
> 
> iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0 
> console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M 
> watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M 
> cpuid_mask_xsave_eax=0
> 
> And today server crashed again and produced a lot of debugging 
> messages, see attached. The "..." in the logfiles mean that the 
> message above the points was repeated very often.
> 
> My summary so far:
> - With only 1 cpu atteched to dom0 the server was stable for 2 weeks, 
> the crash there did not really show any irq problems, see crash20130903.txt
>     You can find Andrews ideas to this in
> http://forums.citrix.com/thread.jspa?messageID=1760771#1760771 - With 
> more than 1 cpu and irqbalance the server produced the crashes I've 
> already posted before - Without irqbalance crash with some other fancy 
> output, see crash20130904.txt
> 
> Next step is to change the network card.
> 
> Zhang, any update from your side ? Or do the others have any idea ?
Our hardware guys said they don't aware of such issue with this CPU. We are 
trying to find the same platform to reproduce now.

> Could "ioapic_ack=old" help somewhere ?
> 
> Best regards
>    Thimo
> Am 27.08.2013 03:03, schrieb Zhang, Yang Z:
>> Zhang, Yang Z wrote on 2013-08-23:
>>> Thimo EichstÃdt wrote on 2013-08-23:
>>>> Hello Yang,
>>>> 
>>>> any update from your side ? Did your expert have any idea ?
>>>> Possible Hardware problem ?
>>> Sorry, no update on this. I am still waiting the answer from hardware team.
>> Hi Thimo,
>> 
>> I remember that the CPU always in idle state when this issue happens.
>> So can you have a try to disable the C state in Xen to see if it helps?
>> 
>>>> Best regards
>>>>     Thimo
>>>> Am 20.08.2013 10:50, schrieb Zhang, Yang Z:
>>>>> Jan Beulich wrote on 2013-08-20:
>>>>>>>>> On 20.08.13 at 07:43, Thimo EichstÃdt<thimoe@xxxxxxxxxx> wrote:
>>>>>>> (XEN) **Pending EOI error^M (XEN)   irq 29, vector 0x21^M (XEN) s[0]
>>>>>>> irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR
>>>>>>> 00000000^M (XEN) All LAPIC state:^M (XEN) [vector]      ISR TMR
>>>>>>> IRR^M (XEN) [1f:00] 00000000 00000000 00000000^M (XEN) [3f:20]
>>>>>>> 00020002 00000000 00000000^M
>>>>>> It ought to be plain impossible to receive an interrupt at vector
>>>>>> 0x21 while the ISR bit for vector 0x31 is still set.
>>>>>> 
>>>>>> Intel folks - any input on this?
>>>>> I have no idea with this. But I will forward the information to 
>>>>> some experts internally for help.
>>>>> 
>>>>>> Jan
>>>>> Best regards,
>>>>> Yang
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@xxxxxxxxxxxxx
>>>>> http://lists.xen.org/xen-devel
>>> 
>>> Best regards,
>>> Yang
>>> 
>> 
>> Best regards,
>> Yang
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel


Best regards,
Yang

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.