[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic



Hello again,

the last two weeks no crash with pinning dom0_vcpus_pin and restricting dom0 to 1 cpu. But yesterday it crashed again. So changed the command line again to:

iommu=no-intremap noirqbalance com1=115200,8n1,0xe050,0 console=com1,vga mem=1024G dom0_max_vcpus=4 dom0_mem=752M,max:752M watchdog_timeout=300 lowmem_emergency_pool=1M crashkernel=64M@32M cpuid_mask_xsave_eax=0

And today server crashed again and produced a lot of debugging messages, see attached. The "..." in the logfiles mean that the message above the points was repeated very often.

My summary so far:
- With only 1 cpu atteched to dom0 the server was stable for 2 weeks, the crash there did not really show any irq problems, see crash20130903.txt You can find Andrews ideas to this in http://forums.citrix.com/thread.jspa?messageID=1760771#1760771 - With more than 1 cpu and irqbalance the server produced the crashes I've already posted before - Without irqbalance crash with some other fancy output, see crash20130904.txt

Next step is to change the network card.

Zhang, any update from your side ? Or do the others have any idea ?
Could "ioapic_ack=old" help somewhere ?

Best regards
  Thimo

Am 27.08.2013 03:03, schrieb Zhang, Yang Z:
Zhang, Yang Z wrote on 2013-08-23:
Thimo EichstÃdt wrote on 2013-08-23:
Hello Yang,

any update from your side ? Did your expert have any idea ? Possible
Hardware problem ?
Sorry, no update on this. I am still waiting the answer from hardware team.
Hi Thimo,

I remember that the CPU always in idle state when this issue happens. So can 
you have a try to disable the C state in Xen to see if it helps?

Best regards
    Thimo
Am 20.08.2013 10:50, schrieb Zhang, Yang Z:
Jan Beulich wrote on 2013-08-20:
On 20.08.13 at 07:43, Thimo EichstÃdt<thimoe@xxxxxxxxxx> wrote:
(XEN) **Pending EOI error^M (XEN)   irq 29, vector 0x21^M (XEN) s[0]
irq 30, vec 0x31, ready 0, ISR 00000001, TMR 00000000, IRR
00000000^M (XEN) All LAPIC state:^M (XEN) [vector]      ISR TMR
IRR^M (XEN) [1f:00] 00000000 00000000 00000000^M (XEN) [3f:20]
00020002 00000000 00000000^M
It ought to be plain impossible to receive an interrupt at vector
0x21 while the ISR bit for vector 0x31 is still set.

Intel folks - any input on this?
I have no idea with this. But I will forward the information to
some experts internally for help.

Jan
Best regards,
Yang


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Best regards,
Yang


Best regards,
Yang


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Attachment: crash20130904.txt
Description: Text document

Attachment: crash20130903.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.