[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xen 4.18rc/ARM64 on Raspberry Pi 4B: Traffic in DomU crashing Dom0 when using VLANs



Hi Paul,

On 01/03/2024 19:37, Paul Leiber wrote:
Stopping xen-watchdog prevents the reboot. However, when triggering traffic on the VLAN, Dom0 and DomU become completely unresponsive. No error or kernel message is printed in the serial console.

Thanks for providing some logs. See some comments below. How long did you wait before confirming dom0 is stucked?

IIRC, Linux may print some RCU stall logs after a few minutes.


Switching to Xen console works. Pressing '0' produces the following output:

(XEN) '0' pressed -> dumping Dom0's registers
(XEN) *** Dumping Dom0 vcpu#0 state: ***
(XEN) ----[ Xen-4.19-unstable  arm64  debug=y  Tainted:   C    ]----
(XEN) CPU:    0
(XEN) PC:     ffff800008027e50
(XEN) LR:     ffff800008027e44
(XEN) SP_EL0: ffff800009c78f80
(XEN) SP_EL1: ffff800008003b60
(XEN) CPSR:   00000000000003c5 MODE:64-bit EL1h (Guest Kernel, handler)

[...]

(XEN) *** Dumping Dom0 vcpu#1 state: ***
(XEN) ----[ Xen-4.19-unstable  arm64  debug=y  Tainted:   C    ]----
(XEN) CPU:    0
(XEN) PC:     ffff800008c5dc80
(XEN) LR:     ffff800008c5dc88
(XEN) SP_EL0: ffff000042272080
(XEN) SP_EL1: ffff80000800b0e0
(XEN) CPSR:   0000000080000305 MODE:64-bit EL1h (Guest Kernel, handler)

[...]

(XEN) *** Dumping Dom0 vcpu#2 state: ***
(XEN) ----[ Xen-4.19-unstable  arm64  debug=y  Tainted:   C    ]----
(XEN) CPU:    0
(XEN) PC:     ffff800008027e50
(XEN) LR:     ffff800008027e44
(XEN) SP_EL0: ffff000042271040
(XEN) SP_EL1: ffff800009fcbf20
(XEN) CPSR:   00000000000003c5 MODE:64-bit EL1h (Guest Kernel, handler)

[...]

(XEN) *** Dumping Dom0 vcpu#3 state: ***
(XEN) ----[ Xen-4.19-unstable  arm64  debug=y  Tainted:   C    ]----
(XEN) CPU:    0
(XEN) PC:     ffff800008027e50
(XEN) LR:     ffff800008027e44
(XEN) SP_EL0: ffff0000422730c0
(XEN) SP_EL1: ffff800009fd3f20
(XEN) CPSR:   00000000000003c5 MODE:64-bit EL1h (Guest Kernel, handler)

All the PCs but one (vcpu#1) are the same.

(XEN) 'q' pressed -> dumping domain info (now = 727929105981)
(XEN) General information for domain 0:
(XEN)     refcnt=3 dying=0 pause_count=0
(XEN)     nr_pages=262144 xenheap_pages=2 dirty_cpus={} max_pages=262144
(XEN)     handle=00000000-0000-0000-0000-000000000000 vm_assist=00000020
(XEN) p2m mappings for domain 0 (vmid 1):
(XEN)   1G mappings: 0 (shattered 1)
(XEN)   2M mappings: 422 (shattered 90)
(XEN)   4K mappings: 45372
(XEN) Rangesets belonging to domain 0:
(XEN)     Interrupts { 32-152, 154-255 }
(XEN)     I/O Memory { 0-fe200, fe203-ff841, ff849-ffffffffffffffff }
(XEN) NODE affinity for domain 0: [0]
(XEN) VCPU information and callbacks for domain 0:
(XEN)   UNIT0 affinities: hard={0-3} soft={0-3}
(XEN)     VCPU0: CPU3 [has=F] poll=0 upcall_pend=01 upcall_mask=01
(XEN)     pause_count=0 pause_flags=1

The vCPU is blocked. But...

(XEN) GICH_LRs (vcpu 0) mask=f
(XEN)    VCPU_LR[0]=2a000002
(XEN)    VCPU_LR[1]=1a00001b
(XEN)    VCPU_LR[2]=1a000001
(XEN)    VCPU_LR[3]=1a000010

... it loosk like multiple IRQs are inflights. LR0 (holding IRQ2) is active but the others are pending. This is the same for vCPU #2, #3. vCPU #1 still seems to "work".

AFAICT, Linux is using IRQ2 for the IPI CPU_STOP. So it sounds like dom0 may have panicked.

Looking at the initial logs you posted. I see some messages from Xen but no messages at all from dom0 (including boot). Can you check if you have console=hvc0 on the Linux command line?

If not, please add it and retry.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.