[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Guest start issue on ARM (maybe related to Credit2) [Was: Re: [xen-unstable test] 113807: regressions - FAIL]
Hi Dario, On 09/26/2017 08:33 AM, Dario Faggioli wrote: > On Mon, 2017-09-25 at 17:23 +0100, Julien Grall wrote: >> On 09/25/2017 03:07 PM, Dario Faggioli wrote: >>> I don't see much in the logs, TBH, but both `xl vcpu-list' and the >>> 'r' >>> debug key seem to suggest that vCPU 0 is running, while the other >>> vCPUs >>> have never run... like it was an issue with secondary (v)CPU >>> bringup. >>> >>> It indeed shows up with Credit2, as it were _specific_ to it, but >>> I'm >>> not 100% sure. In fact, it indeed seems to never show up here: >>> http://logs.test-lab.xenproject.org/osstest/results/history/test-ar >>> mhf- >>> armhf-xl/xen-unstable >>> >> Most of the time guest-start/debian.repeat fails, vCPU 0 is in >> data/prefetch abort state. My guess is a latent cache bug that >> credit2 >> appears to expose. >> > So, forgive my ARM ignorance, but how do you tell that the vCPU(s) > is(are) in that particular state? I was looking at the guest state dumped: Sep 24 15:10:43.275221 (XEN) *** Dumping CPU1 guest state (d3v0): *** Sep 24 15:10:43.279352 (XEN) ----[ Xen-4.10-unstable arm32 debug=y Not tainted ]---- Sep 24 15:10:43.285242 (XEN) CPU: 1 Sep 24 15:10:43.286597 (XEN) PC: 0000000c Sep 24 15:10:43.288743 (XEN) CPSR: 800001d7 MODE:32-bit Guest ABT Sep 24 15:10:43.292741 (XEN) R0: 00400000 R1: ffffffff R2: 48c24000 R3: 80000000 Sep 24 15:10:43.298241 (XEN) R4: 410aa758 R5: 410aacf8 R6: 00000080 R7: c2c2c2c2 Sep 24 15:10:43.303850 (XEN) R8: 40000000 R9: 410fc074 R10:40b7923c R11:10101105 R12:ffffffff Sep 24 15:10:43.310457 (XEN) USR: SP: 00000000 LR: 00000000 Sep 24 15:10:43.313714 (XEN) SVC: SP: 4199fb70 LR: 40208060 SPSR:400001d3 Sep 24 15:10:43.318334 (XEN) ABT: SP: 00000000 LR: 0000000c SPSR:800001d7 Sep 24 15:10:43.322863 (XEN) UND: SP: 00000000 LR: 00000000 SPSR:00000000 Sep 24 15:10:43.327361 (XEN) IRQ: SP: 00000000 LR: 00000000 SPSR:00000000 Sep 24 15:10:43.331855 (XEN) FIQ: SP: 00000000 LR: c1318ae4 SPSR:00000000 Sep 24 15:10:43.336349 (XEN) FIQ: R8: 00000000 R9: 00000000 R10:00000000 R11:00000000 R12:00000000 "MODE:..." is the current mode of the vCPU. In that case ABT means it receive an abort (e.g data/prefetch abort). There are other mode such as: - USR : User mode - SVC : Kernel mode > > I'm asking because I now wonder whether this same issue could also be > the cause of these other failures, which we see from time to time: > > flight 113816 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/113816/ > > [...] > > Tests which did not succeed, but are not blocking: > test-armhf-armhf-xl-rtds 16 guest-start/debian.repeat fail blocked in > 113387 > > Here's the logs: > http://logs.test-lab.xenproject.org/osstest/logs/113816/test-armhf-armhf-xl-rtds/info.html It does not seem to be similar, in the credit2 case the kernel is stuck at very early boot. Here it seems it is running (there are grants setup). This seem to be confirmed from the guest console log, I can see the prompt. Interestingly when the guest job fails, it has been waiting for a long time disk and hvc0. Although, it does not timeout. I am actually quite surprised that we start a 4 vCPUs guest on a 2 pCPUs platform. The total of vCPUs is 6 (2 DOM0 + 4 DOMU). The processors in are not the greatest for testing. So I was wondering if we end up to have too many vCPUs running on the platform and making it unreliable the test? Cheers, -- Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |