Xen project Mailing List

Re: [Xen-devel] [xen-unstable test] 145796: tolerable FAIL - PUSHED

To: osstest service owner <osstest-admin@xxxxxxxxxxxxxx>, Dario Faggioli <dfaggioli@xxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jürgen Groß <jgross@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>

From: Julien Grall <julien.grall.oss@xxxxxxxxx>

Date: Wed, 8 Jan 2020 23:14:28 +0000

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 08 Jan 2020 23:14:43 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, 8 Jan 2020 at 21:40, osstest service owner <osstest-admin@xxxxxxxxxxxxxx> wrote: > > flight 145796 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/145796/ > > Failures :-/ but no regressions. > > Tests which are failing intermittently (not blocking): > test-amd64-amd64-xl-rtds 15 guest-saverestore fail in 145773 pass in > 145796 > test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 16 > guest-start/debianhvm.repeat fail in 145773 pass in 145796 > test-armhf-armhf-xl-rtds 12 guest-start fail in 145773 pass in > 145796 It looks like this test has been failing for a while (although not reliably). I looked at a few flights, the cause seems to be the same: Jan 8 15:02:14.700784 (XEN) Assertion '!unit_on_replq(svc)' failed at sched_rt.c:586 Jan 8 15:02:26.715030 (XEN) ----[ Xen-4.14-unstable arm32 debug=y Not tainted ]---- Jan 8 15:02:26.720756 (XEN) CPU: 1 Jan 8 15:02:26.722158 (XEN) PC: 0023a750 common/sched_rt.c#replq_insert+0x7c/0xcc Jan 8 15:02:26.727851 (XEN) CPSR: 200300da MODE:Hypervisor Jan 8 15:02:26.731334 (XEN) R0: 002a51a4 R1: 400614a0 R2: 3d64b900 R3: 40061338 Jan 8 15:02:26.736830 (XEN) R4: 400614a0 R5: 002a51a4 R6: 3cf1cbf0 R7: 000001cb Jan 8 15:02:26.742600 (XEN) R8: 4003d1b0 R9: 400614a8 R10:4003d1b0 R11:400ffe54 R12:400ffde4 Jan 8 15:02:26.749119 (XEN) HYP: SP: 400ffe2c LR: 0023b6e8 Jan 8 15:02:26.752296 (XEN) Jan 8 15:02:26.753036 (XEN) VTCR_EL2: 80003558 Jan 8 15:02:26.755479 (XEN) VTTBR_EL2: 00020000bbff4000 Jan 8 15:02:26.758757 (XEN) Jan 8 15:02:26.759366 (XEN) SCTLR_EL2: 30cd187f Jan 8 15:02:26.761755 (XEN) HCR_EL2: 0078663f Jan 8 15:02:26.764250 (XEN) TTBR0_EL2: 00000000bc029000 Jan 8 15:02:26.767364 (XEN) Jan 8 15:02:26.767980 (XEN) ESR_EL2: 00000000 Jan 8 15:02:26.770485 (XEN) HPFAR_EL2: 00030010 Jan 8 15:02:26.772795 (XEN) HDFAR: e0800f00 Jan 8 15:02:26.775272 (XEN) HIFAR: c0605744 Jan 8 15:02:26.777748 (XEN) Jan 8 15:02:26.778505 (XEN) Xen stack trace from sp=400ffe2c: Jan 8 15:02:26.781910 (XEN) 00000000 3cf1cbf0 400614a0 002a51a4 3cf1cbf0 000001cb 4003d1b0 6003005a Jan 8 15:02:26.788991 (XEN) 400613f8 400ffe7c 0023b6e8 002f9300 4004c000 400613f8 3cf1cbf0 000001cb Jan 8 15:02:26.796093 (XEN) 4003d1b0 6003005a 400613f8 400ffeac 00242988 4004c000 002425ac 40058000 Jan 8 15:02:26.803237 (XEN) 4004c000 4004f000 10f45000 10f45008 4004b080 40058000 60030013 400ffebc Jan 8 15:02:26.810360 (XEN) 00209984 00000002 4004f000 400ffedc 0020eddc 0020caf8 db097cd4 00000020 Jan 8 15:02:26.817504 (XEN) c13afbec 00000000 db15fd68 400ffee4 0020c9dc 400fff34 0020d5e8 4004e000 Jan 8 15:02:26.824615 (XEN) 00000000 400fff44 400fff44 00000002 00000000 4004e8fa 4004e8f4 400fff1c Jan 8 15:02:26.831737 (XEN) 400fff1c 6003005a 0020caf8 400fff58 00000020 c13afbec 00000000 db15fd68 Jan 8 15:02:26.838798 (XEN) 60030013 400fff54 0026c150 c1204d08 c13afbec 00000000 00000000 00000000 Jan 8 15:02:26.845877 (XEN) 00000002 400fff58 002753b0 00000009 db097cd4 db173008 00000002 c1204d08 Jan 8 15:02:26.852986 (XEN) 00000000 00000002 c13afbec 00000000 db15fd68 60030013 db15fd3c 00000020 Jan 8 15:02:26.860044 (XEN) ffffffff b6cdccb3 c0107ed0 a0030093 4a000ea1 be951568 c136edc0 c010d3a0 Jan 8 15:02:26.867171 (XEN) db097cd0 c056c7f8 c136edcc c010d720 c136edd8 c010d7e0 00000000 00000000 Jan 8 15:02:26.874526 (XEN) 00000000 00000000 00000000 c136ede4 c136ede4 00030030 60070193 80030093 Jan 8 15:02:26.881450 (XEN) 60030193 00000000 00000000 00000000 00000001 Jan 8 15:02:26.886519 (XEN) Xen call trace: Jan 8 15:02:26.888168 (XEN) [<0023a750>] common/sched_rt.c#replq_insert+0x7c/0xcc (PC) Jan 8 15:02:26.894240 (XEN) [<0023b6e8>] common/sched_rt.c#rt_unit_wake+0xf4/0x274 (LR) Jan 8 15:02:26.900246 (XEN) [<0023b6e8>] common/sched_rt.c#rt_unit_wake+0xf4/0x274 Jan 8 15:02:26.905775 (XEN) [<00242988>] vcpu_wake+0x1e4/0x688 Jan 8 15:02:26.909743 (XEN) [<00209984>] domain_unpause+0x64/0x84 Jan 8 15:02:26.913956 (XEN) [<0020eddc>] common/event_fifo.c#evtchn_fifo_unmask+0xd8/0xf0 Jan 8 15:02:26.920167 (XEN) [<0020c9dc>] evtchn_unmask+0x7c/0xc0 Jan 8 15:02:26.924173 (XEN) [<0020d5e8>] do_event_channel_op+0xaf0/0xdac Jan 8 15:02:26.928922 (XEN) [<0026c150>] do_trap_guest_sync+0x350/0x4d0 Jan 8 15:02:26.933647 (XEN) [<002753b0>] entry.o#return_from_trap+0/0x4 Jan 8 15:02:26.938299 (XEN) Jan 8 15:02:26.939039 (XEN) Jan 8 15:02:26.939668 (XEN) **************************************** Jan 8 15:02:26.943794 (XEN) Panic on CPU 1: Jan 8 15:02:26.945872 (XEN) Assertion '!unit_on_replq(svc)' failed at sched_rt.c:586 Jan 8 15:02:26.951492 (XEN) **************************************** I believe the domain_unpause() is coming from guest_clear_bit(). This would mean the atomics didn't succeed without pausing the domain. This makes sense as, per the log: CPU1: Guest atomics will try 1 times before pausing the domain I am under the impression that the crash could be reproduced with just: domain_pause_nosync(current); domain_unpause(current); Any insights what's wrong? I am happy to try to reproduce it tomorrow morning. Cheers, _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.