[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [xen-unstable test] 58330: regressions - FAIL
On Wed, 2015-06-10 at 19:15 +0000, osstest service user wrote: > flight 58330 xen-unstable real [real] > http://logs.test-lab.xenproject.org/osstest/logs/58330/ > > Regressions :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-armhf-armhf-xl-credit2 15 guest-start/debian.repeat fail REGR. vs. > 58264 > Ok, let's look at logs. This should be the one: http://logs.test-lab.xenproject.org/osstest/logs/58330/test-armhf-armhf-xl-credit2/serial-cubietruck-braque.log Jun 10 14:17:12.292428 (XEN) NOW=0x00000147D949B2B5 Jun 10 14:17:12.293377 (XEN) Online Cpus: 0-1 Jun 10 14:17:12.368382 (XEN) Cpupool 0: Jun 10 14:17:12.369884 (XEN) Cpus: 0-1 Jun 10 14:17:12.371388 (XEN) Scheduler: SMP Credit Scheduler rev2 (credit2) Jun 10 14:17:12.376023 (XEN) Active queues: 1 Jun 10 14:17:12.378132 (XEN) default-weight = 256 Jun 10 14:17:12.380970 (XEN) Runqueue 0: Jun 10 14:17:12.382630 (XEN) ncpus = 2 Jun 10 14:17:12.385260 (XEN) cpus = 0-1 Jun 10 14:17:12.388131 (XEN) max_weight = 256 Jun 10 14:17:12.391003 (XEN) instload = 1 Jun 10 14:17:12.393765 (XEN) aveload = 100 100% average load. This could mean that one of the two pCPUs in the runqueue (i.e., one of the two pCPUs in the system, since there is only one runqueue) is fully loaded. In theory, it could also mean that each pCPU is 50% loaded, but I think, in this case, it's like said above. Jun 10 14:17:12.396629 (XEN) idlers: 00000000,00000000,00000000,00000001 Jun 10 14:17:12.401129 (XEN) tickled: 00000000,00000000,00000000,00000000 And it would be pCPU #1 that is running at 100%, as pCPU #0 is idle. Jun 10 14:17:12.405635 (XEN) Domain info: Jun 10 14:17:12.407379 (XEN) Domain: 0 w 256 v 2 Jun 10 14:17:12.409888 (XEN) 1: [0.0] flags=0 cpu=0 credit=10500000 [w=256] Jun 10 14:17:12.414999 (XEN) 2: [0.1] flags=0 cpu=1 credit=10500000 [w=256] Jun 10 14:17:12.419761 (XEN) Domain: 10 w 256 v 4 Jun 10 14:17:12.422381 (XEN) 3: [10.0] flags=2 cpu=1 credit=9097256 [w=256] Jun 10 14:17:12.427387 (XEN) 4: [10.1] flags=0 cpu=0 credit=10500000 [w=256] Jun 10 14:17:12.432388 (XEN) 5: [10.2] flags=0 cpu=1 credit=10500000 [w=256] Jun 10 14:17:12.437393 (XEN) 6: [10.3] flags=0 cpu=0 credit=10500000 [w=256] Jun 10 14:17:12.442434 (XEN) CPU[00] sibling=00000000,00000000,00000000,00000001, core=00000000,00000000,00000000,00000001 Jun 10 14:17:12.451247 (XEN) run: [32767.0] flags=0 cpu=0 credit=-1073741824 [w=0] Jun 10 14:17:12.456634 (XEN) CPU[01] sibling=00000000,00000000,00000000,00000002, core=00000000,00000000,00000000,00000002 Jun 10 14:17:12.465504 (XEN) run: [10.0] flags=2 cpu=1 credit=9097256 [w=256] Here it is, d10v0, caught running on pCPU #1. flags=2 is __CSFLAG_scheduled, which confirms the above. Such flag is set when the vcpu is scheduled and cleared during context switch tail, in csched2_context_saved(). If it's a Xen issue, it seems to be causing context_saved() to never be called which, in ARM, should happen in schedule_tail(). I guess it could also be an issue in the guest kernel. Some kind of livelock comes to my mind, but it feels weird that it would involve only one vcpu... Maybe someone with more ARM experience than me (i.e., more than 0 ARM experience!) can help decipher this, to try figuring out what the vCPU is actually doing: Jun 10 14:17:00.215131 (XEN) *** Dumping CPU0 guest state (d10v0): *** Jun 10 14:17:00.219052 (XEN) ----[ Xen-4.6-unstable arm32 debug=y Not tainted ]---- Jun 10 14:17:00.224663 (XEN) CPU: 0 Jun 10 14:17:00.226157 (XEN) PC: c0413bb8 Jun 10 14:17:00.228279 (XEN) CPSR: 20000153 MODE:32-bit Guest SVC Jun 10 14:17:00.232288 (XEN) R0: 00000022 R1: ffffffff R2: 00000004 R3: c0413b88 Jun 10 14:17:00.237796 (XEN) R4: 07a36158 R5: 00000001 R6: 0000002f R7: c0bbb708 Jun 10 14:17:00.243401 (XEN) R8: 07a36090 R9: c0b68000 R10:c0c49870 R11:600001d3 R12:00000000 Jun 10 14:17:00.250028 (XEN) USR: SP: 00000000 LR: 00000000 Jun 10 14:17:00.253292 (XEN) SVC: SP: c0b69cb8 LR: c080c6e4 SPSR:000001d3 Jun 10 14:17:00.257806 (XEN) ABT: SP: c0c4700c LR: c02126a0 SPSR:800001d3 Jun 10 14:17:00.262281 (XEN) UND: SP: c0c47018 LR: c0c47018 SPSR:00000000 Jun 10 14:17:00.266800 (XEN) IRQ: SP: c0c47000 LR: c0c47000 SPSR:00000000 Jun 10 14:17:00.271426 (XEN) FIQ: SP: 00000000 LR: 00000000 SPSR:00000000 Jun 10 14:17:00.275907 (XEN) FIQ: R8: 00000000 R9: 00000000 R10:00000000 R11:00000000 R12:00000000 Jun 10 14:17:00.282522 (XEN) Jun 10 14:17:00.283175 (XEN) SCTLR: 10c5387d Jun 10 14:17:00.285659 (XEN) TCR: 00000000 Jun 10 14:17:00.288046 (XEN) TTBR0: 000000004020406a Jun 10 14:17:00.291174 (XEN) TTBR1: 000000004020406a Jun 10 14:17:00.294318 (XEN) IFAR: 00000000, IFSR: 00000000 Jun 10 14:17:00.298194 (XEN) DFAR: 9bfd9000, DFSR: 00002805 Jun 10 14:17:00.301903 (XEN) Jun 10 14:17:00.302648 (XEN) VTCR_EL2: 80003558 Jun 10 14:17:00.305035 (XEN) VTTBR_EL2: 00020000bfe7e000 Jun 10 14:17:00.308153 (XEN) Jun 10 14:17:00.308901 (XEN) SCTLR_EL2: 30cd187f Jun 10 14:17:00.311276 (XEN) HCR_EL2: 000000000038643f Jun 10 14:17:00.314407 (XEN) TTBR0_EL2: 00000000bfef0000 Jun 10 14:17:00.317527 (XEN) Jun 10 14:17:00.318276 (XEN) ESR_EL2: 07e00000 Jun 10 14:17:00.320651 (XEN) HPFAR_EL2: 000000000001c810 Jun 10 14:17:00.323780 (XEN) HDFAR: e0800f00 Jun 10 14:17:00.326153 (XEN) HIFAR: f265a116 Jun 10 14:17:00.328650 (XEN) Jun 10 14:17:00.329280 (XEN) Guest stack trace from sp=c0b69cb8: Jun 10 14:17:00.333027 (XEN) 00000000 c0b69ce4 00000008 c0b73a60 0000000b c0b68000 c0b74b30 0000000b Jun 10 14:17:00.340168 (XEN) 00000000 c024bf68 c09b4cb4 c0b69cf0 00200000 00000000 00000001 00000000 Jun 10 14:17:00.347281 (XEN) 600001d3 c080cb50 c09b4a60 c0b69d1c c0b69ee8 c0c47244 c0b69ee8 c0b68000 Jun 10 14:17:00.354404 (XEN) c0b74b30 0000000b 00000001 00000000 600001d3 c0211ff8 c0b68248 0000000b Jun 10 14:17:00.361534 (XEN) c0c4cd2a c0b69d53 c021ed58 c0b69ee8 00000000 00000008 653b36fb 31303830 Jun 10 14:17:00.368530 (XEN) 20313030 32343265 31303033 63316520 30303030 35662033 30666637 28206634 Jun 10 14:17:00.375654 (XEN) 37306565 65336630 c0002029 c080cb50 c0a21b74 00000000 9bfd9000 00002805 Jun 10 14:17:00.382780 (XEN) c0b69ee8 c0b73a60 00002805 c0200000 c0c00000 c080bf40 9bfd9000 c021c1e8 Jun 10 14:17:00.389906 (XEN) 00000000 00000001 00000000 5bbd7800 c0c75070 c0bb5d18 00000000 5bbd9000 Jun 10 14:17:00.397032 (XEN) 00000006 c0c7507c 00000003 c02e2ce4 5bfff000 5bbd9000 c0bb5d18 00000003 Jun 10 14:17:00.404158 (XEN) 00000000 c02e3af8 00000000 00000000 ffffffc0 00000000 00000040 5bbd7800 Jun 10 14:17:00.411160 (XEN) 00001800 00001800 ffffffff 00002805 c0b758a0 9bfd9000 c0b69ee8 c0bb5d00 Jun 10 14:17:00.418285 (XEN) 00040200 c0200000 c0c00000 c0208430 00000040 00000000 5bbd7800 c0af1de4 Jun 10 14:17:00.425419 (XEN) 00000000 00000000 c0c44880 00001800 c0c44880 c0b51f00 00020000 00020000 Jun 10 14:17:00.432531 (XEN) 00000007 dc7c9000 00000007 00000000 00060000 c0b24c50 00040000 c0c44880 Jun 10 14:17:00.439657 (XEN) 00000000 0001fc00 c0b51f00 00020000 600001d3 c02c8dd8 c0c45480 c0aef3c0 Jun 10 14:17:00.446787 (XEN) 00000000 00001000 c0b69f08 c0b69ef8 00060000 00000000 5bfd9000 c021ed54 Jun 10 14:17:00.453907 (XEN) 800001d3 ffffffff c0b69f1c c02126d8 9bfd9000 9bfda000 00000040 0000003f Jun 10 14:17:00.460908 (XEN) 00000000 dbfd9000 c0c473f4 c0bb5b68 c0bb5d00 00040200 c0200000 c0c00000 Jun 10 14:17:00.468036 (XEN) c0b69f08 c0b69f30 c0acbd9c c021ed54 800001d3 ffffffff 00040c00 c0b26fa8 I've looked at the rest of the debug output, and found nothing relevant, or at least nothing relevant enough to me (again, I'm very unfamiliar with ARM traces). The timer status, perhaps, but it does not add much to the sketched analysis above: Jun 10 14:16:50.246933 (XEN) Dumping timer queues: Jun 10 14:16:56.157817 (XEN) CPU00: Jun 10 14:16:56.160771 (XEN) ex= 1983us timer=002f8378 cb=s_timer_fn(00000000) Jun 10 14:16:56.164744 (XEN) ex= 488252us timer=40025958 cb=virt_timer_expired(40025950) Jun 10 14:16:56.171105 (XEN) CPU01: Jun 10 14:16:56.172333 (XEN) ex= 185us timer=40014078 cb=s_timer_fn(00000000) Jun 10 14:16:56.178054 (XEN) ex= 598248us timer=40027958 cb=virt_timer_expired(40027950) The timeout of s_timer_fn on pCPU #0 is ~2ms, which basically means the pCPU is idle, and it's calling the scheduler at CSCHED2_MAX_TIMER interval. OTOH, on pCPU #1 it's smaller than 500us, which tells us that the last time the timer was armed, it was set to fire after CSCHED2_MIN_TIMER, the smallest time interval between two calls to the scheduler that Credit2 allows. I'll keep looking into this tomorrow... Help from ARM people, much appreciated. :-) Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) Attachment:
signature.asc _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |