[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] stubdom migration failure on merlot* XSM related (Was: [adhoc test] 65682: tolerable FAIL])
On Fri, 2015-12-11 at 14:05 +0000, Ian Campbell wrote: So returning to the logs for 65682 (the first repro, which exhibited the fail with the usual timeout). http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/info.html First thing I noticed was the numa config http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/merlot1-output-xl_vcpu-list dom0 has 32 vcpus, allowed hard/soft on all processors. the domU has 2 vcpus, with a soft preference for pcpu 16-23, while the stub dom has a soft preference for 0-7. I think that's normal NUMA placement doing what we want and while I've not thought about the wiseness (or otherwise) of having the stubdom on another node I don't see why that should cause a 10s+ delay to the network device forwarding. I've confirmed in http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/merlot1--etc-network-interfaces.new that STP is disabled. The hotplug log is empty. The migration was of dom5 + stubdom6 into dom7 + stubdom8. Time line (from 13.ts-guest-localmigrate.log): 2015-12-10 20:10:30 Z migration begins 2015-12-10 20:13:43 Z migration completes ("execution took 193 seconds[<=2x400]" 2015-12-10 20:13:43 Z ssh to guest `date` begins 2015-12-10 20:13:53 Z ssh guest timesout 2015-12-10-20:13:58 Z ssh produces: "Thu Dec 10 20:13:58 UTC 2015." http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/serial-merlot1.log (very abridged) Dec 10 20:13:36.441014 (XEN) HVM5 save: CPU [...] Dec 10 20:13:36.641048 (XEN) HVM7 restore: CPU 0 [...] Dec 10 20:13:36.705029 [ 1391.363224] device vif8.0 entered promiscuous mode Dec 10 20:13:38.753001 [ 1391.369957] IPv6: ADDRCONF(NETDEV_UP): vif8.0: link is not ready Dec 10 20:13:38.760973 (d8) Bootstrapping... Dec 10 20:13:38.985061 (d8) backend at /local/domain/0/backend/vif/8/0 [...] Dec 10 20:13:38.993010 (d8) mac is 5a:36:0e:92:00:02 Dec 10 20:13:38.993038 [ 1391.467483] IPv6: ADDRCONF(NETDEV_CHANGE): vif8.0: link becomes ready Dec 10 20:13:39.001015 [ 1391.467574] xenbr0: port 4(vif8.0) entered forwarding state Dec 10 20:13:39.009015 [ 1391.467608] xenbr0: port 4(vif8.0) entered forwarding state [...] Dec 10 20:13:39.273017 (d8) xs_read_watch() -> /local/domain/0/backend/vbd/8/5632/params hdc [...] Dec 10 20:13:39.281011 [ 1391.850153] device vif7.0 entered promiscuous mode Dec 10 20:13:39.281047 [ 1391.856999] IPv6: ADDRCONF(NETDEV_UP): vif7.0: link is not ready Dec 10 20:13:39.289000 (XEN) irq.c:385: Dom7 callback via changed to Direct Vector 0xf3 [...] Dec 10 20:13:39.369050 (d8) close(3) Dec 10 20:13:39.377010 (d8) close network: backend at /local/domain/0/backend/vif/8/0 Dec 10 20:13:39.377045 [ 1391.996438] xen_netback:set_backend_state: backend/vif/8/0: prepare for reconnect Dec 10 20:13:39.385011 [ 1392.034769] xen-blkback:ring-ref 9, event-channel 24, protocol 1 (x86_64-abi) Dec 10 20:13:39.424993 [ 1392.125011] xenbr0: port 2(vif6.0) entered disabled state Dec 10 20:13:39.513026 [ 1392.125835] device vif6.0 left promiscuous mode Dec 10 20:13:39.513062 [ 1392.125868] xenbr0: port 2(vif6.0) entered disabled state Dec 10 20:13:39.520999 [ 1392.185454] xenbr0: port 3(vif5.0) entered disabled state Dec 10 20:13:39.569013 [ 1392.186308] device vif5.0 left promiscuous mode Dec 10 20:13:39.577026 [ 1392.186393] xenbr0: port 3(vif5.0) entered disabled state Dec 10 20:13:39.584976 (XEN) avc:ÂÂdeniedÂÂ{ pcilevel } for domid=8 target=7 scontext=system_u:system_r:dm_dom_t tcontext=system_u:system_r:domU_t_target tclass=hvm Dec 10 20:13:39.793039 [ 1392.468784] xenbr0: port 4(vif8.0) entered disabled state Dec 10 20:13:39.856990 [ 1409.562306] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.0: link becomes ready Dec 10 20:13:56.953048 [ 1409.562434] xenbr0: port 5(vif7.0) entered forwarding state Dec 10 20:13:56.953097 [ 1409.562450] xenbr0: port 5(vif7.0) entered forwarding state (the end, until ts-log-capture sends some keys) I think what we can see at Dec 10 20:13:39.289000 is the PVHVM stuff initialising direct vector injection of event channels and the stubdom8 network close at Dec 10 20:13:39.377010 is the guest unplugging the emulated devices, then vif8.0 (the corresponding stubdom vif) goes away and vif7.0 (the guest's vif) becomes ready. All of that is well within the window of the "xl migrate". However as Wei noticed there is then a 17 second delay until vif7.0 enters the forwarding state. The source stubdom log (which includes the kernel serial log) is http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/merlot1---var-log-xen-qemu-dm-debianhvm.guest.osstest.log and the destination is http://osstest.test-lab.xenproject.org/~osstest/pub/logs/65682/test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm/merlot1---var-log-xen-qemu-dm-debianhvm.guest.osstest--incoming.log Both lack timestamps so it's not easy to discern anything. In the non-XSM case 2015-12-11 12:58:54 Z migration finishes 2015-12-11 12:58:54 Z ssh guest `date` starts 2015-12-11 12:58:55 Z date output from guest: Fri Dec 11 12:58:55 UTC 2015 and in the host serial logs: Dec 11 12:58:50.573034 [ 1007.966381] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.0: link becomes ready Dec 11 12:58:54.133079 [ 1007.966487] xenbr0: port 5(vif7.0) entered forwarding state Dec 11 12:58:54.133149 [ 1007.966502] xenbr0: port 5(vif7.0) entered forwarding state Note that there is a 4s (not 16s as with XSM) delay there. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |