[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] netback BUG_ON when using copy_skb=1
On 2013/10/28 15:43, Jan Beulich wrote: >>>> On 26.10.13 at 10:32, jerry <jerry.lilijun@xxxxxxxxxx> wrote: >> The reason why the vif net-device isn't released after shutting down VM was >> found with copy_skb disabled. >> Let it be supposed that VM1(vif1.0) sends packets to VM2(vif2.0) by virtual >> switch. >> 1) The VM2's OS is windows 2003 and has been shutdown before for some >> unexpected reason. >> After being created, this VM2 stopped the starting process at the prompt >> windows named "Shutdown Event Tracker". >> It is waiting for users to input some messages for the question why the >> computer shut down unexpectedly. >> >> 2) The VM2 already has vif2.0 created. Then I added a new vif net-device >> using virsh commands. >> The new vif2.1 was not completely created with no interrupts, but its >> state is running and tx queues is started as default. >> The function connect() in xenbus.c hasn't been called for vif2.1. The >> related information in xenstore is as follows: >> linux-szRoyS:/ # xenstore-ls -f | grep 2 | grep state >> /local/domain/0/device-model/2/state = "running" >> /local/domain/0/backend/vbd/2/51712/state = "4" >> /local/domain/0/backend/vbd/2/51760/state = "4" >> /local/domain/0/backend/vif/2/0/state = "4" >> /local/domain/0/backend/vif/2/1/state = "2" >> /local/domain/0/backend/console/2/0/state = "1" >> /local/domain/2/control/uvp/vm_state = "running" >> /local/domain/2/device/vbd/51712/state = "4" >> /local/domain/2/device/vbd/51760/state = "4" >> /local/domain/2/device/vif/0/state = "4" >> /local/domain/2/device/vif/1/state = "1" >> >> 3) The KOBJ_ONLINE message was generated in function backend_create_netif() >> called in netback_probe(). >> This event will invoke network script named "vif-bridge" executing and >> add vif2.1 to virtual switch. >> Then packets from vif1.0(VM1) will be forwarded or flooded to vif2.1 by >> virtual switch. >> The vif2.1 dropped this packets because its not netif_schedulable() in >> function netif_be_start_xmit(). >> >> 4) After setting vif2.1 to down and then to up, the TX queue can't be >> started in net_open() with carrier off. >> So its qdisc became fifo_qdic and the TX queue state stopped. >> In this case, the packets will be held in qdisc queue and can't be >> dequeued in function dequeue_skb() >> for vif2.1's stopped TX queues. >> >> 5) If VM1 was destroyed, the packets from vif1.0 can't be released and >> vif1.0 can't be disconnected. >> The vif1.0 will be remained unreleased until setting vif2.1 to down. >> >> This problem is mainly because that vif2.1 was not created successfully >> and got in a strange state: >> running but TX queue is stopped. The function backend_create_netif() is >> called in two place netback_probe() and >> frontend_changed(). I think we can remove the backend_create_netif() call >> in netback_probe(). >> So we can make sure the vif net-device created completely after front-end >> changed to XenbusStateConnected. >> >> The patch is as follows: >> --- drivers/xen/netback/xenbus.c.old 2013-10-26 16:23:07.000000000 +0800 >> +++ drivers/xen/netback/xenbus.c 2013-10-26 16:23:31.000000000 +0800 >> @@ -156,9 +156,6 @@ >> if (err) >> goto fail; >> >> - /* This kicks hotplug scripts, so do it immediately. */ >> - backend_create_netif(be); >> - >> return 0; >> >> abort_transaction: >> >> Do you have some ideas? > > No, not really. Would be helpful if this could be matched up to > behavior (and eventual changes thereto) of the upstream driver. Hi Wei and Jan, Thanks for your reply. My VM is running with SUSE11 sp2 netback drivers. So the upstream driver xen-netback has not been tested in such situation. The patch before may introduce some problems when migrating VMs. So I have a new solution to fix my problem. The patch is as follows: --- drivers/xen/netback/interface.c.old 2013-10-29 11:46:36.000000000 +0800 +++ drivers/xen/netback/interface.c 2013-10-29 11:46:47.000000000 +0800 @@ -111,8 +111,8 @@ netif_t *netif = netdev_priv(dev); if (netback_carrier_ok(netif)) { __netif_up(netif); - netif_start_queue(dev); } + netif_start_queue(dev); return 0; } After this modification, when vif is not connected to front-end, we can make Qdisc continue to transmit skb to vif and then dropped. I mean that the Qdisc queue shouldn't cache SKBs when vif is not created completely. Any ideas? Jerry > > Jan > > > . > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |