[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] netback Oops then xenwatch stuck in D state
>>> On 13.02.13 at 03:51, "Christopher S. Aker" <caker@xxxxxxxxxxxx> wrote: > Feb 12 20:34:12: vif vif-21-0 vif21.0: Frag is bigger than frame. > Feb 12 20:34:12: vif vif-21-0 vif21.0: fatal error; disabling device > <-------------- > Feb 12 20:34:12: BUG: unable to handle kernel NULL pointer dereference at > 00000000000008b8 >... > Feb 12 20:34:12: Call Trace: > Feb 12 20:34:12: [<ffffffff817605da>] _raw_spin_lock_irqsave+0x2a/0x40 > Feb 12 20:34:12: [<ffffffff8154446f>] xen_netbk_schedule_xenvif+0x8f/0x100 > Feb 12 20:34:12: [<ffffffff81544505>] xen_netbk_check_rx_xenvif+0x25/0x60 > Feb 12 20:34:12: [<ffffffff815445eb>] netbk_tx_err+0x5b/0x70 > Feb 12 20:34:12: [<ffffffff8154518c>] xen_netbk_tx_build_gops+0xb8c/0xbc0 > Feb 12 20:34:12: [<ffffffff81012880>] ? __switch_to+0x160/0x4f0 > Feb 12 20:34:12: [<ffffffff810891b8>] ? idle_balance+0xf8/0x150 > Feb 12 20:34:12: [<ffffffff81080150>] ? finish_task_switch+0x60/0xd0 > Feb 12 20:34:12: [<ffffffff8175f7b4>] ? __schedule+0x394/0x750 > Feb 12 20:34:12: [<ffffffff815452af>] xen_netbk_kthread+0xef/0x9d0 > Feb 12 20:34:12: [<ffffffff81080150>] ? finish_task_switch+0x60/0xd0 > Feb 12 20:34:12: [<ffffffff810720c0>] ? wake_up_bit+0x40/0x40 > Feb 12 20:34:12: [<ffffffff815451c0>] ? xen_netbk_tx_build_gops+0xbc0/0xbc0 > Feb 12 20:34:12: [<ffffffff81071a06>] kthread+0xc6/0xd0 > Feb 12 20:34:12: [<ffffffff810037b9>] ? xen_end_context_switch+0x19/0x20 > Feb 12 20:34:12: [<ffffffff81071940>] ? > kthread_freezable_should_stop+0x70/0x70 > Feb 12 20:34:12: [<ffffffff8176847c>] ret_from_fork+0x7c/0xb0 > Feb 12 20:34:12: [<ffffffff81071940>] ? > kthread_freezable_should_stop+0x70/0x70 I think the root cause is the same as for the problem reported on the !classic" kernels - we should not blindly shut down everything on a fatal error. Instead I think we ought to set a flag on the xenvif and disassociate the two in a more controlled manner. On the pv-ops tree, that would likely be just at the bottom of the main loop in xen_netbk_kthread(), with the caveat that there needs to be a way to identify the busted xenvif(s). On the classic tree, this apparently could be done directly in net_tx_action() (and hence can be done in netbk_fatal_tx_err() in place of the call to xenvif_carrier_off()), but the scheduled piece of code would then need to sync with both tasklets. Of course there's nothing preventing the pv-ops solution to be similar to this (allowing easier adding back of tasklet support, which - as I already told you elsewhere - appears to address throughput and/or CPU utilization problems people reported to us with the kthreads approach). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |