[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] kernel BUG at drivers/net/xen-netfront.c:473!



Hi,

The BUG_ON condition looks like this:

struct page *page = skb_frag_page(frag);
len = skb_frag_size(frag);
offset = frag->page_offset;
/* Data must not cross a page boundary. */
BUG_ON(len + offset > PAGE_SIZE<<compound_order(page));

That seems to be a pretty seriously screwed up skb frag, the stacktrace suggests a packet arrived on a vif then the TCP stack either turned it back or generated a response to it. Could you reproduce the the problem with some debug printouts? The BUG_ON line should be replaced with this:

if (len + offset > PAGE_SIZE<<compound_order(page)) {
netdev_err(dev, "len %d offset %d order %d PageHead %d i %d nr_frags %d \n", len, offset, compound_order(page), PageHead(page), i, skb_shinfo(skb)->nr_frags);
    BUG();
}

This can provide some insight what exactly is wrong with this packet.

Regards,

Zoltan

On 24/10/14 18:12, Christopher S. Aker wrote:
Xen: 4.4.1-pre++ (xenbits @ 28414:b2a1758e87a8) + xsa100.patch
Dom0: 3.10.40-2 + futex patcheset
DomU: 3.15.4-x86_64 (straight up kernel.org)

Guest kernel binary and other stuff is available here: 
<http://vin.fo/~caker/xen/bugs/xen-netfront.c:473/>

The host's networking consists of 4x 10G links, bonded, in a bridge, and then a 
single vif per guest on the bridge.

We have a user who is able to reliably (although painfully) reproduce the 
following guest kernel crash.  The guest is using HAProxy as a load balancer 
for a handful of backends - so the network was being used heavily(?).


kernel BUG at drivers/net/xen-netfront.c:473!
invalid opcode: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.4-x86_64-linode45 #1
task: ffffffff81c18450 ti: ffffffff81c00000 task.ti: ffffffff81c00000
RIP: e030:[<ffffffff81568e41>]  [<ffffffff81568e41>] 
xennet_make_frags+0x247/0x40b
RSP: e02b:ffff88007fa037a8  EFLAGS: 00010002
RAX: ffffea0001dfcb40 RBX: ffff880079ee0740 RCX: 0000000000000000
RDX: ffff880079ed1a9c RSI: 0000000000001040 RDI: 0000000000001000
RBP: ffff880079bee6e8 R08: 00000000000005a8 R09: 00000000000000a6
R10: ffffffff81742dc9 R11: ffff88007978a000 R12: 0000000000000f82
R13: 00000000000000be R14: 0000000000000027 R15: ffffea0001df2300
FS:  0000000000000000(0000) GS:ffff88007fa00000(0000) knlGS:ffff8800ff300000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000001240000 CR3: 00000000775c3000 CR4: 0000000000042660
Stack:
  00000000000005a8 00000000000000dd ffff880079ed1000 00005ade816e5a45
  0000000000000020 0000000277db1000 ffff880079bee7cc 0000001d815de4b4
  ffff880079ee1030 0000000400054803 ffff880079bee6e8 ffff880079ee0740
Call Trace:
  <IRQ>
  [<ffffffff8156a577>] ? xennet_start_xmit+0x3a9/0x4a7
  [<ffffffff815ebc85>] ? dev_hard_start_xmit+0x319/0x410
  [<ffffffff816050d6>] ? sch_direct_xmit+0x6a/0x191
  [<ffffffff815ebf9e>] ? __dev_queue_xmit+0x222/0x444
  [<ffffffff8169afe8>] ? ip_options_echo+0x2f0/0x2f0
  [<ffffffff8169dd0d>] ? ip_finish_output_gso+0x329/0x40a
  [<ffffffff8169ddee>] ? ip_finish_output_gso+0x40a/0x40a
  [<ffffffff8169de41>] ? ip_finish_output+0x53/0x3c4
  [<ffffffff8169d51e>] ? ip_queue_xmit+0x2be/0x2e9
  [<ffffffff816af12c>] ? tcp_transmit_skb+0x74e/0x791
  [<ffffffff816acb33>] ? tcp_clean_rtx_queue+0x5c1/0x6b2
  [<ffffffff816b1c6e>] ? tcp_write_xmit+0x3eb/0x542
  [<ffffffff816b1e1a>] ? __tcp_push_pending_frames+0x24/0x7f
  [<ffffffff816adc88>] ? tcp_rcv_established+0x115/0x5a1
  [<ffffffff816df148>] ? ipv4_confirm+0xbf/0xc9
  [<ffffffff816b4715>] ? tcp_v4_do_rcv+0xa3/0x1f5
  [<ffffffff816b4c2b>] ? tcp_v4_rcv+0x3c4/0x715
  [<ffffffff816341d1>] ? nf_hook_slow+0x72/0x107
  [<ffffffff816988c4>] ? ip_rcv+0x317/0x317
  [<ffffffff816989d6>] ? ip_local_deliver_finish+0x112/0x1cd
  [<ffffffff815e72a5>] ? __netif_receive_skb_core+0x4e8/0x520
  [<ffffffff815e7564>] ? netif_receive_skb_internal+0x71/0x77
  [<ffffffff815eb44d>] ? napi_gro_receive+0xa7/0xe5
  [<ffffffff8156aec2>] ? handle_incoming_queue+0xe1/0x138
  [<ffffffff8156b41b>] ? xennet_poll+0x502/0x5cc
  [<ffffffff815e6252>] ? __napi_schedule+0x4c/0x4e
  [<ffffffff815e7773>] ? net_rx_action+0xa7/0x1f6
  [<ffffffff810a68cf>] ? __do_softirq+0xd1/0x1db
  [<ffffffff810a6a5e>] ? irq_exit+0x40/0x87
  [<ffffffff814e49c9>] ? xen_evtchn_do_upcall+0x2f/0x3a
  [<ffffffff817b96fe>] ? xen_do_hypervisor_callback+0x1e/0x30
  <EOI>
  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
  [<ffffffff81007124>] ? xen_safe_halt+0xc/0x15
  [<ffffffff8101287a>] ? default_idle+0x5/0x8
  [<ffffffff810d34bc>] ? cpuidle_idle_call+0x3a/0x7f
  [<ffffffff810d3585>] ? cpu_idle_loop+0x84/0xab
  [<ffffffff81caff44>] ? start_kernel+0x308/0x30e
  [<ffffffff81cafa76>] ? repair_env_string+0x58/0x58
  [<ffffffff810071f1>] ? xen_setup_runstate_info+0x27/0x34
  [<ffffffff81cb2dc5>] ? xen_start_kernel+0x400/0x405
Code: 01 44 8b 69 0c 44 8b 61 08 48 8b 30 31 c9 f7 c6 00 40 00 00 74 03 8b 48 68 43 
8d 74 25 00 bf 00 10 00 00 48 d3 e7 48 39 fe 76 04 <0f> 0b eb fe 45 89 e7 41 81 
e4 ff 0f 00 00 41 c1 ef 0c 45 89 ff
RIP  [<ffffffff81568e41>] xennet_make_frags+0x247/0x40b
  RSP <ffff88007fa037a8>
---[ end trace e681a3f19fa83070 ]---
Kernel panic - not syncing: Fatal exception in interrupt

Thanks,
-Chris


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.