[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-users] Re: [Xen-devel] kernel oops/IRQ exception when networking between many domUs



Am Montag, den 06.06.2005, 09:23 +0100 schrieb Keir Fraser:
> On 5 Jun 2005, at 17:57, Birger Toedtmann wrote:
> 
> > Apparently it is happening somewhere here:
> >
> > [...]
> > 0xc028cbe5 <net_rx_action+1135>:        test   %eax,%eax
> > 0xc028cbe7 <net_rx_action+1137>:        je     0xc028ca82 
> > <net_rx_action+780>
> > 0xc028cbed <net_rx_action+1143>:        mov    %esi,%eax
> > 0xc028cbef <net_rx_action+1145>:        shr    $0xc,%eax
> > 0xc028cbf2 <net_rx_action+1148>:        mov    %eax,(%esp)
> > 0xc028cbf5 <net_rx_action+1151>:        call   0xc028c4c4 <free_mfn>
> > 0xc028cbfa <net_rx_action+1156>:        mov    $0xffffffff,%ecx
> > ^^^^^^^^^^
> 
> Most likely the driver has tried to send a bogus page to a domU. 
> Because it's bogus the transfer fails. The driver then tries to free 
> the page back to Xen, but that also fails because the page is bogus. 
> This confuses the driver, which then BUG()s out.

I commented out the free_mfn() and status= lines: the kernel now reports
the following after it configured the 10th domU and ~80th vif, with
approx. 20-25 bridges up.  Just an idea: the number of vifs + bridges is
somewhere around the magic 128 (NR_IRQS problem in 2.0.x!) when the
crash happens - could this hint to something?


[...]
Jun  6 10:12:14 lomin kernel: 10.2.23.8: port 2(vif10.3) entering
forwarding state
Jun  6 10:12:14 lomin kernel: 10.2.35.16: topology change detected,
propagating
Jun  6 10:12:14 lomin kernel: 10.2.35.16: port 2(vif10.4) entering
forwarding state
Jun  6 10:12:14 lomin kernel: 10.2.35.20: topology change detected,
propagating
Jun  6 10:12:14 lomin kernel: 10.2.35.20: port 2(vif10.5) entering
forwarding state
Jun  6 10:12:20 lomin kernel: c014cea4
Jun  6 10:12:20 lomin kernel:  [do_page_fault+643/1665] do_page_fault
+0x469/0x738
Jun  6 10:12:20 lomin kernel:  [<c0115720>] do_page_fault+0x469/0x738
Jun  6 10:12:20 lomin kernel:  [fixup_4gb_segment+2/12] page_fault
+0x2e/0x34
Jun  6 10:12:20 lomin kernel:  [<c0109a7e>] page_fault+0x2e/0x34
Jun  6 10:12:20 lomin kernel:  [do_page_fault+49/1665] do_page_fault
+0x217/0x738
Jun  6 10:12:20 lomin kernel:  [<c01154ce>] do_page_fault+0x217/0x738
Jun  6 10:12:20 lomin kernel:  [fixup_4gb_segment+2/12] page_fault
+0x2e/0x34
Jun  6 10:12:20 lomin kernel:  [<c0109a7e>] page_fault+0x2e/0x34
Jun  6 10:12:20 lomin kernel: PREEMPT
Jun  6 10:12:20 lomin kernel: Modules linked in: dm_snapshot pcmcia
bridge ipt_REJECT ipt_state iptable_filter ipt_MASQUERADE iptable_nat
ip_conntrack ip_tables autofs4 snd_seq snd_seq_device evdev usbhid
rfcomm l2cap bluetooth dm_mod cryptoloop snd_pcm_oss snd_mixer_oss
snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore
snd_page_alloc tun uhci_hcd usb_storage usbcore irtty_sir sir_dev
ircomm_tty ircomm irda yenta_socket rsrc_nonstatic pcmcia_core 3c59x
Jun  6 10:12:20 lomin kernel: CPU:    0
Jun  6 10:12:20 lomin kernel: EIP:    0061:[do_wp_page+622/1175]    Not
tainted VLI
Jun  6 10:12:20 lomin kernel: EIP:    0061:[<c014cea4>]    Not tainted
VLI
Jun  6 10:12:20 lomin kernel: EFLAGS: 00010206   (2.6.11.11-xen0)
Jun  6 10:12:20 lomin kernel: EIP is at handle_mm_fault+0x5d/0x222
Jun  6 10:12:20 lomin kernel: eax: 15555b18   ebx: d8788000   ecx:
00000b18   edx: 15555b18
Jun  6 10:12:20 lomin kernel: esi: dcfc3b4c   edi: dcaf5580   ebp:
d8789ee4   esp: d8789ebc
Jun  6 10:12:20 lomin kernel: ds: 0069   es: 0069   ss: 0069
Jun  6 10:12:20 lomin kernel: Process python (pid: 4670,
threadinfo=d8788000 task=de1a1520)
Jun  6 10:12:20 lomin kernel: Stack: 00000040 00000001 d40e687c d40e6874
00000006 d40e685c d8789f14 dcaf5580
Jun  6 10:12:20 lomin kernel:        dcaf55ac d40e6b1c d8789fbc c01154ce
dcaf5580 d40e6b1c b4ec6ff0 00000001
Jun  6 10:12:20 lomin kernel:        00000001 de1a1520 b4ec6ff0 00000006
d8789fc4 d8789fc4 c03405b0 00000006
Jun  6 10:12:20 lomin kernel: Call Trace:
Jun  6 10:12:20 lomin kernel:  [dump_stack+16/32] show_stack+0x80/0x96
Jun  6 10:12:20 lomin kernel:  [<c0109c51>] show_stack+0x80/0x96
Jun  6 10:12:20 lomin kernel:  [show_registers+384/457] show_registers
+0x15a/0x1d1
Jun  6 10:12:20 lomin kernel:  [<c0109de1>] show_registers+0x15a/0x1d1
Jun  6 10:12:20 lomin kernel:  [die+301/458] die+0x106/0x1c4
Jun  6 10:12:20 lomin kernel:  [<c010a001>] die+0x106/0x1c4
Jun  6 10:12:20 lomin kernel:  [do_page_fault+675/1665] do_page_fault
+0x489/0x738
Jun  6 10:12:20 lomin kernel:  [<c0115740>] do_page_fault+0x489/0x738
Jun  6 10:12:20 lomin kernel:  [fixup_4gb_segment+2/12] page_fault
+0x2e/0x34
Jun  6 10:12:20 lomin kernel:  [<c0109a7e>] page_fault+0x2e/0x34
Jun  6 10:12:20 lomin kernel:  [do_page_fault+49/1665] do_page_fault
+0x217/0x738
Jun  6 10:12:20 lomin kernel:  [<c01154ce>] do_page_fault+0x217/0x738
Jun  6 10:12:20 lomin kernel:  [fixup_4gb_segment+2/12] page_fault
+0x2e/0x34
Jun  6 10:12:20 lomin kernel:  [<c0109a7e>] page_fault+0x2e/0x34
Jun  6 10:12:20 lomin kernel: Code: 8b 47 1c c1 ea 16 83 43 14 01 8d 34
90 85 f6 0f 84 52 01 00 00 89 f2 8b 4d 10 89 f8 e8 4a d1 ff ff 85 c0 89
c2 0f 84 3c 01 00 00 <8b> 00 a8 81 75 3d 85 c0 0f 84 01 01 00 00 a8 40
0f 84 a4 00 00


> 
> It's not at all clear where the bogus address comes from: the driver 
> basically just reads the address out of an skbuff, and converts it from 
> virtual to physical address. But something is obviously going wrong, 
> perhaps under memory pressure. :-(

Where, within the domUs or dom0?  The latter has lots of memory at hand,
the domU are quite strapped of memory.  I'll try to find out...


Regards,
-- 
Birger Tödtmann
Technik der Rechnernetze, Institut für Experimentelle Mathematik
Universität Duisburg-Essen, Campus Essen email:btoedtmann@xxxxxxxxxxxxxx
skype:birger.toedtmann pgp:0x6FB166C9

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.