[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen-users] kernel 3.9.2 - xen 4.2.2/4.3rc1 => BUG unable to handle kernel paging request netif_poll+0x49c/0xe8



Ye, this is OpenSUSE 12.3 kernel (both Dom0/DomU is kernel xen-3.9.2-8.1.g04040b9.x86_64) from http://download.opensuse.org/repositories/Kernel:/HEAD/standard

 

 

First VM:

 

template:/home/local # iperf -s

------------------------------------------------------------

Server listening on TCP port 5001

TCP window size: 85.3 KByte (default)

------------------------------------------------------------

[ 4] local 10.251.2.202 port 5001 connected with 10.251.2.201 port 38196

 

 

##

#(after iperf -c 10.251.2.202 -i 2 -f m from the second VM)

##

 

[ 38.447860] BUG: unable to handle kernel paging request at ffff88007928b000

[ 38.447898] IP: [<ffffffffa001a75c>] netif_poll+0x49c/0xe80 [xennet]

[ 38.447927] PGD a83067 PUD a93067 PMD 7fc28067 PTE 801000007928b065

[ 38.447955] Oops: 0003 [#1] SMP

[ 38.447970] Modules linked in: af_packet hwmon domctl crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul joydev autofs4 scsi_dh_emc scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh xenblk cdrom xennet ata_generic ata_piix

[ 38.448091] CPU 0

[ 38.448100] Pid: 0, comm: swapper/0 Not tainted 3.9.2-4.756ee56-xen #1

[ 38.448125] RIP: e030:[<ffffffffa001a75c>] [<ffffffffa001a75c>] netif_poll+0x49c/0xe80 [xennet]

[ 38.448158] RSP: e02b:ffff88007b403d18 EFLAGS: 00010286

[ 38.448176] RAX: ffff88007da68cd0 RBX: ffff88007928aec0 RCX: ffff88007928b000

 

 

 

 

This trace is viewed only using xl console, DomUs had no records in logs. You are right, may be this is Dom0 trace.

 

Here is xl-test1.log from Dom0:

 

libxl: debug: libxl_event.c:503:watchfd_callback: watch w=0x1944130 wpath=@releaseDomain token=3/0: event epath=@releaseDomain

libxl: debug: libxl.c:998:domain_death_xswatch_callback: [evg=0x19435e0:2] from domid=2 nentries=1 rc=1

libxl: debug: libxl.c:1009:domain_death_xswatch_callback: [evg=0x19435e0:2] got=domaininfos[0] got->domain=2

libxl: debug: libxl.c:1036:domain_death_xswatch_callback: exists shutdown_reported=0 dominf.flags=30004

libxl: debug: libxl.c:1048:domain_death_xswatch_callback: shutdown reporting

libxl: debug: libxl.c:1002:domain_death_xswatch_callback: [evg=0] all reported

libxl: debug: libxl.c:1066:domain_death_xswatch_callback: domain death search done

Domain 2 has shut down, reason code 3 0x3

Action for shutdown reason code 3 is restart

Domain 2 needs to be cleaned up: destroying the domain

libxl: debug: libxl.c:1250:libxl_domain_destroy: ao 0x19438d0: create: how=(nil) callback=(nil) poller=0x19435a0

libxl: debug: libxl_dm.c:1266:libxl__destroy_device_model: Device Model signaled

 

 

--

Best regards,

Eugene Istomin


On Friday, May 17, 2013 09:59:23 AM Wei Liu wrote:

> Moving discussion to Xen-devel

>

> On Thu, May 16, 2013 at 10:29:56PM +0300, Eugene Istomin wrote:

> > Hello,

> >

> > I tried to use 3.9.2 kernel with xen 4.2.2/4.3rc1 and in both variants

> > leads to this error in network-intensive load (such as iperf, 100 nginx

> > parallel

> > requests to 1M files and so on):

> It would be more helpful if you can provide info on your configurations

> (Dom0 and DomU), your workload, how to reproduce the bug.

>

> I run iperf and NFS to test Xen network, but never see any crash like

> this myself.

>

> > BUG: unable to handle kernel paging request at ffff8800795a3000

> > [ 60.246945] IP: [<ffffffffa001a75c>] netif_poll+0x49c/0xe80 [xennet]

> > [ 60.246975] PGD a8a067 PUD a9a067 PMD 7fc27067 PTE

> > 80100000795a3065

> > [ 60.247004] Oops: 0003 [#1] SMP

> > [ 60.247020] Modules linked in: af_packet hwmon domctl crc32_pclmul

> > crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw

> > aes_x86_64 joydev xts gf128mul autofs4 scsi_dh_emc scsi_dh_alua

> > scsi_dh_rdac scsi_dh_hp_sw scsi_dh xenblk cdrom xennet ata_generic

> > ata_piix

> > [ 60.247144] CPU 0

> > [ 60.247154] Pid: 0, comm: swapper/0 Not tainted 3.9.2-1.g04040b9-xen

> > #1

> > [ 60.247179] RIP: e030:[<ffffffffa001a75c>] [<ffffffffa001a75c>]

> > netif_poll+0x49c/0xe80 [xennet]

> > ...

>

> Could you provide fuul stack trace? AFAICT there is no netif_poll in Xen

> netfront/back.

>

> Presumably this is Dom0 log? (from the domctl module)

>

> > We have couple of production hypervisors on 3.4 kernels with high-

> > throughput internal network (VM-to-VM in one Dom0), iperf on them is

> > working well:

> > [ 3] 0.0- 2.0 sec 3357 MBytes 14080 Mbits/sec

> > [ 3] 2.0- 4.0 sec 2880 MBytes 12077 Mbits/sec

> > [ 3] 4.0- 6.0 sec 2909 MBytes 12202 Mbits/sec

> > [ 3] 6.0- 8.0 sec 2552 MBytes 10702 Mbits/sec

> > [ 3] 8.0-10.0 sec 3616 MBytes 15166 Mbits/sec

> > [ 3] 10.0-12.0 sec 3415 MBytes 14324 Mbits/sec

> >

> >

> > Seems like a kernel bug, is this related to one of this fixes in

> > linux-next or i need to create new bugreport?

> >

> > 1)

> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?

> > id=1aaf6d3d3d1e95f4be07e32dd84aa1c93855fbbd 2)

> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?

> > id=9ecd1a75d977e2e8c48139c7d3efed183f898d94 3)

> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?

> > id=2810e5b9a7731ca5fce22bfbe12c96e16ac44b6f 4)

> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?

> > id=03393fd5cc2b6cdeec32b704ecba64dbb0feae3c 5)

> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?

> > id=59ccb4ebbc35e36a3c143f2d1355deb75c2e628f

> I don't think these patches can fix your problem at first glance.

>

>

> Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.