>>Ye, this is OpenSUSE 12.3 kernel (both Dom0/DomU is
>>kernel-xen-3.9.2-8.1.g04040b9.x86_64) from
>>>>http://download.opensuse.org/repositories/Kernel:/HEAD/standard
Sorry, misspelled,
I tested kernel-xen-3.9.2-8.1.g04040b9.x86_64 and kernel-xen-3.9.2-4.756ee56.x86_64 kernels - results are the same.
--
Best regards,
Eugene Istomin
On Friday, May 17, 2013 12:37:26 PM Eugene Istomin wrote:
Ye, this is OpenSUSE 12.3 kernel (both Dom0/DomU is kernel xen-3.9.2-8.1.g04040b9.x86_64) from http://download.opensuse.org/repositories/Kernel:/HEAD/standard
First VM:
template:/home/local # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.251.2.202 port 5001 connected with 10.251.2.201 port 38196
##
#(after iperf -c 10.251.2.202 -i 2 -f m from the second VM)
##
[ 38.447860] BUG: unable to handle kernel paging request at ffff88007928b000
[ 38.447898] IP: [<ffffffffa001a75c>] netif_poll+0x49c/0xe80 [xennet]
[ 38.447927] PGD a83067 PUD a93067 PMD 7fc28067 PTE 801000007928b065
[ 38.447955] Oops: 0003 [#1] SMP
[ 38.447970] Modules linked in: af_packet hwmon domctl crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 xts gf128mul joydev autofs4 scsi_dh_emc scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh xenblk cdrom xennet ata_generic ata_piix
[ 38.448091] CPU 0
[ 38.448100] Pid: 0, comm: swapper/0 Not tainted 3.9.2-4.756ee56-xen #1
[ 38.448125] RIP: e030:[<ffffffffa001a75c>] [<ffffffffa001a75c>] netif_poll+0x49c/0xe80 [xennet]
[ 38.448158] RSP: e02b:ffff88007b403d18 EFLAGS: 00010286
[ 38.448176] RAX: ffff88007da68cd0 RBX: ffff88007928aec0 RCX: ffff88007928b000
This trace is viewed only using xl console, DomUs had no records in logs. You are right, may be this is Dom0 trace.
Here is xl-test1.log from Dom0:
libxl: debug: libxl_event.c:503:watchfd_callback: watch w=0x1944130 wpath=@releaseDomain token=3/0: event epath=@releaseDomain
libxl: debug: libxl.c:998:domain_death_xswatch_callback: [evg=0x19435e0:2] from domid=2 nentries=1 rc=1
libxl: debug: libxl.c:1009:domain_death_xswatch_callback: [evg=0x19435e0:2] got=domaininfos[0] got->domain=2
libxl: debug: libxl.c:1036:domain_death_xswatch_callback: exists shutdown_reported=0 dominf.flags=30004
libxl: debug: libxl.c:1048:domain_death_xswatch_callback: shutdown reporting
libxl: debug: libxl.c:1002:domain_death_xswatch_callback: [evg=0] all reported
libxl: debug: libxl.c:1066:domain_death_xswatch_callback: domain death search done
Domain 2 has shut down, reason code 3 0x3
Action for shutdown reason code 3 is restart
Domain 2 needs to be cleaned up: destroying the domain
libxl: debug: libxl.c:1250:libxl_domain_destroy: ao 0x19438d0: create: how=(nil) callback=(nil) poller=0x19435a0
libxl: debug: libxl_dm.c:1266:libxl__destroy_device_model: Device Model signaled
--
Best regards,
Eugene Istomin
On Friday, May 17, 2013 09:59:23 AM Wei Liu wrote:
> Moving discussion to Xen-devel
>
> On Thu, May 16, 2013 at 10:29:56PM +0300, Eugene Istomin wrote:
> > Hello,
> >
> > I tried to use 3.9.2 kernel with xen 4.2.2/4.3rc1 and in both variants
> > leads to this error in network-intensive load (such as iperf, 100 nginx
> > parallel
> > requests to 1M files and so on):
> It would be more helpful if you can provide info on your configurations
> (Dom0 and DomU), your workload, how to reproduce the bug.
>
> I run iperf and NFS to test Xen network, but never see any crash like
> this myself.
>
> > BUG: unable to handle kernel paging request at ffff8800795a3000
> > [ 60.246945] IP: [<ffffffffa001a75c>] netif_poll+0x49c/0xe80 [xennet]
> > [ 60.246975] PGD a8a067 PUD a9a067 PMD 7fc27067 PTE
> > 80100000795a3065
> > [ 60.247004] Oops: 0003 [#1] SMP
> > [ 60.247020] Modules linked in: af_packet hwmon domctl crc32_pclmul
> > crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw
> > aes_x86_64 joydev xts gf128mul autofs4 scsi_dh_emc scsi_dh_alua
> > scsi_dh_rdac scsi_dh_hp_sw scsi_dh xenblk cdrom xennet ata_generic
> > ata_piix
> > [ 60.247144] CPU 0
> > [ 60.247154] Pid: 0, comm: swapper/0 Not tainted 3.9.2-1.g04040b9-xen
> > #1
> > [ 60.247179] RIP: e030:[<ffffffffa001a75c>] [<ffffffffa001a75c>]
> > netif_poll+0x49c/0xe80 [xennet]
> > ...
>
> Could you provide fuul stack trace? AFAICT there is no netif_poll in Xen
> netfront/back.
>
> Presumably this is Dom0 log? (from the domctl module)
>
> > We have couple of production hypervisors on 3.4 kernels with high-
> > throughput internal network (VM-to-VM in one Dom0), iperf on them is
> > working well:
> > [ 3] 0.0- 2.0 sec 3357 MBytes 14080 Mbits/sec
> > [ 3] 2.0- 4.0 sec 2880 MBytes 12077 Mbits/sec
> > [ 3] 4.0- 6.0 sec 2909 MBytes 12202 Mbits/sec
> > [ 3] 6.0- 8.0 sec 2552 MBytes 10702 Mbits/sec
> > [ 3] 8.0-10.0 sec 3616 MBytes 15166 Mbits/sec
> > [ 3] 10.0-12.0 sec 3415 MBytes 14324 Mbits/sec
> >
> >
> > Seems like a kernel bug, is this related to one of this fixes in
> > linux-next or i need to create new bugreport?
> >
> > 1)
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?
> > id=1aaf6d3d3d1e95f4be07e32dd84aa1c93855fbbd 2)
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?
> > id=9ecd1a75d977e2e8c48139c7d3efed183f898d94 3)
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?
> > id=2810e5b9a7731ca5fce22bfbe12c96e16ac44b6f 4)
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?
> > id=03393fd5cc2b6cdeec32b704ecba64dbb0feae3c 5)
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?
> > id=59ccb4ebbc35e36a3c143f2d1355deb75c2e628f
> I don't think these patches can fix your problem at first glance.
>
>
> Wei.
|