[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [BUG] kernel BUG at drivers/block/xen-blkfront.c:1711
On 08/10/2016 10:54 PM, Evgenii Shatokhin wrote: > On 10.08.2016 15:49, Bob Liu wrote: >> >> On 08/10/2016 08:33 PM, Evgenii Shatokhin wrote: >>> On 14.07.2016 15:04, Bob Liu wrote: >>>> >>>> On 07/14/2016 07:49 PM, Evgenii Shatokhin wrote: >>>>> On 11.07.2016 15:04, Bob Liu wrote: >>>>>> >>>>>> >>>>>> On 07/11/2016 04:50 PM, Evgenii Shatokhin wrote: >>>>>>> On 06.06.2016 11:42, Dario Faggioli wrote: >>>>>>>> Just Cc-ing some Linux, block, and Xen on CentOS people... >>>>>>>> >>>>>>> >>>>>>> Ping. >>>>>>> >>>>>>> Any suggestions how to debug this or what might cause the problem? >>>>>>> >>>>>>> Obviously, we cannot control Xen on the Amazon's servers. But perhaps >>>>>>> there is something we can do at the kernel's side, is it? >>>>>>> >>>>>>>> On Mon, 2016-06-06 at 11:24 +0300, Evgenii Shatokhin wrote: >>>>>>>>> (Resending this bug report because the message I sent last week did >>>>>>>>> not >>>>>>>>> make it to the mailing list somehow.) >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> One of our users gets kernel panics from time to time when he tries >>>>>>>>> to >>>>>>>>> use his Amazon EC2 instance with CentOS7 x64 in it [1]. Kernel panic >>>>>>>>> happens within minutes from the moment the instance starts. The >>>>>>>>> problem >>>>>>>>> does not show up every time, however. >>>>>>>>> >>>>>>>>> The user first observed the problem with a custom kernel, but it was >>>>>>>>> found later that the stock kernel 3.10.0-327.18.2.el7.x86_64 from >>>>>>>>> CentOS7 was affected as well. >>>>>> >>>>>> Please try this patch: >>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7b0767502b5db11cb1f0daef2d01f6d71b1192dc >>>>>> >>>>>> Regards, >>>>>> Bob >>>>>> >>>>> >>>>> Unfortunately, it did not help. The same BUG_ON() in >>>>> blkfront_setup_indirect() still triggers in our kernel based on RHEL's >>>>> 3.10.0-327.18.2, where I added the patch. >>>>> >>>>> As far as I can see, the patch makes sure the indirect pages are added to >>>>> the list only if (!info->feature_persistent) holds. I suppose it holds in >>>>> our case and the pages are added to the list because the triggered >>>>> BUG_ON() is here: >>>>> >>>>> if (!info->feature_persistent && info->max_indirect_segments) { >>>>> <...> >>>>> BUG_ON(!list_empty(&info->indirect_pages)); >>>>> <...> >>>>> } >>>>> >>>> >>>> That's odd. >>>> Could you please try to reproduce this issue with a recent upstream kernel? >>>> >>>> Thanks, >>>> Bob >>> >>> No luck with the upstream kernel 4.7.0 so far due to unrelated issues (bad >>> initrd, I suppose, so the system does not even boot). >>> >>> However, the problem reproduced with the stable upstream kernel 3.14.74. >>> After the system booted the second time with this kernel, that BUG_ON >>> triggered: >>> kernel BUG at drivers/block/xen-blkfront.c:1701 >>> >> >> Could you please provide more detail on how to reproduce this bug? I'd like >> to have a test. >> >> Thanks! >> Bob > > As the user says, he uses an Amazon EC2 instance. Namely: HVM CentOS7 AMI on > a c3.large instance with EBS magnetic storage. > Oh, then it would be difficult to debug this issue. The xen-blkfront communicates with xen-blkback(in dom0 or driver domain), but that part is a black box when running Amazon EC2. We can't see the source code of the backend side! Can this bug be reproduced on your own environment(xen + dom0)? > At least 2 LVM partitions are needed: > * /, 20-30 Gb should be enough, ext4 > * /vz, 5-10 Gb should be enough, ext4 > > Kernel 3.14.74 I was talking about: > https://www.dropbox.com/s/bhus3mubza87z86/kernel-3.14.74-1.test.x86_64.rpm?dl=1 > > Not sure if it is relevant, but the user may have installed additional > packages from > https://download.openvz.org/virtuozzo/releases/7.0-rtm/x86_64/os/ repository. > Namely: vzctl, vzmigrate, vzprocps, vztt-lib, vzctcalc, ploop, prlctl, > centos-7-x86_64-ez. > > After the kernel and the other mentioned packages have been installed, > the user rebooted the instance to run that kernel 3.14.74. > > Then - start the instance, wait 5 minutes, stop the instance, repeat. 2-20 > such iterations were usually enough to reproduce the problem. Can be > automated with the help of Amazon's API. > > BTW, before the BUG_ON triggered this time, there was the following in dmesg. > Not sure if it is related but still: > Attach the full dmesg would be better. Regards, Bob > ---------------------- > [ 2.835034] scsi0 : ata_piix > [ 2.840317] scsi1 : ata_piix > [ 2.842267] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc100 irq 14 > [ 2.845861] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc108 irq 15 > [ 2.853840] AVX version of gcm_enc/dec engaged. > [ 2.859963] xen_netfront: Initialising Xen virtual ethernet driver > [ 2.867156] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni) > [ 2.885861] blkfront: xvda: barrier or flush: disabled; persistent grants: > disabled; indirect descriptors: enabled; > [ 2.889046] alg: No test for crc32 (crc32-pclmul) > [ 2.899290] xvda: xvda1 > [ 2.997751] blkfront: xvdc: flush diskcache: enabled; persistent grants: > disabled; indirect descriptors: enabled; > [ 3.007401] xvdc: unknown partition table > [ 3.010465] Setting capacity to 31992832 > [ 3.012922] xvdc: detected capacity change from 0 to 16380329984 > [ 3.017408] blkfront: xvdd: flush diskcache: enabled; persistent grants: > disabled; indirect descriptors: enabled; > [ 3.023861] xvdd: unknown partition table > [ 3.026481] Setting capacity to 31992832 > [ 3.029051] xvdd: detected capacity change from 0 to 16380329984 > [ 3.033320] blkfront: xvdf: barrier or flush: disabled; persistent grants: > disabled; indirect descriptors: enabled; > [ 3.040712] random: nonblocking pool is initialized > [ 3.057432] xvdf: unknown partition table > [ 3.060807] Setting capacity to 41943040 > [ 3.063194] xvdf: detected capacity change from 0 to 21474836480 > [ 3.067684] blkfront: xvdb: barrier or flush: disabled; persistent grants: > disabled; indirect descriptors: enabled; > [ 3.076835] xvdb: unknown partition table > [ 3.079692] Setting capacity to 16777216 > [ 3.082112] xvdb: detected capacity change from 0 to 8589934592 > [ 3.086853] vbd vbd-51712: 16 xlvbd_add at > /local/domain/0/backend/vbd/9543/51712 > ---------------------- > >> >>>> >>>>> So the problem is still out there somewhere, it seems. >>>>> >>>>> Regards, >>>>> Evgenii >>>>> >>>>>>>>> >>>>>>>>> The part of the system log he was able to retrieve is attached. Here >>>>>>>>> is >>>>>>>>> the bug info, for convenience: >>>>>>>>> >>>>>>>>> ------------------------------------ >>>>>>>>> [ 2.246912] kernel BUG at drivers/block/xen-blkfront.c:1711! >>>>>>>>> [ 2.246912] invalid opcode: 0000 [#1] SMP >>>>>>>>> [ 2.246912] Modules linked in: ata_generic pata_acpi >>>>>>>>> crct10dif_pclmul >>>>>>>>> crct10dif_common crc32_pclmul crc32c_intel ghash_clmulni_intel >>>>>>>>> xen_netfront xen_blkfront(+) aesni_intel lrw ata_piix gf128mul >>>>>>>>> glue_helper ablk_helper cryptd libata serio_raw floppy sunrpc >>>>>>>>> dm_mirror >>>>>>>>> dm_region_hash dm_log dm_mod scsi_transport_iscsi >>>>>>>>> [ 2.246912] CPU: 1 PID: 50 Comm: xenwatch Not tainted >>>>>>>>> 3.10.0-327.18.2.el7.x86_64 #1 >>>>>>>>> [ 2.246912] Hardware name: Xen HVM domU, BIOS 4.2.amazon >>>>>>>>> 12/07/2015 >>>>>>>>> [ 2.246912] task: ffff8800e9fcb980 ti: ffff8800e98bc000 task.ti: >>>>>>>>> ffff8800e98bc000 >>>>>>>>> [ 2.246912] RIP: 0010:[<ffffffffa015584f>] [<ffffffffa015584f>] >>>>>>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront] >>>>>>>>> [ 2.246912] RSP: 0018:ffff8800e98bfcd0 EFLAGS: 00010283 >>>>>>>>> [ 2.246912] RAX: ffff8800353e15c0 RBX: ffff8800e98c52c8 RCX: >>>>>>>>> 0000000000000020 >>>>>>>>> [ 2.246912] RDX: ffff8800353e15b0 RSI: ffff8800e98c52b8 RDI: >>>>>>>>> ffff8800353e15d0 >>>>>>>>> [ 2.246912] RBP: ffff8800e98bfd20 R08: ffff8800353e15b0 R09: >>>>>>>>> ffff8800eb403c00 >>>>>>>>> [ 2.246912] R10: ffffffffa0155532 R11: ffffffffffffffe8 R12: >>>>>>>>> ffff8800e98c4000 >>>>>>>>> [ 2.246912] R13: ffff8800e98c52b8 R14: 0000000000000020 R15: >>>>>>>>> ffff8800353e15c0 >>>>>>>>> [ 2.246912] FS: 0000000000000000(0000) GS:ffff8800efc20000(0000) >>>>>>>>> knlGS:0000000000000000 >>>>>>>>> [ 2.246912] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>>> [ 2.246912] CR2: 00007f1b615ef000 CR3: 00000000e2b44000 CR4: >>>>>>>>> 00000000001406e0 >>>>>>>>> [ 2.246912] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>>>>>> 0000000000000000 >>>>>>>>> [ 2.246912] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >>>>>>>>> 0000000000000400 >>>>>>>>> [ 2.246912] Stack: >>>>>>>>> [ 2.246912] 0000000000000020 0000000000000001 00000020a0157217 >>>>>>>>> 00000100e98bfdbc >>>>>>>>> [ 2.246912] 0000000027efa3ef ffff8800e98bfdbc ffff8800e98ce000 >>>>>>>>> ffff8800e98c4000 >>>>>>>>> [ 2.246912] ffff8800e98ce040 0000000000000001 ffff8800e98bfe08 >>>>>>>>> ffffffffa0155d4c >>>>>>>>> [ 2.246912] Call Trace: >>>>>>>>> [ 2.246912] [<ffffffffa0155d4c>] blkback_changed+0x4ec/0xfc8 >>>>>>>>> [xen_blkfront] >>>>>>>>> [ 2.246912] [<ffffffff813a6fd0>] ? xenbus_gather+0x170/0x190 >>>>>>>>> [ 2.246912] [<ffffffff816322f5>] ? __slab_free+0x10e/0x277 >>>>>>>>> [ 2.246912] [<ffffffff813a805d>] >>>>>>>>> xenbus_otherend_changed+0xad/0x110 >>>>>>>>> [ 2.246912] [<ffffffff813a7257>] ? xenwatch_thread+0x77/0x180 >>>>>>>>> [ 2.246912] [<ffffffff813a9ba3>] backend_changed+0x13/0x20 >>>>>>>>> [ 2.246912] [<ffffffff813a7246>] xenwatch_thread+0x66/0x180 >>>>>>>>> [ 2.246912] [<ffffffff810a6ae0>] ? wake_up_atomic_t+0x30/0x30 >>>>>>>>> [ 2.246912] [<ffffffff813a71e0>] ? >>>>>>>>> unregister_xenbus_watch+0x1f0/0x1f0 >>>>>>>>> [ 2.246912] [<ffffffff810a5aef>] kthread+0xcf/0xe0 >>>>>>>>> [ 2.246912] [<ffffffff810a5a20>] ? >>>>>>>>> kthread_create_on_node+0x140/0x140 >>>>>>>>> [ 2.246912] [<ffffffff81646118>] ret_from_fork+0x58/0x90 >>>>>>>>> [ 2.246912] [<ffffffff810a5a20>] ? >>>>>>>>> kthread_create_on_node+0x140/0x140 >>>>>>>>> [ 2.246912] Code: e1 48 85 c0 75 ce 49 8d 84 24 40 01 00 00 48 89 >>>>>>>>> 45 >>>>>>>>> b8 e9 91 fd ff ff 4c 89 ff e8 8d ae 06 e1 e9 f2 fc ff ff 31 c0 e9 2e >>>>>>>>> fe >>>>>>>>> ff ff <0f> 0b e8 9a 57 f2 e0 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 >>>>>>>>> 00 >>>>>>>>> [ 2.246912] RIP [<ffffffffa015584f>] >>>>>>>>> blkfront_setup_indirect+0x41f/0x430 [xen_blkfront] >>>>>>>>> [ 2.246912] RSP <ffff8800e98bfcd0> >>>>>>>>> [ 2.491574] ---[ end trace 8a9b992812627c71 ]--- >>>>>>>>> [ 2.495618] Kernel panic - not syncing: Fatal exception >>>>>>>>> ------------------------------------ _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |