[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] kernel BUG at block/bio.c:1786 -- (xen_blkif_schedule on the stack)
I just tried to provoke the bug, after applying your patch and re-enabling tmem, but it seems there are more variables in the equation to make a crash happen. Before this week the VM in question would reliably crash/hang on boot during the past month and through several re-boots of the dom0. I have sligthly reduced memory allotment to several VMs, which might be keeping the bug from triggering. I will not be actively trying to provoke this any more, but I'll keep you posted if it re-surfaces. In the mean-time I'll try to learn more about how my system uses memory (looking into "grants"). Den 09. feb. 2017 18:30, skrev Roger Pau Monné: > On Mon, Feb 06, 2017 at 12:31:20AM +0100, Håkon Alstadheim wrote: >> I get the BUG below in dom0 when trying to start a windows 10 domu (hvm, >> with some pv-drivers installed ) . Below is "xl info", then comes dmesg >> output, and finally domu config attached at end. >> >> This domain is started very rarely, so may have been broken for some >> time. All my other domains ar linux. This message is just a data-point >> for whoever is interested, with possibly more data if anybody wants to >> ask me anything. NOT expecting quick resolution of this :-/ . >> >> The domain boots part of the way, screen resolution gets changed and >> then it keeps spinning for ~ 5 seconds before stopping. > [...] >> [339809.663061] br0: port 12(vif7.0) entered blocking state >> [339809.663063] br0: port 12(vif7.0) entered disabled state >> [339809.663123] device vif7.0 entered promiscuous mode >> [339809.664885] IPv6: ADDRCONF(NETDEV_UP): vif7.0: link is not ready >> [339809.742522] br0: port 13(vif7.0-emu) entered blocking state >> [339809.742523] br0: port 13(vif7.0-emu) entered disabled state >> [339809.742573] device vif7.0-emu entered promiscuous mode >> [339809.744386] br0: port 13(vif7.0-emu) entered blocking state >> [339809.744388] br0: port 13(vif7.0-emu) entered forwarding state >> [339864.059095] xen-blkback: backend/vbd/7/768: prepare for reconnect >> [339864.138002] xen-blkback: backend/vbd/7/768: using 1 queues, protocol >> 1 (x86_64-abi) >> [339864.241039] xen-blkback: backend/vbd/7/832: prepare for reconnect >> [339864.337997] xen-blkback: backend/vbd/7/832: using 1 queues, protocol >> 1 (x86_64-abi) >> [339875.245306] vif vif-7-0 vif7.0: Guest Rx ready >> [339875.245345] IPv6: ADDRCONF(NETDEV_CHANGE): vif7.0: link becomes ready >> [339875.245391] br0: port 12(vif7.0) entered blocking state >> [339875.245395] br0: port 12(vif7.0) entered forwarding state >> [339894.122151] ------------[ cut here ]------------ >> [339894.122169] kernel BUG at block/bio.c:1786! >> [339894.122173] invalid opcode: 0000 [#1] SMP >> [339894.122176] Modules linked in: xt_physdev iptable_filter ip_tables >> x_tables nfsd auth_rpcgss oid_registry nfsv4 dns_resolver nfsv3 nfs_acl >> binfmt_misc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp >> crc32c_intel pcspkr serio_raw i2c_i801 i2c_smbus iTCO_wdt >> iTCO_vendor_support amdgpu drm_kms_helper syscopyarea bcache input_leds >> sysfillrect sysimgblt fb_sys_fops ttm drm uas shpchp ipmi_ssif rtc_cmos >> acpi_power_meter wmi tun snd_hda_codec_realtek snd_hda_codec_generic >> snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd >> usbip_host usbip_core pktcdvd tmem lpc_ich xen_wdt nct6775 hwmon_vid >> dm_zero dm_thin_pool dm_persistent_data dm_bio_prison dm_service_time >> dm_round_robin dm_queue_length dm_multipath dm_log_userspace cn >> virtio_pci virtio_scsi virtio_blk virtio_console virtio_balloon >> [339894.122233] xts gf128mul aes_x86_64 cbc sha512_generic >> sha256_generic sha1_generic libiscsi scsi_transport_iscsi virtio_net >> virtio_ring virtio tg3 libphy e1000 fuse overlay nfs lockd grace sunrpc >> jfs multipath linear raid10 raid1 raid0 dm_raid raid456 >> async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq >> dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log dm_mod >> hid_sunplus hid_sony hid_samsung hid_pl hid_petalynx hid_monterey >> hid_microsoft hid_logitech ff_memless hid_gyration hid_ezkey hid_cypress >> hid_chicony hid_cherry hid_a4tech sl811_hcd xhci_plat_hcd ohci_pci >> ohci_hcd uhci_hcd aic94xx lpfc qla2xxx aacraid sx8 DAC960 hpsa cciss >> 3w_9xxx 3w_xxxx mptsas mptfc scsi_transport_fc mptspi mptscsih mptbase >> atp870u dc395x qla1280 imm parport dmx3191d sym53c8xx gdth initio BusLogic >> [339894.122325] arcmsr aic7xxx aic79xx sg pdc_adma sata_inic162x >> sata_mv sata_qstor sata_vsc sata_uli sata_sis sata_sx4 sata_nv sata_via >> sata_svw sata_sil24 sata_sil sata_promise pata_sis usbhid led_class igb >> ptp dca i2c_algo_bit ehci_pci ehci_hcd xhci_pci megaraid_sas xhci_hcd >> [339894.122350] CPU: 3 PID: 23514 Comm: 7.hda-0 Tainted: G W >> 4.9.8-gentoo #1 >> [339894.122353] Hardware name: ASUSTeK COMPUTER INC. Z10PE-D8 >> WS/Z10PE-D8 WS, BIOS 3304 06/22/2016 >> [339894.122358] task: ffff880244b55b00 task.stack: ffffc90042fcc000 >> [339894.122361] RIP: e030:[<ffffffff813c6af7>] [<ffffffff813c6af7>] >> bio_split+0x9/0x89 >> [339894.122370] RSP: e02b:ffffc90042fcfb18 EFLAGS: 00010246 >> [339894.122373] RAX: 00000000000000a8 RBX: ffff8802433ee900 RCX: >> ffff88023f537080 >> [339894.122377] RDX: 0000000002400000 RSI: 0000000000000000 RDI: >> ffff8801fc8b7890 >> [339894.122380] RBP: ffffc90042fcfba8 R08: 0000000000000000 R09: >> 00000000000052da >> [339894.122383] R10: 0000000000000002 R11: 0005803fffffffff R12: >> ffff8801fc8b7890 >> [339894.122387] R13: 00000000000000a8 R14: ffffc90042fcfbb8 R15: >> 0000000000000000 >> [339894.122394] FS: 0000000000000000(0000) GS:ffff8802498c0000(0000) >> knlGS:ffff8802498c0000 >> [339894.122398] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [339894.122401] CR2: 00007f99b78e3349 CR3: 0000000216d43000 CR4: >> 0000000000042660 >> [339894.122405] Stack: >> [339894.122407] ffffffff813d1bce 0000000000000002 ffffc90042fcfb50 >> ffff88023f537080 >> [339894.122413] 0000000000000002 0000000100000000 0000000000000000 >> 0000000100000000 >> [339894.122419] 0000000000000000 000000000d2ee022 0000000200006fec >> 0000000000000000 >> [339894.122424] Call Trace: >> [339894.122429] [<ffffffff813d1bce>] ? blk_queue_split+0x448/0x48b >> [339894.122435] [<ffffffff813cd7f3>] blk_queue_bio+0x44/0x289 >> [339894.122439] [<ffffffff813cc226>] generic_make_request+0xbd/0x160 >> [339894.122443] [<ffffffff813cc3c9>] submit_bio+0x100/0x11d >> [339894.122446] [<ffffffff813d2b8a>] ? next_bio+0x1d/0x40 >> [339894.122450] [<ffffffff813c4d10>] submit_bio_wait+0x4e/0x62 >> [339894.122454] [<ffffffff813d2df3>] blkdev_issue_discard+0x71/0xa9 >> [339894.122459] [<ffffffff81534fd4>] __do_block_io_op+0x4f0/0x579 >> [339894.122463] [<ffffffff81534fd4>] ? __do_block_io_op+0x4f0/0x579 >> [339894.122469] [<ffffffff81770005>] ? sha_transform+0xf47/0x1069 >> [339894.122474] [<ffffffff81535544>] xen_blkif_schedule+0x318/0x63c >> [339894.122478] [<ffffffff81777498>] ? __schedule+0x32e/0x4e8 >> [339894.122484] [<ffffffff81088f9b>] ? wake_up_atomic_t+0x2c/0x2c >> [339894.122488] [<ffffffff8153522c>] ? xen_blkif_be_int+0x2c/0x2c >> [339894.122492] [<ffffffff810742aa>] kthread+0xa6/0xae >> [339894.122496] [<ffffffff81074204>] ? init_completion+0x24/0x24 >> [339894.122501] [<ffffffff8177a335>] ret_from_fork+0x25/0x30 > > Are you using some kind of software RAID or similar backend for the disk > images? It looks like someone (not blkback) is trying to split a discard bio > (or maybe even a discard bio with 0 sectors), and that's causing a BUG to > trigger. TBH, I would expect blkdev_issue_discard to either ignore or reject > such requests, but it doesn't seem to do so (or at least I cannot find it). > > Could you try the below patch and report back what output do you get? > > Thanks, Roger. > > ---8<--- > diff --git a/drivers/block/xen-blkback/blkback.c > b/drivers/block/xen-blkback/blkback.c > index 726c32e..1964e9c 100644 > --- a/drivers/block/xen-blkback/blkback.c > +++ b/drivers/block/xen-blkback/blkback.c > @@ -1027,6 +1027,8 @@ static int dispatch_discard_io(struct xen_blkif_ring > *ring, > (req->u.discard.flag & BLKIF_DISCARD_SECURE)) ? > BLKDEV_DISCARD_SECURE : 0; > > + pr_info("Sending discard, sector %llu nr %llu\n", > + req->u.discard.sector_number, req->u.discard.nr_sectors); > err = blkdev_issue_discard(bdev, req->u.discard.sector_number, > req->u.discard.nr_sectors, > GFP_KERNEL, secure); > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |