[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]
This is still happening with 2.6.32.18-ge6b9b2c but I'm now not so sure if it's a problem with blkback after all. My script does this: echo source=$source >>/tmp/attach-snapshot.log echo dest=$dest >>/tmp/attach-snapshot.log lvcreate --snapshot --name=$dest --size=1G $source lvscan >>/tmp/attach-snapshot.log xm block-attach $domname phy:$dest $destdev w /tmp/attach-snapshot.log shows "source=" and "dest=" but not a result of "lvscan", which would indicate that it's the "lvcreate" that's hung, not the "xm block-attach". Disk writes are still working so I know that it's not just a missed buffer or anything. That would explain why udevd is hanging too - lvcreate would involve creating a new device while "xm block-attach" wouldn't. I think I'm barking up the wrong tree on xen-devel unless anyone else has any suggestions? Thanks James > -----Original Message----- > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx [mailto:xen-devel- > bounces@xxxxxxxxxxxxxxxxxxx] On Behalf Of James Harper > Sent: Friday, 3 September 2010 10:36 > To: Jeremy Fitzhardinge > Cc: xen-devel@xxxxxxxxxxxxxxxxxxx > Subject: RE: [Xen-devel] BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865] > > > > > On 09/02/2010 05:20 PM, James Harper wrote: > > > I've had "BUG: soft lockup - CPU#1 stuck for 61s! [udevd:4865]" > happen > > > twice now and the server fairly quickly (over the course of 5-10 > > > minutes) becomes unusable after that happens. The call trace appears > to > > > be: > > > > > > Call Trace: > > > [<ffffffff8100e8e9>] ? xen_force_evtchn_callback+0x9/0xa > > > [<ffffffff8100ef72>] ? check_events+0x12/0x20 > > > [<ffffffff8100ef19>] ? xen_irq_enable_direct_end+0x0/0x7 > > > [<ffffffff8102fcc1>] ? do_page_fault+0x9e/0x27b > > > [<ffffffff812b39a5>] ? page_fault+0x25/0x30 > > > [<ffffffff8116f28d>] ? __put_user_4+0x1d/0x30 > > > [<ffffffff81045265>] ? schedule_tail+0x92/0x96 > > > [<ffffffff81011983>] ? ret_from_fork+0x13/0x80 > > > > > > That was under 2.6.32.17-g69a73fa. I've since upgraded to > > > 2.6.32.18-ge6b9b2c but the changelogs don't show anything obviously > > > related to the crash (but as usual there are a lot of them so maybe > I've > > > missed something?) > > > > > > Is this a known problem under 2.6.32.17-g69a73fa that has since been > > > fixed? > > > > Hard to know; nothing springs to mind right now. What else is going > on > > at the time? What's the full softlockup message? Is there any more > > context? > > I just noticed that it coincided pretty much exactly when a backup runs. > I take a snapshot in Dom0 then block-attach it to the linux DomU so it > can be backed up from DomU which makes restores easier etc, rather than > doing the backup in Dom0. I hadn't noticed before because it's the > Windows DomU that people complain about first, not the linux DomU. > Block-attach/detach would explain why udevd is involved too which makes > more sense. The block-attach happens every hour on the hour. The full > cut&paste of the first hang is: > > Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref > 1442, event-channel 13, protocol 1 (x86_64-abi) > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup - > CPU#1 stuck for 61s! [udevd:4865] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: > ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun > act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULO > G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT > xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp > iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n > f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp > nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 > nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback > xen_blkback > blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd > tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc > ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c > ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log > dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata > usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id > e_core uhci_hc > Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: > scsi_wait_scan] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1: > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: > ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun > act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULO > G xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT > xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp > iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc n > f_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp > nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 > nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback > xen_blkback > blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd > tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo snd_page_alloc > ipmi_msghandler pcspkr psmouse serio_raw rng_core i5000_edac edac_c > ore i5k_amb button evdev xfs exportfs dm_mirror dm_region_hash dm_log > dm_snapshot dm_mod ide_cd_mod cdrom ata_piix ata_generic libata > usb_storage scsi_mod usbhid hid piix ide_pci_generic bnx2 ehci_hcd id > e_core uhci_hc > Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: > scsi_wait_scan] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm: udevd > Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP: > e030:[<ffffffff8100922a>] [<ffffffff8100922a>] > hypercall_page+0x22a/0x1001 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP: > e02b:ffff8800114d5dd0 EFLAGS: 00000246 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX: 0000000000040000 > RBX: 0000000000000001 RCX: ffffffff8100922a > keechsvr2:/usr/local/src/linux-2.6-xen# head -100 /var/log/kern.log > Sep 3 07:00:44 keechsvr2 kernel: [228694.767691] blkback: ring-ref 311, > event-channel 13, protocol 1 (x86_64-abi) > Sep 3 08:00:42 keechsvr2 kernel: [232293.072248] blkback: ring-ref > 1334, event-channel 13, protocol 1 (x86_64-abi) > Sep 3 09:00:42 keechsvr2 kernel: [235893.079566] blkback: ring-ref > 1442, event-channel 13, protocol 1 (x86_64-abi) > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] BUG: soft lockup - > CPU#1 stuck for 61s! [udevd:4865] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: > ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun > act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULOG > xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT > xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp > iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc > nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp > nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 > nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback > xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm > snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo > snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core > i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror > dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix > ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic > bnx2 ehci_hcd ide_core uhci_hc > Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: > scsi_wait_scan] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CPU 1: > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Modules linked in: > ipt_MASQUERADE xt_physdev pppoe pppox ipmi_devintf autofs4 tun > act_police sch_ingress cls_u32 sch_sfq sch_cbq ppp_generic slhc ipt_ULOG > xt_limit ipt_REJECT xt_state xt_dscp iptable_filter ipt_REDIRECT > xt_conntrack iptable_nat xt_TCPMSS xt_tcpmss xt_TCPOPTSTRIP xt_tcpudp > iptable_mangle ip_tables x_tables ipv6 8021q garp bridge stp llc > nf_nat_ftp nf_conntrack_ftp ip_gre nf_nat_pptp nf_conntrack_pptp > nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack_ipv4 > nf_conntrack nf_defrag_ipv4 xen_gntdev xen_evtchn xen_netback > xen_blkback blkback_pagemap loop snd_pcm_oss snd_mixer_oss snd_pcm > snd_timer snd tpm_tis tpm sd_mod soundcore ipmi_si tpm_bios hpilo > snd_page_alloc ipmi_msghandler pcspkr psmouse serio_raw rng_core > i5000_edac edac_core i5k_amb button evdev xfs exportfs dm_mirror > dm_region_hash dm_log dm_snapshot dm_mod ide_cd_mod cdrom ata_piix > ata_generic libata usb_storage scsi_mod usbhid hid piix ide_pci_generic > bnx2 ehci_hcd ide_core uhci_hc > Sep 3 09:02:13 keechsvr2 kernel: cciss thermal fan [last unloaded: > scsi_wait_scan] > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Pid: 4865, comm: udevd > Not tainted 2.6.32.17-g69a73fa #1 ProLiant ML370 G5 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RIP: > e030:[<ffffffff8100922a>] [<ffffffff8100922a>] > hypercall_page+0x22a/0x1001 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RSP: > e02b:ffff8800114d5dd0 EFLAGS: 00000246 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RAX: 0000000000040000 > RBX: 0000000000000001 RCX: ffffffff8100922a > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RDX: ffffffff814293d0 > RSI: 0000000000000000 RDI: 0000000000000000 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] RBP: 00007fbc628a9800 > R08: 0000000000000000 R09: 0000000000000000 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R10: 0000000000000002 > R11: 0000000000000246 R12: 0000000000000002 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] R13: ffff8800114d5e88 > R14: ffff880017f9cfc0 R15: ffff880017eed500 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] FS: > 00007fbc628a9770(0000) GS:ffff880002de8000(0000) knlGS:0000000000000000 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CS: e033 DS: 0000 ES: > 0000 CR0: 000000008005003b > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] CR2: 00007fbc628a9800 > CR3: 000000001e2d7000 CR4: 0000000000002660 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR0: 0000000000000000 > DR1: 0000000000000000 DR2: 0000000000000000 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] DR3: 0000000000000000 > DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] Call Trace: > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100e8e9>] > ? xen_force_evtchn_callback+0x9/0xa > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100ef72>] > ? check_events+0x12/0x20 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8100ef19>] > ? xen_irq_enable_direct_end+0x0/0x7 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8102fcc1>] > ? do_page_fault+0x9e/0x27b > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff812b39a5>] > ? page_fault+0x25/0x30 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff8116f28d>] > ? __put_user_4+0x1d/0x30 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff81045265>] > ? schedule_tail+0x92/0x96 > Sep 3 09:02:13 keechsvr2 kernel: [235983.760505] [<ffffffff81011983>] > ? ret_from_fork+0x13/0x80 > > There's another hang at 09:03:09 which I can post if you'd find it > useful. > > If the block-attach happened at 09:00:42 and the crash happened at > 09:02:13 then I assume that whatever udevd started doing started at > 09:01:12 (eg 61 seconds earlier) which means that it's more likely to be > the block-detach at the end of the backup causing the hang not the > block-attach. The backup runs pretty quick. > > I'm inclined to turn off the snapshot backup if this isn't a known and > resolved problem... > > Thanks > > James > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@xxxxxxxxxxxxxxxxxxx > http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |