Xen project Mailing List

RE: [Xen-devel] xm pause causing lockup

To: "Kip Macy" <kip.macy@xxxxxxxxx>

From: "Ian Pratt" <m+Ian.Pratt@xxxxxxxxxxxx>

Date: Fri, 15 Apr 2005 20:29:13 +0100

Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 15 Apr 2005 19:29:07 +0000

List-id: Xen developer discussion <xen-devel.lists.xensource.com>

Thread-index: AcVB7Q67C1HbBcXHQeSycRFKbPmVKwAAnAHg

Thread-topic: [Xen-devel] xm pause causing lockup

I need to think about this more, but it looks like you have an L2 page that has a type count of 1 but hasn't been validated. You're then looping when you try and increment it to 2 thinking that you're racing someone else. Does this happen if you boot with 'nosmp'? I don't really believe it's a race, but might be worth checking. Also, it's worth adding a printk into this loop just to check that that is where you're getting caught. /* Someone else is updating validation of this page. Wait... */ while ( (y = page->u.inuse.type_info) == x ) cpu_relax(); goto again; We need to figure out how the type count managed to get to one without the page being validated. I presume you're doing a debug=y build of Xen? Do you get any warnings about illegal mmu_update attempts when you boot FreeBSD? Ian > Without the ability to continue and only a very basic > understanding of the page typing code there is not a whole > lot to go on. Let me know if there is some other bit of > information that I can provide you with. > > -Kip > > Before attaching: > (XEN) 'd' pressed -> dumping registers > (XEN) CPU: 1 > (XEN) EIP: 0808:[<fc52d59f>] > (XEN) EFLAGS: 00000246 CONTEXT: hypervisor > (XEN) eax: 40000001 ebx: 00000000 ecx: fcfe3740 edx: fcfe3740 > (XEN) esi: 00007ff0 edi: 00000001 ebp: fcffbda0 esp: fcffbd58 > (XEN) ds: 0810 es: 0810 fs: 0810 gs: 0810 ss: 0810 cs: 0808 > (XEN) Stack trace from ESP=fcffbd58: > (XEN) 80000003 00000001 fcfe3740 fcfe3740 fcfe3740 80000003 > 80000004 80000003 > (XEN) 00000000 00007ff0 fcffbda0 [fc52bfec] fd494968 fcfe3740 > fcffbdc0 40000001 > (XEN) 40000001 40000002 fcffbdd0 [fc52c07b] fd494968 25fe0000 > 00000000 00000000 > (XEN) 000003d1 00000000 fcffbde0 [fc52bcec] 00000000 fd494968 > fcffbe00 [fc52c52e] > (XEN) 0000630f 25fe0000 fcfe3740 [fc52d100] fffffffc 00000000 > fcffe000 00000001 > (XEN) 00000001 ff85b000 fcffbe40 [fc52c889] 0630f061 0000630f > fcfe3740 000002ff > (XEN) 00000001 f0000000 f0000000 00000004 f0000001 f0000000 > 000002ff ff85b000 > (XEN) 0000630f fcfe3740 fcffbe60 [fc52d0f0] fd494968 000001fa > fc5b20c0 [fc53185d] > (XEN) 40000000 00000002 fcffbeb0 [fc52d771] fd494968 40000000 > fcfe3740 fcfe3740 > (XEN) fcfe3740 80000002 80000003 00000004 00000000 f0000000 > f0000000 00000004 > (XEN) 40000001 f0000000 fd49497c f0000000 f0000000 40000001 > fcffbee0 [fc52c07b] > (XEN) fd494968 40000000 002ed518 00000000 a089075b 00000001 > fcfe3740 00000000 > (XEN) 00007ff0 fd494968 fcffbfb0 [fc52df98] 0000630f 40000000 > fcfe3740 00000292 > (XEN) fc5781c0 00000001 0019b901 00000000 00804e95 00000000 > a089075b 000000a1 > (XEN) a10955f0 000000a1 00000001 fcfea040 00007ff0 00000001 > fcffbf80 00000000 > (XEN) fcfe3740 00000000 fcfe3740 00000000 a10955f0 000000a1 > 00000000 fcffbf98 > (XEN) c0293bac 0000000c 00000003 [fc515bfc] a08902cd 000000a1 > 00000002 fcfe3740 > (XEN) fcfea040 fd494968 00000000 40000000 00000001 00000001 > 00000000 00000000 > (XEN) 00000001 0000630f c018a19b 00000001 fcfea040 00007ff0 > c0293bc8 [fc54e923] > (XEN) c0293bac 00000001 00000000 00007ff0 00000001 c0293bc8 > 0000001a 00000000 > (XEN) Call Trace from ESP=fcffbd58: > (XEN) [<fc52bfec>] [<fc52c07b>] [<fc52bcec>] [<fc52c52e>] > [<fc52d100>] [<fc52c889>] > (XEN) [<fc52d0f0>] [<fc53185d>] [<fc52d771>] [<fc52c07b>] > [<fc52df98>] [<fc515bfc>] > (XEN) [<fc54e923>] > (XEN) Waiting for GDB to attach to XenDBG > > > gdb) bt > #0 0xfc52d59f in get_page_type (page=0xfd494968, > type=0x25fe0000) at mm.c:1235 > #1 0xfc52c07b in get_page_and_type_from_pagenr > (page_nr=0x630f, type=0x25fe0000, d=0xfcfe3740) at mm.c:360 > #2 0xfc52c52e in get_page_from_l2e (l2e={l2_lo = 0x630f061}, > pfn=0x630f, d=0xfcfe3740, va_idx=0x2ff) at mm.c:495 > #3 0xfc52c889 in alloc_l2_table (page=0xfd494968) at mm.c:679 > #4 0xfc52d0f0 in alloc_page_type (page=0xfd494968, > type=0x40000000) at mm.c:1083 > #5 0xfc52d771 in get_page_type (page=0xfd494968, > type=0x40000000) at mm.c:1269 > #6 0xfc52c07b in get_page_and_type_from_pagenr > (page_nr=0x630f, type=0x40000000, d=0xfcfe3740) at mm.c:360 > #7 0xfc52df98 in do_mmuext_op (uops=0xc0293bac, count=0x1, pdone=0x0, > foreigndom=0x7ff0) at mm.c:1499 > #8 0xfc54e923 in test_all_events () at bitops.h:239 > #9 0xc0293bac in ?? () > > (gdb) f 7 > #7 0xfc52df98 in do_mmuext_op (uops=0xc0293bac, count=0x1, pdone=0x0, > foreigndom=0x7ff0) at mm.c:1499 > 1499 okay = get_page_and_type_from_pagenr(op.mfn, type, > FOREIGNDOM); > (gdb) p op > $9 = { > cmd = 0x1, > { > mfn = 0x630f, > linear_addr = 0x630f > }, > { > nr_ents = 0xc018a19b, > cpuset = 0xc018a19b > } > } > (gdb) p x > $1 = 0x40000001 > (gdb) x nx > 0x40000002: Ignoring packet error, continuing... > Reply contains invalid hex digit 40 > (gdb) p y > $2 = 0x40000001 > (gdb) p page->u.inuse.type_info > $3 = 0x40000001 > (gdb) p x > $4 = 0x40000001 > (gdb) p nx > $5 = 0x40000002 > (gdb) p y > $6 = 0x40000001 > (gdb) p x > $7 = 0x40000001 > (gdb) p sizeof(page->u.inuse.type_info) > $8 = 0x4 > > > > On 4/15/05, Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx> wrote: > > Wild! It really is looping in get_page_type. > > > > Any chance you could use the serial debugger to find out what x, nx > > and y are in the cmpxchg? > > > > I've tried to think of duff inputs that could cause it to loop, but > > I'm not smart enough. > > > > Ian > > > > > -----Original Message----- > > > From: Kip Macy [mailto:kip.macy@xxxxxxxxx] > > > Sent: 15 April 2005 18:13 > > > To: Ian Pratt > > > Cc: Keir Fraser; xen-devel; ian.pratt@xxxxxxxxxxxx > > > Subject: Re: [Xen-devel] xm pause causing lockup > > > > > > Great, thanks. I'm now running a completely fresh tree from last > > > night. > > > > > > Over the course of several minutes I hit 'd' a number of > times. The > > > addresses I got were: > > > > > > 0xfc51c742 > > > 0xfc51c746 > > > 0xfc51c74b > > > 0xfc51c740 > > > > > > (gdb) x/i 0xfc51c742 > > > 0xfc51c742 <get_page_type+1218>: mov 0x40(%esp,1),%eax > > > (gdb) x/i 0xfc51c746 > > > 0xfc51c746 <get_page_type+1222>: mov 0x14(%eax),%ebx > > > (gdb) x/i 0xfc51c74b > > > 0xfc51c74b <get_page_type+1227>: je 0xfc51c740 > > > <get_page_type+1216> > > > (gdb) x/i 0xfc51c740 > > > 0xfc51c740 <get_page_type+1216>: repz nop > > > > > > > > > -Kip > > > > > > On 4/14/05, Ian Pratt <m+Ian.Pratt@xxxxxxxxxxxx> wrote: > > > > > > > > > > > > > -----Original Message----- > > > > > From: xen-devel-bounces@xxxxxxxxxxxxxxxxxxx > > > > > [mailto:xen-devel-bounces@xxxxxxxxxxxxxxxxxxx] On Behalf > > > Of Kip Macy > > > > > Sent: 15 April 2005 05:36 > > > > > To: Keir Fraser > > > > > Cc: xen-devel > > > > > Subject: Re: [Xen-devel] xm pause causing lockup > > > > > > > > > > To further check this I added: > > > > > printk("%s %d %d %d %d %d\n", __FUNCTION__, op->cmd, > > > > > op->mfn, count, success_count, domid); to > > > > > HYPERVISOR_mmuext_op and something similar to mmu_update. > > > > > > > > Is your hypothesis that Xen gets stuck in either the > mmuext_op or > > > > mmu_update loops? > > > > Are you running with watchdog enabled? > > > > > > > > It might be good to add a printk at the end so that you can > > > prove this. > > > > > > > > Hitting 'd' on the debug console will give us an EIP on CPU 1. > > > > > > > > Ian > > > > > > > > > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.