[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH] Re: [Xen-devel] Xen crash on dom0 shutdown
After the pirq is unbind for MSI, the vector is in fact free and action is freed, but it is not cleared in pirq_to_vector mapping in domain still. Yunhong Jiang Shan, Haitao <> wrote: > Hi, Keir, > > As I tested my patch today, I find cs18539 breaks my system. I > have to revert this changeset to test MSI. > Here is the output from serial port. > (XEN) ----[ Xen-3.4-unstable x86_64 debug=n Not tainted ]---- (XEN) CPU: > 2 (XEN) RIP: e008:[<ffff828c8013c2f9>] pirq_guest_unbind+0xb9/0x2f0 > (XEN) RFLAGS: 0000000000010082 CONTEXT: hypervisor > (XEN) rax: ffff828c80274d80 rbx: 0000000000000000 rcx: 00000000000000d0 > (XEN) rdx: ffff83007c60fe38 rsi: 00000000000000ff rdi: ffff83007c80e080 > (XEN) rbp: ffff83007c80e080 rsp: ffff83007c60fe28 r8: 0000000000000282 > (XEN) r9: ffff828c80274da4 r10: ffff828c8026e580 r11: 0000000000000206 > (XEN) r12: ffff828c80274d80 r13: 00000000000000d0 r14: 00000000000000ff > (XEN) r15: 00000000000000ff cr0: 000000008005003b cr4: 00000000000026b0 > (XEN) cr3: 000000005ea9c000 cr2: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 > (XEN) Xen stack trace from rsp=ffff83007c60fe28: > (XEN) 0000000000000282 ffff83007c80e080 0000000000000282 fffffffffffffffd > (XEN) ffff83007c80e080 00000000000000ff 00000000000000d0 ffff8800204408b0 > (XEN) 00000000000000ff ffff828c80149075 00000008000000ff ffffffff803908d6 > (XEN) 828c801a98a0b948 c390ec90d1ffffff ffff83007c80e1f8 ffff83007c60fed8 > (XEN) ffff828c8025f900 ffff83007c80d080 000000ff804e7ff0 ffffffffffffffff > (XEN) 0000000000000000 0000000000000000 000000010000001a 00a0fb000000001a > (XEN) 00000001801fe568 ffff83007d5e6080 00000000000000ff ffff8800204408b0 > (XEN) 0000000000000000 ffff8800204408b0 ffff880020440000 ffff828c801a9169 > > The actually call trace is do_physdev_op->pirq_guest_unbind, > which definitely is in unmap_pirq hypercall for MSI interrupt. > This will happen when dom0 first unbind guest pirq and then > unmap that pirq. I think this little patch will fix the > problem. But I am not quite sure whether this will break the intension of > cs 18539. > > diff -r 7a32c2325fdc xen/arch/x86/irq.c > --- a/xen/arch/x86/irq.c Thu Sep 25 10:26:08 2008 +0100 > +++ b/xen/arch/x86/irq.c Thu Sep 25 21:12:03 2008 +0800 @@ -619,6 > +619,8 @@ } > > action = (irq_guest_action_t *)desc->action; > + if ( !action ) > + goto out; > vector = desc - irq_desc; > > for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ ) > > Signed-off-by: Shan haitao <haitao.shan@xxxxxxxxx> > > > Best Regards > Haitao Shan > > Keir Fraser wrote: >> On 24/9/08 10:13, "Keir Fraser" <keir.fraser@xxxxxxxxxxxxx> wrote: >> >>>> How about this one? It doesn't exactly follow the path you >>>> suggested, i.e. doesn't mess with event channels, but rather marks >>>> the irq<->vector mapping as invalid, allowing only a subsequent >>>> event channel unbind (i.e. close) to recover from that state (which >>>> seems better in terms of requiring proper discipline in the guest, >>>> as it prevents re-using the irq or vector without properly cleaning >>>> up). >>> >>> Yeah, this is the kind of thing I had in mind. I will work on this a >>> bit more (e.g., need to synchronise on d->evtchn_lock to avoid racing >>> EVTCHNOP_bind_pirq; also I'm afraid about leaking MSI vectors on >>> domain destruction). Thanks. >> >> Changeset 18539. I made the locking quite a bit stricter, by getting >> rid of 'irq_lock' and instead using 'evtchn_lock' (which may be >> better renamed now to 'event_lock' since it isn't protecting just >> event channels now). Now the pirq-to-vector mapping is protected by >> both evtchn_lock and irq_desc->lock. A user of the mapping can >> protect themselves with either lock (whichever is most convenient). >> >> Some TODOs: >> * There is no management of MSI vectors. They always leak! I didn't >> fix that here since it isn't a mere synchronisation race; the code just >> isn't there. * I decided to WARN_ON(!spin_is_locked(&d->evtchn_lock)) in >> pirq_guest_[un]bind(). The reason is that in any case those functions >> do not expect to be re-entered -- they really want to be per-domain >> serialised. Furthermore I am pretty certain that the HVM passthrough >> code is not synchronising properly with changes to the pirq-to-vector >> mapping (it uses domain_irq_to_vector() in many places with no care >> for locking) nor is it synchronised with other users of >> pirq_guest_bind() --- a reasonable semantics here would be that a >> domain pirq can be bound to once, either via an event channel, or >> through a virtual PIC in HVM emulation context. I therefore think >> that careful locking is required -- it may suffice to get rid of (or >> at least make less use of) the hvm_domain.irq_lock and replace its >> use with evtchn_lock (only consideration is that the latter is not >> IRQ-safe). The WARN_ON() is a nice reminder of work to be done here. ;-) >> >> Comments? >> >> -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |