[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [GIT PULL] Fix lost interrupt race in Xen event channels
On Fri, 2010-08-27 at 04:56 -0400, Jan Beulich wrote: > >>> On 26.08.10 at 18:32, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote: > > On 08/25/2010 11:46 PM, Jan Beulich wrote: > >> >>> On 25.08.10 at 19:54, Jeremy Fitzhardinge <jeremy@xxxxxxxx> wrote: > >>> Note that this patch is specifically for upstream Xen, which doesn't > >>> have any pirq support in it at present. > >> I understand that, but saw that you had paralleling changes to the > >> pirq handling in your Dom0 tree. > >> > >>> However, I did consider using fasteoi, but I couldn't see how to make > >>> it work. The problem is that it only does a single call into the > >>> irq_chip for EOI after calling the interrupt handler, but there is no > >>> call beforehand to ack the interrupt (which means clear the event flag > >>> in our case). This leads to a race where an event can be lost after the > >>> interrupt handler has returned, but before the event flag has been > >>> cleared (because Xen won't set pending or call the upcall function if > >>> the event is already set). I guess I could pre-clear the event in the > >>> upcall function, but I'm not sure that's any better. > >> That's precisely what we're doing. > > > > You mean pre-clearing the event? OK. > > > > But aren't you still subject to the bug the switch to handle_edge_irq fixed? > > > > With handle_fasteoi_irq: > > > > cpu A cpu B > > get event > > mask and clear event Argh. Right, I guess that's my fault, I was the one who came up with the PENDING theory, but indeed I failed to see the event masking bits. However, please read on. > > set INPROGRESS > > call action > > : > > : > > <migrate event channel to B> > > : get event > > Cannot happen, event is masked (i.e. all that would happen is > that the event occurrence would be logged evtchn_pending). > > > : INPROGRESS set? -> EOI, return > > : > > action returns > > clear INPROGRESS > > EOI > > unmask event, checking for whether the event got re-bound (and > doing the unmask through a hypercall if necessary), thus re-raising > the event in any case Yes. I agree. So let's come up with a new theory. Right now I'm still looking at xen/next. Correct me if I'm mistaken: mask_ack_pirq will: 1. chip->mask 2. chip->ack Where chip->ack will: 1. move_native_irq 2. clear_evtchn. Now if you look into move_native_irq, it will: 1. chip->mask (gratuitous) 2. move 3. chip->unmask (aiiiiiie). That explains why edge_irq still fixed the problem. Price question is if that's the kind of fix we wanted then. Cheers, Daniel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |