[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 01/10] evtchn: use per-channel lock where possible
On 11.01.2021 12:06, Julien Grall wrote: > On 11/01/2021 10:14, Jan Beulich wrote: >> On 08.01.2021 21:32, Julien Grall wrote: >>> On 05/01/2021 13:09, Jan Beulich wrote: >>>> Neither evtchn_status() nor domain_dump_evtchn_info() nor >>>> flask_get_peer_sid() need to hold the per-domain lock - they all only >>>> read a single channel's state (at a time, in the dump case). >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >>>> --- >>>> v4: New. >>>> >>>> --- a/xen/common/event_channel.c >>>> +++ b/xen/common/event_channel.c >>>> @@ -968,15 +968,16 @@ int evtchn_status(evtchn_status_t *statu >>>> if ( d == NULL ) >>>> return -ESRCH; >>>> >>>> - spin_lock(&d->event_lock); >>>> - >>>> if ( !port_is_valid(d, port) ) >>> >>> There is one issue that is now becoming more apparent. To be clear, the >>> problem is not in this patch, but I think it is the best place to >>> discuss it as d->event_lock may be part of the solution. >>> >>> After XSA-344, evtchn_destroy() will end up to decrement d->valid_evtchns. >>> >>> Given that evtchn_status() can work on the non-current domain, it would >>> be possible to run it concurrently with evtchn_destroy(). As a >>> consequence, port_is_valid() will be unstable as a valid event channel >>> may turn invalid. >>> >>> AFAICT, we are getting away so far, as the memory is not freed until the >>> domain is fully destroyed. However, we re-introduced XSA-338 in a >>> different way. >>> >>> To be clear this is not the fault of this patch. But I don't think this >>> is sane to re-introduce a behavior that lead us to an XSA. >> >> I'm getting confused, I'm afraid, from the varying statements above: >> Are you suggesting this patch does re-introduce bad behavior or not? > > No. I am pointing out that this is widening the bad behavior (again). Since I'd really like to get in some more of this series before the full freeze, and hence I want (need) to re-post, I thought I'd reply here despite (or in light of) your request for input from others not having been met. I don't view this as "bad" behaviour, btw. The situation is quite different to that which had led to the XSA: Here we only deal with the "illusion" of a port having become invalid. IOW yes, ... >> Yes, the decrementing of ->valid_evtchns has a similar effect, but >> I'm not convinced it gets us into XSA territory again. The problem >> wasn't the reducing of ->max_evtchns as such, but the derived >> assumptions elsewhere in the code. If there were any such again, I >> suppose we'd have reason to issue another XSA. > > I don't think it get us to the XSA territory yet. However, the > locking/interaction in the event channel code is quite complex. > > To give a concrete example, below the current implementation of > free_xen_event_channel(): > > if ( !port_is_valid(d, port) ) > { > /* > * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing > * with the spin_barrier() and BUG_ON() in evtchn_destroy(). > */ > smp_rmb(); > BUG_ON(!d->is_dying); > return; > } > > evtchn_close(d, port, 0); > > It would be fair for a developer to assume that after the check above, > port_is_valid() would return true. However, this is not the case... ... there needs to be awareness that putting e.g. ASSERT(port_is_valid(d, port)); anywhere past the if() cannot be done without considering domain cleanup logic. > I am not aware of any issue so far... But I am not ready to be this is > not going to be missed out. How about you? There is a risk of this being overlooked, yes. But I'm unconvinced this absolutely requires measures to be taken beyond, maybe, the addition of a comment somewhere. I do, in particular, not think this should stand in the way of the locking relaxation done by this patch, even more so that (just to repeat) it merely introduces more instances of a pattern found elsewhere already. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |