Xen project Mailing List

Re: [PATCH v4 01/10] evtchn: use per-channel lock where possible

Date: Mon, 11 Jan 2021 11:14:10 +0100

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Ian Jackson <iwj@xxxxxxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 11 Jan 2021 10:14:21 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 08.01.2021 21:32, Julien Grall wrote: > Hi Jan, > > On 05/01/2021 13:09, Jan Beulich wrote: >> Neither evtchn_status() nor domain_dump_evtchn_info() nor >> flask_get_peer_sid() need to hold the per-domain lock - they all only >> read a single channel's state (at a time, in the dump case). >> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> --- >> v4: New. >> >> --- a/xen/common/event_channel.c >> +++ b/xen/common/event_channel.c >> @@ -968,15 +968,16 @@ int evtchn_status(evtchn_status_t *statu >> if ( d == NULL ) >> return -ESRCH; >> >> - spin_lock(&d->event_lock); >> - >> if ( !port_is_valid(d, port) ) > > There is one issue that is now becoming more apparent. To be clear, the > problem is not in this patch, but I think it is the best place to > discuss it as d->event_lock may be part of the solution. > > After XSA-344, evtchn_destroy() will end up to decrement d->valid_evtchns. > > Given that evtchn_status() can work on the non-current domain, it would > be possible to run it concurrently with evtchn_destroy(). As a > consequence, port_is_valid() will be unstable as a valid event channel > may turn invalid. > > AFAICT, we are getting away so far, as the memory is not freed until the > domain is fully destroyed. However, we re-introduced XSA-338 in a > different way. > > To be clear this is not the fault of this patch. But I don't think this > is sane to re-introduce a behavior that lead us to an XSA. I'm getting confused, I'm afraid, from the varying statements above: Are you suggesting this patch does re-introduce bad behavior or not? Yes, the decrementing of ->valid_evtchns has a similar effect, but I'm not convinced it gets us into XSA territory again. The problem wasn't the reducing of ->max_evtchns as such, but the derived assumptions elsewhere in the code. If there were any such again, I suppose we'd have reason to issue another XSA. Furthermore there are other paths already using port_is_valid() without holding the domain's event lock; I've not been able to spot a problem with this though, so far. > I can see two solutions: > 1) Use d->event_lock to protect port_is_valid() when d != > current->domain. This would require evtchn_destroy() to grab the lock > when updating d->valid_evtchns. > 2) Never decrement d->valid_evtchns and use a different field for > closing ports > > I am not a big fan of 1) because this is muddying the already complex > locking situation in the event channel code. But I suggested it because > I wasn't sure whether you would be happy with 2). I agree 1) wouldn't be very nice, and you're right in assuming I wouldn't like 2) very much. For the moment I'm not (yet) convinced we need to do anything at all - as you say yourself, while the result of port_is_valid() is potentially unstable when a domain is in the process of being cleaned up, the state guarded by such checks remains usable in (I think) a race free manner. Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.