[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 01/10] evtchn: use per-channel lock where possible

On 11.01.2021 12:06, Julien Grall wrote:
> On 11/01/2021 10:14, Jan Beulich wrote:
>> On 08.01.2021 21:32, Julien Grall wrote:
>>> On 05/01/2021 13:09, Jan Beulich wrote:
>>>> Neither evtchn_status() nor domain_dump_evtchn_info() nor
>>>> flask_get_peer_sid() need to hold the per-domain lock - they all only
>>>> read a single channel's state (at a time, in the dump case).
>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>> ---
>>>> v4: New.
>>>> --- a/xen/common/event_channel.c
>>>> +++ b/xen/common/event_channel.c
>>>> @@ -968,15 +968,16 @@ int evtchn_status(evtchn_status_t *statu
>>>>        if ( d == NULL )
>>>>            return -ESRCH;
>>>> -    spin_lock(&d->event_lock);
>>>> -
>>>>        if ( !port_is_valid(d, port) )
>>> There is one issue that is now becoming more apparent. To be clear, the
>>> problem is not in this patch, but I think it is the best place to
>>> discuss it as d->event_lock may be part of the solution.
>>> After XSA-344, evtchn_destroy() will end up to decrement d->valid_evtchns.
>>> Given that evtchn_status() can work on the non-current domain, it would
>>> be possible to run it concurrently with evtchn_destroy(). As a
>>> consequence, port_is_valid() will be unstable as a valid event channel
>>> may turn invalid.
>>> AFAICT, we are getting away so far, as the memory is not freed until the
>>> domain is fully destroyed. However, we re-introduced XSA-338 in a
>>> different way.
>>> To be clear this is not the fault of this patch. But I don't think this
>>> is sane to re-introduce a behavior that lead us to an XSA.
>> I'm getting confused, I'm afraid, from the varying statements above:
>> Are you suggesting this patch does re-introduce bad behavior or not?
> No. I am pointing out that this is widening the bad behavior (again).

Since I'd really like to get in some more of this series before
the full freeze, and hence I want (need) to re-post, I thought
I'd reply here despite (or in light of) your request for input
from others not having been met.

I don't view this as "bad" behaviour, btw. The situation is
quite different to that which had led to the XSA: Here we
only deal with the "illusion" of a port having become invalid.
IOW yes, ...

>> Yes, the decrementing of ->valid_evtchns has a similar effect, but
>> I'm not convinced it gets us into XSA territory again. The problem
>> wasn't the reducing of ->max_evtchns as such, but the derived
>> assumptions elsewhere in the code. If there were any such again, I
>> suppose we'd have reason to issue another XSA.
> I don't think it get us to the XSA territory yet. However, the 
> locking/interaction in the event channel code is quite complex.
> To give a concrete example, below the current implementation of 
> free_xen_event_channel():
>      if ( !port_is_valid(d, port) )
>      {
>          /*
>           * Make sure ->is_dying is read /after/ ->valid_evtchns, pairing
>           * with the spin_barrier() and BUG_ON() in evtchn_destroy().
>           */
>          smp_rmb();
>          BUG_ON(!d->is_dying);
>          return;
>      }
>      evtchn_close(d, port, 0);
> It would be fair for a developer to assume that after the check above, 
> port_is_valid() would return true. However, this is not the case...

... there needs to be awareness that putting e.g.

    ASSERT(port_is_valid(d, port));

anywhere past the if() cannot be done without considering domain
cleanup logic.

> I am not aware of any issue so far... But I am not ready to be this is 
> not going to be missed out. How about you?

There is a risk of this being overlooked, yes. But I'm unconvinced
this absolutely requires measures to be taken beyond, maybe, the
addition of a comment somewhere. I do, in particular, not think
this should stand in the way of the locking relaxation done by
this patch, even more so that (just to repeat) it merely introduces
more instances of a pattern found elsewhere already.




Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.