[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 5/5] evtchn: don't call Xen consumer callback with per-channel lock held





On 07/12/2020 08:02, Jan Beulich wrote:
On 04.12.2020 16:09, Julien Grall wrote:
On 04/12/2020 12:01, Jan Beulich wrote:
On 04.12.2020 12:51, Julien Grall wrote:
On 04/12/2020 11:48, Jan Beulich wrote:
On 04.12.2020 12:28, Julien Grall wrote:
On 03/12/2020 10:09, Jan Beulich wrote:
On 02.12.2020 22:10, Julien Grall wrote:
So shouldn't we handle this issue properly in VM event?

I suppose that's a question to the VM event folks rather than me?

Yes. From my understanding of Tamas's e-mail, they are relying on the
monitoring software to do the right thing.

I will refrain to comment on this approach. However, given the race is
much wider than the event channel, I would recommend to not add more
code in the event channel to deal with such problem.

Instead, this should be fixed in the VM event code when someone has time
to harden the subsystem.

Are effectively saying I should now undo the addition of the
refcounting, which was added in response to feedback from you?

Please point out where I made the request to use the refcounting...

You didn't ask for this directly, sure, but ...

I pointed out there was an issue with the VM event code.

... this has ultimately led to the decision to use refcounting
(iirc there was one alternative that I had proposed, besides
the option of doing nothing).

One other option that was discussed (maybe only on security@xxxxxxx) is
to move the spinlock outside of the structure so it is always allocated.

Oh, right - forgot about that one, because that's nothing I would
ever have taken on actually carrying out.

This was latter
analysed as a wider issue. The VM event folks doesn't seem to be very
concerned on the race, so I don't see the reason to try to fix it in the
event channel code.

And you won't need the refcount for vpl011 then?

I don't believe we need it for the vpl011 as the spin lock protecting
the code should always be allocated. The problem today is the lock is
initialized too late.

I can certainly
drop it again, but it feels odd to go back to an earlier version
under the circumstances ...

The code introduced doesn't look necessary outside of the VM event code.
So I think it would be wrong to merge it if it is just papering over a
bigger problem.

So to translate this to a clear course of action: You want me to
go back to the earlier version by dropping the refcounting again?

Yes.

(I don't view this as "papering over" btw, but a tiny step towards
a solution.)

This is implying that the refcounting is part of the actual solution. I think you can solve it directly in the VM event code without touching the event channel code.

Furthermore, I see no point to add code in the common code if the maintainers of the affected subsystem think there code is safe (I don't believe it is).

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.