|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4] gnttab: defer allocation of status frame tracking array
On 05.05.2021 12:49, Andrew Cooper wrote:
> On 04/05/2021 09:42, Jan Beulich wrote:
>> This array can be large when many grant frames are permitted; avoid
>> allocating it when it's not going to be used anyway, by doing this only
>> in gnttab_populate_status_frames(). While the delaying of the respective
>> memory allocation adds possible reasons for failure of the respective
>> enclosing operations, there are other memory allocations there already,
>> so callers can't expect these operations to always succeed anyway.
>>
>> As to the re-ordering at the end of gnttab_unpopulate_status_frames(),
>> this is merely to represent intended order of actions (shrink array
>> bound, then free higher array entries). If there were racing accesses,
>> suitable barriers would need adding in addition.
>>
>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>
> Nack.
>
> The argument you make says that the grant status frames is "large
> enough" to care about not allocating. (I frankly disagree, but that
> isn't relevant to my to my nack).
>
> The change in logic here moves a failure in DOMCTL_createdomain, to
> GNTTABOP_setversion. We know, from the last minute screwups in XSA-226,
> that there are versions of Windows and Linux in the field, used by
> customers, which will BUG()/BSOD when GNTTABOP_setversion doesn't succeed.
So after you did re-state the Linux aspect of this on the call yesterday
I went and looked. In the first phase of Linux supporting v2 at all
(3.3 - 3.16) there indeed was a panic() upon certain kinds of failures.
Only up to 3.13 was there an actual attempt to use v2. Also, while there
was some code there to support actual v2 features (sub-page or
transitive grants), all of this was dead. Hence their claim
/*
* If we've already used version 2 features,
* but then suddenly discover that they're not
* available (e.g. migrating to an older
* version of Xen), almost unbounded badness
* can happen.
*/
was bogus at best. If there was no way at all for set_version to fail
prior to the change, here I could probably accept your position. But as
said before by Julien - there are pre-existing memory allocations (of
typically larger size) there, and hence any guest assuming the call
can't fail is flawed already anyway. And no, I don't view this as a
reason for us to try to eliminate all memory allocations from that
hypercall (which would in particular mean populating status frames
irrespective of whether a guest would ever switch to v2). Guest flaws
would better be addressed in the guests.
Upon re-introduction in 4.15 no such fatal behavior was put back in
place. I notice though that even up-to-date Linux has problematic
behavior around GNTTABOP_setup_table - the call is followed (after an
explicit -ENOSYS check) by "BUG_ON(rc || setup.status)". The amount of
memory needed here (including the status table) is potentially far
higher than for set_version. And what's important: setup_table gets
invoked only after set_version, so any table expansion would happen
here, with - if any table growing is needed at all - a far higher risk
for failure.
This ordering of operations (set_version before setup_table) as well
as the "error checking" after setup_table was also in effect up to
3.13. Therefore the same conclusion can be drawn there: The main risk
for crashing the guest due to memory shortage in Xen is there, not
with the version switch. What you've observed with XSA-226 (or rather,
I assume, with a custom extension of yours at the time, to prohibit use
of v2, which was upstreamed only later) was entirely unrelated to memory
shortage, but was instead a result of v2 suddenly becoming unavailable
altogether.
As to Windows, in the pvdrivers/win/ git repos I haven't been able to
find any use of GrantTableSetVersion(). Of course I can't exclude
other versions or other driver variants making use of set_version in an
inappropriate way. But as long as they don't grow the number of grant
table entries before such a call, the main risk of memory allocation
failure would again be with other hypercalls.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |