|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v8 08/16] xen/domain: Add vmtrace_size domain creation parameter
On 01/02/2021 13:18, Jan Beulich wrote:
> On 30.01.2021 03:58, Andrew Cooper wrote:
>> +static int vmtrace_alloc_buffer(struct vcpu *v)
>> +{
>> + struct domain *d = v->domain;
>> + struct page_info *pg;
>> + unsigned int i;
>> +
>> + if ( !d->vmtrace_size )
>> + return 0;
>> +
>> + pg = alloc_domheap_pages(d, get_order_from_bytes(d->vmtrace_size),
>> + MEMF_no_refcount);
>> + if ( !pg )
>> + return -ENOMEM;
>> +
>> + /*
>> + * Getting the reference counting correct here is hard.
>> + *
>> + * All pages are now on the domlist. They, or subranges within, will be
> "domlist" is too imprecise, as there's no list with this name. It's
> extra_page_list in this case (see also below).
>
>> + * freed when their reference count drops to zero, which may any time
>> + * between now and the domain teardown path.
>> + */
>> +
>> + for ( i = 0; i < (d->vmtrace_size >> PAGE_SHIFT); i++ )
>> + if ( unlikely(!get_page_and_type(&pg[i], d, PGT_writable_page)) )
>> + goto refcnt_err;
>> +
>> + /*
>> + * We must only let vmtrace_free_buffer() take any action in the success
>> + * case when we've taken all the refs it intends to drop.
>> + */
>> + v->vmtrace.pg = pg;
>> +
>> + return 0;
>> +
>> + refcnt_err:
>> + /*
>> + * In the failure case, we must drop all the acquired typerefs thus far,
>> + * skip vmtrace_free_buffer(), and leave domain_relinquish_resources()
>> to
>> + * drop the alloc refs on any remaining pages - some pages could already
>> + * have been freed behind our backs.
>> + */
>> + while ( i-- )
>> + put_page_and_type(&pg[i]);
>> +
>> + return -ENODATA;
>> +}
> As said in reply on the other thread, PGC_extra pages don't get
> freed automatically. I too initially thought they would, but
> (re-)learned otherwise when trying to repro your claims on that
> other thread. For all pages you've managed to get the writable
> ref, freeing is easily done by prefixing the loop body above by
> put_page_alloc_ref(). For all other pages best you can do (I
> think; see the debugging patches I had sent on that other
> thread) is to try get_page() - if it succeeds, calling
> put_page_alloc_ref() is allowed. Otherwise we can only leak the
> respective page (unless going to further extents with trying to
> recover from the "impossible"), or assume the failure here was
> because it did get freed already.
Right - I'm going to insist on breaking apart orthogonal issues.
This refcounting issue isn't introduced by this series - this series
uses an established pattern, in which we've found a corner case.
The corner case is theoretical, not practical - it is not possible for a
malicious PV domain to take 2^43 refs on any of the pages in this
allocation. Doing so would require an hours-long SMI, or equivalent,
and even then all malicious activity would be paused after 1s for the
time calibration rendezvous which would livelock the system until the
watchdog kicked in.
I will drop the comments, because in light of this discovery, they're
not correct.
We should fix the corner case, but that should be a separate patch.
Whatever we do needs to start by writing down the the refcounting rules
first because its totally clear that noone understands them, and then a
change to all examples of this pattern adjusting (if necessary).
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |