[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH V11 4/5] p2m: Always use hostp2m when clipping rangesets



>>> On 20.12.18 at 13:59, <george.dunlap@xxxxxxxxxx> wrote:
> On 12/20/18 8:20 AM, Jan Beulich wrote:
>>>>> On 19.12.18 at 18:26, <george.dunlap@xxxxxxxxxx> wrote:
>>> On 12/13/18 10:43 AM, Jan Beulich wrote:
>>>>>>> On 13.12.18 at 11:22, <rcojocaru@xxxxxxxxxxxxxxx> wrote:
>>>>> For my own part, I see no reason why not clipping end should not work
>>>>> when updating the ranges only (as long as start continues to be <=
>>>>> unclipped_end).
>>>>>
>>>>> Would that modification + testing of it help this series continue?
>>>>
>>>> I think so, at least as far as I'm concerned. But I think we really need
>>>> George's opinion as well.
>>>
>>> We are going off into the weeds a little bit here I think.
>>>
>>> If I understand Jan's concern properly, he's concerned about a situation
>>> like this:
>>>
>>> [start] p2m->max_mapped_pfn == 0xfff
>>> 1. change_type_range ram => logdirty, [0x900, 0x1200)
>>>
>>> Obviously the actual p2m entries can only be changed from 0x900 to
>>> 0xfff; but what about the logdirty ranges?  At the moment, the result
>>> will be a rangeset with [0x900, 0xfff].
>>>
>>> Jan is asking whether the rangeset should instead be [0x900, 0x11ff].
>>>
>>> So the time when it would matter would be a situation like the following:
>>>
>>> 2. p2m_set_entry(0x1100, M)
>>>
>>> 3. change_entry_type_global(ram => logdirty)
>>>
>>> 4. change_entry_type_global(logdirty => ram)
>>>
>>> Under the current regime gfn 0x1100 would be have type ram_rw both after
>>> step 2, and after step 4.
>>>
>>> If we used Jan's suggestion, then it would be marked as ram_rw after
>>> step 2, and logdirty after step 4.
>> 
>> Afaict it would be marked logdirty also after step 2, at least
>> effectively (to the outside world), due to ept_get_entry()'s call
>> to p2m_recalc_type().
> 
> That's not what I'm seeing.  Let's consider the ept entry for gfn 0x1100
> at/after the various stages:
> 
> [start]: empty (valid bit clear)
> 1. change_type_range doesn't touch this, so still empty.
> 2. ept_set_entry(M)
>  - Calls recalc_type(). This will walk the ept table down to the
> particular ept entry, resolving the `recalc` bit at each level.
>  - Finally it will set the entry to point to M, with the recalc bit
> clear, and the entry *not* misconfigured.

Well of course - if the caller specified p2m_ram_rw. But this is
wrong for the caller to do for any page inside the logdirty
range. ept_set_entry() is a function which is not itself
implementing policy; it depends on higher levels getting things
right together with the page tables being in proper state.

> Guest writes will not trigger an EPT fault at this point, so the most
> important part of the "outside world" will not effectively see a logdirty.
> 
> What about ept_get_entry() after point 2?  Well, it calls
> p2m_recalc_type() with "recalc || ept->recalc".  The first is
> accumulated by walking down the ept tables; but in this case those bits
> will already have been cleared by the recalc_type() at the top of
> set_entry.  And of course, the gfn's own ept entry will have the recalc
> bit clear.
> 
> So p2m_recalc_type() will be called with `recalc` set to zero.  When
> that's the case, it always returns the type passed to it, without
> checking logdirty.
> 
> Did I miss anything?

I don't think so, except perhaps me having said ...

>> It may well be that there are more bugs
>> here (like ept_set_entry() not honoring this, but then again

... this. IOW what you describe might match the current
situation, but I'm unconvinced it's intended/wanted behavior.

>> this is perhaps something the callers should already take care
>> of), but that's the behavior I'd expect, and why I think the
>> range should not be clipped for the purpose of insertion into
>> the rangeset.
>> 
>>> But of course that's no different than what would happen if
>>> max_mapped_pfn were 0x2000, but gfns 0x1000-11ff just happened to be empty.
>>>
>>> Under normal circumstances, neither of these situations should happen;
>>> and in neither case will catastrophic consequences happen (unless you
>>> were relying on hap_track_dirty_vram for something other than tracking
>>> dirty vram).
>>>
>>> I'm inclined to say that ideally, change_type_range should pass an error
>>> up if end > max_mapped_pfn.
>>>
>>> But of course, it doesn't return an error at the moment, so that's out
>>> of scope for this series.
>>>
>>> I take it, Jan, that in the absence of changing the behavior, you'd like
>>> the comment to look something like this?
>>>
>>> "Always clip the rangeset down to the host p2m.  NB that this means the
>>> logdirty_range will also be clipped, so in the future gfns in
>>> (host_max_pfn, end) range won't be affected by change_entry_type_global.
>>>  We should probably return an error in this case instead, as it's almost
>>> certainly a mistake; but that's left as a clean-up for another time."
>> 
>> Well, not exactly. IMO at least p2m_change_type_range(...,
>> 0, ULONG_MAX) should match p2m_change_entry_type_global(),
>> with the exception of the rangeset modification (which in the
>> p2m_change_entry_type_global() global case is replaced by
>> modifying p2m->global_logdirty).
> 
> That's one sensible interface I considered; but I don't think it's the
> best one.  It has the advantage that from an interface perspective it's
> clean and satisfying.  But I'm having difficulty imagining a situation
> where that behavior would lead to better outcomes.  On the contrary, the
> only time I can imagine this situation happening at the moment is if
> there were a bug in the device model -- in which case, returning an
> error would be a much more helpful thing to do.

I would agree if ->max_mapped_gfn was under qemu's direct
control. But the value depends on guest behavior. It just so
happens that for dirty vram tracking it's perhaps quite unlikely
for the range to live above ->max_mapped_gfn, but I wouldn't
be surprised if things broke when moving the frame buffer
pretty high up in (guest) physical memory, far beyond RAM
(and above 4Gb).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.