[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PoD code killing domain before it really gets started



> On 07/08/12 15:40, Andres Lagar-Cavilla wrote:
>>>>>> On 06.08.12 at 18:03, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>
>>>>>> wrote:
>>>> I guess there are two problems with that:
>>>> * As you've seen, apparently dom0 may access these pages before any
>>>> faults happen.
>>>> * If it happens that reclaim_single is below the only zeroed page, the
>>>> guest will crash even when there is reclaim-able memory available.
>>>>
>>>> Two ways we could fix this:
>>>> 1. Remove dom0 accesses (what on earth could be looking at a
>>>> not-yet-created VM?)
>>> I'm told it's a monitoring daemon, and yes, they are intending to
>>> adjust it to first query the GFN's type (and don't do the access
>>> when it's not populated, yet). But wait, I didn't check the code
>>> when I recommended this - XEN_DOMCTL_getpageframeinfo{2,3)
>>> also call get_page_from_gfn() with P2M_ALLOC, so would also
>>> trigger the PoD code (in -unstable at least) - Tim, was that really
>>> a correct adjustment in 25355:974ad81bb68b? It looks to be a
>>> 1:1 translation, but is that really necessary? If one wanted to
>>> find out whether a page is PoD to avoid getting it populated,
>>> how would that be done from outside the hypervisor? Would
>>> we need XEN_DOMCTL_getpageframeinfo4 for this?
>>>
>>>> 2. Allocate the PoD cache before populating the p2m table
>>>> 3. Make it so that some accesses fail w/o crashing the guest?  I don't
>>>> see how that's really practical.
>>> What's wrong with telling control tools that a certain page is
>>> unpopulated (from which they will be able to imply that's it's all
>>> clear from the guest's pov)? Even outside of the current problem,
>>> I would think that's more efficient than allocating the page. Of
>>> course, the control tools need to be able to cope with that. And
>>> it may also be necessary to distinguish between read and
>>> read/write mappings being established (and for r/w ones the
>>> option of populating at access time rather than at creation time
>>> would need to be explored).
>> I wouldn't be opposed to some form of getpageframeinfo4. It's not just
>> PoD
>> we are talking about here. Is the page paged out? Is the page shared?
>>
>> Right now we have global per-domain queries (domaininfo). Or individual
>> gfn debug memctl's. A batched interface with richer information would be
>> a
>> blessing for debugging or diagnosis purposes.
>>
>> The first order of business is exposing the type. Do we really want to
>> expose the whole range of p2m_* types or just "really useful" ones like
>> is_shared, is_pod, is_paged, is_normal? An argument for the former is
>> that
>> the mem event interface already pumps the p2m_* type up the stack.
>>
>> The other useful bit of information I can think of is exposing the
>> shared
>> ref count.
> I think just like the gfn_to_mfn() interface, we need a "I care about
> the details" interface and an "I don't care about the details"
> interface.  If a page isn't present, or needs to be un-shared, or is PoD
> and not currently available, then maybe dom0 callers trying to map that
> page should get something like -EAGAIN?  Just something that indicates,
> "This page isn't here at the moment, but may be here soon."  What do you
> think?

Sure.

Right now you get -ENOENT if dom0 tries to mmap a foreign frame that is
paged out. Kernel-level backends get the same with grant mappings. As a
side-effect, the hypervisor has triggered the page in, so one of the next
retries should succeed.

My opinion is that you should expand the use of -ENOENT for this
newly-minted "delayed PoD" case. Iiuc, PoD would just succeed or crash the
guest prior to this conversation.

That way, no new code is needed in the upper-layers (neither libxc nor
kernel backends) to deal with delayed PoD allocations. The retry loops
paging needs are already in place and PoD leverages them.

Sharing can get ENOMEM when breaking shares. Much like paging, the
hypervisor will have kicked the appropriate helper, so a retry with an
expectation of success could be put in place. (Nevermind that there are no
in-tree helpers atm: what to do, balloon dom0 to release more mem?). I can
look into making it uniform with the above cases.

As for the detailed interface, getpageframeinfo4 sounds like a good idea.
Otherwise, you need a new privcmd mmap interface in the kernel ... snore
;)

Andres
>
>   -George
>



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.