|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC 1/4] x86/mm: Shadow and p2m changes for PV mem_access
>Thanks for your patch. I have to say, this looks a lot less terrifying than I
>thought it would. :)
I am glad to hear that. I too feared the worst :-) On that note many thanks to
you and Jan for taking the time for giving valuable feedback.
>> Shadow mem_access mode
>> ----------------------
>> Add a new shadow mode for mem_access. This should only be enabled by a
>> mem_access listener when it calls
>> XEN_DOMCTL_MEM_EVENT_OP_ACCESS_ENABLE
>> for a PV domain.
>
>Do we need a specific mode for this? If the default behaviour is to allow all
>accesses until told otherwise, then it can be enabled at all times.
You might be right. I will look in to getting rid of it.
>> P2M changes
>> -----------
>> Add a new p2m implementation for mem_access. Central to this is the
>> access lookup table. This is an array of mfns. Each mfn is a page
>> allocated from the domain's shadow memory. The pages hold the
>> p2m_access_t values for each guest gmfn. p2m_mem_access_set_entry()
>> sets the access value of the mfn given as input and blows the shadow
>> entries for the mfn. p2m_mem_access_get_entry() returns the access
>> value of the mfn given as input.
>
>I think this array needs to be a trie, at least; it may make sense to reuse the
>p2m_pt implementation and just restrict/ignore the types and MFNs.
>That way you won't have to reinvent the wheel, esp. around avoiding long-
>running operations.
I like the idea you mentioned below of stashing the access type in the
shadow_flags field of struct page_info. If that does not work out I will look
in to reusing the p2m_pt implementation.
>> +/* Convert access restrictions to page table flags */ void
>> +p2m_access_to_flags(u32 *flags, u32 gflags, p2m_access_t access) {
>> +
>> + /* Restrict with access permissions */
>> + switch (access)
>> + {
>> + case p2m_access_r:
>> + *flags &= ~(_PAGE_RW);
>> + *flags |= (_PAGE_NX_BIT|_PAGE_PRESENT);
>
>IIUC this function is called to add further restrictions to an existing set of
>flags.
>In that case, it should never set the _PAGE_PRESENT bit. Rather, it should
>only ever _reduce_ permissions (i.e. set _NX or clear other bits). Then...
>
>> + break;
>> + case p2m_access_rx:
>> + case p2m_access_rx2rw:
>> + *flags &= ~(_PAGE_NX_BIT|_PAGE_RW);
>> + *flags |= _PAGE_PRESENT;
>> + break;
>> + case p2m_access_rw:
>> + *flags |= (_PAGE_NX_BIT|_PAGE_RW|_PAGE_PRESENT);
>> + break;
>> + case p2m_access_rwx:
>> + default:
>> + *flags &= ~(_PAGE_NX_BIT);
>> + *flags |= (_PAGE_RW|_PAGE_PRESENT);
>> + break;
>> + }
>> +
>> + // Allow more restrictive guest flags to be propagated instead of access
>> + // permissions
>> + if ( !(gflags & _PAGE_RW) )
>> + *flags &= ~(_PAGE_RW);
>> + if ( gflags & _PAGE_NX_BIT )
>> + *flags |= _PAGE_NX_BIT;
>
>..you won't need these either.
That makes sense. I will follow your suggestion.
>> +}
>> +
>> +/*
>> + * Set the page permission of the mfn. This in effect removes all
>> +shadow
>> + * mappings of that mfn. The access type of that mfn is stored in the
>> +access
>> + * lookup table.
>> + */
>> +static int
>> +p2m_mem_access_set_entry(struct p2m_domain *p2m, unsigned long
>gfn, mfn_t mfn,
>> + unsigned int page_order, p2m_type_t p2mt,
>> + p2m_access_t p2ma) {
>> + struct domain *d = p2m->domain;
>> + mfn_t *access_lookup_table = p2m->access_lookup_table;
>> + uint table_idx;
>> + uint page_idx;
>> + uint8_t *access_table_page;
>> +
>> + ASSERT(shadow_mode_mem_access(d) && access_lookup_table !=
>NULL);
>> +
>> + /* For PV domains we only support rw, rx, rx2rw, rwx access permissions
>*/
>> + if ( unlikely(p2ma != p2m_access_r &&
>> + p2ma != p2m_access_rw &&
>> + p2ma != p2m_access_rx &&
>> + p2ma != p2m_access_rwx &&
>> + p2ma != p2m_access_rx2rw) )
>> + return -EINVAL;
>> +
>> + if ( page_get_owner(mfn_to_page(mfn)) != d )
>> + return -ENOENT;
>
>If we're only ever setting access permissions for a guest's own pages (i.e. not
>setting them for grant mappings or foreign mappings) then the access
>permissions lookup table could potentially be shared among all PV VMs. It
>might even be possible to fold the access-type into the shadow_flags field in
>struct page_info and avoid having a lookup table at all. If that's possible, I
>think it will make some thing a _lot_ easier.
This idea looks very promising. I remember mentioning in this is some of my
earlier emails but got side tracked thinking PV memory is contiguous and I
could use a simple indexed lookup table. So I will resubmit this patch series
after I implement a new version of p2m-ma using the shadow_flags or reusing the
p2m-pt implementation.
>> - /* Sanity check the call */
>> - if ( d == current->domain || (d->arch.paging.mode & mode) == mode )
>> + /*
>> + * Sanity check the call.
>> + * Do not allow shadow mem_access mode to be enabled when
>> + * log-dirty or translate mode is enabled.
>
>Why not log-dirty? That would mean you can't use mem-access and live
>migration at the same time.
I was afraid they would conflict and prevent log-dirty from working correctly.
But I forgot about it being used during live-migration. Let me see if I can get
both of them to live together.
>> + /* Propagate access permissions */
>> + if ( unlikely((level == 1) &&
>> + mem_event_check_ring(&d->mem_event->access) &&
>> + !sh_mfn_is_a_page_table(target_mfn)) )
>> + {
>> + struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> + p2m_access_t a;
>> + p2m_type_t t;
>> + mfn_t mfn;
>> + mfn = p2m->get_entry(p2m, mfn_x(target_mfn), &t, &a, 0,
>> + NULL);
>
>Ah, no. This access lookup ought to happen as part of the original p2m lookup
>that produced target_mfn. Having a second lookup here is unpleasant and
>confusing -- this code keeps the distinction between gfn and mfn pretty
>strictly, so passing an mfn into a p2m lookup is wrong, even though you
>happen to know they're the same namespace in this PV guest.
>
>Of course, if you can hide the access type in the struct page_info, then it's
>correct to recover it here.
OK, I am hoping to do that here.
>> + /*
>> + * Do not police writes to guest memory emanating from the Xen
>> + * kernel. Trying to do so will cause the same pagefault to
>> occur
>> + * over and over again with an event being sent to the access
>> + * listener for each fault. If the access listener's vcpu is not
>> + * scheduled during this time, the violation is never resolved
>> and
>> + * will eventually end with the host crashing.
>> + */
>> + if ( (violation && access_w) &&
>> + (regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END)
>> )
>> + {
>> + violation = 0;
>> + rc = p2m->set_entry(p2m, gfn_x(gfn), gmfn, PAGE_ORDER_4K,
>> + p2m_ram_rw, p2m_access_rw);
>> + }
>
>That's exploitable, surely: the guest just needs to make a hypercall with a
>chosen buffer address to give itself _rw access to a page.
>(Which I guess is OK for your intended use of rw<->rx but not in
>general.)
I agree but could not come up with a way to get it to work. I have sent a
separate email regarding this issue.
Cheers,
Aravindh
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |