Xen project Mailing List

Re: [Xen-devel] [PATCH RFC v2 1/4] x86/mm: Shadow and p2m changes for PV mem_access

From: "Aravindh Puthiyaparambil (aravindp)" <aravindp@xxxxxxxxx>

Date: Thu, 24 Jul 2014 23:34:32 +0000

Accept-language: en-US

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Tim Deegan <tim@xxxxxxx>

Delivery-date: Thu, 24 Jul 2014 23:35:04 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHPp0um5fBJUbHNFEG0eCkTCAq31puvzTKw

Thread-topic: [PATCH RFC v2 1/4] x86/mm: Shadow and p2m changes for PV mem_access

>> +/* Get the page permission of the mfn from page_info->shadow_flags */ >> +static mfn_t >> +p2m_mem_access_get_entry(struct p2m_domain *p2m, unsigned long >gfn, >> + p2m_type_t *t, p2m_access_t *a, p2m_query_t q, >> + unsigned int *page_order) >> +{ >> + struct domain *d = p2m->domain; >> + /* For PV guests mfn == gfn */ > >Valid, but why is this not being checked in any way in the earlier >"set" counterpart? I will add that check. >> + mfn_t mfn = _mfn(gfn); >> + struct page_info *page = mfn_to_page(mfn); >> + >> + ASSERT(shadow_mode_enabled(d)); >> + >> + *t = p2m_ram_rw; >> + >> + if ( page_get_owner(page) != d ) > >I think there's a mfn_valid() check missing prior to this re-ref of page. I will add that too. >> @@ -1356,7 +1357,7 @@ void shadow_prealloc(struct domain *d, u32 type, >> unsigned int count) >> >> /* Deliberately free all the memory we can: this will tear down all of >> * this domain's shadows */ >> -static void shadow_blow_tables(struct domain *d) >> +void shadow_blow_tables(struct domain *d) > >This doesn't belong here - the patch doesn't add any new/external >caller of this function. I was trying to bunch shadow changes in to one patch. I will add this to the mem_access patch where the external caller is being added. >> @@ -2435,15 +2436,20 @@ int sh_remove_all_mappings(struct vcpu *v, >mfn_t gmfn) >> /* If that didn't catch the mapping, something is very wrong */ >> if ( !sh_check_page_has_no_refs(page) ) >> { >> - /* Don't complain if we're in HVM and there are some extra mappings: >> + /* >> + * Don't complain if we're in HVM and there are some extra mappings: >> * The qemu helper process has an untyped mapping of this dom's RAM >> >> * and the HVM restore program takes another. >> * Also allow one typed refcount for xenheap pages, to match >> - * share_xen_page_with_guest(). */ >> + * share_xen_page_with_guest(). >> + * PV domains that have a mem_access listener, runs in shadow mode >> + * without refcounts. >> + */ >> if ( !(shadow_mode_external(v->domain) >> && (page->count_info & PGC_count_mask) <= 3 >> && ((page->u.inuse.type_info & PGT_count_mask) >> - == !!is_xen_heap_page(page))) ) >> + == !!is_xen_heap_page(page))) && >> + !mem_event_check_ring(&v->domain->mem_event->access) ) > >To me this doesn't look to be in sync with the comment, as the new >check is being carried out regardless of domain type. Furthermore >this continues to have the problem of also hiding issues unrelated >to mem-access handling. I will add a check for PV domain. This check is wrt to refcouting. From what Tim told me PV domains cannot run in that mode, so I don't think any issues will be hidden if I add the check for PV domain. >> @@ -3009,7 +3020,84 @@ static int sh_page_fault(struct vcpu *v, >> >> /* What mfn is the guest trying to access? */ >> gfn = guest_l1e_get_gfn(gw.l1e); >> - gmfn = get_gfn(d, gfn, &p2mt); >> + if ( likely(!mem_event_check_ring(&d->mem_event->access)) ) >> + gmfn = get_gfn(d, gfn, &p2mt); >> + /* >> + * A mem_access listener is present, so we will first check if a >> violation >> + * has occurred. >> + */ >> + else >> + { >> + struct p2m_domain *p2m = p2m_get_hostp2m(v->domain); >> + p2m_access_t p2ma; >> + >> + gmfn = get_gfn_type_access(p2m, gfn_x(gfn), &p2mt, &p2ma, 0, >NULL); >> + if ( mfn_valid(gmfn) && !sh_mfn_is_a_page_table(gmfn) >> + && regs->error_code & PFEC_page_present >> + && !(regs->error_code & PFEC_reserved_bit) ) >> + { >> + int violation = 0; >> + bool_t access_w = !!(regs->error_code & PFEC_write_access); >> + bool_t access_x = !!(regs->error_code & PFEC_insn_fetch); >> + bool_t access_r = access_x ? 0 : !access_w; > >"violation" looks to be a boolean just like the other three, so wants >to also be bool_t. Done. >> + >> + /* If the access is against the permissions, then send to >> mem_event >*/ >> + switch ( p2ma ) >> + { >> + case p2m_access_r: >> + violation = access_w || access_x; >> + break; >> + case p2m_access_rx: >> + case p2m_access_rx2rw: >> + violation = access_w; >> + break; >> + case p2m_access_rw: >> + violation = access_x; >> + break; >> + case p2m_access_rwx: >> + default: >> + break; >> + } >> + >> + /* >> + * Do not police writes to guest memory from the Xen hypervisor. >> + * This keeps PV mem_access on par with HVM. Turn off CR0.WP >here to >> + * allow the write to go through if the guest has marked the >> page as >> + * writable. Turn it back on in the guest access functions >> + * __copy_to_user / __put_user_size() after the write is >> completed. >> + */ >> + if ( violation && access_w && >> + regs->eip >= XEN_VIRT_START && regs->eip <= XEN_VIRT_END ) > >Definitely < instead of <= on the right side. But - is this safe, the more >that this again doesn't appear to be sitting in a guest kind specific block? >I'd at least expect this to be qualified by a regs->cs and/or >guest_mode() check. I will add the guest kind check. Should I do it for the whole block i.e add is_pv_domain() in addition to mem_event_check_ring()? That will also address next comment below. I will add the guest_mode() in addition to the above check for policing Xen writes to guest memory. >> + { >> + unsigned long cr0 = read_cr0(); >> + >> + violation = 0; >> + if ( cr0 & X86_CR0_WP && >> + guest_l1e_get_flags(gw.l1e) & _PAGE_RW ) >> + { >> + cr0 &= ~X86_CR0_WP; >> + write_cr0(cr0); >> + v->arch.pv_vcpu.need_cr0_wp_set = 1; > >PV field access within a non-PV-only code block? > >> + } > >I wonder how well the window where you're running with CR0.WP clear >is bounded: The flag serves as a kind of security measure, and hence >shouldn't be left off for extended periods of time. I agree. Is there a way I can bound this? >> + } >> + >> + if ( violation ) >> + { >> + paddr_t gpa = (mfn_x(gmfn) << PAGE_SHIFT) + >> + (va & ((1 << PAGE_SHIFT) - 1)); >> + if ( !p2m_mem_access_check(gpa, 1, va, access_r, access_w, >> + access_x, &req_ptr) ) >> + { >> + SHADOW_PRINTK("Page access %c%c%c for gmfn=%"PRI_mfn" >p2ma: %d\n", >> + (access_r ? 'r' : '-'), >> + (access_w ? 'w' : '-'), >> + (access_x ? 'x' : '-'), mfn_x(gmfn), >> p2ma); >> + /* Rights not promoted, vcpu paused, work here is done >> */ >> + goto out_put_gfn; > >Rather than re-using just two lines from the normal exit path and >introducing to it a mem-access specific code block, put the exit >processing here, at once allowing req_ptr to be limited in scope? Good idea. I will do that. >> --- a/xen/common/page_alloc.c >> +++ b/xen/common/page_alloc.c >> @@ -43,6 +43,7 @@ >> #include <asm/page.h> >> #include <asm/numa.h> >> #include <asm/flushtlb.h> >> +#include <asm/shadow.h> > >Breaking ARM? >> #ifdef CONFIG_X86 >> #include <asm/p2m.h> >> #include <asm/setup.h> /* for highmem_start only */ >> @@ -1660,6 +1661,8 @@ int assign_pages( >> page_set_owner(&pg[i], d); >> smp_wmb(); /* Domain pointer must be visible before updating refcnt. >*/ >> pg[i].count_info = PGC_allocated | 1; >> + if ( is_pv_domain(d) ) >> + shadow_set_access(&pg[i], p2m_get_hostp2m(d)->default_access); > >I don't think you should call shadow code from here. Should I add a p2m wrapper for this, so that it is valid only for x86 and a no-op for ARM? >> --- a/xen/include/asm-x86/domain.h >> +++ b/xen/include/asm-x86/domain.h >> @@ -380,6 +380,12 @@ struct pv_vcpu >> /* Deferred VA-based update state. */ >> bool_t need_update_runstate_area; >> struct vcpu_time_info pending_system_time; >> + >> + /* >> + * Flag that tracks if CR0.WP needs to be set after a Xen write to guest >> + * memory when a PV domain has a mem_access listener attached to it. >> + */ >> + bool_t need_cr0_wp_set; >> }; > >The new field would better go above need_update_runstate_area >for better space usage. Done. >> --- a/xen/include/asm-x86/x86_64/uaccess.h >> +++ b/xen/include/asm-x86/x86_64/uaccess.h >> @@ -1,6 +1,8 @@ >> #ifndef __X86_64_UACCESS_H >> #define __X86_64_UACCESS_H >> >> +#include <xen/sched.h> > >This is pretty ugly. You only reference the needed fields in a macro, >so you don't strictly need to include this here as long as all (or at least >most - the rest could be patched up) use sites include it. OK, I will give that a shot. >> @@ -65,6 +67,11 @@ do { > \ >> case 8: __put_user_asm(x,ptr,retval,"q","","ir",errret);break; \ >> default: __put_user_bad(); \ >> } \ >> + if ( unlikely(current->arch.pv_vcpu.need_cr0_wp_set) ) \ >> + { \ >> + write_cr0(read_cr0() | X86_CR0_WP); \ >> + current->arch.pv_vcpu.need_cr0_wp_set = 0; \ >> + } \ >> } while (0) > >Please obey to the original indentation method. I thought tab characters were not allowed for indentation. But I guess you do not want tabs and spaces for indentation purposes to be mixed within a macro? Thanks, Aravindh _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.