[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] RFC: AMD support for paging
On 02/15/2012 05:06 PM, Andres Lagar-Cavilla wrote: On 02/15/2012 04:14 PM, Andres Lagar-Cavilla wrote:On 02/14/2012 08:05 PM, Andres Lagar-Cavilla wrote:We started hashing out some AMD support for mem_paging and mem_access. Right now my VMs boot, page out a bit, and then die on an HVM triple fault. Most importantly, I want to get somebody from AMD to comment/help out on this. It feels like we're inches away from enabling support for this very nice feature. I'm not sure who exactly on AMD monitors the list for these kinds of things. It'd be great to have you on board! For starters, the changes to the p2m code are relatively mild, but it'd be great if somebody spots a red flag. Another issue: comments indicate that bits 59-62 in NPT entries are used for IOMMU flags but effectively bits 61-62 are. Repossessing one bit (59) would give us enough space to support mem_access. Right now we only have 7 bits available for Xen flags and that is not enough for paging and access. Is bit 59 effectively reserved?Hi bit 59 is used by iommu hardware for ATS response. In most cases, for p2m_ram_rw pages, U bit must be 0. But maybe for other page types that are not potentially used by DMA, this bit could be non-zero. I could tested it on my iommu machines if you had some patches that use U bits.Hi Wei, thanks for pitching in! I assume you're talking about table 14 (and fig 9) in http://support.amd.com/us/Processor_TechDocs/48882.pdfYes, indeed.I don't think this will work. The p2m access value is arbitrary, and independent of the p2m type. So bit 59 is out of reach and we're stuck with 7 bits for Xen use. Thanks for the clarification.Where will p2m access bit be stored? Please note that bit 52 - bit 58 for pte must be zero for p2m_ram_rw pages. For iommu pte, only bit 63 and bit 1 - bit 8 are free to use if PR bit = 1.Wei, Why *must* bits 52-58 be zero for p2m_ram_rw pages? Or is it just the current convention? I propose limiting p2m type to bits 52-55, and storing p2m access on bits 56-58. So, p2m_ram_rw pages with a non-zero access would break the "must be zero" requirement/convention. Right now, without any considerations for paging, we're storing the p2m type in bits 52-58 when PR=1. That p2m type can be non zero. p2m_ram_ro, p2m_mmio_dm, p2m_logdirty, p2m_populate_on_demand are all par for the course. Given your statement above, and table 14 in the IOMMU manual, how is this even working? Or is it not working? It works because only p2m_ram_rw (which is 0) pages suppose to be used by DMA. But, indeed, if iommu tries to access pages with non-zero types, like p2m_ram_ro,,it will have io page faults. It seems that we don't have use cases like this. If mem access bits are independent to p2m types, I think we might have a few solutions here: 1) reverse iommu pt sharing for amd iommu totally. 2) disable iommu pt sharing when xen paging enabled. I am open for both of them. Thanks, Wei Would a violation of these rules cause a VMEXIT_SHUTDOWN? Thanks a lot for the info! AndresThanks, WeiAn alternative to enabling mem_access on AMD processors will be to limit the access types to 3 bits. This would force disabling support for two modes. I'm inclined to disable two out of X, W and WX. I don't think they make sense without R permissions. Thanks! AndresThanks, WeiFinally, the triple fault. Maybe I'm missing something obvious. Comments welcome. Patch inline below, thanks! Andres Enable AMD support for paging. Signed-off-by: Andres Lagar-Cavilla<andres@xxxxxxxxxxxxxxxx> Signed-off-by: Adin Scannell<adin@xxxxxxxxxxx> diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -537,10 +537,6 @@ int mem_event_domctl(struct domain *d, x if ( !hap_enabled(d) ) break; - /* Currently only EPT is supported */ - if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) - break; - rc = -EXDEV; /* Disallow paging in a PoD guest */ if ( p2m->pod.entry_count ) diff -r 25ca78889ed4 -r 10ca4e4293ce xen/arch/x86/mm/p2m-pt.c --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -53,6 +53,20 @@ #define P2M_BASE_FLAGS \ (_PAGE_PRESENT | _PAGE_USER | _PAGE_DIRTY | _PAGE_ACCESSED) +#ifdef __x86_64__ +/* l1e_from_pfn is not designed to have INVALID_MFN stored. The 0xff..ff + * value tramples over the higher-order bits used for flags (NX, p2mt, + * etc.) This happens for paging entries. Thus we do this clip/unclip + * juggle for l1 entries only (no paging superpages!) */ +#define EFF_MFN_WIDTH (PADDR_BITS-PAGE_SHIFT) /* 40 bits */ +#define clipped_mfn(mfn) ((mfn)& ((1UL<< EFF_MFN_WIDTH) - 1)) +#define unclip_mfn(mfn) ((clipped_mfn((mfn)) == INVALID_MFN) ? \ + INVALID_MFN : (mfn)) +#else +#define clipped_mfn(mfn) (mfn) +#define unclip_mfn(mfn) (mfn) +#endif /* __x86_64__ */ + static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn) { unsigned long flags; @@ -77,6 +91,9 @@ static unsigned long p2m_type_to_flags(p case p2m_invalid: case p2m_mmio_dm: case p2m_populate_on_demand: + case p2m_ram_paging_out: + case p2m_ram_paged: + case p2m_ram_paging_in: default: return flags; case p2m_ram_ro: @@ -168,7 +185,7 @@ p2m_next_level(struct p2m_domain *p2m, m shift, max)) ) return 0; - /* PoD: Not present doesn't imply empty. */ + /* PoD/paging: Not present doesn't imply empty. */ if ( !l1e_get_flags(*p2m_entry) ) { struct page_info *pg; @@ -384,8 +401,9 @@ p2m_set_entry(struct p2m_domain *p2m, un 0, L1_PAGETABLE_ENTRIES); ASSERT(p2m_entry); - if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) ) - entry_content = l1e_from_pfn(mfn_x(mfn), + if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) || + (p2mt == p2m_ram_paged) || (p2mt == p2m_ram_paging_in) ) + entry_content = l1e_from_pfn(clipped_mfn(mfn_x(mfn)), p2m_type_to_flags(p2mt, mfn)); else entry_content = l1e_empty(); @@ -393,7 +411,7 @@ p2m_set_entry(struct p2m_domain *p2m, un if ( entry_content.l1 != 0 ) { p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags); - old_mfn = l1e_get_pfn(*p2m_entry); + old_mfn = unclip_mfn(l1e_get_pfn(*p2m_entry)); } /* level 1 entry */ p2m->write_p2m_entry(p2m, gfn, p2m_entry, table_mfn, entry_content, 1); @@ -615,11 +633,12 @@ pod_retry_l1: sizeof(l1e)); if ( ret == 0 ) { + unsigned long l1e_mfn = unclip_mfn(l1e_get_pfn(l1e)); p2mt = p2m_flags_to_type(l1e_get_flags(l1e)); - ASSERT(l1e_get_pfn(l1e) != INVALID_MFN || !p2m_is_ram(p2mt)); + ASSERT( (l1e_mfn != INVALID_MFN || !p2m_is_ram(p2mt)) || + (l1e_mfn == INVALID_MFN&& p2m_is_paging(p2mt)) ); - if ( p2m_flags_to_type(l1e_get_flags(l1e)) - == p2m_populate_on_demand ) + if ( p2mt == p2m_populate_on_demand ) { /* The read has succeeded, so we know that the mapping * exits at this point. */ @@ -641,7 +660,7 @@ pod_retry_l1: } if ( p2m_is_valid(p2mt) || p2m_is_grant(p2mt) ) - mfn = _mfn(l1e_get_pfn(l1e)); + mfn = _mfn(l1e_mfn); else /* XXX see above */ p2mt = p2m_mmio_dm; @@ -783,18 +802,26 @@ pod_retry_l2: pod_retry_l1: if ( (l1e_get_flags(*l1e)& _PAGE_PRESENT) == 0 ) { + p2m_type_t l1t = p2m_flags_to_type(l1e_get_flags(*l1e)); /* PoD: Try to populate */ - if ( p2m_flags_to_type(l1e_get_flags(*l1e)) == p2m_populate_on_demand ) + if ( l1t == p2m_populate_on_demand ) { if ( q != p2m_query ) { if ( !p2m_pod_demand_populate(p2m, gfn, PAGE_ORDER_4K, q) ) goto pod_retry_l1; } else *t = p2m_populate_on_demand; + } else { + if ( p2m_is_paging(l1t) ) + { + *t = l1t; + /* No need to unclip due to check below */ + mfn = _mfn(l1e_get_pfn(*l1e)); + } } unmap_domain_page(l1e); - return _mfn(INVALID_MFN); + return (l1t == p2m_ram_paging_out) ? mfn : _mfn(INVALID_MFN); } mfn = _mfn(l1e_get_pfn(*l1e)); *t = p2m_flags_to_type(l1e_get_flags(*l1e)); @@ -914,7 +941,7 @@ static void p2m_change_type_global(struc flags = l1e_get_flags(l1e[i1]); if ( p2m_flags_to_type(flags) != ot ) continue; - mfn = l1e_get_pfn(l1e[i1]); + mfn = unclip_mfn(l1e_get_pfn(l1e[i1])); gfn = i1 + (i2 + (i3 #if CONFIG_PAGING_LEVELS>= 4 + (i4 * L3_PAGETABLE_ENTRIES) @@ -923,7 +950,7 @@ static void p2m_change_type_global(struc * L2_PAGETABLE_ENTRIES) * L1_PAGETABLE_ENTRIES; /* create a new 1le entry with the new type */ flags = p2m_type_to_flags(nt, _mfn(mfn)); - l1e_content = l1e_from_pfn(mfn, flags); + l1e_content = l1e_from_pfn(clipped_mfn(mfn), flags); p2m->write_p2m_entry(p2m, gfn,&l1e[i1], l1mfn, l1e_content, 1); } @@ -1073,7 +1100,7 @@ long p2m_pt_audit_p2m(struct p2m_domain entry_count++; continue; } - mfn = l1e_get_pfn(l1e[i1]); + mfn = unclip_mfn(l1e_get_pfn(l1e[i1])); ASSERT(mfn_valid(_mfn(mfn))); m2pfn = get_gpfn_from_mfn(mfn); if ( m2pfn != gfn&& _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |