[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC PATCH 4/4] arm/mem_access: Add software guest-page-table walk
Hi Julien, On 05/02/2017 05:17 PM, Julien Grall wrote: > Hi Sergej, > > On 30/04/17 20:48, Sergej Proskurin wrote: >> The function p2m_mem_access_check_and_get_page in mem_access.c >> translates a gva to an ipa by means of the hardware functionality >> implemented in the function gva_to_ipa. If mem_access is active, >> hardware-based gva to ipa translation might fail, as gva_to_ipa uses the >> guest's translation tables, access to which might be restricted by the >> active VTTBR. To address this issue, we perform the gva to ipa >> translation in software. >> >> Signed-off-by: Sergej Proskurin <proskurin@xxxxxxxxxxxxx> >> --- >> Cc: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> >> Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx> >> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx> >> Cc: Julien Grall <julien.grall@xxxxxxx> >> --- >> xen/arch/arm/mem_access.c | 140 >> +++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 139 insertions(+), 1 deletion(-) >> >> diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c >> index 04b1506b00..352eb6073f 100644 >> --- a/xen/arch/arm/mem_access.c >> +++ b/xen/arch/arm/mem_access.c >> @@ -20,6 +20,7 @@ >> #include <xen/monitor.h> >> #include <xen/sched.h> >> #include <xen/vm_event.h> >> +#include <xen/domain_page.h> >> #include <public/vm_event.h> >> #include <asm/event.h> >> >> @@ -90,6 +91,129 @@ static int __p2m_get_mem_access(struct domain *d, >> gfn_t gfn, >> return 0; >> } >> >> +static int >> +p2m_gva_to_ipa(struct p2m_domain *p2m, vaddr_t gva, >> + paddr_t *ipa, unsigned int flags) > > I don't think this function belongs to mem_access.c. It would be > better in p2m.c. Also, the name is a bit confusing, a user would not > know the difference between p2m_gva_to_ipa and gva_to_ipa. Fair enough, thank you. > >> +{ >> + int level=0, t0_sz, t1_sz; > > Coding style level = 0. > > Also please use unsigned int for level. > > t0_sz and t1_sz should be stay signed. > >> + unsigned long t0_max, t1_min; >> + lpae_t pte, *table; >> + mfn_t root_mfn; >> + uint64_t ttbr; >> + uint32_t sctlr = READ_SYSREG(SCTLR_EL1); >> + register_t ttbcr = READ_SYSREG(TCR_EL1); > > So you are assuming that the current vCPU is running, right? If so, > this should be clarified in the commit message and in a comment above > the function. I will state that in a comment, thank you. > > Also s/ttbcr/tcr/ > >> + struct domain *d = p2m->domain; >> + >> + const unsigned int offsets[4] = { >> +#ifdef CONFIG_ARM_64 >> + zeroeth_table_offset(gva), >> +#endif >> + first_table_offset(gva), >> + second_table_offset(gva), >> + third_table_offset(gva) >> + }; > > Offsets are based on the granularity of Xen (currently 4KB). However > the guests can support 4KB, 16KB, 64KB. Please handle everything > correctly. True, ARM ist quite flexible. Yet, I have not seen an OS implementation that is supported by Xen and makes use of page sizes other than 4KB and their supersets (2/4MB, 1GB) (please let me know, if you know some), which is why I doubt that we need that. Please let me know why you think we need that kind of flexibility in software? If you should nevertheless insist on that functionality, I would include it in the next patch and try not to blow up the code too much. > >> + >> + const paddr_t masks[4] = { >> +#ifdef CONFIG_ARM_64 >> + ZEROETH_SIZE - 1, >> +#endif >> + FIRST_SIZE - 1, >> + SECOND_SIZE - 1, >> + THIRD_SIZE - 1 >> + }; >> + >> + /* If the MMU is disabled, there is no need to translate the >> gva. */ >> + if ( !(sctlr & SCTLR_M) ) >> + { >> + *ipa = gva; >> + >> + return 0; >> + } >> + >> + if ( is_32bit_domain(d) ) >> + { >> + /* >> + * XXX: We do not support 32-bit domain translation table >> walks for >> + * domains using the short-descriptor translation table >> format, yet. >> + */ > > Debian ARM 32bit is using short-descriptor translation table. So are > you suggesting that memaccess will not correctly with Debian guest? > Yes, as stated in the comment, the current implementation does not support the short-descriptor translation table format. As this is an RFC patch, I wanted to see your opinion on that functionality in general before implementing all corner cases of the ARM architecture. As mentioned in my previous reply in patch (patch 2/4), I would prefer to separate the long-descriptor from the short-descriptor translation in the future implementation. >> + if ( !(ttbcr & TTBCR_EAE) ) > > See my comment on patch #2 about the naming convention. > Done, thanks. >> + return -EFAULT; >> + >> +#ifdef CONFIG_ARM_64 >> + level = 1; > > This sounds wrong to me. The first lookup level is determined by the > value of T0SZ and T1SZ (see B3-1352 in ARM DDI 0406C.c). > > For the AArch64 version see D4.2.5 in ARM DDI 0487A.k_iss10775. > Thank you for the hint. >> +#endif >> + } >> + >> +#ifdef CONFIG_ARM_64 >> + if ( is_64bit_domain(d) ) >> + { >> + /* Get the max GVA that can be translated by TTBR0. */ >> + t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK; >> + t0_max = (1UL << (64 - t0_sz)) - 1; >> + >> + /* Get the min GVA that can be translated by TTBR1. */ >> + t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK; >> + t1_min = ~0UL - (1UL << (64 - t1_sz)) + 1; >> + } >> + else >> +#endif >> + { >> + /* Get the max GVA that can be translated by TTBR0. */ >> + t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TTBCR_SZ_MASK; > > See my comment on patch #2 for the naming convention. > >> + t0_max = (1U << (32 - t0_sz)) - 1; >> + >> + /* Get the min GVA that can be translated by TTBR1. */ >> + t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TTBCR_SZ_MASK; >> + t1_min = ~0U - (1U << (32 - t1_sz)) + 1; > > 1U << (32 - t1_sz) will not fit in an unsigned long if t1_sz = 0. An unsigned long long should fix that (however not in the 64-bit case), thanks. > > Also, this code looks wrong to me. Looking at B3.6.4 in DDI 0406C.c, > you don't handle properly t0_max and t1_min when T0SZ or T1SZ is 0. > > I think it would be worth for you to have a look to the pseudo-code in > the ARM ARM for TranslationTableWalk (see B3.19.3 in ARM DDI 0406C.c > and J1-5295 in ARM DDI 0487A.k_iss10775) which gives you all the > details for doing properly translation table walk. I will have a deeper look and adopt parts of the given pseudocode in my implementation. You are right: there might be an off-by-one issue in the computation above. > >> + } >> + >> + if ( t0_max >= gva ) >> + /* Use TTBR0 for GVA to IPA translation. */ >> + ttbr = READ_SYSREG64(TTBR0_EL1); >> + else if ( t1_min <= gva ) >> + /* Use TTBR1 for GVA to IPA translation. */ >> + ttbr = READ_SYSREG64(TTBR1_EL1); >> + else >> + /* GVA out of bounds of TTBR(0|1). */ >> + return -EFAULT; >> + >> + /* Bits [63..48] might be used by an ASID. */ >> + root_mfn = p2m_lookup(d, _gfn(paddr_to_pfn(ttbr & >> ((1ULL<<48)-1))), NULL); > > Please don't hardcode the mask. > >> + >> + /* Check, whether TTBR holds a valid address. */ >> + if ( mfn_eq(root_mfn, INVALID_MFN) ) >> + return -EFAULT; >> + >> + table = map_domain_page(root_mfn); > > Nothing prevents the guest to give back the page table whilst you > browsing it. You may want to have a look to patch "ARM: introduce > vgic_access_guest_memory()" [1] to generalize the function and avoid > re-inventing the wheel. > I will have a look at this commit, thanks. >> + >> + for ( ; ; level++ ) >> + { >> + pte = table[offsets[level]]; > > You likely want to use ACCESS_ONCE here to prevent the compiler to > read the pte twice from the memory. I tried that, however the compiler did not seem to like it. I will try again and let you know if I got that to work. > >> + >> + if ( level == 3 || !pte.walk.valid || !pte.walk.table ) >> + break; >> + >> + unmap_domain_page(table); >> + >> + root_mfn = p2m_lookup(d, _gfn(pte.walk.base), NULL); > > No check on root_mfn? Done, thanks. > >> + table = map_domain_page(root_mfn); >> + } >> + >> + unmap_domain_page(table); >> + >> + if ( !pte.walk.valid ) >> + return -EFAULT; >> + >> + /* Make sure the entry holds the requested access attributes. */ >> + if ( ((flags & GV2M_WRITE) == GV2M_WRITE) && pte.pt.ro ) >> + return -EFAULT; > > IHMO, this function should return the access attribute of the page and > let the caller decides what to do. This would make this function more > generic. > Makes sense. This would make the function indeed more generic. Since there are already multiple gpt-walk stubs throughout the codebase, we could make use of only one. >> + >> + *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]); >> + >> + return 0; >> +} >> + >> + >> /* >> * If mem_access is in use it might have been the reason why >> get_page_from_gva >> * failed to fetch the page, as it uses the MMU for the permission >> checking. >> @@ -109,9 +233,23 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, >> unsigned long flag, >> struct page_info *page = NULL; >> struct p2m_domain *p2m = &v->domain->arch.p2m; >> >> + ASSERT(p2m->mem_access_enabled); >> + > > Why this ASSERT has been added? The function p2m_mem_access_check_and_get_page is called only if get_page_from_gva fails and mem_access is active. I can add a comment and move this part into an individual commit. > >> rc = gva_to_ipa(gva, &ipa, flag); >> + >> + /* >> + * In case mem_access is active, hardware-based gva_to_ipa >> translation >> + * might fail. Since gva_to_ipa uses the guest's translation >> tables, access >> + * to which might be restricted by the active VTTBR, we perform >> a gva to >> + * ipa translation in software. >> + */ >> if ( rc < 0 ) >> - goto err; >> + if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 ) >> + /* >> + * The software gva to ipa translation can still fail, >> if the the >> + * gva is not mapped or does not hold the requested >> access rights. >> + */ >> + goto err; > > Rather falling back, why don't we do software page table walk every time? A software page table walk would (presumably) take more time to translate the gva. Do we want that? Also, I am not sure what you meant by "falling back" at this point. Thank you. > >> >> gfn = _gfn(paddr_to_pfn(ipa)); >> >> > > Cheers, > > [1] https://patchwork.kernel.org/patch/9676291/ > Cheers, ~Sergej _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |