[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 4/4] arm/mem_access: Add software guest-page-table walk



Hi Julien,


On 05/02/2017 05:17 PM, Julien Grall wrote:
> Hi Sergej,
>
> On 30/04/17 20:48, Sergej Proskurin wrote:
>> The function p2m_mem_access_check_and_get_page in mem_access.c
>> translates a gva to an ipa by means of the hardware functionality
>> implemented in the function gva_to_ipa. If mem_access is active,
>> hardware-based gva to ipa translation might fail, as gva_to_ipa uses the
>> guest's translation tables, access to which might be restricted by the
>> active VTTBR. To address this issue, we perform the gva to ipa
>> translation in software.
>>
>> Signed-off-by: Sergej Proskurin <proskurin@xxxxxxxxxxxxx>
>> ---
>> Cc: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>
>> Cc: Tamas K Lengyel <tamas@xxxxxxxxxxxxx>
>> Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>
>> Cc: Julien Grall <julien.grall@xxxxxxx>
>> ---
>>  xen/arch/arm/mem_access.c | 140
>> +++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 139 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/mem_access.c b/xen/arch/arm/mem_access.c
>> index 04b1506b00..352eb6073f 100644
>> --- a/xen/arch/arm/mem_access.c
>> +++ b/xen/arch/arm/mem_access.c
>> @@ -20,6 +20,7 @@
>>  #include <xen/monitor.h>
>>  #include <xen/sched.h>
>>  #include <xen/vm_event.h>
>> +#include <xen/domain_page.h>
>>  #include <public/vm_event.h>
>>  #include <asm/event.h>
>>
>> @@ -90,6 +91,129 @@ static int __p2m_get_mem_access(struct domain *d,
>> gfn_t gfn,
>>      return 0;
>>  }
>>
>> +static int
>> +p2m_gva_to_ipa(struct p2m_domain *p2m, vaddr_t gva,
>> +               paddr_t *ipa, unsigned int flags)
>
> I don't think this function belongs to mem_access.c. It would be
> better in p2m.c. Also, the name is a bit confusing, a user would not
> know the difference between p2m_gva_to_ipa and gva_to_ipa.

Fair enough, thank you.

>
>> +{
>> +    int level=0, t0_sz, t1_sz;
>
> Coding style level = 0.
>
> Also please use unsigned int for level.
>
> t0_sz and t1_sz should be stay signed.
>
>> +    unsigned long t0_max, t1_min;
>> +    lpae_t pte, *table;
>> +    mfn_t root_mfn;
>> +    uint64_t ttbr;
>> +    uint32_t sctlr = READ_SYSREG(SCTLR_EL1);
>> +    register_t ttbcr = READ_SYSREG(TCR_EL1);
>
> So you are assuming that the current vCPU is running, right? If so,
> this should be clarified in the commit message and in a comment above
> the function.

I will state that in a comment, thank you.

>
> Also s/ttbcr/tcr/
>
>> +    struct domain *d = p2m->domain;
>> +
>> +    const unsigned int offsets[4] = {
>> +#ifdef CONFIG_ARM_64
>> +        zeroeth_table_offset(gva),
>> +#endif
>> +        first_table_offset(gva),
>> +        second_table_offset(gva),
>> +        third_table_offset(gva)
>> +    };
>
> Offsets are based on the granularity of Xen (currently 4KB). However
> the guests can support 4KB, 16KB, 64KB. Please handle everything
> correctly.

True, ARM ist quite flexible. Yet, I have not seen an OS implementation
that is supported by Xen and makes use of page sizes other than 4KB and
their supersets (2/4MB, 1GB) (please let me know, if you know some),
which is why I doubt that we need that. Please let me know why you think
we need that kind of flexibility in software?

If you should nevertheless insist on that functionality, I would include
it in the next patch and try not to blow up the code too much.

>
>> +
>> +    const paddr_t masks[4] = {
>> +#ifdef CONFIG_ARM_64
>> +        ZEROETH_SIZE - 1,
>> +#endif
>> +        FIRST_SIZE - 1,
>> +        SECOND_SIZE - 1,
>> +        THIRD_SIZE - 1
>> +    };
>> +
>> +    /* If the MMU is disabled, there is no need to translate the
>> gva. */
>> +    if ( !(sctlr & SCTLR_M) )
>> +    {
>> +        *ipa = gva;
>> +
>> +        return 0;
>> +    }
>> +
>> +    if ( is_32bit_domain(d) )
>> +    {
>> +        /*
>> +         * XXX: We do not support 32-bit domain translation table
>> walks for
>> +         * domains using the short-descriptor translation table
>> format, yet.
>> +         */
>
> Debian ARM 32bit is using short-descriptor translation table. So are
> you suggesting that memaccess will not correctly with Debian guest?
>

Yes, as stated in the comment, the current implementation does not
support the short-descriptor translation table format. As this is an RFC
patch, I wanted to see your opinion on that functionality in general
before implementing all corner cases of the ARM architecture.

As mentioned in my previous reply in patch (patch 2/4), I would prefer
to separate the long-descriptor from the short-descriptor translation in
the future implementation.

>> +        if ( !(ttbcr & TTBCR_EAE) )
>
> See my comment on patch #2 about the naming convention.
>

Done, thanks.

>> +            return -EFAULT;
>> +
>> +#ifdef CONFIG_ARM_64
>> +        level = 1;
>
> This sounds wrong to me. The first lookup level is determined by the
> value of T0SZ and T1SZ (see B3-1352 in ARM DDI 0406C.c).
>
> For the AArch64 version see D4.2.5 in ARM DDI 0487A.k_iss10775.
>

Thank you for the hint.

>> +#endif
>> +    }
>> +
>> +#ifdef CONFIG_ARM_64
>> +    if ( is_64bit_domain(d) )
>> +    {
>> +        /* Get the max GVA that can be translated by TTBR0. */
>> +        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TCR_SZ_MASK;
>> +        t0_max = (1UL << (64 - t0_sz)) - 1;
>> +
>> +        /* Get the min GVA that can be translated by TTBR1. */
>> +        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TCR_SZ_MASK;
>> +        t1_min = ~0UL - (1UL << (64 - t1_sz)) + 1;
>> +    }
>> +    else
>> +#endif
>> +    {
>> +        /* Get the max GVA that can be translated by TTBR0. */
>> +        t0_sz = (ttbcr >> TCR_T0SZ_SHIFT) & TTBCR_SZ_MASK;
>
> See my comment on patch #2 for the naming convention.
>
>> +        t0_max = (1U << (32 - t0_sz)) - 1;
>> +
>> +        /* Get the min GVA that can be translated by TTBR1. */
>> +        t1_sz = (ttbcr >> TCR_T1SZ_SHIFT) & TTBCR_SZ_MASK;
>> +        t1_min = ~0U - (1U << (32 - t1_sz)) + 1;
>
> 1U << (32 - t1_sz) will not fit in an unsigned long if t1_sz = 0.

An unsigned long long should fix that (however not in the 64-bit case),
thanks.

>
> Also, this code looks wrong to me. Looking at B3.6.4 in DDI 0406C.c,
> you don't handle properly t0_max and t1_min  when T0SZ or T1SZ is 0.
>
> I think it would be worth for you to have a look to the pseudo-code in
> the ARM ARM for TranslationTableWalk (see B3.19.3 in ARM DDI 0406C.c
> and J1-5295 in ARM DDI 0487A.k_iss10775) which gives you all the
> details for doing properly translation table walk.

I will have a deeper look and adopt parts of the given pseudocode in my
implementation. You are right: there might be an off-by-one issue in the
computation above.

>
>> +    }
>> +
>> +    if ( t0_max >= gva )
>> +        /* Use TTBR0 for GVA to IPA translation. */
>> +        ttbr = READ_SYSREG64(TTBR0_EL1);
>> +    else if ( t1_min <= gva )
>> +        /* Use TTBR1 for GVA to IPA translation. */
>> +        ttbr = READ_SYSREG64(TTBR1_EL1);
>> +    else
>> +        /* GVA out of bounds of TTBR(0|1). */
>> +        return -EFAULT;
>> +
>> +    /* Bits [63..48] might be used by an ASID. */
>> +    root_mfn = p2m_lookup(d, _gfn(paddr_to_pfn(ttbr &
>> ((1ULL<<48)-1))), NULL);
>
> Please don't hardcode the mask.
>
>> +
>> +    /* Check, whether TTBR holds a valid address. */
>> +    if ( mfn_eq(root_mfn, INVALID_MFN) )
>> +        return -EFAULT;
>> +
>> +    table = map_domain_page(root_mfn);
>
> Nothing prevents the guest to give back the page table whilst you
> browsing it. You may want to have a look to patch "ARM: introduce
> vgic_access_guest_memory()" [1] to generalize the function and avoid
> re-inventing the wheel.
>

I will have a look at this commit, thanks.

>> +
>> +    for ( ; ; level++ )
>> +    {
>> +        pte = table[offsets[level]];
>
> You likely want to use ACCESS_ONCE here to prevent the compiler to
> read the pte twice from the memory.

I tried that, however the compiler did not seem to like it. I will try
again and let you know if I got that to work.

>
>> +
>> +        if ( level == 3 || !pte.walk.valid || !pte.walk.table )
>> +            break;
>> +
>> +        unmap_domain_page(table);
>> +
>> +        root_mfn = p2m_lookup(d, _gfn(pte.walk.base), NULL);
>
> No check on root_mfn?

Done, thanks.

>
>> +        table = map_domain_page(root_mfn);
>> +    }
>> +
>> +    unmap_domain_page(table);
>> +
>> +    if ( !pte.walk.valid )
>> +        return -EFAULT;
>> +
>> +    /* Make sure the entry holds the requested access attributes. */
>> +    if ( ((flags & GV2M_WRITE) == GV2M_WRITE) && pte.pt.ro )
>> +        return -EFAULT;
>
> IHMO, this function should return the access attribute of the page and
> let the caller decides what to do. This would make this function more
> generic.
>

Makes sense. This would make the function indeed more generic. Since
there are already multiple gpt-walk stubs throughout the codebase, we
could make use of only one.

>> +
>> +    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]);
>> +
>> +    return 0;
>> +}
>> +
>> +
>>  /*
>>   * If mem_access is in use it might have been the reason why
>> get_page_from_gva
>>   * failed to fetch the page, as it uses the MMU for the permission
>> checking.
>> @@ -109,9 +233,23 @@ p2m_mem_access_check_and_get_page(vaddr_t gva,
>> unsigned long flag,
>>      struct page_info *page = NULL;
>>      struct p2m_domain *p2m = &v->domain->arch.p2m;
>>
>> +    ASSERT(p2m->mem_access_enabled);
>> +
>
> Why this ASSERT has been added?

The function p2m_mem_access_check_and_get_page is called only if
get_page_from_gva fails and mem_access is active. I can add a comment
and move this part into an individual commit.

>
>>      rc = gva_to_ipa(gva, &ipa, flag);
>> +
>> +    /*
>> +     * In case mem_access is active, hardware-based gva_to_ipa
>> translation
>> +     * might fail. Since gva_to_ipa uses the guest's translation
>> tables, access
>> +     * to which might be restricted by the active VTTBR, we perform
>> a gva to
>> +     * ipa translation in software.
>> +     */
>>      if ( rc < 0 )
>> -        goto err;
>> +        if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 )
>> +            /*
>> +             * The software gva to ipa translation can still fail,
>> if the the
>> +             * gva is not mapped or does not hold the requested
>> access rights.
>> +             */
>> +            goto err;
>
> Rather falling back, why don't we do software page table walk every time?

A software page table walk would (presumably) take more time to
translate the gva. Do we want that? Also, I am not sure what you meant
by "falling back" at this point. Thank you.

>
>>
>>      gfn = _gfn(paddr_to_pfn(ipa));
>>
>>
>
> Cheers,
>
> [1] https://patchwork.kernel.org/patch/9676291/
>

Cheers,
~Sergej

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.