[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH 4/4] arm/mem_access: Add software guest-page-table walk



On Mon, May 8, 2017 at 5:44 AM, Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi,
>
> On 08/05/17 10:22, Sergej Proskurin wrote:
>>
>> On 05/02/2017 05:17 PM, Julien Grall wrote:
>>>
>>> On 30/04/17 20:48, Sergej Proskurin wrote:
>>> Also s/ttbcr/tcr/
>>>
>>>> +    struct domain *d = p2m->domain;
>>>> +
>>>> +    const unsigned int offsets[4] = {
>>>> +#ifdef CONFIG_ARM_64
>>>> +        zeroeth_table_offset(gva),
>>>> +#endif
>>>> +        first_table_offset(gva),
>>>> +        second_table_offset(gva),
>>>> +        third_table_offset(gva)
>>>> +    };
>>>
>>>
>>> Offsets are based on the granularity of Xen (currently 4KB). However
>>> the guests can support 4KB, 16KB, 64KB. Please handle everything
>>> correctly.
>>
>>
>> True, ARM ist quite flexible. Yet, I have not seen an OS implementation
>> that is supported by Xen and makes use of page sizes other than 4KB and
>> their supersets (2/4MB, 1GB) (please let me know, if you know some),
>> which is why I doubt that we need that. Please let me know why you think
>> we need that kind of flexibility in software?
>>
>> If you should nevertheless insist on that functionality, I would include
>> it in the next patch and try not to blow up the code too much.
>
>
> Linux is able to support 4KB, 16KB, 64KB page granularity for AArch64.
> Centos and RedHat are only shipped with 64KB page granularity.
>
>
>>
>>>
>>>> +
>>>> +    const paddr_t masks[4] = {
>>>> +#ifdef CONFIG_ARM_64
>>>> +        ZEROETH_SIZE - 1,
>>>> +#endif
>>>> +        FIRST_SIZE - 1,
>>>> +        SECOND_SIZE - 1,
>>>> +        THIRD_SIZE - 1
>>>> +    };
>>>> +
>>>> +    /* If the MMU is disabled, there is no need to translate the
>>>> gva. */
>>>> +    if ( !(sctlr & SCTLR_M) )
>>>> +    {
>>>> +        *ipa = gva;
>>>> +
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    if ( is_32bit_domain(d) )
>>>> +    {
>>>> +        /*
>>>> +         * XXX: We do not support 32-bit domain translation table
>>>> walks for
>>>> +         * domains using the short-descriptor translation table
>>>> format, yet.
>>>> +         */
>>>
>>>
>>> Debian ARM 32bit is using short-descriptor translation table. So are
>>> you suggesting that memaccess will not correctly with Debian guest?
>>>
>>
>> Yes, as stated in the comment, the current implementation does not
>> support the short-descriptor translation table format. As this is an RFC
>> patch, I wanted to see your opinion on that functionality in general
>> before implementing all corner cases of the ARM architecture.
>>
>> As mentioned in my previous reply in patch (patch 2/4), I would prefer
>> to separate the long-descriptor from the short-descriptor translation in
>> the future implementation.
>
>
> I agree here. See my answer on patch #2 about how I would like to see the
> implementation.
>
> [...]
>
>>>> +
>>>> +    *ipa = pfn_to_paddr(pte.walk.base) | (gva & masks[level]);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +
>>>>  /*
>>>>   * If mem_access is in use it might have been the reason why
>>>> get_page_from_gva
>>>>   * failed to fetch the page, as it uses the MMU for the permission
>>>> checking.
>>>> @@ -109,9 +233,23 @@ p2m_mem_access_check_and_get_page(vaddr_t gva,
>>>> unsigned long flag,
>>>>      struct page_info *page = NULL;
>>>>      struct p2m_domain *p2m = &v->domain->arch.p2m;
>>>>
>>>> +    ASSERT(p2m->mem_access_enabled);
>>>> +
>>>
>>>
>>> Why this ASSERT has been added?
>>
>>
>> The function p2m_mem_access_check_and_get_page is called only if
>> get_page_from_gva fails and mem_access is active. I can add a comment
>> and move this part into an individual commit.
>
>
> Whilst I agree it is dumb to call this code without mem_access enabled, this
> code is able to cope with it. So why do you need this ASSERT?
>
>
>>
>>>
>>>>      rc = gva_to_ipa(gva, &ipa, flag);
>>>> +
>>>> +    /*
>>>> +     * In case mem_access is active, hardware-based gva_to_ipa
>>>> translation
>>>> +     * might fail. Since gva_to_ipa uses the guest's translation
>>>> tables, access
>>>> +     * to which might be restricted by the active VTTBR, we perform
>>>> a gva to
>>>> +     * ipa translation in software.
>>>> +     */
>>>>      if ( rc < 0 )
>>>> -        goto err;
>>>> +        if ( p2m_gva_to_ipa(p2m, gva, &ipa, flag) < 0 )
>>>> +            /*
>>>> +             * The software gva to ipa translation can still fail,
>>>> if the the
>>>> +             * gva is not mapped or does not hold the requested
>>>> access rights.
>>>> +             */
>>>> +            goto err;
>>>
>>>
>>> Rather falling back, why don't we do software page table walk every time?
>>
>>
>> A software page table walk would (presumably) take more time to
>> translate the gva. Do we want that? Also, I am not sure what you meant
>> by "falling back" at this point. Thank you.
>
>
> What you currently do is try gva_to_ipa and if it does not work you will
> call p2m_gva_to_ipa. This sounds a bit pointless to me and waste of time if
> the underlying memory of stage-1 page table is protected.

But we don't know that the stage-1 page table is protected until the
hardware based lookup fails. I can turn your argument around and say
doing the software based lookup is pointless and a waste of time when
the stage-1 table is not protected. Which is by the way what I would
expect to see in most cases.

> Before saying this is taking much more time, I would like to see actual
> numbers.

Whether the performance is measurable different is going to be very
usecase specific. If the TLBs are already loaded with the translation
then the hardware lookup will be a lot faster. Setting up a test-case
for this is just a PITA and I don't really see why you are against
using the hardware lookup to start with.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.