[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.

To: Julien Grall <julien@xxxxxxx>
From: Paran Lee <p4ranlee@xxxxxxxxx>
Date: Wed, 27 Apr 2022 00:37:50 +0900
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Austin Kim <austindh.kim@xxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Tue, 26 Apr 2022 15:38:02 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hello, Julien Grall.

Thanks you, I agreed! It made me think once more about what my patch
could improve.
patches I sent have been reviewed in various ways. It was a good
opportunity to analyze my patch from various perspectives. :)

I checked objdump in -O2 optimization(default) of Xen Makefile to make
sure CSE (Common subexpression elimination) works well on the latest
arm64 cross compiler on x86_64 from  Arm GNU Toolchain.

$
~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
-v
...
A-profile Architecture 10.3-2021.07 (arm-10.29)'
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile
Architecture 10.3-2021.07 (arm-10.29)

I compared the before and after my patches. This time, without adding a
"pages" variable, I proceeded to use the local variable mask with order
operation.

I was able to confirm that it does one less operation.

(1) before clean up

0000000000001bb4 <p2m_set_entry>:
    while ( nr )
    1bb4:       b40005e2        cbz     x2, 1c70 <p2m_set_entry+0xbc>
{
    ...
        if ( rc )
    1c1c:       350002e0        cbnz    w0, 1c78 <p2m_set_entry+0xc4>
        sgfn = gfn_add(sgfn, (1 << order));
    1c20:       1ad32373        lsl     w19, w27, w19   // <<< CES works
    1c24:       93407e73        sxtw    x19, w19        // <<< well!
    return _gfn(gfn_x(gfn) + i);
    1c28:       8b1302d6        add     x22, x22, x19
    return _mfn(mfn_x(mfn) + i);
    1c2c:       8b130281        add     x1, x20, x19
    1c30:       b100069f        cmn     x20, #0x1
    1c34:       9a941034        csel    x20, x1, x20, ne  // ne = any
    while ( nr )
    1c38:       eb1302b5        subs    x21, x21, x19
    1c3c:       540001e0        b.eq    1c78 <p2m_set_entry+0xc4>  // b.none

(2) Using again mask variable. mask = 1UL << order
code show me   sxtw    x19, w19    operation disappeared.

0000000000001bb4 <p2m_set_entry>:
    while ( nr )
    1bb4:       b40005c2        cbz     x2, 1c6c <p2m_set_entry+0xb8>
{
    ...
        if ( rc )
    1c1c:       350002c0        cbnz    w0, 1c74 <p2m_set_entry+0xc0>
        mask = 1UL << order;
    1c20:       9ad32373        lsl     x19, x27, x19   // <<< only lsl
    return _gfn(gfn_x(gfn) + i);
    1c24:       8b1302d6        add     x22, x22, x19
    return _mfn(mfn_x(mfn) + i);
    1c28:       8b130281        add     x1, x20, x19
    1c2c:       b100069f        cmn     x20, #0x1
    1c30:       9a941034        csel    x20, x1, x20, ne  // ne = any
    while ( nr )
    1c34:       eb1302b5        subs    x21, x21, x19
    1c38:       540001e0        b.eq    1c74 <p2m_set_entry+0xc0>  // b.none

I'll send you a follow-up patch I've been working on.

BR
Paran Lee

2022-04-25 오전 1:17에 Julien Grall 이(가) 쓴 글:
> Hi,
> 
> On 21/04/2022 16:17, Paran Lee wrote:
>> It doesn't seem necessary to do that calculation of order shift again.
> 
> I think we need to weight that against increasing the number of local
> variables that do pretty much the same.
> 
> This is pretty much done to a matter of taste here. IMHO, the original
> version is better but I see Stefano reviewed it so I will not argue
> against it.
> 
> That said, given you already sent a few patches, can you explain why you
> are doing this? Is this optimization purpose? Is it clean-up?
> 
>>
>> Signed-off-by: Paran Lee <p4ranlee@xxxxxxxxx>
>> ---
>>   xen/arch/arm/p2m.c | 9 +++++----
>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 1d1059f7d2..533afc830a 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
>>       while ( nr )
>>       {
>>           unsigned long mask;
>> -        unsigned long order;
>> +        unsigned long order, pages;
>>             /*
>>            * Don't take into account the MFN when removing mapping (i.e
>> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
>>           if ( rc )
>>               break;
>>   -        sgfn = gfn_add(sgfn, (1 << order));
>> +        pages = 1 << order;
> 
> Please take the opportunity to switch to 1UL.
> 
>> +        sgfn = gfn_add(sgfn, pages);
>>           if ( !mfn_eq(smfn, INVALID_MFN) )
>> -           smfn = mfn_add(smfn, (1 << order));
>> +           smfn = mfn_add(smfn, pages);
>>   -        nr -= (1 << order);
>> +        nr -= pages;
>>       }
>>         return rc;
> 
> Cheers,
>

Follow-Ups:
- Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
  - From: Julien Grall

References:
- [PATCH] xen/arm: p2m_set_entry duplicate calculation.
  - From: Paran Lee
- Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
  - From: Julien Grall

Prev by Date: Re: [PATCH] xen/public: add new macro to ring.h
Next by Date: [PATCH v2] xen/arm: p2m_set_entry reuse mask variables
Previous by thread: Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
Next by thread: Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.