|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/arm: p2m_set_entry duplicate calculation.
Hello, Julien Grall.
Thanks you, I agreed! It made me think once more about what my patch
could improve.
patches I sent have been reviewed in various ways. It was a good
opportunity to analyze my patch from various perspectives. :)
I checked objdump in -O2 optimization(default) of Xen Makefile to make
sure CSE (Common subexpression elimination) works well on the latest
arm64 cross compiler on x86_64 from Arm GNU Toolchain.
$
~/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu-gcc
-v
...
A-profile Architecture 10.3-2021.07 (arm-10.29)'
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 10.3.1 20210621 (GNU Toolchain for the A-profile
Architecture 10.3-2021.07 (arm-10.29)
I compared the before and after my patches. This time, without adding a
"pages" variable, I proceeded to use the local variable mask with order
operation.
I was able to confirm that it does one less operation.
(1) before clean up
0000000000001bb4 <p2m_set_entry>:
while ( nr )
1bb4: b40005e2 cbz x2, 1c70 <p2m_set_entry+0xbc>
{
...
if ( rc )
1c1c: 350002e0 cbnz w0, 1c78 <p2m_set_entry+0xc4>
sgfn = gfn_add(sgfn, (1 << order));
1c20: 1ad32373 lsl w19, w27, w19 // <<< CES works
1c24: 93407e73 sxtw x19, w19 // <<< well!
return _gfn(gfn_x(gfn) + i);
1c28: 8b1302d6 add x22, x22, x19
return _mfn(mfn_x(mfn) + i);
1c2c: 8b130281 add x1, x20, x19
1c30: b100069f cmn x20, #0x1
1c34: 9a941034 csel x20, x1, x20, ne // ne = any
while ( nr )
1c38: eb1302b5 subs x21, x21, x19
1c3c: 540001e0 b.eq 1c78 <p2m_set_entry+0xc4> // b.none
(2) Using again mask variable. mask = 1UL << order
code show me sxtw x19, w19 operation disappeared.
0000000000001bb4 <p2m_set_entry>:
while ( nr )
1bb4: b40005c2 cbz x2, 1c6c <p2m_set_entry+0xb8>
{
...
if ( rc )
1c1c: 350002c0 cbnz w0, 1c74 <p2m_set_entry+0xc0>
mask = 1UL << order;
1c20: 9ad32373 lsl x19, x27, x19 // <<< only lsl
return _gfn(gfn_x(gfn) + i);
1c24: 8b1302d6 add x22, x22, x19
return _mfn(mfn_x(mfn) + i);
1c28: 8b130281 add x1, x20, x19
1c2c: b100069f cmn x20, #0x1
1c30: 9a941034 csel x20, x1, x20, ne // ne = any
while ( nr )
1c34: eb1302b5 subs x21, x21, x19
1c38: 540001e0 b.eq 1c74 <p2m_set_entry+0xc0> // b.none
I'll send you a follow-up patch I've been working on.
BR
Paran Lee
2022-04-25 오전 1:17에 Julien Grall 이(가) 쓴 글:
> Hi,
>
> On 21/04/2022 16:17, Paran Lee wrote:
>> It doesn't seem necessary to do that calculation of order shift again.
>
> I think we need to weight that against increasing the number of local
> variables that do pretty much the same.
>
> This is pretty much done to a matter of taste here. IMHO, the original
> version is better but I see Stefano reviewed it so I will not argue
> against it.
>
> That said, given you already sent a few patches, can you explain why you
> are doing this? Is this optimization purpose? Is it clean-up?
>
>>
>> Signed-off-by: Paran Lee <p4ranlee@xxxxxxxxx>
>> ---
>> xen/arch/arm/p2m.c | 9 +++++----
>> 1 file changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index 1d1059f7d2..533afc830a 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1092,7 +1092,7 @@ int p2m_set_entry(struct p2m_domain *p2m,
>> while ( nr )
>> {
>> unsigned long mask;
>> - unsigned long order;
>> + unsigned long order, pages;
>> /*
>> * Don't take into account the MFN when removing mapping (i.e
>> @@ -1118,11 +1118,12 @@ int p2m_set_entry(struct p2m_domain *p2m,
>> if ( rc )
>> break;
>> - sgfn = gfn_add(sgfn, (1 << order));
>> + pages = 1 << order;
>
> Please take the opportunity to switch to 1UL.
>
>> + sgfn = gfn_add(sgfn, pages);
>> if ( !mfn_eq(smfn, INVALID_MFN) )
>> - smfn = mfn_add(smfn, (1 << order));
>> + smfn = mfn_add(smfn, pages);
>> - nr -= (1 << order);
>> + nr -= pages;
>> }
>> return rc;
>
> Cheers,
>
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |