|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v6 08/20] xen/riscv: introduce cmpxchg.h
On 15.03.2024 19:06, Oleksii Kurochko wrote:
> The header was taken from Linux kernl 6.4.0-rc1.
>
> Addionally, were updated:
> * add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic
> access.
> * replace tabs with spaces
> * replace __* variale with *__
> * introduce generic version of xchg_* and cmpxchg_*.
> * drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them
With this, ...
> * drop barries and use instruction suffixices instead ( .aq, .rl, .aqrl )
>
> Implementation of 4- and 8-byte cases were updated according to the spec:
> ```
> ....
> Linux Construct RVWMO AMO Mapping
> atomic <op> relaxed amo<op>.{w|d}
> atomic <op> acquire amo<op>.{w|d}.aq
> atomic <op> release amo<op>.{w|d}.rl
> atomic <op> amo<op>.{w|d}.aqrl
> Linux Construct RVWMO LR/SC Mapping
> atomic <op> relaxed loop: lr.{w|d}; <op>; sc.{w|d}; bnez loop
> atomic <op> acquire loop: lr.{w|d}.aq; <op>; sc.{w|d}; bnez loop
> atomic <op> release loop: lr.{w|d}; <op>; sc.{w|d}.aqrl∗ ; bnez loop OR
> fence.tso; loop: lr.{w|d}; <op>; sc.{w|d}∗ ; bnez loop
> atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; bnez loop
>
> Table A.5: Mappings from Linux memory primitives to RISC-V primitives
>
> ```
... I consider quoting this table in full, without any further remarks, as
confusing: Three of the lines each are inapplicable now, aiui.
Further what are the two * telling us? Quite likely they aren't there just
accidentally.
Finally, why sc.{w|d}.aqrl when in principle one would expect just
sc.{w|d}.rl?
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/cmpxchg.h
> @@ -0,0 +1,209 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (C) 2014 Regents of the University of California */
> +
> +#ifndef _ASM_RISCV_CMPXCHG_H
> +#define _ASM_RISCV_CMPXCHG_H
> +
> +#include <xen/compiler.h>
> +#include <xen/lib.h>
> +
> +#include <asm/fence.h>
> +#include <asm/io.h>
> +#include <asm/system.h>
> +
> +#define __amoswap_generic(ptr, new, ret, sfx) \
As before / elsewhere: Is there a strong need for two leading underscores
here? Using just one would already be standard compliant afaict.
> +({ \
> + asm volatile ( \
> + " amoswap" sfx " %0, %2, %1" \
> + : "=r" (ret), "+A" (*ptr) \
> + : "r" (new) \
> + : "memory" ); \
> +})
This doesn't need the ({ }) (anymore?):
#define __amoswap_generic(ptr, new, ret, sfx) \
asm volatile ( \
" amoswap" sfx " %0, %2, %1" \
: "=r" (ret), "+A" (*(ptr)) \
: "r" (new) \
: "memory" )
(note also the added parentheses).
> +/*
> + * For LR and SC, the A extension requires that the address held in rs1 be
> + * naturally aligned to the size of the operand (i.e., eight-byte aligned
> + * for 64-bit words and four-byte aligned for 32-bit words).
> + * If the address is not naturally aligned, an address-misaligned exception
> + * or an access-fault exception will be generated.
> + *
> + * Thereby:
> + * - for 1-byte xchg access the containing word by clearing low two bits
> + * - for 2-byte xchg ccess the containing word by clearing bit 1.
Nit: "access"
> + * If resulting 4-byte access is still misalgined, it will fault just as
> + * non-emulated 4-byte access would.
> + */
> +#define emulate_xchg_1_2(ptr, new, lr_sfx, sc_sfx) \
> +({ \
> + uint32_t *aligned_ptr = (uint32_t *)((unsigned long)ptr & ~(0x4 -
> sizeof(*(ptr)))); \
> + unsigned int new_val_pos = ((unsigned long)(ptr) & (0x4 -
> sizeof(*(ptr)))) * BITS_PER_BYTE; \
You parenthesize ptr here correctly, but not in the line above.
Instead of "_pos" in the name, maybe better "_bit"?
Finally, here and elsewhere, please limit line length to 80 chars. (Omitting
the 0x here would help a little, but not quite enough. Question is whether
these wouldn't better be sizeof(*aligned_ptr) anyway.)
> + unsigned long mask = GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) - 1, 0)
> << new_val_pos; \
> + unsigned int new_ = new << new_val_pos; \
Similarly new wants parenthesizing here.
> + unsigned int old; \
> + unsigned int scratch; \
> + \
> + asm volatile ( \
> + "0: lr.w" lr_sfx " %[old], %[aligned_ptr]\n" \
> + " and %[scratch], %[old], %z[nmask]\n" \
> + " or %[scratch], %[scratch], %z[new_]\n" \
> + " sc.w" sc_sfx " %[scratch], %[scratch], %[aligned_ptr]\n" \
> + " bnez %[scratch], 0b\n" \
> + : [old] "=&r" (old), [scratch] "=&r" (scratch), [aligned_ptr] "+A"
> (*aligned_ptr) \
While for the variable name aligned_ptr is likely helpful, for the operand
name just ptr would certainly do?
> + : [new_] "rJ" (new_), [nmask] "rJ" (~mask) \
Neither mask nor ~mask can be 0. Hence J here and the z modifier above
look pointless. (new_, otoh, can be 0, so allowing x0 to be used in that
case is certainly desirable.)
As to using ~mask here: Now that we look to have settled on requiring
Zbb, you could use andn instead of and, thus allowing the same register
to be used in the asm() and ...
> + : "memory" ); \
> + \
> + (__typeof__(*(ptr)))((old & mask) >> new_val_pos); \
... for this calculation.
> +})
> +
> +static always_inline unsigned long __xchg(volatile void *ptr, unsigned long
> new, int size)
> +{
> + unsigned long ret;
> +
> + switch ( size )
> + {
> + case 1:
> + ret = emulate_xchg_1_2((volatile uint8_t *)ptr, new, ".aq", ".aqrl");
> + break;
> + case 2:
> + ret = emulate_xchg_1_2((volatile uint16_t *)ptr, new, ".aq",
> ".aqrl");
> + break;
> + case 4:
> + __amoswap_generic((volatile uint32_t *)ptr, new, ret, ".w.aqrl");
> + break;
> +#ifndef CONFIG_32BIT
There's no 32BIT Kconfig symbol; all we have is a 64BIT one.
> + case 8:
> + __amoswap_generic((volatile uint64_t *)ptr, new, ret, ".d.aqrl");
> + break;
> +#endif
> + default:
> + STATIC_ASSERT_UNREACHABLE();
> + }
> +
> + return ret;
> +}
> +
> +#define xchg(ptr, x) \
> +({ \
> + __typeof__(*(ptr)) n_ = (x); \
> + (__typeof__(*(ptr))) \
> + __xchg((ptr), (unsigned long)(n_), sizeof(*(ptr))); \
Nit: While excess parentheses "only" harm readability, they would
nevertheless better be omitted (here: the first argument passed).
> +})
> +
> +#define __generic_cmpxchg(ptr, old, new, ret, lr_sfx, sc_sfx) \
> + ({ \
> + register unsigned int rc; \
Nit: We don't normally use "register", unless accompanied by asm() tying
a variable to a specific one.
> + __typeof__(*(ptr)) old__ = (__typeof__(*(ptr)))(old); \
> + __typeof__(*(ptr)) new__ = (__typeof__(*(ptr)))(new); \
The casts aren't very nice to have here; I take they're needed for
cmpxchg_ptr() to compile?
> + asm volatile( \
Nit: Missing blank once again. Would be really nice if you could go
through and sort this uniformly for the series.
> + "0: lr" lr_sfx " %0, %2\n" \
> + " bne %0, %z3, 1f\n" \
> + " sc" sc_sfx " %1, %z4, %2\n" \
> + " bnez %1, 0b\n" \
> + "1:\n" \
> + : "=&r" (ret), "=&r" (rc), "+A" (*ptr) \
> + : "rJ" (old__), "rJ" (new__) \
Please could I talk you into using named operands here, too?
Also ptr here is lacking parentheses again.
> + : "memory"); \
And yet another missing blank.
> + })
At the use site this construct having a normal return value (rather
than ret being passed in) would overall look more natural.
> +/*
> + * For LR and SC, the A extension requires that the address held in rs1 be
> + * naturally aligned to the size of the operand (i.e., eight-byte aligned
> + * for 64-bit words and four-byte aligned for 32-bit words).
> + * If the address is not naturally aligned, an address-misaligned exception
> + * or an access-fault exception will be generated.
> + *
> + * Thereby:
> + * - for 1-byte xchg access the containing word by clearing low two bits
> + * - for 2-byte xchg ccess the containing word by clearing first bit.
> + *
> + * If resulting 4-byte access is still misalgined, it will fault just as
> + * non-emulated 4-byte access would.
> + *
> + * old_val was casted to unsigned long for cmpxchgptr()
> + */
> +#define emulate_cmpxchg_1_2(ptr, old, new, lr_sfx, sc_sfx) \
> +({ \
> + uint32_t *aligned_ptr = (uint32_t *)((unsigned long)ptr & ~(0x4 -
> sizeof(*(ptr)))); \
> + uint8_t new_val_pos = ((unsigned long)(ptr) & (0x4 - sizeof(*(ptr)))) *
> BITS_PER_BYTE; \
> + unsigned long mask = GENMASK(((sizeof(*(ptr))) * BITS_PER_BYTE) - 1, 0)
> << new_val_pos; \
> + unsigned int old_ = old << new_val_pos; \
> + unsigned int new_ = new << new_val_pos; \
> + unsigned int old_val; \
> + unsigned int scratch; \
> + \
> + __asm__ __volatile__ ( \
> + "0: lr.w" lr_sfx " %[scratch], %[aligned_ptr]\n" \
> + " and %[old_val], %[scratch], %z[mask]\n" \
> + " bne %[old_val], %z[old_], 1f\n" \
> + " xor %[scratch], %[old_val], %[scratch]\n" \
To be honest I was hoping this line would come with a brief comment.
> + " or %[scratch], %[scratch], %z[new_]\n" \
> + " sc.w" sc_sfx " %[scratch], %[scratch], %[aligned_ptr]\n" \
> + " bnez %[scratch], 0b\n" \
> + "1:\n" \
> + : [old_val] "=&r" (old_val), [scratch] "=&r" (scratch),
> [aligned_ptr] "+A" (*aligned_ptr) \
> + : [old_] "rJ" (old_), [new_] "rJ" (new_), \
> + [mask] "rJ" (mask) \
> + : "memory" ); \
> + \
> + (__typeof__(*(ptr)))((unsigned long)old_val >> new_val_pos); \
> +})
A few of the comments for emulate_xchg_1_2() apply here as well.
> +/*
> + * Atomic compare and exchange. Compare OLD with MEM, if identical,
> + * store NEW in MEM. Return the initial value in MEM. Success is
> + * indicated by comparing RETURN with OLD.
> + */
> +static always_inline unsigned long __cmpxchg(volatile void *ptr,
> + unsigned long old,
> + unsigned long new,
> + int size)
Nit: Inappropriate indentation.
Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |