[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v10 02/14] xen: introduce generic non-atomic test_*bit()



On Fri, 2024-05-24 at 09:35 +0200, Jan Beulich wrote:
> On 24.05.2024 09:25, Oleksii K. wrote:
> > On Fri, 2024-05-24 at 08:48 +0200, Jan Beulich wrote:
> > > On 23.05.2024 18:40, Oleksii K. wrote:
> > > > On Thu, 2024-05-23 at 15:33 +0100, Julien Grall wrote:
> > > > > On 23/05/2024 15:11, Oleksii K. wrote:
> > > > > > On Thu, 2024-05-23 at 14:00 +0100, Julien Grall wrote:
> > > > > > > On 17/05/2024 14:54, Oleksii Kurochko wrote:
> > > > > > > > diff --git a/xen/arch/arm/arm64/livepatch.c
> > > > > > > > b/xen/arch/arm/arm64/livepatch.c
> > > > > > > > index df2cebedde..4bc8ed9be5 100644
> > > > > > > > --- a/xen/arch/arm/arm64/livepatch.c
> > > > > > > > +++ b/xen/arch/arm/arm64/livepatch.c
> > > > > > > > @@ -10,7 +10,6 @@
> > > > > > > >    #include <xen/mm.h>
> > > > > > > >    #include <xen/vmap.h>
> > > > > > > >    
> > > > > > > > -#include <asm/bitops.h>
> > > > > > > 
> > > > > > > It is a bit unclear how this change is related to the
> > > > > > > patch.
> > > > > > > Can
> > > > > > > you
> > > > > > > explain in the commit message?
> > > > > > Probably it doesn't need anymore. I will double check and
> > > > > > if
> > > > > > this
> > > > > > change is not needed, I will just drop it in the next patch
> > > > > > version.
> > > > > > 
> > > > > > > 
> > > > > > > >    #include <asm/byteorder.h>
> > > > > > > >    #include <asm/insn.h>
> > > > > > > >    #include <asm/livepatch.h>
> > > > > > > > diff --git a/xen/arch/arm/include/asm/bitops.h
> > > > > > > > b/xen/arch/arm/include/asm/bitops.h
> > > > > > > > index 5104334e48..8e16335e76 100644
> > > > > > > > --- a/xen/arch/arm/include/asm/bitops.h
> > > > > > > > +++ b/xen/arch/arm/include/asm/bitops.h
> > > > > > > > @@ -22,9 +22,6 @@
> > > > > > > >    #define __set_bit(n,p)            set_bit(n,p)
> > > > > > > >    #define __clear_bit(n,p)          clear_bit(n,p)
> > > > > > > >    
> > > > > > > > -#define BITOP_BITS_PER_WORD     32
> > > > > > > > -#define BITOP_MASK(nr)          (1UL << ((nr) %
> > > > > > > > BITOP_BITS_PER_WORD))
> > > > > > > > -#define BITOP_WORD(nr)          ((nr) /
> > > > > > > > BITOP_BITS_PER_WORD)
> > > > > > > >    #define BITS_PER_BYTE           8
> > > > > > > 
> > > > > > > OOI, any reason BITS_PER_BYTE has not been moved as well?
> > > > > > > I
> > > > > > > don't
> > > > > > > expect
> > > > > > > the value to change across arch.
> > > > > > I can move it to generic one header too in the next patch
> > > > > > version.
> > > > > > 
> > > > > > > 
> > > > > > > [...]
> > > > > > > 
> > > > > > > > diff --git a/xen/include/xen/bitops.h
> > > > > > > > b/xen/include/xen/bitops.h
> > > > > > > > index f14ad0d33a..6eeeff0117 100644
> > > > > > > > --- a/xen/include/xen/bitops.h
> > > > > > > > +++ b/xen/include/xen/bitops.h
> > > > > > > > @@ -65,10 +65,141 @@ static inline int
> > > > > > > > generic_flsl(unsigned
> > > > > > > > long
> > > > > > > > x)
> > > > > > > >     * scope
> > > > > > > >     */
> > > > > > > >    
> > > > > > > > +#define BITOP_BITS_PER_WORD 32
> > > > > > > > +typedef uint32_t bitop_uint_t;
> > > > > > > > +
> > > > > > > > +#define BITOP_MASK(nr)  ((bitop_uint_t)1 << ((nr) %
> > > > > > > > BITOP_BITS_PER_WORD))
> > > > > > > > +
> > > > > > > > +#define BITOP_WORD(nr)  ((nr) / BITOP_BITS_PER_WORD)
> > > > > > > > +
> > > > > > > > +extern void __bitop_bad_size(void);
> > > > > > > > +
> > > > > > > > +#define bitop_bad_size(addr) (sizeof(*(addr)) <
> > > > > > > > sizeof(bitop_uint_t))
> > > > > > > > +
> > > > > > > >    /* --------------------- Please tidy above here ----
> > > > > > > > ----
> > > > > > > > ----
> > > > > > > > -----
> > > > > > > > ---- */
> > > > > > > >    
> > > > > > > >    #include <asm/bitops.h>
> > > > > > > >    
> > > > > > > > +/**
> > > > > > > > + * generic__test_and_set_bit - Set a bit and return
> > > > > > > > its
> > > > > > > > old
> > > > > > > > value
> > > > > > > > + * @nr: Bit to set
> > > > > > > > + * @addr: Address to count from
> > > > > > > > + *
> > > > > > > > + * This operation is non-atomic and can be reordered.
> > > > > > > > + * If two examples of this operation race, one can
> > > > > > > > appear
> > > > > > > > to
> > > > > > > > succeed
> > > > > > > > + * but actually fail.  You must protect multiple
> > > > > > > > accesses
> > > > > > > > with
> > > > > > > > a
> > > > > > > > lock.
> > > > > > > > + */
> > > > > > > 
> > > > > > > Sorry for only mentioning this on v10. I think this
> > > > > > > comment
> > > > > > > should be
> > > > > > > duplicated (or moved to) on top of test_bit() because
> > > > > > > this is
> > > > > > > what
> > > > > > > everyone will use. This will avoid the developper to
> > > > > > > follow
> > > > > > > the
> > > > > > > function
> > > > > > > calls and only notice the x86 version which says "This
> > > > > > > function
> > > > > > > is
> > > > > > > atomic and may not be reordered." and would be wrong for
> > > > > > > all
> > > > > > > the
> > > > > > > other arch.
> > > > > > It makes sense to add this comment on top of test_bit(),
> > > > > > but I
> > > > > > am
> > > > > > curious if it is needed to mention that for x86
> > > > > > arch_test_bit()
> > > > > > "is
> > > > > > atomic and may not be reordered":
> > > > > 
> > > > > I would say no because any developper modifying common code
> > > > > can't
> > > > > relying it.
> > > > > 
> > > > > > 
> > > > > >   * This operation is non-atomic and can be reordered. (
> > > > > > Exception:
> > > > > > for
> > > > > > * x86 arch_test_bit() is atomic and may not be reordered )
> > > > > >   * If two examples of this operation race, one can appear
> > > > > > to
> > > > > > succeed
> > > > > >   * but actually fail.  You must protect multiple accesses
> > > > > > with
> > > > > > a
> > > > > > lock.
> > > > > >   */
> > > > > > 
> > > > > > > 
> > > > > > > > +static always_inline bool
> > > > > > > > +generic__test_and_set_bit(int nr, volatile void *addr)
> > > > > > > > +{
> > > > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t
> > > > > > > > *)addr +
> > > > > > > > BITOP_WORD(nr);
> > > > > > > > +    bitop_uint_t old = *p;
> > > > > > > > +
> > > > > > > > +    *p = old | mask;
> > > > > > > > +    return (old & mask);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/**
> > > > > > > > + * generic__test_and_clear_bit - Clear a bit and
> > > > > > > > return
> > > > > > > > its
> > > > > > > > old
> > > > > > > > value
> > > > > > > > + * @nr: Bit to clear
> > > > > > > > + * @addr: Address to count from
> > > > > > > > + *
> > > > > > > > + * This operation is non-atomic and can be reordered.
> > > > > > > > + * If two examples of this operation race, one can
> > > > > > > > appear
> > > > > > > > to
> > > > > > > > succeed
> > > > > > > > + * but actually fail.  You must protect multiple
> > > > > > > > accesses
> > > > > > > > with
> > > > > > > > a
> > > > > > > > lock.
> > > > > > > > + */
> > > > > > > 
> > > > > > > Same applies here and ...
> > > > > > > 
> > > > > > > > +static always_inline bool
> > > > > > > > +generic__test_and_clear_bit(int nr, volatile void
> > > > > > > > *addr)
> > > > > > > > +{
> > > > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t
> > > > > > > > *)addr +
> > > > > > > > BITOP_WORD(nr);
> > > > > > > > +    bitop_uint_t old = *p;
> > > > > > > > +
> > > > > > > > +    *p = old & ~mask;
> > > > > > > > +    return (old & mask);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +/* WARNING: non atomic and it can be reordered! */
> > > > > > > 
> > > > > > > ... here.
> > > > > > > 
> > > > > > > > +static always_inline bool
> > > > > > > > +generic__test_and_change_bit(int nr, volatile void
> > > > > > > > *addr)
> > > > > > > > +{
> > > > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > > > +    volatile bitop_uint_t *p = (volatile bitop_uint_t
> > > > > > > > *)addr +
> > > > > > > > BITOP_WORD(nr);
> > > > > > > > +    bitop_uint_t old = *p;
> > > > > > > > +
> > > > > > > > +    *p = old ^ mask;
> > > > > > > > +    return (old & mask);
> > > > > > > > +}
> > > > > > > > +/**
> > > > > > > > + * generic_test_bit - Determine whether a bit is set
> > > > > > > > + * @nr: bit number to test
> > > > > > > > + * @addr: Address to start counting from
> > > > > > > > + */
> > > > > > > > +static always_inline bool generic_test_bit(int nr,
> > > > > > > > const
> > > > > > > > volatile
> > > > > > > > void *addr)
> > > > > > > > +{
> > > > > > > > +    bitop_uint_t mask = BITOP_MASK(nr);
> > > > > > > > +    const volatile bitop_uint_t *p =
> > > > > > > > +                        (const volatile bitop_uint_t
> > > > > > > > *)addr +
> > > > > > > > BITOP_WORD(nr);
> > > > > > > > +
> > > > > > > > +    return (*p & mask);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static always_inline bool
> > > > > > > > +__test_and_set_bit(int nr, volatile void *addr)
> > > > > > > > +{
> > > > > > > > +#ifndef arch__test_and_set_bit
> > > > > > > > +#define arch__test_and_set_bit
> > > > > > > > generic__test_and_set_bit
> > > > > > > > +#endif
> > > > > > > > +
> > > > > > > > +    return arch__test_and_set_bit(nr, addr);
> > > > > > > > +}
> > > > > > > 
> > > > > > > NIT: It is a bit too late to change this one. But I have
> > > > > > > to
> > > > > > > admit, I
> > > > > > > don't understand the purpose of the static inline when
> > > > > > > you
> > > > > > > could
> > > > > > > have
> > > > > > > simply call...
> > > > > > > 
> > > > > > > > +#define __test_and_set_bit(nr, addr) ({             \
> > > > > > > > +    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
> > > > > > > > +    __test_and_set_bit(nr, addr);                   \
> > > > > > > 
> > > > > > > ... __arch__test_and_set_bit here.
> > > > > > I just wanted to be in sync with other non-atomic
> > > > > > test*_bit()
> > > > > > operations and Andrew's patch series ( xen/bitops: Reduce
> > > > > > the
> > > > > > mess,
> > > > > > starting with ffs() ).
> > > > > 
> > > > > I looked at Andrew series and I can't seem to find an example
> > > > > where
> > > > > he 
> > > > > uses both the macro + static inline. He seems only use a
> > > > > static
> > > > > inline.
> > > > > Do you have any pointer?
> > > > > 
> > > > > But by any chance are you referring to the pattern on x86? If
> > > > > so,
> > > > > I 
> > > > > would really like to understand what's the benefits of
> > > > > introducing a
> > > > > a 
> > > > > one-liner static inline which is "shadowed" by a macro...
> > > > Yes, I was referring to the x86 pattern.
> > > > 
> > > > I tried to find the reason in the commit but couldn't:
> > > > https://lists.xenproject.org/archives/html/xen-changelog/2008-03/msg00097.html
> > > > 
> > > > For some reason, I also couldn't find the mailing list thread
> > > > for
> > > > this.
> > > > 
> > > > Perhaps Jan could help here, based on the commit message he
> > > > might
> > > > have
> > > > a suggestion.
> > > 
> > > There's a lot of apparently unrelated context left, so I'm trying
> > > to
> > > guess
> > > on what the actual question is, from the old change you're
> > > pointing
> > > at: With
> > > 
> > > #define __test_and_set_bit(nr, addr) ({             \
> > >    if ( bitop_bad_size(addr) ) __bitop_bad_size(); \
> > >    __test_and_set_bit(nr, addr);                   \
> > > 
> > > why do we have that wrapping macro? If that's indeed the
> > > question,
> > > then may
> > > I suggest you consider what would happen to the sizeof() in
> > > bitop_bad_size()
> > > if the invocation was moved to the inline function, taking
> > > pointer-
> > > to-void?
> > > 
> > > However, I can't relate that old change to the question I think
> > > Julien
> > > raised (invoking __test_and_set_bit() vs
> > > arch__test_and_set_bit()),
> > > so
> > > perhaps I'm missing something.
> > If I understand the question correctly then the question is why do
> > we
> > need static inline function. Can't we just do the following:
> > 
> > #ifndef arch_test_bit
> > #define arch_test_bit generic_test_bit
> > #endif
> > 
> > #define test_bit(nr, addr) ({                           \
> >     if ( bitop_bad_size(addr) ) __bitop_bad_size();     \
> >     arch_test_bit(nr, addr);                            \
> > })
> 
> I think we can. But then how did that old commit come into play?
In the old commit, this approach was introduced where we have a 'static
inline function' called within a macro, but I only now realized that it
was done solely because of the check "if (bitop_bad_size(addr))
__bitop_bad_size();".

As far as I understand, Julien doesn't insist on dropping the 'static
inline function' and using arch_test_bit() directly in the test_bit()
macro, so I prefer to use the option implemented in this patch.
However, if anyone has objections to this approach and thinks it would
be better to drop the 'static inline functions' and call arch_*()
directly in a macro, I am okay with reworking it today and sending a
new patch version where this changes will be taking into account.

Anyway, I will send a new version of this patch today as Julien asked
me to add comments on top of the functions generic_test_bit() and
generic__test_and_change_bit(). Additionally, "#define BITS_PER_BYTE 8"
will be moved to xen/bitops.h as it seems to be always the same for
every architecture.

~ Oleksii



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.