[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 12/30] xen/riscv: introduce cmpxchg.h
On 26.02.2024 12:18, Oleksii wrote: > On Mon, 2024-02-26 at 10:45 +0100, Jan Beulich wrote: >> On 23.02.2024 13:23, Oleksii wrote: >>>> >>>>>>> As 1- and 2-byte cases are emulated I decided that is not >>>>>>> to >>>>>>> provide >>>>>>> sfx argument for emulation macros as it will not have to >>>>>>> much >>>>>>> affect on >>>>>>> emulated types and just consume more performance on acquire >>>>>>> and >>>>>>> release >>>>>>> version of sc/ld instructions. >>>>>> >>>>>> Question is whether the common case (4- and 8-byte accesses) >>>>>> shouldn't >>>>>> be valued higher, with 1- and 2-byte emulation being there >>>>>> just >>>>>> to >>>>>> allow things to not break altogether. >>>>> If I understand you correctly, it would make sense to add the >>>>> 'sfx' >>>>> argument for the 1/2-byte access case, ensuring that all >>>>> options >>>>> are >>>>> available for 1/2-byte access case as well. >>>> >>>> That's one of the possibilities. As said, I'm not overly worried >>>> about >>>> the emulated cases. For the initial implementation I'd recommend >>>> going >>>> with what is easiest there, yielding the best possible result for >>>> the >>>> 4- and 8-byte cases. If later it turns out repeated >>>> acquire/release >>>> accesses are a problem in the emulation loop, things can be >>>> changed >>>> to explicit barriers, without touching the 4- and 8-byte cases. >>> I am confused then a little bit if emulated case is not an issue. >>> >>> For 4- and 8-byte cases for xchg .aqrl is used, for relaxed and >>> aqcuire >>> version of xchg barries are used. >>> >>> The similar is done for cmpxchg. >>> >>> If something will be needed to change in emulation loop it won't >>> require to change 4- and 8-byte cases. >> >> I'm afraid I don't understand your reply. > IIUC, emulated cases it is implemented correctly in terms of usage > barriers. And it also OK not to use sfx for lr/sc instructions and use > only barriers. > > For 4- and 8-byte cases are used sfx + barrier depending on the > specific case ( relaxed, acquire, release, generic xchg/cmpxchg ). > What also looks to me correct. But you suggested to provide the best > possible result for 4- and 8-byte cases. > > So I don't understand what the best possible result is as the current > one usage of __{cmp}xchg_generic for each specific case ( relaxed, > acquire, release, generic xchg/cmpxchg ) looks correct to me: > xchg -> (..., ".aqrl", "", "") just suffix .aqrl suffix without > barriers. > xchg_release -> (..., "", RISCV_RELEASE_BARRIER, "" ) use only release > barrier > xchg_acquire -> (..., "", "", RISCV_ACQUIRE_BARRIER ), only acquire > barrier > xchg_relaxed -> (..., "", "", "") - no barries, no sfx So first: While explicit barriers are technically okay, I don't follow why you insist on using them when you can achieve the same by suitably tagging the actual insn doing the exchange. Then second: It's somewhat hard for me to see the final effect on the emulation paths without you actually having done the switch. Maybe no special handling is necessary there anymore then. And as said, it may actually be acceptable for the emulation paths to "only" be correct, but not be ideal in terms of performance. After all, if you use the normal 4-byte primitive in there, more (non-explicit) barriers than needed would occur if the involved loop has to take more than one iteration. Which could (but imo doesn't need to be) avoided by using a more relaxed 4-byte primitive there and an explicit barrier outside of the loop. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |