[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCHv5 1/3] rwlock: Add per-cpu reader-writer lock infrastructure
On 19/01/16 10:29, Malcolm Crossley wrote: > On 11/01/16 15:06, Malcolm Crossley wrote: >> On 22/12/15 11:56, George Dunlap wrote: >>> On 18/12/15 16:08, Malcolm Crossley wrote: >>>> <snip> >>>> + >>>> +#ifndef NDEBUG >>>> +#define PERCPU_RW_LOCK_UNLOCKED(owner) { RW_LOCK_UNLOCKED, 0, owner } >>>> +static inline void _percpu_rwlock_owner_check(percpu_rwlock_t >>>> **per_cpudata, >>>> + percpu_rwlock_t *percpu_rwlock) >>>> +{ >>>> + ASSERT(per_cpudata == percpu_rwlock->percpu_owner); >>>> +} >>>> +#else >>>> +#define PERCPU_RW_LOCK_UNLOCKED(owner) { RW_LOCK_UNLOCKED, 0 } >>>> +#define _percpu_rwlock_owner_check(data, lock) ((void)0) >>>> +#endif >>>> + >>>> +#define DEFINE_PERCPU_RWLOCK_RESOURCE(l, owner) \ >>>> + percpu_rwlock_t l = PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner)) >>>> +#define percpu_rwlock_resource_init(l, owner) \ >>>> + (*(l) = >>>> (percpu_rwlock_t)PERCPU_RW_LOCK_UNLOCKED(&get_per_cpu_var(owner))) >>>> + >>>> +static inline void _percpu_read_lock(percpu_rwlock_t **per_cpudata, >>>> + percpu_rwlock_t *percpu_rwlock) >>> >>> Is there a particular reason you chose to only use the "owner" value in >>> the struct to verify that the "per_cpudata" argument passed matched the >>> one you expected, rather than just getting rid of the "per_cpudata" >>> argument altogether and always using the pointer in the struct? >> >> Initially I was aiming to add percpu aspects to the rwlock without increasing >> the size of the rwlock structure itself, this was to keep data cache usage >> and >> memory allocations the same. >> It became clear that having a global writer_activating barrier would cause >> the >> read_lock to enter the slow path far too often. So I put the >> writer_activating >> variable in the percpu_rwlock_t, as writer_activating is just a bool then the >> additional data overhead should be small. Always having a 8 byte pointer may >> add a lot of overhead to data structures contain multiple rwlocks and thus >> cause additional allocation overhead. >>> >>> (i.e., _percpu_read_lock(percpu_rwlock_t *percpu_rwlock) { ... >>> per_cpudata = percpu_rwlock->percpu_owner; ... }) >>> >>> I'm not an expert in this sort of micro-optimization, but it seems like >>> you're trading off storing a pointer in your rwlock struct for storing a >>> pointer at every call site. Since you have to read writer_activating >>> for every lock or unlock anyway, >> >> writer_activating is not read on the read_unlock path. As these are rwlocks >> then I'm assuming the read lock/unlock paths are more critical for >> performance. >> So I'd prefer to not do a read of the percpu_rwlock structure if it's not >> required (i.e. on the read unlock path) >> Furthermore, the single byte for the writer_activating variable is likely >> to have been read into cache by accesses to other parts of the data structure >> near the percpu_rwlock_t. If we add additional 8 bytes to the percpu_rwlock_t >> then this may not happen and it may also adjust the cache line alignment >> aswell. >> >>> it doesn't seem like you'd actually be >>> saving that many memory fetches; but having only one copy in the cache, >>> rather than one copy per call site, would on the whole reduce both the >>> cache footprint and the total memory used (if only by a few bytes). >> >> If you put the owner pointer in the percpu_rwlock_t then wouldn't you have >> a copy per instance of percpu_rwlock_t? Surely this would use more cache than >> the handful of call site references to a global variable. >> >>> >>> It also makes the code cleaner to have only one argument, rather than >>> two which must match; but since in all the places you use it you end up >>> using a wrapper to give you a single argument anyway, I don't think that >>> matters in this case. (i.e., if there's a good reason for having it at >>> the call site instead if in the struct, I'm fine with this approach). >> >> If you agree with my reasoning for the cache overhead and performance of the >> read unlock path being better with passing the percpu_data as an argument >> then >> I propose we keep the patches as is. >> > Ping? I believe this is the last point of discussion before the patches can > go in. Sorry -- I did skim this, and intended to give it another once-over last week, but some other stuff came up. I should get a chance to take a look at it sometime this week. -George _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |