[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v5 3/8] qspinlock, x86: Add x86 specific optimization for 2 contending tasks
Peter, I was trying to implement the generic queue code exchange code using cmpxchg as suggested by you. However, when I gathered the performance data, the code performed worse than I expected at a higher contention level. Below were the execution time of the benchmark tool that I sent you: [xchg] [cmpxchg] # of tasks Ticket lock Queue lock Queue Lock ---------- ----------- ----------- ---------- 1 135 135 135 2 732 1315 1102 3 1827 2372 2681 4 2689 2934 5392 5 3736 3658 7696 6 4942 4434 9876 7 6304 5176 11901 8 7736 5955 14551 Below is the code that I used:static inline u32 queue_code_xchg(struct qspinlock *lock, u32 *ocode, u32 ncode) { while (true) { u32 qlcode = atomic_read(&lock->qlcode); if (qlcode == 0) { /* * Try to get the lock */ if (atomic_cmpxchg(&lock->qlcode, 0, _QSPINLOCK_LOCKED) == 0) return 1; } else if (qlcode & _QSPINLOCK_LOCKED) { *ocode = atomic_cmpxchg(&lock->qlcode, qlcode, ncode | _QSPINLOCK_LOCKED); if (*ocode == qlcode) { /* Clear lock bit before return */ *ocode &= ~_QSPINLOCK_LOCKED; return 0; } } /** Wait if atomic_cmpxchg() fails or lock is temporarily free. */ arch_mutex_cpu_relax(); } } My cmpxchg code is not optimal, and I can probably tune the code to make it perform better. Given the trend that I was seeing, however, I think I will keep the current xchg code, but I will package it in an inline function. -Longman _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |