[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 1/2] rwlock: add per-cpu reader-writer locks



Hi Malcolm,

If you can post the updated patches, that would be great. I think it would be better for me to test with your update.

Thanks again.

On 11/05/2015 10:20 AM, Malcolm Crossley wrote:
On 05/11/15 13:48, Marcos E. Matsunaga wrote:
Hi Malcolm,

I tried your patches against staging yesterday and as soon as I started a 
guest, it panic. I have
lock_profile enabled and applied your patches against:
I tested with a non debug version of Xen (because I was analysing the 
performance of Xen) and thus
those ASSERTS were never run.

The ASSERTS can be safely removed, the rwlock behaviour is slightly different 
in that it's possible
for a writer to hold the write lock whilst a reader is progressing through the 
read critical
section, this is safe because the writer is waiting for the percpu variables to 
clear before
actually progressing through it's own critical section.

I have an updated version of the patch series which fixes this. Do you want me 
to post it or are you
happy to remove the ASSERTS yourself ( or switch to non-debug build of Xen)

Sorry for not catching this before it hit the list.

Malcolm

6f04de658574833688c3f9eab310e7834d56a9c0 x86: cleanup of early cpuid handling



(XEN) HVM1 save: CPU
(XEN) HVM1 save: PIC
(XEN) HVM1 save: IOAPIC
(XEN) HVM1 save: LAPIC
(XEN) HVM1 save: LAPIC_REGS
(XEN) HVM1 save: PCI_IRQ
(XEN) HVM1 save: ISA_IRQ
(XEN) HVM1 save: PCI_LINK
(XEN) HVM1 save: PIT
(XEN) HVM1 save: RTC
(XEN) HVM1 save: HPET
(XEN) HVM1 save: PMTIMER
(XEN) HVM1 save: MTRR
(XEN) HVM1 save: VIRIDIAN_DOMAIN
(XEN) HVM1 save: CPU_XSAVE
(XEN) HVM1 save: VIRIDIAN_VCPU
(XEN) HVM1 save: VMCE_VCPU
(XEN) HVM1 save: TSC_ADJUST
(XEN) HVM1 restore: CPU 0
[  394.163143] loop: module loaded
(XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215
(XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d08011019e>] do_grant_table_op+0x63f/0x2e04
(XEN) RFLAGS: 0000000000010246   CONTEXT: hypervisor (d0v0)
(XEN) rax: 0000000000000000   rbx: ffff83400f9dc9e0   rcx: 0000000000000000
(XEN) rdx: 0000000000000001   rsi: ffff82d080342b10   rdi: ffff83400819b784
(XEN) rbp: ffff8300774ffef8   rsp: ffff8300774ffdf8   r8: 0000000000000002
(XEN) r9:  0000000000000002   r10: 0000000000000002   r11: 00000000ffffffff
(XEN) r12: 0000000000000000   r13: 0000000000000000   r14: ffff83400819b780
(XEN) r15: ffff83400f9d0000   cr0: 0000000080050033   cr4: 00000000001526e0
(XEN) cr3: 000001007f613000   cr2: ffff8800746182b8
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff8300774ffdf8:
(XEN)    ffff8300774ffe08 ffff82d000000000 ffff8300774ffef8 ffff82d08017fc9b
(XEN)    ffff82d080342b28 ffff83400f9d8600 ffff82d080342b10 0000000000000000
(XEN)    ffff83400f9dca20 ffff832100000000 ffff834008188000 0000000000000001
(XEN)    00000001772ee000 ffff8801e98d03e0 ffff8300774ffe88 0000000000000000
(XEN)    0000000000000000 ffff8300774fff18 00000021d0269c10 000000000001001a
(XEN)    ffffffff00000001 0000000000000000 0000000000000246 00007ff7de45a407
(XEN)    0000000000000100 00007ff7de45a407 0000000000000033 ffff8300772ee000
(XEN)    ffff8801eb0e3c00 ffff880004bf57e8 ffff8801e98d03e0 ffff8801eb0a5938
(XEN)    00007cff88b000c7 ffff82d08023d952 ffffffff8100128a 0000000000000014
(XEN)    0000000000000000 0000000000000001 ffff8801f6e18388 ffffffff81d3d740
(XEN)    ffff8801efb7bd40 ffff88000542e780 0000000000000282 0000000000000000
(XEN)    ffff8801e98d03a0 ffff8801efe07000 0000000000000014 ffffffff8100128a
(XEN)    0000000000000001 ffff8801e98d03e0 0000000000000000 0001010000000000
(XEN)    ffffffff8100128a 000000000000e033 0000000000000282 ffff8801efb7bce0
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff8300772ee000 0000000000000000
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82d08011019e>] do_grant_table_op+0x63f/0x2e04
(XEN)    [<ffff82d08023d952>] lstar_enter+0xe2/0x13c
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(&t->lock)' failed at grant_table.c:215
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)


Thanks for your help.

On 11/03/2015 12:58 PM, Malcolm Crossley wrote:
Per-cpu read-write locks allow for the fast path read case to have low overhead
by only setting/clearing a per-cpu variable for using the read lock.
The per-cpu read fast path also avoids locked compare swap operations which can
be particularly slow on coherent multi-socket systems, particularly if there is
heavy usage of the read lock itself.

The per-cpu reader-writer lock uses a global variable to control the read lock
fast path. This allows a writer to disable the fast path and ensure the readers
use the underlying read-write lock implementation.

Once the writer has taken the write lock and disabled the fast path, it must
poll the per-cpu variable for all CPU's which have entered the critical section
for the specific read-write lock the writer is attempting to take. This design
allows for a single per-cpu variable to be used for read/write locks belonging
to seperate data structures as long as multiple per-cpu read locks are not
simultaneously held by one particular cpu. This also means per-cpu
reader-writer locks are not recursion safe.

Slow path readers which are unblocked set the per-cpu variable and drop the
read lock. This simplifies the implementation and allows for fairness in the
underlying read-write lock to be taken advantage of.

There may be slightly more overhead on the per-cpu write lock path due to
checking each CPUs fast path read variable but this overhead is likely be hidden
by the required delay of waiting for readers to exit the critical section.
The loop is optimised to only iterate over the per-cpu data of active readers
of the rwlock.

Signed-off-by: Malcolm Crossley <malcolm.crossley@xxxxxxxxxx>
---
   xen/common/spinlock.c        | 32 ++++++++++++++++++++++++++++++++
   xen/include/asm-arm/percpu.h |  5 +++++
   xen/include/asm-x86/percpu.h |  6 ++++++
   xen/include/xen/percpu.h     |  4 ++++
   xen/include/xen/spinlock.h   | 37 +++++++++++++++++++++++++++++++++++++
   5 files changed, 84 insertions(+)

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 7f89694..a526216 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -492,6 +492,38 @@ int _rw_is_write_locked(rwlock_t *lock)
       return (lock->lock == RW_WRITE_FLAG); /* writer in critical section? */
   }
   +void percpu_write_lock(rwlock_t **per_cpudata, bool_t *writer_activating,
+        rwlock_t *rwlock)
+{
+    int cpu;
+    cpumask_t active_readers;
+
+    cpumask_copy(&active_readers, &cpu_online_map);
+    /* First take the write lock to protect against other writers */
+    write_lock(rwlock);
+
+    /* Now set the global variable so that readers start using read_lock */
+    *writer_activating = true;
+    smp_mb();
+
+    /* Check if there are any percpu readers in progress on our rwlock*/
+    do
+    {
+        for_each_cpu(cpu, &active_readers)
+        {
+            /* Remove any percpu readers not contending
+             * from our check mask */
+            if ( per_cpu_ptr(per_cpudata, cpu) != rwlock )
+                cpumask_clear_cpu(cpu, &active_readers);
+        }
+        /* Check if we've cleared all percpu readers */
+        if ( cpumask_empty(&active_readers) )
+            break;
+        /* Give the coherency fabric a break */
+        cpu_relax();
+    } while ( 1 );
+}
+
   #ifdef LOCK_PROFILE
     struct lock_profile_anc {
diff --git a/xen/include/asm-arm/percpu.h b/xen/include/asm-arm/percpu.h
index 71e7649..c308a56 100644
--- a/xen/include/asm-arm/percpu.h
+++ b/xen/include/asm-arm/percpu.h
@@ -27,6 +27,11 @@ void percpu_init_areas(void);
   #define __get_cpu_var(var) \
       (*RELOC_HIDE(&per_cpu__##var, READ_SYSREG(TPIDR_EL2)))
   +#define per_cpu_ptr(var, cpu)  \
+    (*RELOC_HIDE(&var, __per_cpu_offset[cpu]))
+#define __get_cpu_ptr(var) \
+    (*RELOC_HIDE(&var, READ_SYSREG(TPIDR_EL2)))
+
   #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
     DECLARE_PER_CPU(unsigned int, cpu_id);
diff --git a/xen/include/asm-x86/percpu.h b/xen/include/asm-x86/percpu.h
index 604ff0d..51562b9 100644
--- a/xen/include/asm-x86/percpu.h
+++ b/xen/include/asm-x86/percpu.h
@@ -20,4 +20,10 @@ void percpu_init_areas(void);
     #define DECLARE_PER_CPU(type, name) extern __typeof__(type) per_cpu__##name
   +#define __get_cpu_ptr(var) \
+    (*RELOC_HIDE(var, get_cpu_info()->per_cpu_offset))
+
+#define per_cpu_ptr(var, cpu)  \
+    (*RELOC_HIDE(var, __per_cpu_offset[cpu]))
+
   #endif /* __X86_PERCPU_H__ */
diff --git a/xen/include/xen/percpu.h b/xen/include/xen/percpu.h
index abe0b11..c896863 100644
--- a/xen/include/xen/percpu.h
+++ b/xen/include/xen/percpu.h
@@ -16,6 +16,10 @@
   /* Preferred on Xen. Also see arch-defined per_cpu(). */
   #define this_cpu(var)    __get_cpu_var(var)
   +#define this_cpu_ptr(ptr)    __get_cpu_ptr(ptr)
+
+#define get_per_cpu_var(var)  (per_cpu__##var)
+
   /* Linux compatibility. */
   #define get_cpu_var(var) this_cpu(var)
   #define put_cpu_var(var)
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index fb0438e..f929f1b 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -3,6 +3,7 @@
     #include <asm/system.h>
   #include <asm/spinlock.h>
+#include <xen/stdbool.h>
     #ifndef NDEBUG
   struct lock_debug {
@@ -261,4 +262,40 @@ int _rw_is_write_locked(rwlock_t *lock);
   #define rw_is_locked(l)               _rw_is_locked(l)
   #define rw_is_write_locked(l)         _rw_is_write_locked(l)
   +static inline void percpu_read_lock(rwlock_t **per_cpudata, bool_t 
*writer_activating,
+        rwlock_t *rwlock)
+{
+    /* Indicate this cpu is reading */
+    this_cpu_ptr(per_cpudata) = rwlock;
+    smp_mb();
+    /* Check if a writer is waiting */
+    if ( unlikely(*writer_activating) )
+    {
+        /* Let the waiting writer know we aren't holding the lock */
+        this_cpu_ptr(per_cpudata) = NULL;
+        /* Wait using the read lock to keep the lock fair */
+        read_lock(rwlock);
+        /* Set the per CPU data again and continue*/
+        this_cpu_ptr(per_cpudata) = rwlock;
+        /* Drop the read lock because we don't need it anymore */
+        read_unlock(rwlock);
+    }
+}
+
+static inline void percpu_read_unlock(rwlock_t **per_cpudata)
+{
+    this_cpu_ptr(per_cpudata) = NULL;
+    smp_wmb();
+}
+
+/* Don't inline percpu write lock as it's a complex function */
+void percpu_write_lock(rwlock_t **per_cpudata, bool_t *writer_activating,
+        rwlock_t *rwlock);
+
+static inline void percpu_write_unlock(bool_t *writer_activating, rwlock_t 
*rwlock)
+{
+    *writer_activating = false;
+    write_unlock(rwlock);
+}
+
   #endif /* __SPINLOCK_H__ */

--

Regards,

Marcos Eduardo Matsunaga

Oracle USA
Linux Engineering

“The statements and opinions expressed here are my own and do not
necessarily represent those of Oracle Corporation.”


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.