[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 0/4] enhance lock debugging
On 07.08.19 20:11, Andrew Cooper wrote: On 07/08/2019 15:31, Juergen Gross wrote:While hunting a locking problem in my core scheduling series I have added some debugging aids to spinlock handling making it easier to find the root cause for the problem. Making use of the already lock profiling and enhancing it a little bit produces some really valuable diagnostic data e.g. when a NMI watchdog is triggering a crash.So I'm not sure where best to report this in the series, so 0/$N will have to do. Here is a sample trace from panic(): (XEN) Xen lock profile info SHOW (now = 49695791886 total = 49695791886) (XEN) Global grant_lock: addr=ffff82d080818640, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global balloon_lock: addr=ffff82d080818630, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global osvw_lock: addr=ffff82d080818610, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global mtrr_mutex: addr=ffff82d0808185f0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global set_atomicity_lock: addr=ffff82d0808185e0, lockval=80008, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global cmci_discover_lock: addr=ffff82d0808185c0, lockval=80008, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global mce_logout_lock: addr=ffff82d0808185a0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global processing_lock: addr=ffff82d080818580, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global vpmu_lock: addr=ffff82d080818560, lockval=40004, cpu=4095 (XEN) lock: 4(00000000:0000007D), block: 0(00000000:00000000) (XEN) Global pm_lock: addr=ffff82d080818230, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global rtc_lock: addr=ffff82d080818210, lockval=310031, cpu=4095 (XEN) lock: 48(00000000:0001892E), block: 0(00000000:00000000) (XEN) Global pit_lock: addr=ffff82d080818200, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global platform_timer_lock: addr=ffff82d0808181f0, lockval=6c006c, cpu=4095 (XEN) lock: 92(00000000:00012CAB), block: 0(00000000:00000000) (XEN) Global sync_lock: addr=ffff82d0808181d0, lockval=c188c188, cpu=4095 (XEN) lock: 115080(00000000:0074F130), block: 115079(00000000:0871000E) (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095 (XEN) lock: 1(00000000:01322165), block: 0(00000000:00000000) (XEN) Global flush_lock: addr=ffff82d0808181c0, lockval=91139111, cpu=3 (XEN) lock: 168195(00000000:159C6F6E), block: 1695(00000000:006B31FD) (XEN) Global xenpf_lock: addr=ffff82d0808181b0, lockval=2f002f, cpu=4095 (XEN) lock: 47(00000000:0001BC02), block: 0(00000000:00000000) (XEN) Global pci_config_lock: addr=ffff82d0808181a0, lockval=374d374d, cpu=4095 (XEN) lock: 5196(00000000:004545AA), block: 0(00000000:00000000) (XEN) Global lapic_nmi_owner_lock: addr=ffff82d080817d70, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global last_lock: addr=ffff82d080817d40, lockval=40034003, cpu=4095 (XEN) lock: 16387(00000000:00047132), block: 0(00000000:00000000) (XEN) Global map_pgdir_lock: addr=ffff82d080817d50, lockval=2c002c, cpu=4095 (XEN) lock: 38(00000000:00000ACF), block: 0(00000000:00000000) (XEN) Global microcode_mutex: addr=ffff82d080817d20, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global microcode_update_lock: addr=ffff82d080817d10, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global microcode_update_lock: addr=ffff82d080817d00, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global vector_lock: addr=ffff82d080817cf0, lockval=440044, cpu=4095 (XEN) lock: 27(00000000:00003B20), block: 0(00000000:00000000) (XEN) Global irq_ratelimit_lock: addr=ffff82d080817ce0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global msix_fixmap_lock: addr=ffff82d080817cb0, lockval=20002, cpu=4095 (XEN) lock: 2(00000000:000000CF), block: 0(00000000:00000000) (XEN) Global ioapic_lock: addr=ffff82d080817c90, lockval=660066, cpu=4095 (XEN) lock: 20(00000000:00026E6C), block: 0(00000000:00000000) (XEN) Global i8259A_lock: addr=ffff82d080817c80, lockval=2d002d, cpu=4095 (XEN) lock: 6(00000000:00001BA0), block: 0(00000000:00000000) (XEN) Global apei_iomaps_lock: addr=ffff82d080817c50, lockval=110011, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global erst_lock: addr=ffff82d080817c40, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global shared_intremap_lock: addr=ffff82d080817bc0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global iommu_pt_cleanup_lock: addr=ffff82d080817ba0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global rx_lock: addr=ffff82d080817b70, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global tx_lock: addr=ffff82d080817b60, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global console_lock: addr=ffff82d080817830, lockval=770077, cpu=4095 (XEN) lock: 3447(00000000:07595894), block: 0(00000000:00000000) (XEN) Global ratelimit_lock: addr=ffff82d080817820, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global pool_list_lock: addr=ffff82d0808177e0, lockval=10001, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global xenoprof_lock: addr=ffff82d0808177d0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global pmu_owner_lock: addr=ffff82d0808177c0, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global vm_lock: addr=ffff82d0808177b0, lockval=220022, cpu=4095 (XEN) lock: 17(00000000:00001ED1), block: 0(00000000:00000000) (XEN) Global virtual_region_lock: addr=ffff82d0808177a0, lockval=10001, cpu=4095 (XEN) lock: 1(00000000:00000054), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global wc_lock: addr=ffff82d080817770, lockval=c000c, cpu=4095 (XEN) lock: 11(00000000:00000995), block: 0(00000000:00000000) (XEN) Global tasklet_lock: addr=ffff82d080817760, lockval=3f403f4, cpu=4095 (XEN) lock: 426(00000000:00007AFF), block: 0(00000000:00000000) (XEN) Global sysctl_lock: addr=ffff82d080817750, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global symbols_mutex: addr=ffff82d080817740, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global stopmachine_lock: addr=ffff82d080817730, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global call_lock: addr=ffff82d080817710, lockval=4d004d, cpu=4095 (XEN) lock: 67(00000000:00176310), block: 0(00000000:00000000) (XEN) Global heap_lock: addr=ffff82d080817570, lockval=6b286b28, cpu=4095 (XEN) lock: 753071(00000000:018E9B8B), block: 33(00000000:0000916A) (XEN) Global payload_lock: addr=ffff82d080817550, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global crash_notes_lock: addr=ffff82d080817530, lockval=80008, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global global_virq_handlers_lock: addr=ffff82d080817500, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global domlist_update_lock: addr=ffff82d0808174e0, lockval=10001, cpu=4095 (XEN) lock: 1(00000000:0000005B), block: 0(00000000:00000000) (XEN) Global domctl_lock: addr=ffff82d0808174d0, lockval=50005, cpu=4095 (XEN) lock: 5(00000000:000008DC), block: 0(00000000:00000000) (XEN) Global cpupool_lock: addr=ffff82d0808174b0, lockval=a000a, cpu=4095 (XEN) lock: 1(00000000:00000176), block: 0(00000000:00000000) (XEN) Global cpu_add_remove_lock: addr=ffff82d0808174a0, lockval=2d002d, cpu=4095 (XEN) lock: 24(00000000:0000175F), block: 0(00000000:00000000) (XEN) Global efi_rs_lock: addr=ffff82d080817030, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Domain 0 event_lock: addr=ffff8308366c50e8, lockval=1d451d45, cpu=4095 (XEN) lock: 7493(00000000:00866052), block: 0(00000000:00000000) (XEN) Domain 0 page_alloc_lock: addr=ffff8308366c5028, lockval=9aa59aa5, cpu=4095 (XEN) lock: 39589(00000000:0055339E), block: 0(00000000:00000000) (XEN) Domain 0 domain_lock: addr=ffff8308366c5018, lockval=150015, cpu=4095 (XEN) lock: 21(00000000:00009D70), block: 0(00000000:00000000) (XEN) Domain 32767 page_alloc_lock: addr=ffff83084ccc5028, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Domain 32767 domain_lock: addr=ffff83084ccc5018, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Domain 32753 page_alloc_lock: addr=ffff83084cccf028, lockval=3ee43ee4, cpu=4095 (XEN) lock: 16100(00000000:0005173F), block: 0(00000000:00000000) (XEN) Domain 32753 domain_lock: addr=ffff83084cccf018, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Domain 32754 page_alloc_lock: addr=ffff83084ccd0028, lockval=64006400, cpu=4095 (XEN) lock: 25600(00000000:000814D9), block: 0(00000000:00000000) (XEN) Domain 32754 domain_lock: addr=ffff83084ccd0018, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) Its not exactly the easiest to dump to follow. First of all - I don't see why the hold/block time are printed like that. It might be a holdover from the 32bit build, pre PRId64 support. They should probably use PRI_stime anyway. Fine with me. The domid rendering is unfortunate. Ideally we'd use %pd but that would involve rearranging the logic to get a struct domain* in hand. Seeing as you're the last person to modify this code, how hard would that be to do? It would completely break the struct type agnostic design. I have a debug patch adding credit2 run-queue lock. It requires to just add 5 lines of code (and this includes moving the spinlock_init() out of an irq-off section). It will produce: (XEN) credit2-runq 0 lock: addr=ffff830413351010, lockval=de00ddf, cpu=6 (XEN) lock: 896287(00000000:00FAA6B2), block: 819(00000000:00009C80) If/when the per-domain locks are rendered more clearly, the "Global " prefix can be dropped. I'm not sure we should do that. An interesting quirk of ticketlocks is that lockval= is redundant with lock_cnt. Could we perhaps fold the two fields? No. See my credit2 example. As soon as we get above 65535 lockings the values are no longer redundant. Distinguishing cpu=4095 as "not locked" would be useful. Yes, will change. We have several global locks called lock: (XEN) Global lock: addr=ffff82d0808181e0, lockval=10001, cpu=4095 (XEN) lock: 1(00000000:01322165), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817cc0, lockval=100010, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817800, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817780, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) (XEN) Global lock: addr=ffff82d080817510, lockval=0000, cpu=4095 (XEN) lock: 0(00000000:00000000), block: 0(00000000:00000000) The second one in particular has corrupt data, seeing has it has been taken and released several times without lock_cnt increasing. The lock might have been taken/released before lock profiling has been initialized. For sanity sake, we should enforce unique naming of any lock registered for profiling. This would be every lock inited via DEFINE_SPINLOCK(). I can do a followup patch for that purpose. Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |