[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2
From: Andrii Anisov <andrii_anisov@xxxxxxxx> This patch series is an attempt to reduce IRQ latency with the old GIC implementation (gic-vgic). These patches originally based on XEN 4.10 release. The motivation was to improve benchmark results of a system given to a customer for evaluation. This patch series is tailored for GICv2 on RCAR H3. Several of the most controversial patches (i.e. LRs shadowing) were not shared to the customer, and here are for comments and discussion. I hope several patches from here could be upstreamed. Some as is, others with modifications. There are several simple ideas behind these changes: - reduce an excessive code (condition checks) - drop an excessive peripheral register accesses - if not reduce, then move an excessive code out of spinlocks - if not drop, then move an excessive register accesses out of spinlocks This is a v2 of the original RFC series [1]. From that series, patches [2] and [3] have already reached mainline. Here few patches are reworked with addressing some comments or separating them into more clear pieces, more patches are taken from the RFC v1 as is. The main intention of this version of RFC series is to reveal patch-by-patch IRQ latency impact. The measurement is performed with TBM [4], so the use-case is trivial - passing a single IRQ twice in a second. Thus no lock contentions nor even passing more than one interrupt to a guest at the time use-cases are hit. The series is based on the current xenbits/staging, commit 7f28661f6a7. XEN is build with no DEBUG and no GICv3 support for the staging HEAD and each commit. Four runtime configurations are evaluated for each commit: - sched=credit2 vwfi=trap - sched=credit2 vwfi=native - sched=credit vwfi=trap - sched=credit vwfi=native Each commit is incrementally cherry-picked for the latency evaluation in an order they appear in the table. The table also can be found shared as a Google spreadsheet here [5]. sched=credit2 vwfi=trap sched=credit2 vwfi=native sched=credit vwfi=trap sched=credit vwfi=native 7f28661f6a7ce3d82f881b9afedfebca7f2cf116 max=9480 warm_max=7200 min=6600 avg=6743 max=4680 warm_max=3240 min=3000 avg=3007 max=9480 warm_max=7920 min=6720 avg=7009 max=4560 warm_max=3000 min=2880 avg=2979 gic:gic-vgic: separate GIV3 code more thoroughly max=9720 warm_max=6960 min=6600 avg=6617 max=5040 warm_max=3840 min=2880 avg=2905 max=9480 warm_max=7200 min=6600 avg=6871 max=4560 warm_max=3000 min=2880 avg=2887 gic-vgic:vgic: avoid excessive conversions max=9360 warm_max=6720 min=6480 avg=6578 max=4800 warm_max=3120 min=2880 avg=2895 max=9480 warm_max=7080 min=6600 avg=6804 max=4800 warm_max=3120 min=2880 avg=2887 gic:vgic:gic-vgic: introduce non-atomic bitops max=9120 warm_max=6600 min=6480 avg=6546 max=4920 warm_max=3000 min=2760 avg=2872 max=9120 warm_max=6720 min=6480 avg=6574 max=4200 warm_max=3120 min=2760 avg=2798 gic: drop interrupts enabling on interrupts processing max=9240 warm_max=7080 min=6360 avg=6492 max=5040 warm_max=3240 min=2760 avg=2767 max=9240 warm_max=6720 min=6480 avg=6491 max=4440 warm_max=3000 min=2760 avg=2809 gic-vgic: skip irqs locking in gic_restore_pending_irqs() max=9000 warm_max=6720 min=6360 avg=6430 max=4320 warm_max=3120 min=2640 avg=2671 max=9240 warm_max=6720 min=6360 avg=6459 max=4440 warm_max=2880 min=2640 avg=2668 vgic: move pause_flags check out of vgic spinlock max=9240 warm_max=6720 min=6360 avg=6431 max=4800 warm_max=2880 min=2640 avg=2675 max=9360 warm_max=6600 min=6360 avg=6435 max=4440 warm_max=2760 min=2640 avg=2647 vgic: move irq_to_pending out of lock max=8520 warm_max=7440 min=6360 avg=6444 max=4680 warm_max=3000 min=2640 avg=2753 max=9480 warm_max=6720 min=6360 avg=6445 max=4200 warm_max=3000 min=2640 avg=2667 gic-vgic:vgic: do not keep disabled IRQs in any of queues max=9120 warm_max=7920 min=6360 avg=6447 max=4440 warm_max=2760 min=2760 avg=2767 max=10440 warm_max=7560 min=6360 avg=6459 max=4440 warm_max=3840 min=2640 avg=2669 xen/arm: Re-enable interrupt later in the trap path max=9720 warm_max=9120 min=6360 avg=6441 max=4440 warm_max=2880 min=2760 avg=2767 max=9360 warm_max=6960 min=6360 avg=6451 max=4680 warm_max=2880 min=2640 avg=2675 gic-vgic: skip irqs locking in vgic_sync_from_lrs max=9240 warm_max=7080 min=6360 avg=6431 max=4920 warm_max=3120 min=2640 avg=2678 max=9480 warm_max=6960 min=6360 avg=6443 max=4680 warm_max=2880 min=2640 avg=2667 gic-v2: Write HCR only on change max=9840 warm_max=6600 min=6360 avg=6459 max=4440 warm_max=2760 min=2520 avg=2527 max=9480 warm_max=7920 min=6360 avg=6445 max=4320 warm_max=2760 min=2520 avg=2527 gic-v2: avoid HCR reading for GICv2 max=9480 warm_max=7680 min=6360 avg=6443 max=4320 warm_max=2880 min=2520 avg=2527 max=9360 warm_max=7080 min=6720 avg=6750 max=3960 warm_max=2640 min=2400 avg=2487 hack: arm/domain: simplify context restore from idle vcpu max=9360 warm_max=6720 min=6000 avg=6214 max=5040 warm_max=2640 min=2520 avg=2527 max=9480 warm_max=7080 min=6240 avg=6367 max=4080 warm_max=2880 min=2400 avg=2527 hack: move gicv2 LRs reads and writes out of spinlocks max=9480 warm_max=6840 min=6600 avg=6612 max=4800 warm_max=2760 min=2640 avg=2739 max=9000 warm_max=7200 min=6600 avg=6636 max=4560 warm_max=2760 min=2520 avg=2619 gic: vgic: align frequently accessed data by cache line size max=9840 warm_max=6600 min=6240 avg=6288 max=4440 warm_max=2880 min=2640 avg=2682 max=8280 warm_max=6720 min=6360 avg=6488 max=4080 warm_max=2880 min=2640 avg=2678 gic: separate ppi processing NOT SUITABLE FOR EVALUATION WITH TBM [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html [2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03291.html [3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03285.html [4] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00881.html [5] https://docs.google.com/spreadsheets/d/1J_u9UDowaonnaKtgiugXqtIFT-c2E4Ss2vxgL6NnbNo/edit?usp=sharing Andrii Anisov (15): gic:gic-vgic: separate GIV3 code more thoroughly gic-vgic:vgic: avoid excessive conversions gic:vgic:gic-vgic: introduce non-atomic bitops gic: drop interrupts enabling on interrupts processing gic-vgic: skip irqs locking in gic_restore_pending_irqs() vgic: move pause_flags check out of vgic spinlock vgic: move irq_to_pending out of lock gic-vgic:vgic: do not keep disabled IRQs in any of queues gic-vgic: skip irqs locking in vgic_sync_from_lrs gic-v2: Write HCR only on change gic-v2: avoid HCR reading for GICv2 hack: arm/domain: simplify context restore from idle vcpu hack: move gicv2 LRs reads and writes out of spinlocks gic: vgic: align frequently accessed data by cache line size gic: separate ppi processing Julien Grall (1): xen/arm: Re-enable interrupt later in the trap path [11] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03282.html xen/arch/arm/arm64/entry.S | 11 +++--- xen/arch/arm/domain.c | 25 +++++++----- xen/arch/arm/gic-v2.c | 82 +++++++++++++++++++++++++------------- xen/arch/arm/gic-v3-its.c | 2 + xen/arch/arm/gic-v3-lpi.c | 2 + xen/arch/arm/gic-v3.c | 4 +- xen/arch/arm/gic-vgic.c | 87 +++++++++++++++++++++++++---------------- xen/arch/arm/gic.c | 32 +++++++++++++-- xen/arch/arm/irq.c | 32 +++++++++++++++ xen/arch/arm/traps.c | 6 +++ xen/arch/arm/vgic-v3-its.c | 2 +- xen/arch/arm/vgic.c | 93 +++++++++++++++++++++++++++++++++++++------- xen/arch/arm/vgic/vgic.c | 2 + xen/include/asm-arm/config.h | 2 +- xen/include/asm-arm/gic.h | 10 ++--- xen/include/asm-arm/irq.h | 3 ++ xen/include/asm-arm/vgic.h | 24 ++++++++---- xen/include/xen/sched.h | 1 + 18 files changed, 310 insertions(+), 110 deletions(-) -- 2.7.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |