[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] [RFC v2 00/16] Old GIC (gic-vgic) optimizations for GICV2

To: xen-devel@xxxxxxxxxxxxxxxxxxxx
From: Andrii Anisov <andrii.anisov@xxxxxxxxx>
Date: Wed, 26 Dec 2018 13:20:11 +0200
Cc: Andre Przywara <andre.przywara@xxxxxxx>, Julien Grall <julien.grall@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Andrii Anisov <andrii_anisov@xxxxxxxx>
Delivery-date: Wed, 26 Dec 2018 11:20:44 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

From: Andrii Anisov <andrii_anisov@xxxxxxxx>

This patch series is an attempt to reduce IRQ latency with the
old GIC implementation (gic-vgic). These patches originally based
on XEN 4.10 release. The motivation was to improve benchmark
results of a system given to a customer for evaluation.
This patch series is tailored for GICv2 on RCAR H3. Several
of the most controversial patches (i.e. LRs shadowing) were
not shared to the customer, and here are for comments and discussion.
I hope several patches from here could be upstreamed. Some as is,
others with modifications.

There are several simple ideas behind these changes:
    - reduce an excessive code (condition checks)
    - drop an excessive peripheral register accesses
    - if not reduce, then move an excessive code out of spinlocks
    - if not drop, then move an excessive register
      accesses out of spinlocks

This is a v2 of the original RFC series [1]. From that series, patches
[2] and [3] have already reached mainline. Here few patches are reworked
with addressing some comments or separating them into more clear pieces,
more patches are taken from the RFC v1 as is.

The main intention of this version of RFC series is to reveal
patch-by-patch IRQ latency impact.
The measurement is performed with TBM [4], so the use-case is trivial -
passing a single IRQ twice in a second. Thus no lock contentions nor
even passing more than one interrupt to a guest at the time use-cases
are hit.

The series is based on the current xenbits/staging, commit 7f28661f6a7.
XEN is build with no DEBUG and no GICv3 support for the staging HEAD and
each commit. Four runtime configurations are evaluated for each commit:
    - sched=credit2 vwfi=trap
    - sched=credit2 vwfi=native
    - sched=credit vwfi=trap
    - sched=credit vwfi=native

Each commit is incrementally cherry-picked for the latency evaluation in
an order they appear in the table. The table also can be found shared
as a Google spreadsheet here [5].


        sched=credit2 vwfi=trap                         sched=credit2 
vwfi=native                       sched=credit vwfi=trap                        
  sched=credit vwfi=native

7f28661f6a7ce3d82f881b9afedfebca7f2cf116
        max=9480 warm_max=7200 min=6600 avg=6743        max=4680 warm_max=3240 
min=3000 avg=3007        max=9480 warm_max=7920 min=6720 avg=7009        
max=4560 warm_max=3000 min=2880 avg=2979

gic:gic-vgic: separate GIV3 code more thoroughly

        max=9720 warm_max=6960 min=6600 avg=6617        max=5040 warm_max=3840 
min=2880 avg=2905        max=9480 warm_max=7200 min=6600 avg=6871        
max=4560 warm_max=3000 min=2880 avg=2887

gic-vgic:vgic: avoid excessive conversions

        max=9360 warm_max=6720 min=6480 avg=6578        max=4800 warm_max=3120 
min=2880 avg=2895        max=9480 warm_max=7080 min=6600 avg=6804        
max=4800 warm_max=3120 min=2880 avg=2887

gic:vgic:gic-vgic: introduce non-atomic bitops

        max=9120 warm_max=6600 min=6480 avg=6546        max=4920 warm_max=3000 
min=2760 avg=2872        max=9120 warm_max=6720 min=6480 avg=6574        
max=4200 warm_max=3120 min=2760 avg=2798

gic: drop interrupts enabling on interrupts processing
        max=9240 warm_max=7080 min=6360 avg=6492        max=5040 warm_max=3240 
min=2760 avg=2767        max=9240 warm_max=6720 min=6480 avg=6491        
max=4440 warm_max=3000 min=2760 avg=2809

gic-vgic: skip irqs locking in gic_restore_pending_irqs()
        max=9000 warm_max=6720 min=6360 avg=6430        max=4320 warm_max=3120 
min=2640 avg=2671        max=9240 warm_max=6720 min=6360 avg=6459        
max=4440 warm_max=2880 min=2640 avg=2668

vgic: move pause_flags check out of vgic spinlock
        max=9240 warm_max=6720 min=6360 avg=6431        max=4800 warm_max=2880 
min=2640 avg=2675        max=9360 warm_max=6600 min=6360 avg=6435        
max=4440 warm_max=2760 min=2640 avg=2647

vgic: move irq_to_pending out of lock
        max=8520 warm_max=7440 min=6360 avg=6444        max=4680 warm_max=3000 
min=2640 avg=2753        max=9480 warm_max=6720 min=6360 avg=6445        
max=4200 warm_max=3000 min=2640 avg=2667

gic-vgic:vgic: do not keep disabled IRQs in any of queues
        max=9120 warm_max=7920 min=6360 avg=6447        max=4440 warm_max=2760 
min=2760 avg=2767        max=10440 warm_max=7560 min=6360 avg=6459       
max=4440 warm_max=3840 min=2640 avg=2669

xen/arm: Re-enable interrupt later in the trap path
        max=9720 warm_max=9120 min=6360 avg=6441        max=4440 warm_max=2880 
min=2760 avg=2767        max=9360 warm_max=6960 min=6360 avg=6451        
max=4680 warm_max=2880 min=2640 avg=2675

gic-vgic: skip irqs locking in vgic_sync_from_lrs
        max=9240 warm_max=7080 min=6360 avg=6431        max=4920 warm_max=3120 
min=2640 avg=2678        max=9480 warm_max=6960 min=6360 avg=6443        
max=4680 warm_max=2880 min=2640 avg=2667

gic-v2: Write HCR only on change
        max=9840 warm_max=6600 min=6360 avg=6459        max=4440 warm_max=2760 
min=2520 avg=2527        max=9480 warm_max=7920 min=6360 avg=6445        
max=4320 warm_max=2760 min=2520 avg=2527

gic-v2: avoid HCR reading for GICv2
        max=9480 warm_max=7680 min=6360 avg=6443        max=4320 warm_max=2880 
min=2520 avg=2527        max=9360 warm_max=7080 min=6720 avg=6750        
max=3960 warm_max=2640 min=2400 avg=2487

hack: arm/domain: simplify context restore from idle vcpu
        max=9360 warm_max=6720 min=6000 avg=6214        max=5040 warm_max=2640 
min=2520 avg=2527        max=9480 warm_max=7080 min=6240 avg=6367        
max=4080 warm_max=2880 min=2400 avg=2527

hack: move gicv2 LRs reads and writes out of spinlocks
        max=9480 warm_max=6840 min=6600 avg=6612        max=4800 warm_max=2760 
min=2640 avg=2739        max=9000 warm_max=7200 min=6600 avg=6636        
max=4560 warm_max=2760 min=2520 avg=2619

gic: vgic: align frequently accessed data by cache line size
        max=9840 warm_max=6600 min=6240 avg=6288        max=4440 warm_max=2880 
min=2640 avg=2682        max=8280 warm_max=6720 min=6360 avg=6488        
max=4080 warm_max=2880 min=2640 avg=2678

gic: separate ppi processing
        NOT SUITABLE FOR EVALUATION WITH TBM

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03328.html
[2] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03291.html
[3] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03285.html
[4] https://lists.xenproject.org/archives/html/xen-devel/2018-12/msg00881.html
[5] 
https://docs.google.com/spreadsheets/d/1J_u9UDowaonnaKtgiugXqtIFT-c2E4Ss2vxgL6NnbNo/edit?usp=sharing

Andrii Anisov (15):
  gic:gic-vgic: separate GIV3 code more thoroughly
  gic-vgic:vgic: avoid excessive conversions
  gic:vgic:gic-vgic: introduce non-atomic bitops
  gic: drop interrupts enabling on interrupts processing
  gic-vgic: skip irqs locking in gic_restore_pending_irqs()
  vgic: move pause_flags check out of vgic spinlock
  vgic: move irq_to_pending out of lock
  gic-vgic:vgic: do not keep disabled IRQs in any of queues
  gic-vgic: skip irqs locking in vgic_sync_from_lrs
  gic-v2: Write HCR only on change
  gic-v2: avoid HCR reading for GICv2
  hack: arm/domain: simplify context restore from idle vcpu
  hack: move gicv2 LRs reads and writes out of spinlocks
  gic: vgic: align frequently accessed data by cache line size
  gic: separate ppi processing

Julien Grall (1):
  xen/arm: Re-enable interrupt later in the trap path

[11] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03282.html

 xen/arch/arm/arm64/entry.S   | 11 +++---
 xen/arch/arm/domain.c        | 25 +++++++-----
 xen/arch/arm/gic-v2.c        | 82 +++++++++++++++++++++++++-------------
 xen/arch/arm/gic-v3-its.c    |  2 +
 xen/arch/arm/gic-v3-lpi.c    |  2 +
 xen/arch/arm/gic-v3.c        |  4 +-
 xen/arch/arm/gic-vgic.c      | 87 +++++++++++++++++++++++++----------------
 xen/arch/arm/gic.c           | 32 +++++++++++++--
 xen/arch/arm/irq.c           | 32 +++++++++++++++
 xen/arch/arm/traps.c         |  6 +++
 xen/arch/arm/vgic-v3-its.c   |  2 +-
 xen/arch/arm/vgic.c          | 93 +++++++++++++++++++++++++++++++++++++-------
 xen/arch/arm/vgic/vgic.c     |  2 +
 xen/include/asm-arm/config.h |  2 +-
 xen/include/asm-arm/gic.h    | 10 ++---
 xen/include/asm-arm/irq.h    |  3 ++
 xen/include/asm-arm/vgic.h   | 24 ++++++++----
 xen/include/xen/sched.h      |  1 +
 18 files changed, 310 insertions(+), 110 deletions(-)

-- 
2.7.4


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- [Xen-devel] [RFC v2 16/16] gic: separate ppi processing
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 15/16] gic: vgic: align frequently accessed data by cache line size
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 14/16] hack: move gicv2 LRs reads and writes out of spinlocks
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 13/16] hack: arm/domain: simplify context restore from idle vcpu
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 12/16] gic-v2: avoid HCR reading for GICv2
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 11/16] gic-v2: Write HCR only on change
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 10/16] gic-vgic: skip irqs locking in vgic_sync_from_lrs
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 09/16] xen/arm: Re-enable interrupt later in the trap path
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 08/16] gic-vgic:vgic: do not keep disabled IRQs in any of queues
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 07/16] hack:vgic: move irq_to_pending out of lock
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 06/16] vgic: move pause_flags check out of vgic spinlock
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 05/16] gic-vgic: skip irqs locking in gic_restore_pending_irqs()
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 04/16] gic: drop interrupts enabling on interrupts processing
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 03/16] gic:vgic:gic-vgic: introduce non-atomic bitops
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 02/16] gic-vgic:vgic: avoid excessive conversions
  - From: Andrii Anisov
- [Xen-devel] [RFC v2 01/16] gic:gic-vgic: separate GIV3 code more thoroughly
  - From: Andrii Anisov

Prev by Date: [Xen-devel] [RFC v2 01/16] gic:gic-vgic: separate GIV3 code more thoroughly
Next by Date: [Xen-devel] [RFC v2 02/16] gic-vgic:vgic: avoid excessive conversions
Previous by thread: [Xen-devel] [ovmf test] 131582: regressions - FAIL
Next by thread: [Xen-devel] [RFC v2 01/16] gic:gic-vgic: separate GIV3 code more thoroughly
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.