[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 15/16] xen/arm: Implement Set/Way operations

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>
From: Julien Grall <julien.grall@xxxxxxx>
Date: Mon, 5 Nov 2018 23:38:24 +0000
Cc: andre.przywara@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxx
Delivery-date: Mon, 05 Nov 2018 23:38:38 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Hi Stefano,

On 11/5/18 9:10 PM, Stefano Stabellini wrote:

On Mon, 8 Oct 2018, Julien Grall wrote:

Set/Way operations are used to perform maintenance on a given cache.
At the moment, Set/Way operations are not trapped and therefore a guest
OS will directly act on the local cache. However, a vCPU may migrate to
another pCPU in the middle of the processor. This will result to have
cache with stall data (Set/Way are not propagated) potentially causing
crash. This may be the cause of heisenbug noticed in Osstest [1].

Furthermore, Set/Way operations are not available on system cache. This
means that OS, such as Linux 32-bit, relying on those operations to
fully clean the cache before disabling MMU may break because data may
sits in system caches and not in RAM.

For more details about Set/Way, see the talk "The Art of Virtualizing
Cache Maintenance" given at Xen Summit 2018 [2].

In the context of Xen, we need to trap Set/Way operations and emulate
them. From the Arm Arm (B1.14.4 in DDI 046C.c), Set/Way operations are
difficult to virtualized. So we can assume that a guest OS using them will
suffer the consequence (i.e slowness) until developer removes all the usage
of Set/Way.

As the software is not allowed to infer the Set/Way to Physical Address
mapping, Xen will need to go through the guest P2M and clean &
invalidate all the entries mapped.

Because Set/Way happen in batch (a loop on all Set/Way of a cache), Xen
would need to go through the P2M for every instructions. This is quite
expensive and would severely impact the guest OS. The implementation is
re-using the KVM policy to limit the number of flush:
     - If we trap a Set/Way operations, we enable VM trapping (i.e

                    ^ remove 'a'

       HVC_EL2.TVM) to detect cache being turned on/off, and do a full
     clean.


"do a full clean" straight away, right? May I suggest a rewording of
this item:

- as soon as we trap a set/way operation, we enable VM trapping (i.e.
   HVC_EL2.TVM, it ll allow us to detect cache being turned on/off),
   then we do a full clean


Sure.

     - We clean the caches when turning on and off


"We clean the caches when the guest turns caches on or off"


Sure.

     - Once the caches are enabled, we stop trapping VM instructions

[1] https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg03191.html
[2] https://fr.slideshare.net/xen_com_mgr/virtualizing-cache

Signed-off-by: Julien Grall <julien.grall@xxxxxxx>
---
  xen/arch/arm/arm64/vsysreg.c | 27 +++++++++++++++++-
  xen/arch/arm/p2m.c           | 68 ++++++++++++++++++++++++++++++++++++++++++++
  xen/arch/arm/traps.c         |  2 +-
  xen/arch/arm/vcpreg.c        | 23 +++++++++++++++
  xen/include/asm-arm/p2m.h    | 16 +++++++++++
  5 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index 1517879697..43c6c3e30d 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -40,7 +40,20 @@ static bool vreg_emulate_##reg(struct cpu_user_regs *regs,   
       \
  }

/* Defining helpers for emulating sysreg registers. */

-TVM_REG(SCTLR_EL1)
+static bool vreg_emulate_SCTLR_EL1(struct cpu_user_regs *regs, uint64_t *r,
+                                   bool read)
+{
+    struct vcpu *v = current;
+    bool cache_enabled = vcpu_has_cache_enabled(v);
+
+    GUEST_BUG_ON(read);
+    WRITE_SYSREG(*r, SCTLR_EL1);
+
+    p2m_toggle_cache(v, cache_enabled);
+
+    return true;
+}
+
  TVM_REG(TTBR0_EL1)
  TVM_REG(TTBR1_EL1)
  TVM_REG(TCR_EL1)
@@ -84,6 +97,18 @@ void do_sysreg(struct cpu_user_regs *regs,
          break;

/*

+     * HCR_EL2.TSW
+     *
+     * ARMv8 (DDI 0487B.b): Table D1-42
+     */
+    case HSR_SYSREG_DCISW:
+    case HSR_SYSREG_DCCSW:
+    case HSR_SYSREG_DCCISW:
+        if ( hsr.sysreg.read )


Shouldn't it be !hsr.sysreg.read ?


Hmmm yes.

+            p2m_set_way_flush(current);
+        break;
+
+    /*
       * HCR_EL2.TVM
       *
       * ARMv8 (DDI 0487B.b): Table D1-37
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index df6b48d73b..a3d4c563b1 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1564,6 +1564,74 @@ int p2m_cache_flush_range(struct domain *d, gfn_t start, 
gfn_t end)
      return 0;
  }

+static void p2m_flush_vm(struct vcpu *v)

+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
+
+    /* XXX: Handle preemption */


Yes, good to have this reminder. Maybe add "we'd want to break the
operation down when it takes too long".

I am planning to handle this before the series is actually merged. Thisis a massive security hole and easy to exploit.

+    p2m_cache_flush_range(v->domain, p2m->lowest_mapped_gfn,
+                          p2m->max_mapped_gfn);
+}
+
+/*
+ * See note at ARMv7 ARM B1.14.4 (DDI 0406C.c) (TL;DR: S/W ops are not
+ * easily virtualized).
+ *
+ * Main problems:
+ *  - S/W ops are local to a CPU (not broadcast)
+ *  - We have line migration behind our back (speculation)
+ *  - System caches don't support S/W at all (damn!)
+ *
+ * In the face of the above, the best we can do is to try and convert
+ * S/W ops to VA ops. Because the guest is not allowed to infer the S/W
+ * to PA mapping, it can only use S/W to nuke the whole cache, which is
+ * rather a good thing for us.
+ *
+ * Also, it is only used when turning caches on/off ("The expected
+ * usage of the cache maintenance instructions that operate by set/way
+ * is associated with the powerdown and powerup of caches, if this is
+ * required by the implementation.").
+ *
+ * We use the following policy:
+ *  - If we trap a S/W operation, we enabled VM trapping to detect
+ *  caches being turned on/off, and do a full clean.
+ *
+ *  - We flush the caches on both caches being turned on and off.
+ *
+ *  - Once the caches are enabled, we stop trapping VM ops.
+ */
+void p2m_set_way_flush(struct vcpu *v)
+{
+    /* This function can only work with the current vCPU. */
+    ASSERT(v == current);


NIT: if it can only operate on current, it makes sense to remove the
struct vcpu* parameter

I prefer to keep struct vcpu *v here. This makes more straightforward toknow on what the function is working on.

Furthermore, current is actually quite expensive to use in somecircumstance.

For instance, in nested case, TPIDR_EL2 will get trapped to the hosthypervisor.


So it would be best if we start avoid current whenever it is possible.

+    if ( !(v->arch.hcr_el2 & HCR_TVM) )
+    {
+        p2m_flush_vm(v);
+        vcpu_hcr_set_flags(v, HCR_TVM);
+    }
+}
+
+void p2m_toggle_cache(struct vcpu *v, bool was_enabled)
+{
+    bool now_enabled = vcpu_has_cache_enabled(v);
+
+    /* This function can only work with the current vCPU. */
+    ASSERT(v == current);


NIT: same about struct vcpu* as parameter when only current can be used

+    /*
+     * If switching the MMU+caches on, need to invalidate the caches.
+     * If switching it off, need to clean the caches.
+     * Clean + invalidate does the trick always.
+     */
+    if ( was_enabled != now_enabled )
+        p2m_flush_vm(v);
+
+    /* Caches are now on, stop trapping VM ops (until a S/W op) */
+    if ( now_enabled )
+        vcpu_hcr_clear_flags(v, HCR_TVM);
+}
+
  mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
  {
      return p2m_lookup(d, gfn, NULL);
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 169b57cb6b..cdc10eee5a 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -98,7 +98,7 @@ register_t get_default_hcr_flags(void)
  {
      return  (HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
               (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
-             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB);
+             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB|HCR_TSW);
  }

static enum {

diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
index 49529b97cd..dc46d9d0d7 100644
--- a/xen/arch/arm/vcpreg.c
+++ b/xen/arch/arm/vcpreg.c
@@ -45,9 +45,14 @@
  #define TVM_REG(sz, func, reg...)                                           \
  static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)    \
  {                                                                           \
+    struct vcpu *v = current;                                               \
+    bool cache_enabled = vcpu_has_cache_enabled(v);                         \
+                                                                            \
      GUEST_BUG_ON(read);                                                     \
      WRITE_SYSREG##sz(*r, reg);                                              \
                                                                              \
+    p2m_toggle_cache(v, cache_enabled);                                     \


This will affect all the registers trapped with TVM. Shouldn't we only
call p2m_toggle_cache when relevant? i.e. when changing SCTLR?
I think it would be better to only modify the SCTLR emulation handler.

This is made on purpose, you increase the chance to disable TVM as soonas possible. If you only rely on SCTLR, you may end up to trap a lot ofregisters for a long time.

FWIW, as I already wrote in the commit message, this is based on whatKVM does.

      return true;                                                            \
  }

@@ -65,6 +70,8 @@ static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read) \

  static bool vreg_emulate_##xreg(struct cpu_user_regs *regs, uint32_t *r,    \
                                  bool read, bool hi)                         \
  {                                                                           \
+    struct vcpu *v = current;                                               \
+    bool cache_enabled = vcpu_has_cache_enabled(v);                         \
      register_t reg = READ_SYSREG(xreg);                                     \
                                                                              \
      GUEST_BUG_ON(read);                                                     \
@@ -80,6 +87,8 @@ static bool vreg_emulate_##xreg(struct cpu_user_regs *regs, 
uint32_t *r,    \
      }                                                                       \
      WRITE_SYSREG(reg, xreg);                                                \
                                                                              \
+    p2m_toggle_cache(v, cache_enabled);                                     \
+                                                                            \
      return true;                                                            \
  }                                                                           \
                                                                              \
@@ -98,6 +107,7 @@ static bool vreg_emulate_##hireg(struct cpu_user_regs *regs, 
uint32_t *r,   \

/* Defining helpers for emulating co-processor registers. */

  TVM_REG32(SCTLR, SCTLR_EL1)
+


Spurious change. Should be in a previous patch?


Likely.

  /*
   * AArch32 provides two way to access TTBR* depending on the access
   * size, whilst AArch64 provides one way.
@@ -180,6 +190,19 @@ void do_cp15_32(struct cpu_user_regs *regs, const union 
hsr hsr)
          break;

/*

+     * HCR_EL2.TSW
+     *
+     * ARMv7 (DDI 0406C.b): B1.14.6
+     * ARMv8 (DDI 0487B.b): Table D1-42
+     */
+    case HSR_CPREG32(DCISW):
+    case HSR_CPREG32(DCCSW):
+    case HSR_CPREG32(DCCISW):
+        if ( !cp32.read )
+            p2m_set_way_flush(current);
+        break;
+
+    /*
       * HCR_EL2.TVM
       *
       * ARMv8 (DDI 0487B.b): Table D1-37
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 92213dd1ab..c470f062db 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -231,6 +231,10 @@ int p2m_set_entry(struct p2m_domain *p2m,
   */
  int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end);

+void p2m_set_way_flush(struct vcpu *v);

+
+void p2m_toggle_cache(struct vcpu *v, bool was_enabled);
+
  /*
   * Map a region in the guest p2m with a specific p2m type.
   * The memory attributes will be derived from the p2m type.
@@ -358,6 +362,18 @@ static inline int set_foreign_p2m_entry(struct domain *d, 
unsigned long gfn,
      return -EOPNOTSUPP;
  }

+/*

+ * A vCPU has cache enabled only when the MMU is enabled and data cache
+ * is enabled.
+ */
+static inline bool vcpu_has_cache_enabled(struct vcpu *v)
+{
+    /* Only works with the current vCPU */
+    ASSERT(current == v);


NIT: same about struct vcpu *v as parameter when only current makes
sense

+    return (READ_SYSREG32(SCTLR_EL1) & (SCTLR_C|SCTLR_M)) == (SCTLR_C|SCTLR_M);
+}
+
  #endif /* _XEN_P2M_H */

/*


Cheers,

--
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

Follow-Ups:
- Re: [Xen-devel] [RFC 15/16] xen/arm: Implement Set/Way operations
  - From: Stefano Stabellini

References:
- Re: [Xen-devel] [RFC 15/16] xen/arm: Implement Set/Way operations
  - From: Stefano Stabellini

Prev by Date: Re: [Xen-devel] [RFC 16/16] xen/arm: Track page accessed between batch of Set/Way operations
Next by Date: [Xen-devel] [libvirt test] 129434: regressions - FAIL
Previous by thread: Re: [Xen-devel] [RFC 15/16] xen/arm: Implement Set/Way operations
Next by thread: Re: [Xen-devel] [RFC 15/16] xen/arm: Implement Set/Way operations
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.