[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen: arm: force guest memory accesses to cacheable when MMU is disabled



On Tue, 7 Jan 2014, Ian Campbell wrote:
> On ARM guest OSes are started with MMU and Caches disables (as they are on
> native) however caching is enabled in the domain running the builder and
> therefore we must ensure cache consistency.
> 
> The existing solution to this problem (a0035ecc0d82 "tools: libxc: flush data
> cache after loading images into guest memory") is to flush the caches after
> loading the various blobs into guest RAM. However this approach has two short
> comings:
> 
>  - The cache flush primitives available to userspace on arm32 are not
>    sufficient for our needs.
>  - There is a race between the cache flush and the unmap of the guest page
>    where the processor might speculatively dirty the cache line again.
> 
> (of these the second is the more fundamental)
> 
> This patch makes use of the the hardware functionality to force all accesses
> made from guest mode to be cached (the HCR.DC == default cached bit). This
> means that we don't need to worry about the domain builder's writes being
> cached because the guests "uncached" accesses will actually be cached.
> 
> Unfortunately the use of HCR.DC is incompatible with the guest enabling its
> MMU (SCTLR.M bit). Therefore we must trap accesses to the SCTLR so that we can
> detect when this happens and disable HCR.DC. This is done with the HCR.TVM
> (trap virtual memory controls) bit which also causes various other registers
> to be trapped, all of which can be passed straight through to the underlying
> register. Once the guest has enabled its MMU we no longer need to trap so
> there is no ongoing overhead. In my tests Linux makes about half a dozen
> accesses to these registers before the MMU is enabled, I would expect other
> OSes to behave similarly (the sequence of writes needed to setup the MMU is
> pretty obvious).
> 
> Apart from this unfortunate need to trap these accesses this approach is
> incompatible with guests which attempt to do DMA operations with their MMU
> disabled. In practice this means guests with passthrough which we do not yet
> support. Since a typical guest (including dom0) does not access devices which
> require DMA until after it is fully up and running with paging enabled the
> main risk is to in-guest firmware which does DMA i.e. running EFI in a guest,
> with a disk passed through and booting from that disk. Since we know that dom0
> is not using any such firmware and we do not support device passthrough to
> guests yet we can live with this restriction. Once passthrough is implemented
> this will need to be revisited.
> 
> The patch includes a couple of seemingly unrelated but necessary changes:
> 
>  - HSR_SYSREG_CRN_MASK was incorrectly defined, which happened to be benign
>    with the existing set of system register we handled, but broke with the new
>    ones introduced here.
>  - The defines used to decode the HSR system register fields were named the
>    same as the register. This breaks the accessor macros. This had gone
>    unnoticed because the handling of the existing trapped registers did not
>    require accessing the underlying hardware register. Rename those constants
>    with an HSR_SYSREG prefix (in line with HSR_CP32/64 for 32-bit registers).
> 
> This patch has survived thousands of boot loops on a Midway system.
> 
> Signed-off-by: Ian Campbell <ian.campbell@xxxxxxxxxx>
> ---
> My preferred solution here would be for the tools to use an uncached mapping
> of guest memory when building the guest, which requires adding a new privmcd
> ioctl (a relatively straightforward patch) and plumbing a "cached" flag
> through the libxc foreign mapping interfaces (a twisty maze of passage, all
> alike).  IMO the libxc side of this patch was not looking suitable for a 4.4
> freeze exception, since it was quite large (because we have 4 or more mapping
> interfaces in libxc, some of which call back into others).
> 
> So I propose this version for 4.4. The uncached mapping solution should be
> revisited for a future release.
> 
> At the risk of appearing to be going mad:
> 
> <speaking hat="submitter">
> 
> This bug results in memory corruption bugs in the guest, which mostly manifest
> as a failure to boot the guest (subsequent bad behaviour is possible but, I
> think, unlikely) the frequency of failures is perhaps 1/10 times. This would
> not constitute an awesome release.
> 
> Although the patch is large most of it is repetative and mechanical (made
> explicit through the use of macros in many cases). The biggest risk is that
> one of the registers is not passed through correctly (i.e. the wrong size or
> target registers). The ones which Linux uses have been tested and appear to
> function OK.  The others might be buggy but this is mitigated through the use
> of the same set of macros.
> 
> I think the chance of the patch having a bug wrt my understanding of the
> hardware behaviour is pretty low. WRT there being bugs in my understanding of
> the hardware documentation, I would say middle to low, but I have discussed it
> with some folks at ARM and they didn't call me an idiot (in fact pretty much
> the same thing has been proposed for KVM).
> 
> Overall I think the benefits outweigh the risks.
> 
> One thing I'm not sure about is reverting the previous fix in a0035ecc0d82.
> It's reasonably recent so reverting it takes us back to a pretty well
> understood state in the libraries. The functionality is harmless if
> incomplete. I think given the first argument I would lean towards reverting.
> 
> </speaking>
> 
> <speaking hat="stand in release manager">
> 
> OK.
> 
> </speaking>
> 
> Obviously if you think I'm being to easy on myself please say so.
> 
> Actually, if you think my judgement is right I'd appreciate being told so too.
> ---
>  xen/arch/arm/domain.c           |    5 ++
>  xen/arch/arm/domain_build.c     |    1 +
>  xen/arch/arm/traps.c            |  153 
> ++++++++++++++++++++++++++++++++++++++-
>  xen/arch/arm/vtimer.c           |    6 +-
>  xen/include/asm-arm/cpregs.h    |    4 +
>  xen/include/asm-arm/processor.h |    2 +-
>  xen/include/asm-arm/sysregs.h   |   19 ++++-
>  7 files changed, 182 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 124cccf..104d228 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -219,6 +219,11 @@ static void ctxt_switch_to(struct vcpu *n)
>      else
>          hcr |= HCR_RW;
>  
> +    if ( n->arch.sctlr & SCTLR_M )
> +        hcr &= ~(HCR_TVM|HCR_DC);
> +    else
> +        hcr |= (HCR_TVM|HCR_DC);
> +
>      WRITE_SYSREG(hcr, HCR_EL2);
>      isb();

Is this actually needed? Shouldn't HCR be already correctly updated by
update_sctlr?


> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index 47b781b..bb31db8 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -1026,6 +1026,7 @@ int construct_dom0(struct domain *d)
>      else
>          WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_RW, HCR_EL2);
>  #endif
> +    WRITE_SYSREG(READ_SYSREG(HCR_EL2) | HCR_DC | HCR_TVM, HCR_EL2);
>  
>      /*
>       * kernel_load will determine the placement of the initrd & fdt in
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 7c5ab19..d00bba3 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -1279,6 +1279,23 @@ static void advance_pc(struct cpu_user_regs *regs, 
> union hsr hsr)
>      regs->pc += hsr.len ? 4 : 2;
>  }
>  
> +static void update_sctlr(uint32_t v)
> +{
> +    /*
> +     * If MMU (SCTLR_M) is now enabled then we must disable HCR.DC
> +     * because they are incompatible.
> +     *
> +     * Once HCR.DC is disabled then we do not need HCR_TVM either,
> +     * since it's only purpose was to catch the MMU being enabled.
> +     *
> +     * Both are set appropriately on context switch but we need to
> +     * clear them now since we may not context switch on return to
> +     * guest.
> +     */
> +    if ( v & SCTLR_M )
> +        WRITE_SYSREG(READ_SYSREG(HCR_EL2) & ~(HCR_DC|HCR_TVM), HCR_EL2);
> +}
> +
>  static void do_cp15_32(struct cpu_user_regs *regs,
>                         union hsr hsr)
>  {
> @@ -1338,6 +1355,89 @@ static void do_cp15_32(struct cpu_user_regs *regs,
>          if ( cp32.read )
>             *r = v->arch.actlr;
>          break;
> +
> +/* Passthru a 32-bit AArch32 register which is also 32-bit under AArch64 */
> +#define CP32_PASSTHRU32(R...) do {              \
> +    if ( cp32.read )                            \
> +        *r = READ_SYSREG32(R);                  \
> +    else                                        \
> +        WRITE_SYSREG32(*r, R);                  \
> +} while(0)
> +
> +/*
> + * Passthru a 32-bit AArch32 register which is 64-bit under AArch64.
> + * Updates the lower 32-bits and clears the upper bits.
> + */
> +#define CP32_PASSTHRU64(R...) do {              \
> +    if ( cp32.read )                            \
> +        *r = (uint32_t)READ_SYSREG64(R);        \
> +    else                                        \
> +        WRITE_SYSREG64((uint64_t)*r, R);        \
> +} while(0)

Can/Should CP32_PASSTHRU64_LO be used instead of this?


> +/*
> + * Passthru a 32-bit AArch32 register which is 64-bit under AArch64.
> + * Updates either the HI ([63:32]) or LO ([31:0]) 32-bits preserving
> + * the other half.
> + */
> +#ifdef CONFIG_ARM_64
> +#define CP32_PASSTHRU64_HI(R...) do {                   \
> +    if ( cp32.read )                                    \
> +        *r = (uint32_t)(READ_SYSREG64(R) >> 32);        \
> +    else                                                \
> +    {                                                   \
> +        uint64_t t = READ_SYSREG64(R) & 0xffffffffUL;   \
> +        t |= ((uint64_t)(*r)) << 32;                    \
> +        WRITE_SYSREG64(t, R);                           \
> +    }                                                   \
> +} while(0)
> +#define CP32_PASSTHRU64_LO(R...) do {                           \
> +    if ( cp32.read )                                            \
> +        *r = (uint32_t)(READ_SYSREG64(R) & 0xffffffff);         \
> +    else                                                        \
> +    {                                                           \
> +        uint64_t t = READ_SYSREG64(R) & 0xffffffff00000000UL;   \
> +        t |= *r;                                                \
> +        WRITE_SYSREG64(t, R);                                   \
> +    }                                                           \
> +} while(0)
> +#endif
> +    /* HCR.TVM */
> +    case HSR_CPREG32(SCTLR):
> +        CP32_PASSTHRU32(SCTLR_EL1);
> +        update_sctlr(*r);
> +        break;
> +    case HSR_CPREG32(TTBR0_32):   CP32_PASSTHRU64(TTBR0_EL1);      break;
> +    case HSR_CPREG32(TTBR1_32):   CP32_PASSTHRU64(TTBR1_EL1);      break;
> +    case HSR_CPREG32(TTBCR):      CP32_PASSTHRU32(TCR_EL1);        break;
> +    case HSR_CPREG32(DACR):       CP32_PASSTHRU32(DACR32_EL2);     break;
> +    case HSR_CPREG32(DFSR):       CP32_PASSTHRU32(ESR_EL1);        break;
> +    case HSR_CPREG32(IFSR):       CP32_PASSTHRU32(IFSR32_EL2);     break;
> +    case HSR_CPREG32(ADFSR):      CP32_PASSTHRU32(AFSR0_EL1);      break;
> +    case HSR_CPREG32(AIFSR):      CP32_PASSTHRU32(AFSR1_EL1);      break;
> +    case HSR_CPREG32(CONTEXTIDR): CP32_PASSTHRU32(CONTEXTIDR_EL1); break;
> +
> +#ifdef CONFIG_ARM_64
> +    case HSR_CPREG32(DFAR):       CP32_PASSTHRU64_LO(FAR_EL1);     break;
> +    case HSR_CPREG32(IFAR):       CP32_PASSTHRU64_HI(FAR_EL1);     break;
> +    case HSR_CPREG32(MAIR0):      CP32_PASSTHRU64_LO(MAIR_EL1);    break;
> +    case HSR_CPREG32(MAIR1):      CP32_PASSTHRU64_HI(MAIR_EL1);    break;
> +    case HSR_CPREG32(AMAIR0):     CP32_PASSTHRU64_LO(AMAIR_EL1);   break;
> +    case HSR_CPREG32(AMAIR1):     CP32_PASSTHRU64_HI(AMAIR_EL1);   break;
> +#else
> +    case HSR_CPREG32(DFAR):       CP32_PASSTHRU32(DFAR);           break;
> +    case HSR_CPREG32(IFAR):       CP32_PASSTHRU32(IFAR);           break;
> +    case HSR_CPREG32(MAIR0):      CP32_PASSTHRU32(MAIR0);          break;
> +    case HSR_CPREG32(MAIR1):      CP32_PASSTHRU32(MAIR1);          break;
> +    case HSR_CPREG32(AMAIR0):     CP32_PASSTHRU32(AMAIR0);         break;
> +    case HSR_CPREG32(AMAIR1):     CP32_PASSTHRU32(AMAIR1);         break;
> +#endif
> +
> +#undef CP32_PASSTHRU32
> +#undef CP32_PASSTHRU64
> +#undef CP32_PASSTHRU64_LO
> +#undef CP32_PASSTHRU64_HI
>      default:
>          printk("%s p15, %d, r%d, cr%d, cr%d, %d @ 0x%"PRIregister"\n",
>                 cp32.read ? "mrc" : "mcr",
> @@ -1351,6 +1451,9 @@ static void do_cp15_64(struct cpu_user_regs *regs,
>                         union hsr hsr)
>  {
>      struct hsr_cp64 cp64 = hsr.cp64;
> +    uint32_t *r1 = (uint32_t *)select_user_reg(regs, cp64.reg1);
> +    uint32_t *r2 = (uint32_t *)select_user_reg(regs, cp64.reg2);
> +    uint64_t r;
>  
>      if ( !check_conditional_instr(regs, hsr) )
>      {
> @@ -1368,6 +1471,26 @@ static void do_cp15_64(struct cpu_user_regs *regs,
>              domain_crash_synchronous();
>          }
>          break;
> +
> +#define CP64_PASSTHRU(R...) do {                                  \
> +    if ( cp64.read )                                            \
> +    {                                                           \
> +        r = READ_SYSREG64(R);                                   \
> +        *r1 = r & 0xffffffffUL;                                 \
> +        *r2 = r >> 32;                                          \

it doesn't look like r, r1 and r2 are used anywhere


> +    }                                                           \
> +    else                                                        \
> +    {                                                           \
> +        r = (*r1) | (((uint64_t)(*r2))<<32);                    \
> +        WRITE_SYSREG64(r, R);                                   \
> +    }                                                           \
> +} while(0)
> +
> +    case HSR_CPREG64(TTBR0): CP64_PASSTHRU(TTBR0_EL1); break;
> +    case HSR_CPREG64(TTBR1): CP64_PASSTHRU(TTBR1_EL1); break;
> +
> +#undef CP64_PASSTHRU
> +
>      default:
>          printk("%s p15, %d, r%d, r%d, cr%d @ 0x%"PRIregister"\n",
>                 cp64.read ? "mrrc" : "mcrr",
> @@ -1382,11 +1505,12 @@ static void do_sysreg(struct cpu_user_regs *regs,
>                        union hsr hsr)
>  {
>      struct hsr_sysreg sysreg = hsr.sysreg;
> +    register_t *x = select_user_reg(regs, sysreg.reg);
>  
>      switch ( hsr.bits & HSR_SYSREG_REGS_MASK )
>      {
> -    case CNTP_CTL_EL0:
> -    case CNTP_TVAL_EL0:
> +    case HSR_SYSREG_CNTP_CTL_EL0:
> +    case HSR_SYSREG_CNTP_TVAL_EL0:
>          if ( !vtimer_emulate(regs, hsr) )
>          {
>              dprintk(XENLOG_ERR,
> @@ -1394,6 +1518,31 @@ static void do_sysreg(struct cpu_user_regs *regs,
>              domain_crash_synchronous();
>          }
>          break;
> +
> +#define SYSREG_PASSTHRU(R...) do {              \
> +    if ( sysreg.read )                          \
> +        *x = READ_SYSREG(R);                    \
> +    else                                        \
> +        WRITE_SYSREG(*x, R);                    \
> +} while(0)
> +
> +    case HSR_SYSREG_SCTLR_EL1:
> +        SYSREG_PASSTHRU(SCTLR_EL1);
> +        update_sctlr(*x);
> +        break;
> +    case HSR_SYSREG_TTBR0_EL1:      SYSREG_PASSTHRU(TTBR0_EL1);      break;
> +    case HSR_SYSREG_TTBR1_EL1:      SYSREG_PASSTHRU(TTBR1_EL1);      break;
> +    case HSR_SYSREG_TCR_EL1:        SYSREG_PASSTHRU(TCR_EL1);        break;
> +    case HSR_SYSREG_ESR_EL1:        SYSREG_PASSTHRU(ESR_EL1);        break;
> +    case HSR_SYSREG_FAR_EL1:        SYSREG_PASSTHRU(FAR_EL1);        break;
> +    case HSR_SYSREG_AFSR0_EL1:      SYSREG_PASSTHRU(AFSR0_EL1);      break;
> +    case HSR_SYSREG_AFSR1_EL1:      SYSREG_PASSTHRU(AFSR1_EL1);      break;
> +    case HSR_SYSREG_MAIR_EL1:       SYSREG_PASSTHRU(MAIR_EL1);       break;
> +    case HSR_SYSREG_AMAIR_EL1:      SYSREG_PASSTHRU(AMAIR_EL1);      break;
> +    case HSR_SYSREG_CONTEXTIDR_EL1: SYSREG_PASSTHRU(CONTEXTIDR_EL1); break;
> +
> +#undef SYSREG_PASSTHRU
> +
>      default:
>          printk("%s %d, %d, c%d, c%d, %d %s x%d @ 0x%"PRIregister"\n",
>                 sysreg.read ? "mrs" : "msr",
> diff --git a/xen/arch/arm/vtimer.c b/xen/arch/arm/vtimer.c
> index 433ad55..e325f78 100644
> --- a/xen/arch/arm/vtimer.c
> +++ b/xen/arch/arm/vtimer.c
> @@ -240,18 +240,18 @@ static int vtimer_emulate_sysreg(struct cpu_user_regs 
> *regs, union hsr hsr)
>  
>      switch ( hsr.bits & HSR_SYSREG_REGS_MASK )
>      {
> -    case CNTP_CTL_EL0:
> +    case HSR_SYSREG_CNTP_CTL_EL0:
>          vtimer_cntp_ctl(regs, &r, sysreg.read);
>          if ( sysreg.read )
>              *x = r;
>          return 1;
> -    case CNTP_TVAL_EL0:
> +    case HSR_SYSREG_CNTP_TVAL_EL0:
>          vtimer_cntp_tval(regs, &r, sysreg.read);
>          if ( sysreg.read )
>              *x = r;
>          return 1;
>  
> -    case HSR_CPREG64(CNTPCT):
> +    case HSR_SYSREG_CNTPCT_EL0:
>          return vtimer_cntpct(regs, x, sysreg.read);
>  
>      default:
> diff --git a/xen/include/asm-arm/cpregs.h b/xen/include/asm-arm/cpregs.h
> index f0f1d53..508467a 100644
> --- a/xen/include/asm-arm/cpregs.h
> +++ b/xen/include/asm-arm/cpregs.h
> @@ -121,6 +121,8 @@
>  #define TTBR0           p15,0,c2        /* Translation Table Base Reg. 0 */
>  #define TTBR1           p15,1,c2        /* Translation Table Base Reg. 1 */
>  #define HTTBR           p15,4,c2        /* Hyp. Translation Table Base 
> Register */
> +#define TTBR0_32        p15,0,c2,c0,0   /* 32-bit access to TTBR0 */
> +#define TTBR1_32        p15,0,c2,c0,1   /* 32-bit access to TTBR1 */
>  #define HTCR            p15,4,c2,c0,2   /* Hyp. Translation Control Register 
> */
>  #define VTCR            p15,4,c2,c1,2   /* Virtualization Translation 
> Control Register */
>  #define VTTBR           p15,6,c2        /* Virtualization Translation Table 
> Base Register */
> @@ -260,7 +262,9 @@
>  #define CPACR_EL1               CPACR
>  #define CSSELR_EL1              CSSELR
>  #define DACR32_EL2              DACR
> +#define ESR_EL1                 DFSR
>  #define ESR_EL2                 HSR
> +#define FAR_EL1                 HIFAR
>  #define FAR_EL2                 HIFAR
>  #define HCR_EL2                 HCR
>  #define HPFAR_EL2               HPFAR
> diff --git a/xen/include/asm-arm/processor.h b/xen/include/asm-arm/processor.h
> index dfe807d..06e638f 100644
> --- a/xen/include/asm-arm/processor.h
> +++ b/xen/include/asm-arm/processor.h
> @@ -342,7 +342,7 @@ union hsr {
>  #define HSR_SYSREG_OP0_SHIFT (20)
>  #define HSR_SYSREG_OP1_MASK (0x0001c000)
>  #define HSR_SYSREG_OP1_SHIFT (14)
> -#define HSR_SYSREG_CRN_MASK (0x00003800)
> +#define HSR_SYSREG_CRN_MASK (0x00003c00)
>  #define HSR_SYSREG_CRN_SHIFT (10)
>  #define HSR_SYSREG_CRM_MASK (0x0000001e)
>  #define HSR_SYSREG_CRM_SHIFT (1)
> diff --git a/xen/include/asm-arm/sysregs.h b/xen/include/asm-arm/sysregs.h
> index 48ad07e..0cee0e9 100644
> --- a/xen/include/asm-arm/sysregs.h
> +++ b/xen/include/asm-arm/sysregs.h
> @@ -40,8 +40,23 @@
>      ((__HSR_SYSREG_##crm) << HSR_SYSREG_CRM_SHIFT) | \
>      ((__HSR_SYSREG_##op2) << HSR_SYSREG_OP2_SHIFT)
>  
> -#define CNTP_CTL_EL0  HSR_SYSREG(3,3,c14,c2,1)
> -#define CNTP_TVAL_EL0 HSR_SYSREG(3,3,c14,c2,0)
> +#define HSR_SYSREG_SCTLR_EL1      HSR_SYSREG(3,0,c1, c0,0)
> +#define HSR_SYSREG_TTBR0_EL1      HSR_SYSREG(3,0,c2, c0,0)
> +#define HSR_SYSREG_TTBR1_EL1      HSR_SYSREG(3,0,c2, c0,1)
> +#define HSR_SYSREG_TCR_EL1        HSR_SYSREG(3,0,c2, c0,2)
> +#define HSR_SYSREG_AFSR0_EL1      HSR_SYSREG(3,0,c5, c1,0)
> +#define HSR_SYSREG_AFSR1_EL1      HSR_SYSREG(3,0,c5, c1,1)
> +#define HSR_SYSREG_ESR_EL1        HSR_SYSREG(3,0,c5, c2,0)
> +#define HSR_SYSREG_FAR_EL1        HSR_SYSREG(3,0,c6, c0,0)
> +#define HSR_SYSREG_MAIR_EL1       HSR_SYSREG(3,0,c10,c2,0)
> +#define HSR_SYSREG_AMAIR_EL1      HSR_SYSREG(3,0,c10,c3,0)
> +#define HSR_SYSREG_CONTEXTIDR_EL1 HSR_SYSREG(3,0,c13,c0,1)
> +
> +#define HSR_SYSREG_CNTPCT_EL0     HSR_SYSREG(3,3,c14,c0,0)
> +#define HSR_SYSREG_CNTP_CTL_EL0   HSR_SYSREG(3,3,c14,c2,1)
> +#define HSR_SYSREG_CNTP_TVAL_EL0  HSR_SYSREG(3,3,c14,c2,0)
> +
> +
>  #endif
>  
>  #endif
> -- 
> 1.7.10.4
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.