Xen project Mailing List

Re: [Xen-devel] [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings

To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Wed, 21 Jun 2017 16:19:00 +0000

Accept-language: en-GB, en-US

Cc: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Mihai Donțu <mdontu@xxxxxxxxxxxxxxx>, Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>

Delivery-date: Wed, 21 Jun 2017 16:23:52 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHS6qDT9+05+8d3RU2WKzdb/UuCBKIve7Ow

Thread-topic: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real mappings

> -----Original Message----- > From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > Sent: 21 June 2017 16:13 > To: Xen-devel <xen-devel@xxxxxxxxxxxxx> > Cc: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>; Jan Beulich > <JBeulich@xxxxxxxx>; Paul Durrant <Paul.Durrant@xxxxxxxxxx>; Razvan > Cojocaru <rcojocaru@xxxxxxxxxxxxxxx>; Mihai Donțu > <mdontu@xxxxxxxxxxxxxxx> > Subject: [PATCH 6/6] x86/hvm: Implement hvmemul_write() using real > mappings > > An access which crosses a page boundary is performed atomically by x86 > hardware, albeit with a severe performance penalty. An important corner > case > is when a straddled access hits two pages which differ in whether a > translation exists, or in net access rights. > > The use of hvm_copy*() in hvmemul_write() is problematic, because it > performs > a translation then completes the partial write, before moving onto the next > translation. > > If an individual emulated write straddles two pages, the first of which is > writable, and the second of which is not, the first half of the write will > complete before #PF is raised from the second half. > > This results in guest state corruption as a side effect of emulation, which > has been observed to cause windows to crash while under introspection. > > Introduce the hvmemul_{,un}map_linear_addr() helpers, which translate an > entire contents of a linear access, and vmap() the underlying frames to > provide a contiguous virtual mapping for the emulator to use. This is the > same mechanism as used by the shadow emulation code. > > This will catch any translation issues and abort the emulation before any > modifications occur. > > Signed-off-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > --- > CC: Jan Beulich <JBeulich@xxxxxxxx> > CC: Paul Durrant <paul.durrant@xxxxxxxxxx> > CC: Razvan Cojocaru <rcojocaru@xxxxxxxxxxxxxxx> > CC: Mihai Donțu <mdontu@xxxxxxxxxxxxxxx> > > While the maximum size of linear mapping is capped at 1 page, the logic in > the > helpers is written to work properly as hvmemul_ctxt->mfn[] gets longer, > specifically with XSAVE instruction emulation in mind. > > This has only had light testing so far. > --- > xen/arch/x86/hvm/emulate.c | 179 > ++++++++++++++++++++++++++++++++++---- > xen/include/asm-x86/hvm/emulate.h | 7 ++ > 2 files changed, 169 insertions(+), 17 deletions(-) > > diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c > index 384ad0b..804bea5 100644 > --- a/xen/arch/x86/hvm/emulate.c > +++ b/xen/arch/x86/hvm/emulate.c > @@ -498,6 +498,159 @@ static int hvmemul_do_mmio_addr(paddr_t > mmio_gpa, > } > > /* > + * Map the frame(s) covering an individual linear access, for writeable > + * access. May return NULL for MMIO, or ERR_PTR(~X86EMUL_*) for other > errors > + * including ERR_PTR(~X86EMUL_OKAY) for write-discard mappings. > + * > + * In debug builds, map() checks that each slot in hvmemul_ctxt->mfn[] is > + * clean before use, and poisions unused slots with INVALID_MFN. > + */ > +static void *hvmemul_map_linear_addr( > + unsigned long linear, unsigned int bytes, uint32_t pfec, > + struct hvm_emulate_ctxt *hvmemul_ctxt) > +{ > + struct vcpu *curr = current; > + void *err, *mapping; > + > + /* First and final gfns which need mapping. */ > + unsigned long frame = linear >> PAGE_SHIFT, first = frame; > + unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT; Do we need to worry about linear + bytes overflowing here? Also, is this ever legitimately called with bytes == 0? > + > + /* > + * mfn points to the next free slot. All used slots have a page > reference > + * held on them. > + */ > + mfn_t *mfn = &hvmemul_ctxt->mfn[0]; > + > + /* > + * The caller has no legitimate reason for trying a zero-byte write, but > + * final is calculate to fail safe in release builds. > + * > + * The maximum write size depends on the number of adjacent mfns[] > which > + * can be vmap()'d, accouting for possible misalignment within the > region. > + * The higher level emulation callers are responsible for ensuring that > + * mfns[] is large enough for the requested write size. > + */ > + if ( bytes == 0 || > + final - first > ARRAY_SIZE(hvmemul_ctxt->mfn) - 1 ) > + { I guess not, so why the weird looking calculation for final? It's value will not be used when bytes == 0. > + ASSERT_UNREACHABLE(); > + goto unhandleable; > + } > + > + do { > + enum hvm_translation_result res; > + struct page_info *page; > + pagefault_info_t pfinfo; > + p2m_type_t p2mt; > + > + res = hvm_translate_get_page(curr, frame << PAGE_SHIFT, true, pfec, > + &pfinfo, &page, NULL, &p2mt); > + > + switch ( res ) > + { > + case HVMTRANS_okay: > + break; > + > + case HVMTRANS_bad_linear_to_gfn: > + x86_emul_pagefault(pfinfo.ec, pfinfo.linear, > &hvmemul_ctxt->ctxt); > + err = ERR_PTR(~(long)X86EMUL_EXCEPTION); > + goto out; > + > + case HVMTRANS_bad_gfn_to_mfn: > + err = NULL; > + goto out; > + > + case HVMTRANS_gfn_paged_out: > + case HVMTRANS_gfn_shared: > + err = ERR_PTR(~(long)X86EMUL_RETRY); > + goto out; > + > + default: > + goto unhandleable; > + } > + > + /* Error checking. Confirm that the current slot is clean. */ > + ASSERT(mfn_x(*mfn) == 0); > + > + *mfn++ = _mfn(page_to_mfn(page)); > + frame++; > + > + if ( p2m_is_discard_write(p2mt) ) > + { > + err = ERR_PTR(~(long)X86EMUL_OKAY); > + goto out; > + } > + > + } while ( frame < final ); > + > + /* Entire access within a single frame? */ > + if ( first == final ) > + mapping = map_domain_page(hvmemul_ctxt->mfn[0]) + (linear & > ~PAGE_MASK); > + /* Multiple frames? Need to vmap(). */ > + else if ( (mapping = vmap(hvmemul_ctxt->mfn, > + mfn - hvmemul_ctxt->mfn)) == NULL ) > + goto unhandleable; > + > +#ifndef NDEBUG /* Poision unused mfn[]s with INVALID_MFN. */ > + while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) ) > + { > + ASSERT(mfn_x(*mfn) == 0); > + *mfn++ = INVALID_MFN; > + } > +#endif > + > + return mapping; > + > + unhandleable: > + err = ERR_PTR(~(long)X86EMUL_UNHANDLEABLE); > + > + out: > + /* Drop all held references. */ > + while ( mfn > hvmemul_ctxt->mfn ) > + put_page(mfn_to_page(mfn_x(*mfn--))); > + > + return err; > +} > + > +static void hvmemul_unmap_linear_addr( > + void *mapping, unsigned long linear, unsigned int bytes, > + struct hvm_emulate_ctxt *hvmemul_ctxt) > +{ > + struct domain *currd = current->domain; > + unsigned long frame = linear >> PAGE_SHIFT; > + unsigned long final = (linear + bytes - !!bytes) >> PAGE_SHIFT; > + mfn_t *mfn = &hvmemul_ctxt->mfn[0]; > + > + ASSERT(bytes > 0); Why not return if bytes == 0? I know it's not a legitimate call but in a non-debug build it would result in unmap_domain_page() being called below. Paul > + > + if ( frame == final ) > + unmap_domain_page(mapping); > + else > + vunmap(mapping); > + > + do > + { > + ASSERT(mfn_valid(*mfn)); > + paging_mark_dirty(currd, *mfn); > + put_page(mfn_to_page(mfn_x(*mfn))); > + > + frame++; > + *mfn++ = _mfn(0); /* Clean slot for map()'s error checking. */ > + > + } while ( frame < final ); > + > + > +#ifndef NDEBUG /* Check (and clean) all unused mfns. */ > + while ( mfn < hvmemul_ctxt->mfn + ARRAY_SIZE(hvmemul_ctxt->mfn) ) > + { > + ASSERT(mfn_eq(*mfn, INVALID_MFN)); > + *mfn++ = _mfn(0); > + } > +#endif > +} > + > +/* > * Convert addr from linear to physical form, valid over the range > * [addr, addr + *reps * bytes_per_rep]. *reps is adjusted according to > * the valid computed range. It is always >0 when X86EMUL_OKAY is > returned. > @@ -987,11 +1140,11 @@ static int hvmemul_write( > struct hvm_emulate_ctxt *hvmemul_ctxt = > container_of(ctxt, struct hvm_emulate_ctxt, ctxt); > struct vcpu *curr = current; > - pagefault_info_t pfinfo; > unsigned long addr, reps = 1; > uint32_t pfec = PFEC_page_present | PFEC_write_access; > struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io; > int rc; > + void *mapping; > > if ( is_x86_system_segment(seg) ) > pfec |= PFEC_implicit; > @@ -1007,23 +1160,15 @@ static int hvmemul_write( > (vio->mmio_gla == (addr & PAGE_MASK)) ) > return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, > hvmemul_ctxt, 1); > > - rc = hvm_copy_to_guest_linear(addr, p_data, bytes, pfec, &pfinfo); > - > - switch ( rc ) > - { > - case HVMTRANS_okay: > - break; > - case HVMTRANS_bad_linear_to_gfn: > - x86_emul_pagefault(pfinfo.ec, pfinfo.linear, &hvmemul_ctxt->ctxt); > - return X86EMUL_EXCEPTION; > - case HVMTRANS_bad_gfn_to_mfn: > + mapping = hvmemul_map_linear_addr(addr, bytes, pfec, > hvmemul_ctxt); > + if ( IS_ERR(mapping) ) > + return ~PTR_ERR(mapping); > + else if ( !mapping ) > return hvmemul_linear_mmio_write(addr, bytes, p_data, pfec, > hvmemul_ctxt, 0); > - case HVMTRANS_gfn_paged_out: > - case HVMTRANS_gfn_shared: > - return X86EMUL_RETRY; > - default: > - return X86EMUL_UNHANDLEABLE; > - } > + > + memcpy(mapping, p_data, bytes); > + > + hvmemul_unmap_linear_addr(mapping, addr, bytes, hvmemul_ctxt); > > return X86EMUL_OKAY; > } > diff --git a/xen/include/asm-x86/hvm/emulate.h b/xen/include/asm- > x86/hvm/emulate.h > index 8864775..65efd4e 100644 > --- a/xen/include/asm-x86/hvm/emulate.h > +++ b/xen/include/asm-x86/hvm/emulate.h > @@ -37,6 +37,13 @@ struct hvm_emulate_ctxt { > unsigned long seg_reg_accessed; > unsigned long seg_reg_dirty; > > + /* > + * MFNs behind temporary mappings in the write callback. The length is > + * arbitrary, and can be increased if writes longer than PAGE_SIZE are > + * needed. > + */ > + mfn_t mfn[2]; > + > uint32_t intr_shadow; > > bool_t set_context; > -- > 2.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.