Xen project Mailing List

[Xen-devel] Xen Security Advisory 154 (CVE-2016-2270) - x86: inconsistent cachability flags on guest mappings

From: David Woodhouse <dwmw2@xxxxxxxxxxxxx>

Date: Wed, 25 Jan 2017 14:08:49 +0000

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, "H. Peter Anvin" <h.peter.anvin@xxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>

Delivery-date: Wed, 25 Jan 2017 14:09:11 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

> x86: enforce consistent cachability of MMIO mappings > > We've been told by Intel that inconsistent cachability between > multiple mappings of the same page can affect system stability only > when the affected page is an MMIO one. Since the stale data issue is > of no relevance to the hypervisor (since all guest memory accesses go > through proper accessors and validation), handling of RAM pages > remains unchanged here. Any MMIO mapped by domains however needs to be > done consistently (all cachable mappings or all uncachable ones), in > order to avoid Machine Check exceptions. Since converting existing > cachable mappings to uncachable (at the time an uncachable mapping > gets established) would in the PV case require tracking all mappings, > allow MMIO to only get mapped uncachable (UC, UC-, or WC). > > This also implies that in the PV case we mustn't use the L1 PTE update > fast path when cachability flags get altered. > > Since in the HVM case at least for now we want to continue honoring > pinned cachability attributes for pages not mapped by the hypervisor, > special case handling of r/o MMIO pages (forcing UC) gets added there. > Arguably the counterpart change to p2m-pt.c may not be necessary, since > UC- (which already gets enforced there) is probably strict enough. > > Note that the shadow code changes include fixing the write protection > of r/o MMIO ranges: shadow_l1e_remove_flags() and its siblings, other > than l1e_remove_flags() and alike, return the new PTE (and hence > ignoring their return values makes them no-ops). > > This is CVE-2016-2270 / XSA-154. > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > Acked-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> ... > --- a/xen/arch/x86/hvm/mtrr.c > +++ b/xen/arch/x86/hvm/mtrr.c > @@ -770,8 +770,17 @@ int epte_get_entry_emt(struct domain *d, > if ( v->domain != d ) > v = d->vcpu ? d->vcpu[0] : NULL; > > - if ( !mfn_valid(mfn_x(mfn)) ) > + if ( !mfn_valid(mfn_x(mfn)) || > + rangeset_contains_range(mmio_ro_ranges, mfn_x(mfn), > + mfn_x(mfn) + (1UL < order) - 1) ) > + { > + *ipat = 1; > return MTRR_TYPE_UNCACHABLE; > + } > + > + if ( rangeset_overlaps_range(mmio_ro_ranges, mfn_x(mfn), > + mfn_x(mfn) + (1UL < order) - 1) ) > + return -1; > > switch ( hvm_get_mem_pinned_cacheattr(d, gfn, order, &type) ) > { This doesn't look right. That second 'if(rangeset_overlaps_range(…))' is tautologically false, because if it *is* true, the first if() statement happens first and it's never reached. The reason I'm looking is because that first if() statement is happening for MMIO regions where it probably shouldn't. This means that guests are mapping MMIO BARs of assigned devices and getting *forced* UC (because *ipat=1) instead of taking the if(direct_mmio) path slightly further down — which wouldn't set the 'ignore PAT' bit, and would thus allow them to enable WC through their PAT. It makes me wonder if the first was actually intended to be '!mfn_valid() && rangeset_contains_range(…)' — with logical && rather than ||. That would make some sense because it's then explicitly refusing to map pages which are in mmio_ro_ranges *and* mfn_valid(). And then there's a 'if (direct_mmio) return UC;' further down which looks like it'd do the right thing for the use case I'm actually testing. I may see if I can construct a straw man patch, but I'm kind of unfamiliar with this code so it should be taken with a large pinch of salt... There is a separate question of whether it's actually safe to let the guest map an MMIO page with both UC and WC simultaneously. Empirically, it seems to be OK — I hacked a guest kernel not to enforce the mutual exclusion, mapped the BAR with both UC and WC and ran two kernel threads, reading and writing the whole BAR in a number of iterations. The WC thread went a lot faster than the UC one, so it will have often been touching the same locations as the UC thread as it 'overtook' it, and nothing bad happened. This seems reasonable, as the dire warnings and machine checks are more about *cached* vs. uncached mappings, not WC vs. UC. But it would be good to have a definitive answer from Intel and AMD about whether it's safe. -- dwmw2 ¹ Or is that the problem — is mfn_valid() supposed to be true for MMIO BARs, and this is actually an SR-IOV problem with resource assignment failing to do that for VFs?

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.