Xen project Mailing List

Re: [PATCH v3 1/2] x86/mm: add API for marking only part of a MMIO page read only

From: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

Date: Wed, 22 May 2024 12:36:51 +0200

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx

Delivery-date: Wed, 22 May 2024 10:37:03 +0000

Feedback-id: i1568416f:Fastmail

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, May 22, 2024 at 09:52:44AM +0200, Jan Beulich wrote: > On 21.05.2024 04:54, Marek Marczykowski-Górecki wrote: > > --- a/xen/arch/x86/hvm/hvm.c > > +++ b/xen/arch/x86/hvm/hvm.c > > @@ -2009,6 +2009,14 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned > > long gla, > > goto out_put_gfn; > > } > > > > + if ( (p2mt == p2m_mmio_direct) && npfec.write_access && npfec.present > > && > > + subpage_mmio_write_accept(mfn, gla) && > > Afaics subpage_mmio_write_accept() is unreachable then when CONFIG_HVM=n? Right, the PV path hits mmio_ro_emulated_write() without my changes already. Do you suggest to make subpage_mmio_write_accept() under #ifdef CONFIG_HVM? > > + (hvm_emulate_one_mmio(mfn_x(mfn), gla) == X86EMUL_OKAY) ) > > + { > > + rc = 1; > > + goto out_put_gfn; > > + } > > Overall this new if() is pretty similar to the immediate preceding one. > So similar that I wonder whether the two shouldn't be folded. I can do that if you prefer. > In fact > it looks as if the new one is needed only for the case where you'd pass > through (to a DomU) a device partially used by Xen. That could certainly > do with mentioning explicitly. Well, the change in mmio_ro_emulated_write() is relevant to both dom0 and domU. It simply wasn't reachable (in this case) for HVM domU before (but was for PV already). > > +static void __iomem *subpage_mmio_get_page(struct subpage_ro_range *entry) > > Considering what the function does and what it returns, perhaps better > s/get/map/? The "get_page" part of the name generally has a different > meaning in Xen's memory management. Ok. > > +{ > > + void __iomem *mapped_page; > > + > > + if ( entry->mapped ) > > + return entry->mapped; > > + > > + mapped_page = ioremap(mfn_x(entry->mfn) << PAGE_SHIFT, PAGE_SIZE); > > + > > + spin_lock(&subpage_ro_lock); > > + /* Re-check under the lock */ > > + if ( entry->mapped ) > > + { > > + spin_unlock(&subpage_ro_lock); > > + iounmap(mapped_page); > > The only unmap is on an error path here and on another error path elsewhere. > IOW it looks as if devices with such marked pages are meant to never be hot > unplugged. I can see that being intentional for the XHCI console, but imo > such a restriction also needs prominently calling out in a comment next to > e.g. the function declaration. The v1 included subpage_mmio_ro_remove() function (which would need to be used in case of hot-unplug of such device, if desirable), but since this series doesn't introduce any use of it (as you say, it isn't desirable for XHCI console specifically), you asked me to remove it... Should I add an explicit comment about the limitation, instead of having it implicit by not having subpage_mmio_ro_remove() there? > > + return entry->mapped; > > + } > > + > > + entry->mapped = mapped_page; > > + spin_unlock(&subpage_ro_lock); > > + return entry->mapped; > > +} > > + > > +static void subpage_mmio_write_emulate( > > + mfn_t mfn, > > + unsigned int offset, > > + const void *data, > > + unsigned int len) > > +{ > > + struct subpage_ro_range *entry; > > + void __iomem *addr; > > Wouldn't this better be pointer-to-volatile, with ... > > > + list_for_each_entry(entry, &subpage_ro_ranges, list) > > + { > > + if ( mfn_eq(entry->mfn, mfn) ) > > + { > > + if ( test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, > > entry->ro_qwords) ) > > + { > > + write_ignored: > > + gprintk(XENLOG_WARNING, > > + "ignoring write to R/O MMIO 0x%"PRI_mfn"%03x len > > %u\n", > > + mfn_x(mfn), offset, len); > > + return; > > + } > > + > > + addr = subpage_mmio_get_page(entry); > > + if ( !addr ) > > + { > > + gprintk(XENLOG_ERR, > > + "Failed to map page for MMIO write at > > 0x%"PRI_mfn"%03x\n", > > + mfn_x(mfn), offset); > > + return; > > + } > > + > > + switch ( len ) > > + { > > + case 1: > > + writeb(*(const uint8_t*)data, addr); > > + break; > > + case 2: > > + writew(*(const uint16_t*)data, addr); > > + break; > > + case 4: > > + writel(*(const uint32_t*)data, addr); > > + break; > > + case 8: > > + writeq(*(const uint64_t*)data, addr); > > + break; > > ... this being how it's written? (If so, volatile suitably carried through to > other places as well.) > > > + default: > > + /* mmio_ro_emulated_write() already validated the size */ > > + ASSERT_UNREACHABLE(); > > + goto write_ignored; > > + } > > + return; > > + } > > + } > > + /* Do not print message for pages without any writable parts. */ > > +} > > + > > +bool subpage_mmio_write_accept(mfn_t mfn, unsigned long gla) > > +{ > > + unsigned int offset = PAGE_OFFSET(gla); > > + const struct subpage_ro_range *entry; > > + > > + list_for_each_entry_rcu(entry, &subpage_ro_ranges, list) > > Considering the other remark about respective devices impossible to go > away, is the RCU form here really needed? Its use gives the (false) > impression of entry removal being possible. Right, I forgot to change this one. > > + if ( mfn_eq(entry->mfn, mfn) && > > + !test_bit(offset / SUBPAGE_MMIO_RO_ALIGN, entry->ro_qwords) ) > > Btw, "qwords" in the field name is kind of odd when SUBPAGE_MMIO_RO_ALIGN > in principle suggests that changing granularity ought to be possible by > simply adjusting that #define. Maybe "->ro_elems"? Makes sense. > > --- a/xen/arch/x86/pv/ro-page-fault.c > > +++ b/xen/arch/x86/pv/ro-page-fault.c > > @@ -330,6 +330,7 @@ static int mmio_ro_do_page_fault(struct > > x86_emulate_ctxt *ctxt, > > return X86EMUL_UNHANDLEABLE; > > } > > > > + mmio_ro_ctxt.mfn = mfn; > > ctxt->data = &mmio_ro_ctxt; > > if ( pci_ro_mmcfg_decode(mfn_x(mfn), &mmio_ro_ctxt.seg, > > &mmio_ro_ctxt.bdf) ) > > return x86_emulate(ctxt, &mmcfg_intercept_ops); > > Wouldn't you better set .mfn only on the "else" path, just out of context? > Suggesting that the new field in the struct could actually overlay the > (seg,bdf) tuple (being of relevance only to MMCFG intercept handling). > This would be more for documentation purposes than to actually save space. > (If so, perhaps the "else" itself would also better be dropped while making > the adjustment.) I can do that if you prefer. But personally, I find such such use of an union risky (without some means for a compiler to actually enforce their proper use) - while for correct code it may save some space, it makes the impact of a type confusion bug potentially worse - now that the unexpected value would be potentially attacker controlled. For a documentation purpose I can simply add a comment. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab

Attachment: signature.asc
Description: PGP signature

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.