[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] [RFC] vpci: allow BAR write while mapped
On 3/13/25 11:14, Alejandro Vallejo wrote: > On Wed Mar 12, 2025 at 7:50 PM GMT, Stewart Hildebrand wrote: >> Xen vPCI refuses BAR writes if the BAR is mapped in p2m. If firmware >> initialized the BAR to a bad address, Linux will try to write a new >> address to the BAR without disabling memory decoding. Allow the write >> by updating p2m right away in the vPCI BAR write handler. >> >> Resolves: https://gitlab.com/xen-project/xen/-/issues/197 >> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@xxxxxxx> >> --- >> RFC: Currently the deferred mapping machinery supports only map or >> unmap, not both. It might be better to rework the mapping machinery >> to support unmap-then-map operations, but please let me know your >> thoughts. >> RFC: This patch has not yet made an attempt to distinguish between >> 32-bit and 64-bit writes. It probably should. >> --- >> xen/drivers/vpci/header.c | 65 +++++++++++++++++++++++++++++++-------- >> 1 file changed, 53 insertions(+), 12 deletions(-) >> >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >> index ef6c965c081c..66adb2183cfe 100644 >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -173,7 +173,7 @@ static void modify_decoding(const struct pci_dev *pdev, >> uint16_t cmd, >> ASSERT_UNREACHABLE(); >> } >> >> -bool vpci_process_pending(struct vcpu *v) >> +static bool process_pending(struct vcpu *v, bool need_lock) >> { >> struct pci_dev *pdev = v->vpci.pdev; >> struct vpci_header *header = NULL; >> @@ -182,12 +182,14 @@ bool vpci_process_pending(struct vcpu *v) >> if ( !pdev ) >> return false; >> >> - read_lock(&v->domain->pci_lock); >> + if ( need_lock ) >> + read_lock(&v->domain->pci_lock); >> >> if ( !pdev->vpci || (v->domain != pdev->domain) ) >> { >> v->vpci.pdev = NULL; >> - read_unlock(&v->domain->pci_lock); >> + if ( need_lock ) >> + read_unlock(&v->domain->pci_lock); >> return false; >> } >> >> @@ -209,17 +211,20 @@ bool vpci_process_pending(struct vcpu *v) >> >> if ( rc == -ERESTART ) >> { >> - read_unlock(&v->domain->pci_lock); >> + if ( need_lock ) >> + read_unlock(&v->domain->pci_lock); >> return true; >> } >> >> if ( rc ) >> { >> - spin_lock(&pdev->vpci->lock); >> + if ( need_lock ) >> + spin_lock(&pdev->vpci->lock); >> /* Disable memory decoding unconditionally on failure. */ >> modify_decoding(pdev, v->vpci.cmd & ~PCI_COMMAND_MEMORY, >> false); >> - spin_unlock(&pdev->vpci->lock); >> + if ( need_lock ) >> + spin_unlock(&pdev->vpci->lock); >> >> /* Clean all the rangesets */ >> for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) >> @@ -228,7 +233,8 @@ bool vpci_process_pending(struct vcpu *v) >> >> v->vpci.pdev = NULL; >> >> - read_unlock(&v->domain->pci_lock); >> + if ( need_lock ) >> + read_unlock(&v->domain->pci_lock); >> >> if ( !is_hardware_domain(v->domain) ) >> domain_crash(v->domain); >> @@ -238,15 +244,23 @@ bool vpci_process_pending(struct vcpu *v) >> } >> v->vpci.pdev = NULL; >> >> - spin_lock(&pdev->vpci->lock); >> + if ( need_lock ) >> + spin_lock(&pdev->vpci->lock); >> modify_decoding(pdev, v->vpci.cmd, v->vpci.rom_only); >> - spin_unlock(&pdev->vpci->lock); >> + if ( need_lock ) >> + spin_unlock(&pdev->vpci->lock); >> >> - read_unlock(&v->domain->pci_lock); >> + if ( need_lock ) >> + read_unlock(&v->domain->pci_lock); >> >> return false; >> } >> >> +bool vpci_process_pending(struct vcpu *v) >> +{ >> + return process_pending(v, true); >> +} >> + >> static int __init apply_map(struct domain *d, const struct pci_dev *pdev, >> uint16_t cmd) >> { >> @@ -565,6 +579,8 @@ static void cf_check bar_write( >> { >> struct vpci_bar *bar = data; >> bool hi = false; >> + bool reenable = false; >> + uint32_t cmd = 0; >> >> ASSERT(is_hardware_domain(pdev->domain)); >> >> @@ -585,10 +601,31 @@ static void cf_check bar_write( >> { >> /* If the value written is the current one avoid printing a >> warning. */ >> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) >> + { >> gprintk(XENLOG_WARNING, >> - "%pp: ignored BAR %zu write while mapped\n", >> + "%pp: allowing BAR %zu write while mapped\n", >> &pdev->sbdf, bar - pdev->vpci->header.bars + hi); >> - return; >> + ASSERT(rw_is_write_locked(&pdev->domain->pci_lock)); >> + ASSERT(spin_is_locked(&pdev->vpci->lock)); >> + reenable = true; >> + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND); >> + /* >> + * Write-while-mapped: unmap the old BAR in p2m. We want this to >> + * finish right away since the deferral machinery only supports >> + * unmap OR map, not unmap-then-remap. Ultimately, it probably >> would >> + * be better to defer the write-while-mapped case just like >> regular >> + * BAR writes (but still only allow it for 32-bit BAR writes). >> + */ >> + /* Disable memory decoding */ >> + modify_bars(pdev, cmd & ~PCI_COMMAND_MEMORY, false); >> + /* Call process pending here to ensure P2M operations are done >> */ >> + while ( process_pending(current, false) ) >> + { >> + /* Pre-empted, try again */ > > This seems a tad dangerous. There may be a non-negligible amount of work > queued > up. I also wonder whether the guest can induce spinning by increasing > contention on the p2m (e.g: via ballooning) or by induces work being queued > up. > > I don't quite understand the logic, but I suspect you could > raise_softirq(SCHEDULE_SOFTIRQ), decrease the IP so the instruction is > replayed, release the locks, and simply exit the hypervisor. The system ought > to naturally split the operation in several continuations each of which does > either unmapping or mapping if it couldn't be done in a single one. Replaying > the instruction after decoding is disabled ought to be benign. > > I haven't tried any of what I just wrote, so take it with with several tons of > salt though. The idea was that the unmap-then-map operation would appear atomic from the guest's point of view. I've only queued up the unmap operation at this point in the new logic. Due to the mentioned limitation in the BAR mapping deferral machinery, I wanted to make sure *this BAR* was unmapped before queuing up the map operation (see below). Waiting for *all* pending operations to finish here is likely not appropriate. I think this just reinforces the need to rework the BAR mapping machinery. > Do you know if Linux intentionally skips disabling decode? Or is it a bug? I think it's intentional. See https://gitlab.com/xen-project/xen/-/issues/197 >> + } >> + } >> + else >> + return; >> } >> >> >> @@ -610,6 +647,10 @@ static void cf_check bar_write( >> } >> >> pci_conf_write32(pdev->sbdf, reg, val); >> + >> + if ( reenable ) >> + /* Write-while-mapped: map the new BAR in p2m. OK to defer. */ >> + modify_bars(pdev, cmd, false); This call to modify_bars() will raise a softirq for the map operation. >> } >> >> static void cf_check guest_mem_bar_write(const struct pci_dev *pdev, >> >> base-commit: 8e60d47cf0112c145b6b0e454d102b04c857db8c > > Cheers, > Alejandro
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |