Xen project Mailing List

Re: [Xen-devel] [PATCH v3 5/9] xen/vpci: add handlers to map the BARs

On Fri, May 19, 2017 at 09:21:56AM -0600, Jan Beulich wrote: > >>> On 27.04.17 at 16:35, <roger.pau@xxxxxxxxxx> wrote: > > +static int vpci_modify_bars(struct pci_dev *pdev, const bool map) > > +{ > > + struct vpci_header *header = &pdev->vpci->header; > > + unsigned int i; > > + int rc = 0; > > + > > + for ( i = 0; i < ARRAY_SIZE(header->bars); i++ ) > > + { > > + paddr_t gaddr = map ? header->bars[i].gaddr > > + : header->bars[i].mapped_addr; > > + paddr_t paddr = header->bars[i].paddr; > > + > > + if ( header->bars[i].type != VPCI_BAR_MEM && > > + header->bars[i].type != VPCI_BAR_MEM64_LO ) > > + continue; > > + > > + rc = modify_mmio(pdev->domain, _gfn(PFN_DOWN(gaddr)), > > + _mfn(PFN_DOWN(paddr)), > > PFN_UP(header->bars[i].size), > > The PFN_UP() indicates a problem: For sub-page BARs you can't > blindly map/unmap them without taking into consideration other > devices sharing the same page. I'm not sure I follow, the start address of BARs is always aligned to a 4KB boundary, so there's no chance of the same page being used by two different BARs at the same time. The size is indeed not aligned to 4KB, but I don't see how this can cause collisions with other BARs unless the domain is actively trying to make the BARs overlap, in which case there's not much Xen can do. > > + map); > > + if ( rc ) > > + break; > > + > > + header->bars[i].mapped_addr = map ? gaddr : 0; > > + } > > + > > + return rc; > > +} > > Shouldn't this function somewhere honor the unset flags? Right, I've added a check to make sure the BAR is positioned before trying to map it into the domain p2m. > > +static int vpci_cmd_read(struct pci_dev *pdev, unsigned int reg, > > + union vpci_val *val, void *data) > > +{ > > + struct vpci_header *header = data; > > + > > + val->word = header->command; > > Rather than reading back and storing the value in the write handler, > I'd recommending doing an actual read here. OK. > > +static int vpci_cmd_write(struct pci_dev *pdev, unsigned int reg, > > + union vpci_val val, void *data) > > +{ > > + struct vpci_header *header = data; > > + uint16_t new_cmd, saved_cmd; > > + uint8_t seg = pdev->seg, bus = pdev->bus; > > + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn); > > + int rc; > > + > > + new_cmd = val.word; > > + saved_cmd = header->command; > > + > > + if ( !((new_cmd ^ saved_cmd) & PCI_COMMAND_MEMORY) ) > > + goto out; > > + > > + /* Memory space access change. */ > > + rc = vpci_modify_bars(pdev, new_cmd & PCI_COMMAND_MEMORY); > > + if ( rc ) > > + { > > + dprintk(XENLOG_ERR, > > + "%04x:%02x:%02x.%u:unable to %smap BARs: %d\n", > > + seg, bus, slot, func, > > + new_cmd & PCI_COMMAND_MEMORY ? "" : "un", rc); > > + return rc; > > I guess you can guess the question already: What is the bare > hardware equivalent of this failure return? Yes, this is already fixed since write handlers simply return void. The hw equivalent would be to ignore the write AFAICT (ie: memory decoding will not be enabled). Are you fine with the dprintk or would you also like me to remove that? (IMHO it's helpful for debugging). > > + } > > + > > + out: > > Please try to avoid goto-s and labels for other than error handling > (and even then only when code would otherwise end up pretty > convoluted). Done. > > +static int vpci_bar_read(struct pci_dev *pdev, unsigned int reg, > > + union vpci_val *val, void *data) > > +{ > > + struct vpci_bar *bar = data; > > const > > > + bool hi = false; > > + > > + ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO || > > + bar->type == VPCI_BAR_MEM64_HI); > > + > > + if ( bar->type == VPCI_BAR_MEM64_HI ) > > + { > > + ASSERT(reg - PCI_BASE_ADDRESS_0 > 0); > > reg > PCI_BASE_ADDRESS_0 Fixed. > > + bar--; > > + hi = true; > > + } > > + > > + if ( bar->sizing ) > > + val->double_word = ~(bar->size - 1) >> (hi ? 32 : 0); > > There's also a comment further down - this is producing undefined > behavior on 32-bits arches. I've changed size to be a uint64_t. > > +static int vpci_bar_write(struct pci_dev *pdev, unsigned int reg, > > + union vpci_val val, void *data) > > +{ > > + struct vpci_bar *bar = data; > > + uint32_t wdata = val.double_word; > > + bool hi = false, unset = false; > > + > > + ASSERT(bar->type == VPCI_BAR_MEM || bar->type == VPCI_BAR_MEM64_LO || > > + bar->type == VPCI_BAR_MEM64_HI); > > + > > + if ( wdata == GENMASK(31, 0) ) > > I'm afraid this again doesn't match real hardware behavior: As the > low bits are r/o, writes with them having any value, but all other > bits being 1 should have the same effect. I notice that while I had > fixed this for the ROM BAR in Linux'es pciback, I should have also > fixed this for ordinary ones. I've changed this to: switch ( bar->type ) { case VPCI_BAR_MEM: size_mask = GENMASK(31, 12); break; case VPCI_BAR_MEM64_LO: size_mask = GENMASK(31, 26); break; case VPCI_BAR_MEM64_HI: size_mask = GENMASK(31, 0); break; default: ASSERT_UNREACHABLE(); break; } if ( (wdata & size_mask) == size_mask ) { ... And removed the ASSERT just above (since it's now handled in the switch itself). > > + { > > + /* Next reads from this register are going to return the BAR size. > > */ > > + bar->sizing = true; > > + return 0; > > + } > > + > > + /* End previous sizing cycle if any. */ > > + bar->sizing = false; > > + > > + unset = bar->unset; > > + if ( unset ) > > + bar->unset = false; > > + > > + if ( bar->type == VPCI_BAR_MEM64_HI ) > > + { > > + ASSERT(reg - PCI_BASE_ADDRESS_0 > 0); > > + bar--; > > + hi = true; > > + } > > + > > + /* Update the relevant part of the BAR address. */ > > + bar->gaddr &= hi ? ~GENMASK(63, 32) : ~GENMASK(31, 0); > > + wdata &= hi ? GENMASK(31, 0) : PCI_BASE_ADDRESS_MEM_MASK; > > Perhaps easier to grok as > > if ( hi ) > wdata &= PCI_BASE_ADDRESS_MEM_MASK; I've done that (with the condition reversed). > However, considering the dual use below, I'd prefer if you wrote > back the value you read to the low 4 bits. They're _supposed_ to > be r/o, yes, but anyway. Done. > > > + bar->gaddr |= (uint64_t)wdata << (hi ? 32 : 0); > > + > > + if ( unset ) > > + { > > + bar->paddr = bar->gaddr; > > So this deals with first time setting of the BAR by Dom0. If Dom0 > later decides to move BARs around, how do you guarantee things > to continue to work fine if you allow paddr and gaddr to go out of > sync? Often the reason to do re-assignments is because the OS > recognized address conflicts. Or it needs to make room for SR-IOV > BARs. I've removed the unset check, so that every BAR position change done by Dom0 is also applied to the hardware, instead of just changing Dom0's p2m. > > + pci_conf_write16(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn), > > + PCI_FUNC(pdev->devfn), reg, wdata); > > pci_conf_write32() Ups, thanks. > > > + } > > + > > + ASSERT(IS_ALIGNED(bar->gaddr, PAGE_SIZE)); > > Urgh. Removed. > > +static int vpci_init_bars(struct pci_dev *pdev) > > +{ > > + uint8_t seg = pdev->seg, bus = pdev->bus; > > + uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn); > > + uint8_t header_type; > > + unsigned int i, num_bars; > > + struct vpci_header *header = &pdev->vpci->header; > > + struct vpci_bar *bars = header->bars; > > + int rc; > > + > > + header_type = pci_conf_read8(seg, bus, slot, func, PCI_HEADER_TYPE) & > > 0x7f; > > + if ( header_type == PCI_HEADER_TYPE_NORMAL ) > > + num_bars = 6; > > + else if ( header_type == PCI_HEADER_TYPE_BRIDGE ) > > + num_bars = 2; > > + else > > + return -ENOSYS; > > -EOPNOTSUPP > > > + /* Setup a handler for the control register. */ > > + header->command = pci_conf_read16(seg, bus, slot, func, PCI_COMMAND); > > As the code says, the register is the Command Register, so your > comment shouldn't say "control". My mistake. > > + rc = xen_vpci_add_register(pdev, vpci_cmd_read, vpci_cmd_write, > > + PCI_COMMAND, 2, header); > > + if ( rc ) > > + { > > + dprintk(XENLOG_ERR, > > + "%04x:%02x:%02x.%u: failed to add handler register %#x: > > %d\n", > > + seg, bus, slot, func, PCI_COMMAND, rc); > > + return rc; > > + } > > + > > + for ( i = 0; i < num_bars; i++ ) > > + { > > + uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4; > > + uint32_t val = pci_conf_read32(seg, bus, slot, func, reg); > > + uint64_t addr, size; > > + unsigned int index; > > + > > + if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO ) > > + { > > + bars[i].type = VPCI_BAR_MEM64_HI; > > + bars[i].unset = bars[i - 1].unset; > > + continue; > > Neither here nor below you install a handler for this upper half. Ugh, good catch. > > + } > > + else if ( (val & PCI_BASE_ADDRESS_SPACE) == > > PCI_BASE_ADDRESS_SPACE_IO ) > > + { > > + bars[i].type = VPCI_BAR_IO; > > + continue; > > + } > > + else if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == > > Pointless "else" (twice). Removed. > > + PCI_BASE_ADDRESS_MEM_TYPE_64 ) > > + bars[i].type = VPCI_BAR_MEM64_LO; > > + else > > + bars[i].type = VPCI_BAR_MEM; > > + > > + /* Size the BAR and map it. */ > > + index = i; > > + rc = pci_size_bar(seg, bus, slot, func, PCI_BASE_ADDRESS_0, > > num_bars, > > + &index, &addr, &size); > > + if ( rc ) > > + { > > + dprintk(XENLOG_ERR, > > + "%04x:%02x:%02x.%u: unable to size BAR#%u: %d\n", > > + seg, bus, slot, func, i, rc); > > + return rc; > > + } > > + > > + if ( size == 0 ) > > + { > > + bars[i].type = VPCI_BAR_EMPTY; > > + continue; > > + } > > + > > + if ( (bars[i].type == VPCI_BAR_MEM && addr == GENMASK(31, 12)) || > > + addr == GENMASK(63, 26) ) > > Where is this 26 coming from? > > Perhaps > > if ( addr == GENMASK(bars[i].type == VPCI_BAR_MEM ? 31 : 63, 12) ) I'm checking the memory decode bit here instead in order to figure out if the BAR is not positioned. > ? Albeit I'm unconvinced GENMASK() is useful to be used here anyway > (see also below). Right, regardless of the specific usage above, what would you recommend regarding the usage of GENMASK? Julien suggested introducing GENMASK_ULL. Should I go that route, or introduce something locally for vPCI? > > + { > > + /* BAR is not positioned. */ > > I can't find anything in the standard saying that all-ones upper > address bits indicate an unassigned BAR. As long as the memory > decode bit is off, all BARs are to be considered unassigned afaik. > Furthermore you can't possibly read e.g. 0xfffff000 from a > 32-bit BAR covering more than 4k. OK, so I've now changed this to mark the BAR as unset if the memory decode bit in the command register is not set. > > + bars[i].unset = true; > > + ASSERT(is_hardware_domain(pdev->domain)); > > + ASSERT(!(header->command & PCI_COMMAND_MEMORY)); > > You're asserting guest controlled state here (even if it's Dom0). > > > + } > > + > > + ASSERT(IS_ALIGNED(addr, PAGE_SIZE)); > > Urgh (again). Removed both of the above. > > --- a/xen/include/xen/vpci.h > > +++ b/xen/include/xen/vpci.h > > @@ -50,6 +50,34 @@ int xen_vpci_write(unsigned int seg, unsigned int bus, > > unsigned int devfn, > > struct vpci { > > /* Root pointer for the tree of vPCI handlers. */ > > struct rb_root handlers; > > + > > +#ifdef __XEN__ > > + /* Hide the rest of the vpci struct from the user-space test harness. > > */ > > + struct vpci_header { > > + /* Cached value of the command register. */ > > + uint16_t command; > > + /* Information about the PCI BARs of this device. */ > > + struct vpci_bar { > > + enum { > > + VPCI_BAR_EMPTY, > > + VPCI_BAR_IO, > > + VPCI_BAR_MEM, > > MEM32? Changed. > > + VPCI_BAR_MEM64_LO, > > + VPCI_BAR_MEM64_HI, > > + } type; > > + /* Hardware address. */ > > + paddr_t paddr; > > + /* Guest address where the BAR should be mapped. */ > > + paddr_t gaddr; > > + /* Current guest address where the BAR is mapped. */ > > + paddr_t mapped_addr; > > Why do you need to track both "should be" and "is" addresses? Also > I think all three would more naturally be frame numbers. I think I can use a single field to store the address. > > + size_t size; > > Is this enough for e.g. ARM32 (remember this is a common > header)? No, I've changed it to uint64_t. > > + unsigned int attributes:4; > > ??? Changed this to "bool prefetchable" instead. > > + bool sizing; > > + bool unset; > > Isn't this redundant with e.g. gaddr (or as per above gfn) being > INVALID_PADDR (INVALID_GFN)? Yes, now removed. > > + } bars[6]; > > What about the ROM and SR-IOV ones? I've implemented support for the expansion ROM BAR (which I still need to figure out how to test), but I would like to defer SR-IOV for later because it involves a non-trivial amount of work, and with this series one can already boot a PVH Dom0 (minus SR-IOV of course). Thanks, Roger. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.