[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 6/9] xen/vpci: add handlers to map the BARs



On Fri, Jul 14, 2017 at 09:11:29AM -0600, Jan Beulich wrote:
> >>> On 30.06.17 at 17:01, <roger.pau@xxxxxxxxxx> wrote:
> > Introduce a set of handlers that trap accesses to the PCI BARs and the 
> > command
> > register, in order to emulate BAR sizing and BAR relocation.
> 
> I don't think "emulate" is the right term here - you really don't mean to
> change anything, you only want to snoop Dom0 writes.

Right, changed emulate to snoop.

> > --- /dev/null
> > +++ b/xen/drivers/vpci/header.c
> > @@ -0,0 +1,473 @@
> > +/*
> > + * Generic functionality for handling accesses to the PCI header from the
> > + * configuration space.
> > + *
> > + * Copyright (C) 2017 Citrix Systems R&D
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms and conditions of the GNU General Public
> > + * License, version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > + * General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public
> > + * License along with this program; If not, see 
> > <http://www.gnu.org/licenses/>.
> > + */
> > +
> > +#include <xen/sched.h>
> > +#include <xen/vpci.h>
> > +#include <xen/p2m-common.h>
> > +
> > +#define MAPPABLE_BAR(x)                                                 \
> > +    (((x)->type == VPCI_BAR_MEM32 || (x)->type == VPCI_BAR_MEM64_LO ||  \
> > +     ((x)->type == VPCI_BAR_ROM && (x)->enabled)) &&                    \
> > +     (x)->addr != INVALID_PADDR)
> > +
> > +static struct rangeset *vpci_get_bar_memory(const struct domain *d,
> > +                                            const struct vpci_bar *map)
> > +{
> > +    const struct pci_dev *pdev;
> > +    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
> > +    int rc;
> > +
> > +    if ( !mem )
> > +        return ERR_PTR(-ENOMEM);
> > +
> > +    /*
> > +     * Create a rangeset that represents the current BAR memory region
> > +     * and compare it against all the currently active BAR memory regions.
> > +     * If an overlap is found, subtract it from the region to be
> > +     * mapped/unmapped.
> > +     *
> > +     * NB: the rangeset uses frames, and if start and end addresses are
> > +     * equal it means only one frame is used, that's why PFN_DOWN is used
> > +     * to calculate the end of the rangeset.
> > +     */
> 
> That explanation doesn't seem to fit: Did you perhaps mean to
> point out that rangeset ranges are inclusive ones?

Yes, that's probably better.

> > +    rc = rangeset_add_range(mem, PFN_DOWN(map->addr),
> > +                            PFN_DOWN(map->addr + map->size));
> 
> Don't you need to subtract 1 here (and elsewhere below)?

Indeed.

> > +    if ( rc )
> > +    {
> > +        rangeset_destroy(mem);
> > +        return ERR_PTR(rc);
> > +    }
> > +
> > +    list_for_each_entry(pdev, &d->arch.pdev_list, domain_list)
> > +    {
> > +        uint16_t cmd = pci_conf_read16(pdev->seg, pdev->bus,
> > +                                       PCI_SLOT(pdev->devfn),
> > +                                       PCI_FUNC(pdev->devfn),
> > +                                       PCI_COMMAND);
> 
> This is quite a lot of overhead - a loop over all devices plus a config
> space read on each one. What state the memory decode bit is in
> could be recorded in the ->enabled flag, couldn't it? And devices on
> different sub-branches of the topology can't possibly have
> overlapping entries that we need to worry about, as the bridge
> windows would suppress actual accesses.

Oh, so Xen only needs to care about devices that share the same
bridge, because that is the only case where the same page can be
shared by multiple devices?

In any case, the Dom0 is free to wrongly position the BARs anywhere it
wants, thus possibly placing them outside of the bridge windows, in
with case I think we should better check all assigned devices.

> > +        unsigned int i;
> > +
> > +        /* Check if memory decoding is enabled. */
> > +        if ( !(cmd & PCI_COMMAND_MEMORY) )
> > +            continue;
> > +
> > +        for ( i = 0; i < ARRAY_SIZE(pdev->vpci->header.bars); i++ )
> > +        {
> > +            const struct vpci_bar *bar = &pdev->vpci->header.bars[i];
> > +
> > +            if ( bar == map || !MAPPABLE_BAR(bar) ||
> > +                 !rangeset_overlaps_range(mem, PFN_DOWN(bar->addr),
> > +                                          PFN_DOWN(bar->addr + bar->size)) 
> > )
> > +                continue;
> > +
> > +            rc = rangeset_remove_range(mem, PFN_DOWN(bar->addr),
> > +                                       PFN_DOWN(bar->addr + bar->size));
> 
> I'm struggling to convince myself of the correctness of this approach
> (including other code further down which is also involved). I think you
> should have taken the time to add a few words on the approach
> chosen to the description.

Will do.

> For example, it doesn't look like things will
> go right if the device being dealt with has two BARs both using part
> of the same page.

Right, because the BAR won't reflect it's actual state (due to the
memory decoding being global per-device). AFAICT this will be solved
by your suggestion above of using ->enabled and keeping it updated for
BARs also.

> > +static int vpci_modify_bar(struct domain *d, const struct vpci_bar *bar,
> > +                           const bool map)
> > +{
> > +    struct rangeset *mem;
> > +    struct map_data data = { .d = d, .map = map };
> > +    int rc;
> > +
> > +    ASSERT(MAPPABLE_BAR(bar));
> > +
> > +    mem = vpci_get_bar_memory(d, bar);
> > +    if ( IS_ERR(mem) )
> > +        return -PTR_ERR(mem);
> 
> The negation looks wrong to me.

OK, this is already returning -<ERROR>, so the negation is not needed.

> > +static void vpci_cmd_write(struct pci_dev *pdev, unsigned int reg,
> > +                           union vpci_val val, void *data)
> > +{
> > +    uint16_t cmd = val.u16, current_cmd;
> > +    uint8_t seg = pdev->seg, bus = pdev->bus;
> > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > +    int rc;
> > +
> > +    current_cmd = pci_conf_read16(seg, bus, slot, func, reg);
> > +
> > +    if ( !((cmd ^ current_cmd) & PCI_COMMAND_MEMORY) )
> > +    {
> > +        /*
> > +         * Let the guest play with all the bits directly except for the
> > +         * memory decoding one.
> > +         */
> > +        pci_conf_write16(seg, bus, slot, func, reg, cmd);
> > +        return;
> 
> Please invert the condition and have both cases use the same write
> at the end of the function.

Done.

> > +static void vpci_bar_write(struct pci_dev *pdev, unsigned int reg,
> > +                           union vpci_val val, void *data)
> > +{
> > +    struct vpci_bar *bar = data;
> > +    uint8_t seg = pdev->seg, bus = pdev->bus;
> > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > +    uint32_t wdata = val.u32, size_mask;
> > +    bool hi = false;
> > +
> > +    switch ( bar->type )
> > +    {
> > +    case VPCI_BAR_MEM32:
> > +    case VPCI_BAR_MEM64_LO:
> > +        size_mask = (uint32_t)PCI_BASE_ADDRESS_MEM_MASK;
> > +        break;
> > +    case VPCI_BAR_MEM64_HI:
> > +        size_mask = ~0u;
> > +        break;
> > +    default:
> > +        ASSERT_UNREACHABLE();
> > +        return;
> > +    }
> > +
> > +    if ( (wdata & size_mask) == size_mask )
> > +    {
> > +        /* Next reads from this register are going to return the BAR size. 
> > */
> > +        bar->sizing = true;
> > +        return;
> 
> I think the comment needs extending to explain why the written
> sizing value can't possibly be an address. This is particularly
> relevant because I'm not sure that assumption would hold on e.g.
> ARM (which I don't think has guaranteed ROM right below 4Gb).

Hm, right. Maybe it would be best to detect sizing by checking that
the address when performing a read is ~0 on the high bits and ~0 &
PCI_BASE_ADDRESS_MEM_MASK on the lower ones, instead of doing this
kind of partial guessing as done here, it's certainly not very robust.

> > +    }
> > +
> > +    /* End previous sizing cycle if any. */
> > +    bar->sizing = false;
> > +
> > +    /*
> > +     * Ignore attempts to change the position of the BAR if memory 
> > decoding is
> > +     * active.
> > +     */
> > +    if ( pci_conf_read16(seg, bus, slot, func, PCI_COMMAND) &
> > +         PCI_COMMAND_MEMORY )
> > +        return;
> 
> Especially as long as this code supports only Dom0 I think we want
> a warning here.

Done, I've added:

%04x:%02x:%02x.%u: ignored BAR write with memory decoding enabled

> > +static void vpci_rom_write(struct pci_dev *pdev, unsigned int reg,
> > +                           union vpci_val val, void *data)
> > +{
> > +    struct vpci_bar *rom = data;
> > +    uint8_t seg = pdev->seg, bus = pdev->bus;
> > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > +    const uint32_t wdata = val.u32;
> > +
> > +    if ( (wdata & PCI_ROM_ADDRESS_MASK) == PCI_ROM_ADDRESS_MASK )
> > +    {
> > +        /* Next reads from this register are going to return the BAR size. 
> > */
> > +        rom->sizing = true;
> > +        return;
> > +    }
> > +
> > +    /* End previous sizing cycle if any. */
> > +    rom->sizing = false;
> > +
> > +    rom->addr = wdata & PCI_ROM_ADDRESS_MASK;
> > +
> > +    /* Check if memory decoding is enabled. */
> > +    if ( pci_conf_read16(seg, bus, slot, func, PCI_COMMAND) &
> > +         PCI_COMMAND_MEMORY &&
> > +         (rom->enabled ^ (wdata & PCI_ROM_ADDRESS_ENABLE)) )
> 
> Just like you parenthesize the operands of ^, please also do so for
> the ones of &. Also the ^-expression relies on the particular value
> of PCI_ROM_ADDRESS_ENABLE, which I'd prefer if you avoided.

Changed it to: rom->enabled != !!(wadata & PCI_ROM_ADDRESS_ENABLE)

> > +static int vpci_init_bars(struct pci_dev *pdev)
> > +{
> > +    uint8_t seg = pdev->seg, bus = pdev->bus;
> > +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> > +    uint8_t header_type;
> > +    uint16_t cmd;
> > +    uint32_t rom_val;
> > +    uint64_t addr, size;
> > +    unsigned int i, num_bars, rom_reg;
> > +    struct vpci_header *header = &pdev->vpci->header;
> > +    struct vpci_bar *bars = header->bars;
> > +    int rc;
> > +
> > +    header_type = pci_conf_read8(seg, bus, slot, func, PCI_HEADER_TYPE) & 
> > 0x7f;
> > +    switch ( header_type )
> 
> I'd prefer if you didn't introduce variables used just once.

OK, I find it cumbersome to place it as the switch expression, but it
fits in a single line so it's not that bad.

> > +    if ( cmd & PCI_COMMAND_MEMORY )
> > +        pci_conf_write16(seg, bus, slot, func, PCI_COMMAND,
> > +                         cmd & ~PCI_COMMAND_MEMORY);
> > +
> > +    for ( i = 0; i < num_bars; i++ )
> > +    {
> > +        uint8_t reg = PCI_BASE_ADDRESS_0 + i * 4;
> > +        uint32_t val = pci_conf_read32(seg, bus, slot, func, reg);
> > +
> > +        if ( i && bars[i - 1].type == VPCI_BAR_MEM64_LO )
> > +        {
> > +            bars[i].type = VPCI_BAR_MEM64_HI;
> > +            rc = vpci_add_register(pdev, vpci_bar_read, vpci_bar_write, 
> > reg, 4,
> > +                                   &bars[i]);
> > +            if ( rc )
> > +                return rc;
> > +
> > +            continue;
> > +        }
> > +        if ( (val & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
> > +        {
> > +            bars[i].type = VPCI_BAR_IO;
> > +            continue;
> > +        }
> > +        if ( (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
> > +             PCI_BASE_ADDRESS_MEM_TYPE_64 )
> > +            bars[i].type = VPCI_BAR_MEM64_LO;
> > +        else
> > +            bars[i].type = VPCI_BAR_MEM32;
> 
> Perhaps ignore the 64-bit indicator if it appears in the last BAR?

Hm, pci_size_mem_bar is going to complain anyway and Xen won't be able
to size the BAR.

> > +        /* Size the BAR and map it. */
> > +        rc = pci_size_mem_bar(seg, bus, slot, func, reg, i == num_bars - 1,
> > +                              &addr, &size);
> > +        if ( rc < 0 )
> > +            return rc;
> > +
> > +        if ( size == 0 )
> > +        {
> > +            bars[i].type = VPCI_BAR_EMPTY;
> > +            continue;
> > +        }
> > +
> > +        bars[i].addr = (cmd & PCI_COMMAND_MEMORY) ? addr : INVALID_PADDR;
> 
> This doesn't match up with logic further up: When the memory decode
> bit gets cleared, you don't zap the addresses, so I think you'd better
> store it here too. Use INVALID_PADDR only when the value read has
> all address bits set (same caveat as pointed out earlier).

OK, note that .addr can only possibly be INVALID_PADDR at
initialization time, once the user has written something to the BAR
.addr will be different than INVALID_PADDR.

> > +        bars[i].size = size;
> > +        bars[i].prefetchable = val & PCI_BASE_ADDRESS_MEM_PREFETCH;
> > +
> > +        rc = vpci_add_register(pdev, vpci_bar_read, vpci_bar_write, reg, 4,
> > +                               &bars[i]);
> > +        if ( rc )
> > +            return rc;
> > +    }
> > +
> > +    /* Check expansion ROM. */
> > +    rom_val = pci_conf_read32(seg, bus, slot, func, rom_reg);
> > +    if ( rom_val & PCI_ROM_ADDRESS_ENABLE )
> > +        pci_conf_write32(seg, bus, slot, func, rom_reg,
> > +                         rom_val & ~PCI_ROM_ADDRESS_ENABLE);
> 
> Do you really need to do this when you've cleared the memory
> decode bit already?

Oh right, this is not needed. Both bits need to be enabled for the ROM
to be mapped.

> > +    rc = pci_size_mem_bar(seg, bus, slot, func, rom_reg, true, &addr, 
> > &size);
> 
> You can't use this function here without first making it capable of
> dealing with ROM BARs - it expects the low bits to be different
> than what we have here (see the early ASSERT() that's there).
> 
> > +    if ( rc < 0 )
> > +        return rc;
> 
> Perhaps I didn't pay attention elsewhere, but here it is quite obvious
> that in the error case you return with the device in a state other than
> on input.

Yes, there are several error paths here that will return with memory
decoding disabled. I can fix that by writing back the original command
value to the register.

> > +    if ( size )
> > +    {
> > +        struct vpci_bar *rom = &header->bars[num_bars];
> > +
> > +        rom->type = VPCI_BAR_ROM;
> > +        rom->size = size;
> > +        rom->enabled = rom_val & PCI_ROM_ADDRESS_ENABLE;
> > +        if ( rom->enabled )
> > +            rom->addr = addr;
> > +        else
> > +            rom->addr = INVALID_PADDR;
> 
> Same remark as further up.

Fixed.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.