[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 2/9] x86/mmcfg: add handlers for the PVH Dom0 MMCFG areas



On Thu, Jul 13, 2017 at 02:15:26PM -0600, Jan Beulich wrote:
> >>> Roger Pau Monne <roger.pau@xxxxxxxxxx> 06/30/17 5:02 PM >>>
> > @@ -1041,6 +1043,24 @@ static int __init pvh_setup_acpi(struct domain *d, 
> > paddr_t start_info)
> >      return 0;
> >  }
> >  
> > +int __init pvh_setup_mmcfg(struct domain *d)
> 
> Didn't I point out that __init van't be correct here, and instead this
> needs to be __hwdom_init? I can see that the only current caller is
> __init, but that merely suggests there is a second call missing.

Mostly likely, and I failed to update it.

AFAIK it's not possible to build a late PVH hwdom (or I don't see
how), so I guess that missing call should be added if we ever support
that.

> > +{
> > +    unsigned int i;
> > +    int rc;
> > +
> > +    for ( i = 0; i < pci_mmcfg_config_num; i++ )
> > +    {
> > +        rc = register_vpci_mmcfg_handler(d, pci_mmcfg_config[i].address,
> > +                                         
> > pci_mmcfg_config[i].start_bus_number,
> > +                                         
> > pci_mmcfg_config[i].end_bus_number,
> > +                                         pci_mmcfg_config[i].pci_segment);
> > +        if ( rc )
> > +            return rc;
> 
> I would make this a best effort thing, i.e. issue a log message upon
> failure but continue the loop. There's a good chance Dom0 will still
> be able to come up.

It's worth a try certainly.

> > --- a/xen/arch/x86/hvm/io.c
> > +++ b/xen/arch/x86/hvm/io.c
> > @@ -261,11 +261,11 @@ void register_g2m_portio_handler(struct domain *d)
> >  static int vpci_access_check(unsigned int reg, unsigned int len)
> >  {
> >      /* Check access size. */
> > -    if ( len != 1 && len != 2 && len != 4 )
> > +    if ( len != 1 && len != 2 && len != 4 && len != 8 )
> >          return -EINVAL;
> >  
> > -    /* Check if access crosses a double-word boundary. */
> > -    if ( (reg & 3) + len > 4 )
> > +    /* Check if access crosses a double-word boundary or it's not aligned. 
> > */
> > +    if ( (len <= 4 && (reg & 3) + len > 4) || (len == 8 && (reg & 3) != 0) 
> > )
> >          return -EINVAL;
> 
> For one I suppose you mean "& 7" in the 8-byte case.

I cannot find anything in the PCIe 3.1A specification that says that
8B accesses should be aligned. AFAICT it only mentions that accesses
should not cross double-word (4B) boundaries, because it's not
mandatory for the root complex to support such accesses.

> And then I don't
> understand why you permit mis-aligned 2-byte writes, but not mis-aligned
> 4-byte ones as long as they fall withing a quad-word. Any such asymmetry
> needs at least a comment.

IIRC reading soemthing like that on the Mindshare PCI book, but I
don't have it at hand. Will check on Monday. Anyway, I cannot seem to
find any specific set of restrictions in the PCI/PCIe specifications,
apart from the one that accesses should not cross a double-word
boundary.

I'm fine with only allowing accesses aligned to their respective
sizes, but I think I should add a comment somewhere regarding where
this has been picked from. Do you have any references from the
AMD/Intel SDMs maybe?

> > @@ -398,6 +398,188 @@ void register_vpci_portio_handler(struct domain *d)
> >      handler->ops = &vpci_portio_ops;
> >  }
> >  
> > +struct hvm_mmcfg {
> > +    paddr_t addr;
> > +    size_t size;
> 
> paddr_t and size_t don't really fit together, most notably on 32-bit.
> As I don't think any individual range can possibly be 4Gb or larger, I
> think unsigned int would suffice here.
> 
> > +    unsigned int bus;
> > +    unsigned int segment;
> 
> Depending on how many instances of this structure we expect, it may be
> worthwhile to limit these two to 8 and 16 bits respectively.

Hm, so far the boxes I've tested on only had 1 MCFG area, but it's
probably best to change the types and the order, so that there's no
padding.

> > +/* Handlers to trap PCI ECAM config accesses. */
> 
> An "ECAM" did survive here.

Shame, I should have grepped the patch.

> > +static const struct hvm_mmcfg *vpci_mmcfg_find(struct domain *d,
> > +                                               unsigned long addr)
> 
> paddr_t (to match the structure field)
> 
> > +static void vpci_mmcfg_decode_addr(const struct hvm_mmcfg *mmcfg,
> > +                                   unsigned long addr, unsigned int *bus,
> 
> Same here (and it seems more below). Also, just like in patch 1,
> perhaps return the register by value rather than via indirection.
> 
> > +                                   unsigned int *slot, unsigned int *func,
> > +                                   unsigned int *reg)
> > +{
> > +    addr -= mmcfg->addr;
> > +    *bus = ((addr >> 20) & 0xff) + mmcfg->bus;
> > +    *slot = (addr >> 15) & 0x1f;
> > +    *func = (addr >> 12) & 0x7;
> > +    *reg = addr & 0xfff;
> 
> Iirc there already was a comment to use manifest constants or macros
> here.

Yes, going to fix that.

> > +static int vpci_mmcfg_accept(struct vcpu *v, unsigned long addr)
> > +{
> > +    struct domain *d = v->domain;
> > +    bool found;
> > +
> > +    vpci_lock(d);
> > +    found = vpci_mmcfg_find(d, addr);
> > +    vpci_unlock(d);
> 
> The latest here I wonder whether the lock wouldn't better be an r/w one.

TBH, my first implementation was using a rw lock, but then I though it
was not worth it and switched to a spinlock. I Don't mind making it a
rw lock, but then the argument passed to the read handlers should be
constified for safety IMHO.

Also note that due to the usage of the pcidevs lock whether this is rw
or a spinlock doesn't make much of a difference.

> > +static int vpci_mmcfg_read(struct vcpu *v, unsigned long addr,
> > +                           unsigned int len, unsigned long *data)
> 
> uint64_t * (to be 32-bit compatible)

Will this work properly on 32bit builds?

hvm_mmio_{read/write}_t types expect a unsigned long, not a
uint64_t. I'm confused about how this worked before with a 32bit
hypervisor and a 64bit guest, how where movq handled?

> > +{
> > +    struct domain *d = v->domain;
> > +   > +
> > +    vpci_lock(d);
> > +    mmcfg = vpci_mmcfg_find(d, addr);
> > +    if ( !mmcfg )
> > +    {
> > +        vpci_unlock(d);
> > +        return X86EMUL_OKAY;
> > +    }
> > +
> > +    vpci_mmcfg_decode_addr(mmcfg, addr, &bus, &slot, &func, ®);
> > +
> > +    if ( vpci_access_check(reg, len) )
> > +    {
> > +        vpci_unlock(d);
> > +        return X86EMUL_OKAY;
> > +    }
> > +
> > +    pcidevs_lock();
> > +    if ( len == 8 )
> > +    {
> > +        /*
> > +         * According to the PCIe 3.1A specification:
> > +         *  - Configuration Reads and Writes must usually be DWORD or 
> > smaller
> > +         *    in size.
> > +         *  - Because Root Complex implementations are not required to 
> > support
> > +         *    accesses to a RCRB that cross DW boundaries [...] software
> > +         *    should take care not to cause the generation of such accesses
> > +         *    when accessing a RCRB unless the Root Complex will support 
> > the
> > +         *    access.
> > +         *  Xen however supports 8byte accesses by splitting them into two
> > +         *  4byte accesses.
> > +         */
> > +        *data = vpci_read(mmcfg->segment, bus, slot, func, reg, 4);
> > +        *data |= (uint64_t)vpci_read(mmcfg->segment, bus, slot, func,
> > +                                     reg + 4, 4) << 32;
> > +    }
> > +    else
> > +        *data = vpci_read(mmcfg->segment, bus, slot, func, reg, len);
> 
> I think it would be preferable to avoid the else, by merging this and
> the first of the other two reads.

Ack.

> > +    pcidevs_unlock();
> > +    vpci_unlock(d);
> 
> Question on lock order (should have gone into the patch 1 reply already,
> but I had thought of this only after sending): Is it really a good idea
> to nest this way?

I saw no other way to make sure the pdev is not removed while poking
at it.

> The pcidevs lock is covering quite large regions at
> times, so the risk of a lock order violation seems non-negligible even
> if there may be none right now. Futhermore the new uses of the pcidevs
> lock you introduce would seem to make it quite desirable to make that
> one an r/w one too. Otoh that's a recursive one, so it'll be non-trivial
> to convert ...

I can try, but as you say doesn't seem trivial at all.

> > +int register_vpci_mmcfg_handler(struct domain *d, paddr_t addr,
> 
> __hwdom_init

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.