[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v8 07/11] vpci/bars: add handlers to map the BARs



On Tue, Feb 27, 2018 at 03:01:44AM -0700, Jan Beulich wrote:
> >>> On 27.02.18 at 10:21, <roger.pau@xxxxxxxxxx> wrote:
> > On Tue, Feb 27, 2018 at 01:32:24AM -0700, Jan Beulich wrote:
> >> >>> On 26.02.18 at 19:00, <roger.pau@xxxxxxxxxx> wrote:
> >> > On Mon, Feb 26, 2018 at 04:20:17AM -0700, Jan Beulich wrote:
> >> >> >>> On 23.01.18 at 16:07, <roger.pau@xxxxxxxxxx> wrote:
> >> >> > +static void maybe_defer_map(struct domain *d, const struct pci_dev 
> >> >> > *pdev,
> >> >> > +                            struct rangeset *mem, bool map, bool rom)
> >> >> > +{
> >> >> > +    struct vcpu *curr = current;
> >> >> > +
> >> >> > +    if ( is_idle_vcpu(curr) )
> >> >> > +    {
> >> >> > +        struct map_data data = { .d = d, .map = true };
> >> >> > +
> >> >> > +        /*
> >> >> > +         * Only used for domain construction in order to map the BARs
> >> >> > +         * of devices with memory decoding enabled.
> >> >> > +         */
> >> >> > +        ASSERT(map && !rom);
> >> >> > +        rangeset_consume_ranges(mem, map_range, &data);
> >> >> 
> >> >> What if this produces -ENOMEM? And despite having looked at
> >> >> several revisions of this, I can't make the connection to why this
> >> >> is in an is_idle_vcpu() conditional (neither the direct caller nor the
> >> >> next level up make this obvious to me). There's clearly a need
> >> >> for extending the comment.
> >> > 
> >> > I thought the comment above that mentions domain construction would be
> >> > enough. I can try to expand this, maybe like:
> >> > 
> >> > "This function will only be called from the idle vCPU while building
> >> > the domain, in which case it's not possible to defer the operation
> >> > (like done in the else branch). Call rangeset_consume_ranges in order
> >> > to establish the mappings right away."
> >> 
> >> And what again is the connection between is_idle_domain() and
> >> domain construction? I think the comment belongs ahead of the
> >> if(), and it needs to make that connection.
> > 
> > Oh, domain constructions runs on the idle vCPU, that's the relation.
> > 
> > "This function will only be called from the idle vCPU while building
> > the domain (because Dom0 building runs on the idle vCPU), in which
> > case it's not possible to defer the operation (like done in the else
> > branch). Call rangeset_consume_ranges in order to establish the
> > mappings right away."
> > 
> > Does that seem clearer if placed ahead of the if?
> 
> Can be shorter imo - the thing I didn't pay attention to is that
> this is all about _dom0_ building, not general _domain_ building.
> Hence perhaps:
> 
> "Dom0 building runs on the idle vCPU, in which case it's not possible
>  to defer the operation (like done in the else branch). Call
>  rangeset_consume_ranges in order to establish the mappings right
>  away."

LGTM. Thanks for re-writing it.

> >> >> > +static void modify_bars(const struct pci_dev *pdev, bool map, bool 
> >> >> > rom)
> >> >> > +{
> >> >> > +    struct vpci_header *header = &pdev->vpci->header;
> >> >> > +    struct rangeset *mem = rangeset_new(NULL, NULL, 0);
> >> >> > +    const struct pci_dev *tmp;
> >> >> > +    unsigned int i;
> >> >> > +    int rc;
> >> >> > +
> >> >> > +    if ( !map )
> >> >> > +        modify_decoding(pdev, false, rom);
> >> >> > +
> >> >> > +    if ( !mem )
> >> >> > +        return;
> >> >> 
> >> >> Similarly here - why is it okay (or what effect will it have) to return
> >> >> here when we're out of memory, but the caller won't know?
> >> > 
> >> > The behaviour here depends on the change to the memory decoding bit:
> >> > 
> >> >  - Clearing: memory decoding on device will be disabled, BARs won't be
> >> >    unmapped.
> >> >  - Setting: no change to device memory decoding bit, BARs won't be
> >> >    mapped.
> >> > 
> >> > Do you think this is suitable? IMO it's fine to disable the memory
> >> > decoding bit on the device and leave the memory regions mapped.
> >> 
> >> As long as subsequent changes to the decoding bit can't leave
> >> stale mappings. Plus there needs to be a comment to explain this.
> > 
> > With the current approach in the unmap case there will be stale
> > mappings left behind.
> > 
> > I guess it's better then to not modify the memory decoding bit at all
> > until the operation finishes. That also rises the question of whether
> > the memory decoding bit should be modified if p2m mapping/unmapping
> > reports an error.
> > 
> > Should we also attempt to rollback failed map/unmap operations? What
> > happens if the rollback also fails?
> 
> It is in particular this last question why I don't think rollback makes
> sense. If there's any failure, I think the decode bit should be sticky
> clear; we may want (need?) to invent some magical mechanism to
> get a device back out of this state later on. But that way stale
> mappings are not an immediate problem (I think).

The only problem I see with this approach is that if an error happens
in the unmap case we will disable memory decoding and leave some stale
p2m mappings. Then the guest might change the position of the BARs,
and those stale mappings would be completely forgotten.

I don't think this is a problem for Dom0, but DomUs need to be handled
differently (possibly along the lines of the diagram below).

> > What about the following:
> > 
> >     +---------+   SUCCESS  +---------------------------------+
> >     |map/unmap+------------>Change decoding or ROM enable bit|
> >     +----+----+            +---------------------------------+
> >          |
> >          |FAILURE
> >          |
> > +--------v-------+ SUCCESS +---------------------------------------+
> > |revert operation+--------->No change to decoding or ROM enable bit|
> > +--------+-------+         +---------------------------------------+
> >          |
> >          |FAILURE
> >          |
> >    +-----v-----+
> >    |Kill domain|
> >    +-----------+
> 
> Possibly. Killing Dom0 is a bad thing though, if just some random
> device has a problem.

So for Dom0 the diagram would be the following:

    +---------+   SUCCESS  +---------------------------------+
    |map/unmap+------------>Change decoding or ROM enable bit|
    +----+----+            +---------------------------------+
         |
         |FAILURE
         |
+--------v-----------------------+
|Clear decoding or ROM enable bit|
+--------------------------------+

> >> >> > +    for ( i = 0; i < ARRAY_SIZE(header->bars); i++ )
> >> >> > +    {
> >> >> > +        const struct vpci_bar *bar = &header->bars[i];
> >> >> > +
> >> >> > +        if ( !MAPPABLE_BAR(bar) ||
> >> >> > +             (rom ? bar->type != VPCI_BAR_ROM
> >> >> > +                  : (bar->type == VPCI_BAR_ROM && 
> >> >> > !header->rom_enabled)) )
> >> >> > +            continue;
> >> >> 
> >> >> Why does rom_enabled matter for the !rom case rather than for
> >> >> the rom one? I.e.
> >> >> 
> >> >>         if ( !MAPPABLE_BAR(bar) ||
> >> >>              (rom ? bar->type != VPCI_BAR_ROM || !header->rom_enabled
> >> >>                   : bar->type == VPCI_BAR_ROM) )
> >> >>             continue;
> >> >> 
> >> >> ?
> >> > 
> >> > No, for the ROM case we only want to map/unmap the ROM, so that's the
> >> > only thing added to the rangeset. For the non-ROM case Xen will also
> >> > map/unmap the ROM if the enable bit is set.
> >> > 
> >> > Your proposed code would always map/unmap the ROM into the p2m when
> >> > the memory decoding bit is changed even if it's not enabled.
> >> 
> >> I don't understand. Taking apart the conditional I've suggested,
> >> and converting to human language:
> >> - if the BAR is no mappable, continue
> >> - if we want to deal with ROM (rom=true), if the BAR isn't ROM
> >>   or isn't enabled, continue
> > 
> > With the current flow rom_enabled is set after the ROM BAR mappings
> > are established, which means that when modify_bars is called with
> > rom=true rom_enabled is not yet set, and using the logic above the
> > mappings won't be created.
> > 
> >> - if we want to deal with non-ROM (rom=false), if the BAR is ROM,
> >>   continue
> > 
> > Xen also has to deal with the ROM if it's enabled when the memory
> > decoding bit is toggled, hence in rom=false case the ROM also needs to
> > be mapped/unampped if it's enabled.
> > 
> > This dependency between the memory decoding bit and the ROM enable bit
> > is quite convoluted TBH.
> > 
> >> To me this is in line with the 2nd paragraph of your reply. It's not
> >> in line with the first, which makes me wonder whether "rom" is
> >> misnamed and wants to be "rom_only". Still, the question would
> >> remain of why rom_enabled doesn't matter when the variable is
> >> true.
> > 
> > Yes, rom means rom_only (ie: ROM enable bit has been toggled and
> > memory decoding bit is enabled). I assume you would prefer to change
> > the name to rom_only.
> 
> Yes, and perhaps provide an abridged version of your explanation
> above in a comment next to the conditional.

OK, I can do that.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.