[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen/arm: Virtual ITS command queue handling



On Wed, 2015-05-13 at 15:26 +0100, Julien Grall wrote:
> >>>   on that vits;
> >>> * On receipt of an interrupt notification arising from Xen's own use
> >>>   of `INT`; (see discussion under Completion)
> >>> * On any interrupt injection arising from a guests use of the `INT`
> >>>   command; (XXX perhaps, see discussion under Completion)
> >>
> >> With all the solution suggested, it will be very likely that we will try
> >> to execute multiple the scheduling pass at the same time.
> >>
> >> One way is to wait, until the previous pass as finished. But that would
> >> mean that the scheduler would be executed very often.
> >>
> >> Or maybe you plan to offload the scheduler in a softirq?
> > 
> > Good point.
> > 
> > A soft irq might be one solution, but it is problematic during emulation
> > of `CREADR`, when we would like to do a pass immediately to complete any
> > operations outstanding for the domain doing the read.
> > 
> > Or just using spin_try_lock and not bothering if one is already in
> > progress might be another. But has similar problems.
> > 
> > Or we could defer only scheduling from `INT` (either guest or Xen's own)
> > to a softirq but do ones from `CREADR` emulation synchronously? The
> > softirq would be run on return from the interrupt handler but multiple
> > such would be coalesced I think?
> 
> I think we could defer the scheduling to a softirq for CREADR too, if
> the guest is using:
>       - INT completion: vits.creadr would have been correctly update when
> receiving the INT in xen.
>       - polling completion: the guest will loop on CREADR. It will likely get
> the info on the next read. The drawback is the guest may loose few
> instructions cycle.
> 
> Overall, I don't think it's necessary to have an accurate CREADR.

Yes, deferring the update by one exit+enter might be tolerable. I added
after this list:
        This may result in lots of contention on the scheduler
        locking. Therefore we consider that in each case all which happens is
        triggering of a softirq which will be processed on return to guest,
        and just once even for multiple events. The is considered OK for the
        `CREADR` case because at worst the value read will be one cycle out of
        date.
        


> 
> [..]
> 
> >> AFAIU the process suggested, Xen will inject small batch as long as the
> >> physical command queue is not full.
> > 
> >> Let's take a simple case, only a single domain is using vITS on the
> >> platform. If it injects a huge number of commands, Xen will split it
> >> with lots of small batch. All batch will be injected in the same pass as
> >> long as it fits in the physical command queue. Am I correct?
> > 
> > That's how it is currently written, yes. With the "possible
> > simplification" above the answer is no, only a batch at a time would be
> > written for each guest.
> > 
> > BTW, it doesn't have to be a single guest, the sum total of the
> > injections across all guests could also take a similar amount of time.
> > Is that a concern?
> 
> Yes, the example with only a guest was easier to explain.

So as well as limiting the number of commands in each domains batch we
also want to limit the total number of batches?

> >> I think we have to restrict total number of batch (i.e for all the
> >> domain) injected in a same scheduling pass.
> >>
> >> I would even tend to allow only one in flight batch per domain. That
> >> would limit the possible problem I pointed out.
> > 
> > This is the "possible simplification" I think. Since it simplifies other
> > things (I think) as well as addressing this issue I think it might be a
> > good idea.
> 
> With the limitation of command send per batch, would the fairness you
> were talking on the design doc still required?

I think we still want to schedule the guest's in a strict round robin
manner, to avoid one guest monopolising things.

> >>> Therefore it is proposed that the restriction that a single vITS maps
> >>> to one pITS be retained. If a guest requires access to devices
> >>> associated with multiple pITSs then multiple vITS should be
> >>> configured.
> >>
> >> Having multiple vITS per domain brings other issues:
> >>    - How do you know the number of ITS to describe in the device tree at 
> >> boot?
> > 
> > I'm not sure. I don't think 1 vs N is very different from the question
> > of 0 vs 1 though, somehow the tools need to know about the pITS setup.
> 
> I don't see why the tools would require to know the pITS setup.

Even with only a single vits the tools need to know if the system has 0,
1, or more pits, to know whether to vreate a vits at all or not.

> >>    - How do you tell to the guest that the PCI device is mapped to a
> >> specific vITS?
> > 
> > Device Tree or IORT, just like on native and just like we'd have to tell
> > the guest about that mapping even if there was a single vITS.
> 
> Right, although the root controller can only be attached to one ITS.
> 
> It will be necessary to have multiple root controller in the guest in
> the case of we passthrough devices using different ITS.
> 
> Is pci-back able to expose multiple root controller?

In principal the xenstore protocol supports it, but AFAIK all toolstacks
have only every used "bus" 0, so I wouldn't be surprised if there were
bugs lurking.

But we could fix those, I don't think it is a requirement that this
stuff suddenly springs into life on ARM even with existing kernels.

> > I think the complexity of having one vITS target multiple pITSs is going
> > to be quite high in terms of data structures and the amount of
> > thinking/tracking scheduler code will have to do, mostly down to out of
> > order completion of things put in the pITS queue.
> 
> I understand the complexity, but exposing on vITS per pITS means that we
> are exposing the underlying hardware to the guest.

Some aspect of it, yes, but it is still a virtual ITs.

> That bring a lot of complexity in the guest layout, which is right now
> static. How do you decide the number of vITS/root controller exposed
> (think about PCI hotplug)?
> 
> Given that PCI passthrough doesn't allow migration, maybe we could use
> the layout of the hardware.

That's an option.

> If we are going to expose multiple vITS to the guest, we should only use
> vITS for guest using PCI passthrough. This is because migration won't be
> compatible with it.

It would be possible to support one s/w only vits for migration, i.e the
evtchn thing at the end, but for the general case that is correct. On
x86 I believe that if you hot unplug all passthrough devices you can
migrate and then plug in other devices at the other end.

Anyway, more generally there are certainly problems with multiple vITS.
However there are also problems with a single vITS feeding multiple
pITSs:

      * What to do with global commands? Inject to all pITS and then
        synchronise on them all finishing.
      * Handling of out of order completion of commands queued with
        different pITS, since the vITS must appear to complete in order.
        Apart from the book keeping question it makes scheduling more
        interesting:
              * What if you have a pITS with slots available, and the
                guest command queue contains commands which could go to
                the pITS, but behind ones which are targetting another
                pITS which has no slots
              * What if one pITS is very busy and another is mostly idle
                and a guest submits one command to the busy one
                (contending with other guest) followed by a load of
                commands targeting the idle one. Those commands would be
                held up in this situation.
              * Reasoning about fairness may be harder.

I've but both your list and mine into the next revision of the document.
I think this remains an important open question.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.