|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: IOREQ completions for MMIO writes
On 29/08/2024 5:08 pm, Jason Andryuk wrote:
> Hi Everyone,
>
> I've been looking at ioreq latency and pausing of vCPUs. Specifically
> for MMIO (IOREQ_TYPE_COPY) writes, they still need completions:
>
> static inline bool ioreq_needs_completion(const ioreq_t *ioreq)
> {
> return ioreq->state == STATE_IOREQ_READY &&
> !ioreq->data_is_ptr &&
> (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE);
> }
>
> state == STATE_IOREQ_READY
> data_is_ptr == 0
> type == IOREQ_TYPE_COPY
> dir == IOREQ_WRITE
>
> To a completion is needed. The vCPU remains paused with
> _VPF_blocked_in_xen set in paused_flags until the ioreq server
> notifies of the completion.
>
> At least for the case I'm looking, a single write to a mmio register,
> it doesn't seem like the vCPU needs to be blocked. The write has been
> sent and subsequent emulation should not depend on it.
>
> I feel like I am missing something, but I can't think of a specific
> example where a write needs to be blocking. Maybe it simplifies the
> implementation, so a subsequent instruction will always have a ioreq
> slot available?
>
> Any insights are appreciated.
This is a thorny issue.
In x86, MMIO writes are typically posted, but that doesn't mean that the
underlying layers can stop tracking the write completely.
In your scenario, consider what happens when the same vCPU hits a second
MMIO write a few instructions later. You've now got two IOREQs worth of
pending state, only one slot in the "ring", and a wait of an unknown
period of time for qemu to process the first.
More generally, by not blocking you're violating memory ordering.
Consider vCPU0 doing an MMIO write, and vCPU1 doing an MMIO read, and
qemu happening to process vCPU1 first.
You now have a case where the VM can observe vCPU0 "completing" before
vCPU1 starts, yet vCPU1 observing the old value.
Other scenarios which exist would be e.g. a subsequent IO hitting STDVGA
buffering and being put into the bufioreq ring. Or the vCPU being able
to continue when the "please unplug my emulated disk/network" request is
still pending.
In terms of what to do about latency, this is one area where Xen does
suffer vs KVM.
With KVM, this type of emulation is handled synchronously by an entity
on the same logical processor. With Xen, one LP says "I'm now blocked,
schedule something else" without any idea when the IO will even be
processed.
One crazy idea I had was to look into not de-scheduling the HVM vCPU,
and instead going idle by MONITOR-ing the IOREQ slot.
This way, Qemu can "resume" the HVM vCPU by simply writing the
completion status (and observing some kind of new "I don't need an
evtchn" signal). For a sufficiently quick turnaround, you're also not
thrashing the cache by scheduling another vCPU in the meantime.
It's definitely more complicated. For one, you'd need to double the
size of an IOREQ slot (currently 32 bytes) to avoid sharing a cacheline
with an adjacent vCPU.
I also have no idea if it would be an improvement in practice, but on
paper it does look like it warrants some further experimentation.
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |