[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: IOREQ completions for MMIO writes
On 29/08/2024 5:08 pm, Jason Andryuk wrote: > Hi Everyone, > > I've been looking at ioreq latency and pausing of vCPUs. Specifically > for MMIO (IOREQ_TYPE_COPY) writes, they still need completions: > > static inline bool ioreq_needs_completion(const ioreq_t *ioreq) > { > return ioreq->state == STATE_IOREQ_READY && > !ioreq->data_is_ptr && > (ioreq->type != IOREQ_TYPE_PIO || ioreq->dir != IOREQ_WRITE); > } > > state == STATE_IOREQ_READY > data_is_ptr == 0 > type == IOREQ_TYPE_COPY > dir == IOREQ_WRITE > > To a completion is needed. The vCPU remains paused with > _VPF_blocked_in_xen set in paused_flags until the ioreq server > notifies of the completion. > > At least for the case I'm looking, a single write to a mmio register, > it doesn't seem like the vCPU needs to be blocked. The write has been > sent and subsequent emulation should not depend on it. > > I feel like I am missing something, but I can't think of a specific > example where a write needs to be blocking. Maybe it simplifies the > implementation, so a subsequent instruction will always have a ioreq > slot available? > > Any insights are appreciated. This is a thorny issue. In x86, MMIO writes are typically posted, but that doesn't mean that the underlying layers can stop tracking the write completely. In your scenario, consider what happens when the same vCPU hits a second MMIO write a few instructions later. You've now got two IOREQs worth of pending state, only one slot in the "ring", and a wait of an unknown period of time for qemu to process the first. More generally, by not blocking you're violating memory ordering. Consider vCPU0 doing an MMIO write, and vCPU1 doing an MMIO read, and qemu happening to process vCPU1 first. You now have a case where the VM can observe vCPU0 "completing" before vCPU1 starts, yet vCPU1 observing the old value. Other scenarios which exist would be e.g. a subsequent IO hitting STDVGA buffering and being put into the bufioreq ring. Or the vCPU being able to continue when the "please unplug my emulated disk/network" request is still pending. In terms of what to do about latency, this is one area where Xen does suffer vs KVM. With KVM, this type of emulation is handled synchronously by an entity on the same logical processor. With Xen, one LP says "I'm now blocked, schedule something else" without any idea when the IO will even be processed. One crazy idea I had was to look into not de-scheduling the HVM vCPU, and instead going idle by MONITOR-ing the IOREQ slot. This way, Qemu can "resume" the HVM vCPU by simply writing the completion status (and observing some kind of new "I don't need an evtchn" signal). For a sufficiently quick turnaround, you're also not thrashing the cache by scheduling another vCPU in the meantime. It's definitely more complicated. For one, you'd need to double the size of an IOREQ slot (currently 32 bytes) to avoid sharing a cacheline with an adjacent vCPU. I also have no idea if it would be an improvement in practice, but on paper it does look like it warrants some further experimentation. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |