[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
On 04/06/2020 08:08, Jan Beulich wrote:
On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote:Then, we get the main issue: (XEN) d3v0 handle_pio port 0xb004 read 0x0000 (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff (XEN) domain_crash called from io.c:178 Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its stubdom yet. But XEN_DMOP_remote_shutdown for domain 3 was called already.I'd guess an issue with the shutdown deferral logic. Did you / can you check whether XEN_DMOP_remote_shutdown managed to pause all CPUs (I assume it didn't, since once they're paused there shouldn't be any I/O there anymore, and hence no I/O emulation)? The vcpu in question is talking to Qemu, so will have v->defer_shutdown intermittently set, and skip the pause in domain_shutdown() I presume this lack of pause is to allow the vcpu in question to still be scheduled to consume the IOREQ reply? (Its fairly opaque logic with 0 clarifying details). What *should* happen is that, after consuming the reply, the vcpu should notice and pause itself, at which point it would yield to the scheduler. This is the purpose of vcpu_{start,end}_shutdown_deferral(). Evidentially, this is not happening. Marek: can you add a BUG() after the weird PIO printing? That should confirm whether we're getting into handle_pio() via the handle_hvm_io_completion() path, or via the vmexit path (at which case, we're fully re-entering the guest). I suspect you can drop the debugging of XEN_DOMCTL_destroydomain - I think its just noise atm. However, it would be very helpful to see the vcpus which fall into domain_shutdown()'s "else if ( v->defer_shutdown ) continue;" path. Another question though: In 4.13 the log message next to the domain_crash() I assume you're hitting is "Weird HVM ioemulation status", not "Weird PIO status", and the debugging patch you referenced doesn't have any change there. Andrew's recent change to master, otoh, doesn't use the word "weird" anymore. I can therefore only guess that the value logged is still hvmemul_do_pio_buffer()'s return value, i.e. X86EMUL_UNHANDLEABLE. Please confirm. It's the first draft of the patch which I did, before submitting to xen-devel. We do have X86EMUL_UNHANDLEABLE at this point, but its not terribly helpful - there are loads of paths which fail silently with this error. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |