Xen project Mailing List

RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom

To: "'Jan Beulich'" <jbeulich@xxxxxxxx>, 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>

From: Paul Durrant <xadimgnik@xxxxxxxxx>

Date: Fri, 5 Jun 2020 16:39:56 +0100

Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>, 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 05 Jun 2020 15:40:10 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQGjLdCXHS+ot6864srdbAQDyQsvoACJ30jDAXuEodcCsFgQ5AHivRJwAUZxxQAClbGaTgE+0ENsAeL9TS8CQ5Vrh6ixcARw

> -----Original Message----- > From: Jan Beulich <jbeulich@xxxxxxxx> > Sent: 05 June 2020 14:57 > To: paul@xxxxxxx; 'Marek Marczykowski-Górecki' > <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' > <xen-devel@xxxxxxxxxxxxxxxxxxxx> > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in > stubdom > > On 05.06.2020 15:37, Paul Durrant wrote: > >> -----Original Message----- > >> From: Jan Beulich <jbeulich@xxxxxxxx> > >> Sent: 05 June 2020 14:32 > >> To: paul@xxxxxxx > >> Cc: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; > >> 'Andrew Cooper' > >> <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx> > >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in > >> stubdom > >> > >> On 05.06.2020 13:05, Paul Durrant wrote: > >>> Sorry, only just catching up with this... > >>> > >>>> -----Original Message----- > >>>> From: Jan Beulich <jbeulich@xxxxxxxx> > >>>> Sent: 05 June 2020 10:09 > >>>> To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > >>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; xen-devel > >>>> <xen-devel@xxxxxxxxxxxxxxxxxxxx>; Paul > >>>> Durrant <paul@xxxxxxx> > >>>> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 > >>>> in stubdom > >>>> > >>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote: > >>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote: > >>>>>> On 04.06.2020 13:13, Andrew Cooper wrote: > >>>>>>> On 04/06/2020 08:08, Jan Beulich wrote: > >>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote: > >>>>>>>>> Then, we get the main issue: > >>>>>>>>> > >>>>>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > >>>>>>>>> (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff > >>>>>>>>> (XEN) domain_crash called from io.c:178 > >>>>>>>>> > >>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its > >>>>>>>>> stubdom > >>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called already. > >>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can > >>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all > >>>>>>>> CPUs (I assume it didn't, since once they're paused there shouldn't > >>>>>>>> be any I/O there anymore, and hence no I/O emulation)? > >>>>>>> > >>>>>>> The vcpu in question is talking to Qemu, so will have > >>>>>>> v->defer_shutdown > >>>>>>> intermittently set, and skip the pause in domain_shutdown() > >>>>>>> > >>>>>>> I presume this lack of pause is to allow the vcpu in question to still > >>>>>>> be scheduled to consume the IOREQ reply? (Its fairly opaque logic > >>>>>>> with > >>>>>>> 0 clarifying details). > >>>>>>> > >>>>>>> What *should* happen is that, after consuming the reply, the vcpu > >>>>>>> should > >>>>>>> notice and pause itself, at which point it would yield to the > >>>>>>> scheduler. This is the purpose of > >>>>>>> vcpu_{start,end}_shutdown_deferral(). > >>>>>>> > >>>>>>> Evidentially, this is not happening. > >>>>>> > >>>>>> We can't tell yet, until ... > >>>>>> > >>>>>>> Marek: can you add a BUG() after the weird PIO printing? That should > >>>>>>> confirm whether we're getting into handle_pio() via the > >>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which > >>>>>>> case, > >>>>>>> we're fully re-entering the guest). > >>>>>> > >>>>>> ... we know this. handle_pio() gets called from > >>>>>> handle_hvm_io_completion() > >>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() -> > >>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't > >>>>>> call > >>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps > >>>>>> hvm_wait_for_io() should return "!sv->vcpu->domain->is_shutting_down" > >>>>>> instead of plain "true"? > >>>>>> > >>>>>> Adding Paul to Cc, as being the maintainer here. > >>>>> > >>>>> Got it, by sticking BUG() just before that domain_crash() in > >>>>> handle_pio(). And also vcpu 0 of both HVM domains do have > >>>>> v->defer_shutdown. > >>>> > >>>> As per the log they did get it set. I'd be curious of the flag's > >>>> value (as well as v->paused_for_shutdown's) at the point of the > >>>> problematic handle_pio() invocation (see below). It may be > >>>> worthwhile to instrument vcpu_check_shutdown() (inside its if()) > >>>> - before exiting to guest context (in order to then come back > >>>> and call handle_pio()) the vCPU ought to be getting through > >>>> there. No indication of it doing so would be a sign that there's > >>>> a code path bypassing the call to vcpu_end_shutdown_deferral(). > >>>> > >>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off. > >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001 > >>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001 > >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0 > >>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1 > >>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done > >>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off. > >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 > >>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 > >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001 > >>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001 > >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0 > >>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1 > >>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done > >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 d6 > >>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 d6 > >>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > >>>> > >>>> Perhaps in this message could you also log > >>>> v->domain->is_shutting_down, v->defer_shutdown, and > >>>> v->paused_for_shutdown? (Would be nice if, after having made > >>>> changes to your debugging patch, you could point again at the > >>>> precise version you've used for the log provided.) > >>>> > >>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff > >>>>> (XEN) Xen BUG at io.c:178 > >>>> > >>>> Btw, instead of BUG(), WARN() or dump_execution_state() would > >>>> likely also do, keeping Xen alive. > >>>> > >>> > >>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? > >> > >> Where would this originate? > > > > I was referring to the 'if ( unlikely(!vcpu_start_shutdown_deferral(curr)) > > )' at the top of > hvm_send_ioreq(). > > Ah yes. But this is just one way of things possibly going wrong. Plus > the function will return true when ->defer_shutdown is already or > (wrongly) still set. > > >>> That would mean we wouldn't be seeing the "Unexpected PIO" message. From > >>> that message this clearly > >> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server teardown, > >> possibly due to selecting a > >> server but then not finding a vcpu match in ioreq_vcpu_list. > >> > >> I was suspecting such, but at least the tearing down of all servers > >> happens only from relinquish-resources, which gets started only > >> after ->is_shut_down got set (unless the tool stack invoked > >> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown > >> set for the domain). > >> > >> For individually unregistered servers - yes, if qemu did so, this > >> would be a problem. They need to remain registered until all vCPU-s > >> in the domain got paused. > > > > It shouldn't be a problem should it? Destroying an individual server is > > only done with the domain > paused, so no vcpus can be running at the time. > > Consider the case of one getting destroyed after it has already > returned data, but the originating vCPU didn't consume that data > yet. Once that vCPU gets unpaused, handle_hvm_io_completion() > won't find the matching server anymore, and hence the chain > hvm_wait_for_io() -> hvm_io_assist() -> > vcpu_end_shutdown_deferral() would be skipped. handle_pio() > would then still correctly consume the result. True, and skipping hvm_io_assist() means the vcpu internal ioreq state will be left set to IOREQ_READY and *that* explains why we would then exit hvmemul_do_io() with X86EMUL_UNHANDLEABLE (from the first switch). > > Marek - to verify this doesn't happen (sorry, my qemu knowledge > is rather limited, and hence I don't know whether this can > happen at all), could you also log hvm_destroy_ioreq_server() > invocations? > > Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.