Xen project Mailing List

RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom

To: <paul@xxxxxxx>, "'Jan Beulich'" <jbeulich@xxxxxxxx>

From: Paul Durrant <xadimgnik@xxxxxxxxx>

Date: Fri, 5 Jun 2020 16:48:39 +0100

Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>, 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Fri, 05 Jun 2020 15:48:56 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AQGjLdCXHS+ot6864srdbAQDyQsvoACJ30jDAXuEodcCsFgQ5AHivRJwAUZxxQAClbGaTgD0KkheAp3cNn8CbNqNYgH2h+l1AmkAS7Koia0gAA==

> -----Original Message----- > From: Paul Durrant <xadimgnik@xxxxxxxxx> > Sent: 05 June 2020 14:49 > To: 'Jan Beulich' <jbeulich@xxxxxxxx> > Cc: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; 'Andrew > Cooper' > <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx> > Subject: RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in > stubdom > > > -----Original Message----- > > From: Jan Beulich <jbeulich@xxxxxxxx> > > Sent: 05 June 2020 14:47 > > To: paul@xxxxxxx > > Cc: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; 'Andrew > > Cooper' > > <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx> > > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in > > stubdom > > > > On 05.06.2020 15:43, Paul Durrant wrote: > > >> -----Original Message----- > > >> From: Jan Beulich <jbeulich@xxxxxxxx> > > >> Sent: 05 June 2020 14:37 > > >> To: paul@xxxxxxx > > >> Cc: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; > > >> 'Andrew Cooper' > > >> <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' <xen-devel@xxxxxxxxxxxxxxxxxxxx> > > >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 > > >> in stubdom > > >> > > >> On 05.06.2020 13:25, Paul Durrant wrote: > > >>>> -----Original Message----- > > >>>> From: Paul Durrant <xadimgnik@xxxxxxxxx> > > >>>> Sent: 05 June 2020 12:06 > > >>>> To: 'Jan Beulich' <jbeulich@xxxxxxxx>; 'Marek Marczykowski-Górecki' > > >> <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > > >>>> Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' > > >>>> <xen-devel@xxxxxxxxxxxxxxxxxxxx> > > >>>> Subject: RE: handle_pio looping during domain shutdown, with qemu > > >>>> 4.2.0 in stubdom > > >>>> > > >>>> Sorry, only just catching up with this... > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: Jan Beulich <jbeulich@xxxxxxxx> > > >>>>> Sent: 05 June 2020 10:09 > > >>>>> To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > > >>>>> Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; xen-devel > > >>>>> <xen-devel@xxxxxxxxxxxxxxxxxxxx>; > Paul > > >>>>> Durrant <paul@xxxxxxx> > > >>>>> Subject: Re: handle_pio looping during domain shutdown, with qemu > > >>>>> 4.2.0 in stubdom > > >>>>> > > >>>>> On 04.06.2020 16:25, Marek Marczykowski-Górecki wrote: > > >>>>>> On Thu, Jun 04, 2020 at 02:36:26PM +0200, Jan Beulich wrote: > > >>>>>>> On 04.06.2020 13:13, Andrew Cooper wrote: > > >>>>>>>> On 04/06/2020 08:08, Jan Beulich wrote: > > >>>>>>>>> On 04.06.2020 03:46, Marek Marczykowski-Górecki wrote: > > >>>>>>>>>> Then, we get the main issue: > > >>>>>>>>>> > > >>>>>>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > > >>>>>>>>>> (XEN) d3v0 Weird PIO status 1, port 0xb004 read 0xffff > > >>>>>>>>>> (XEN) domain_crash called from io.c:178 > > >>>>>>>>>> > > >>>>>>>>>> Note, there was no XEN_DOMCTL_destroydomain for domain 3 nor its > > >>>>>>>>>> stubdom > > >>>>>>>>>> yet. But XEN_DMOP_remote_shutdown for domain 3 was called > > >>>>>>>>>> already. > > >>>>>>>>> I'd guess an issue with the shutdown deferral logic. Did you / can > > >>>>>>>>> you check whether XEN_DMOP_remote_shutdown managed to pause all > > >>>>>>>>> CPUs (I assume it didn't, since once they're paused there > > >>>>>>>>> shouldn't > > >>>>>>>>> be any I/O there anymore, and hence no I/O emulation)? > > >>>>>>>> > > >>>>>>>> The vcpu in question is talking to Qemu, so will have > > >>>>>>>> v->defer_shutdown > > >>>>>>>> intermittently set, and skip the pause in domain_shutdown() > > >>>>>>>> > > >>>>>>>> I presume this lack of pause is to allow the vcpu in question to > > >>>>>>>> still > > >>>>>>>> be scheduled to consume the IOREQ reply? (Its fairly opaque logic > > >>>>>>>> with > > >>>>>>>> 0 clarifying details). > > >>>>>>>> > > >>>>>>>> What *should* happen is that, after consuming the reply, the vcpu > > >>>>>>>> should > > >>>>>>>> notice and pause itself, at which point it would yield to the > > >>>>>>>> scheduler. This is the purpose of > > >>>>>>>> vcpu_{start,end}_shutdown_deferral(). > > >>>>>>>> > > >>>>>>>> Evidentially, this is not happening. > > >>>>>>> > > >>>>>>> We can't tell yet, until ... > > >>>>>>> > > >>>>>>>> Marek: can you add a BUG() after the weird PIO printing? That > > >>>>>>>> should > > >>>>>>>> confirm whether we're getting into handle_pio() via the > > >>>>>>>> handle_hvm_io_completion() path, or via the vmexit path (at which > > >>>>>>>> case, > > >>>>>>>> we're fully re-entering the guest). > > >>>>>>> > > >>>>>>> ... we know this. handle_pio() gets called from > > >>>>>>> handle_hvm_io_completion() > > >>>>>>> after having called hvm_wait_for_io() -> hvm_io_assist() -> > > >>>>>>> vcpu_end_shutdown_deferral(), so the issue may be that we shouldn't > > >>>>>>> call > > >>>>>>> handle_pio() (etc) at all anymore in this state. IOW perhaps > > >>>>>>> hvm_wait_for_io() should return > > >>>>>>> "!sv->vcpu->domain->is_shutting_down" > > >>>>>>> instead of plain "true"? > > >>>>>>> > > >>>>>>> Adding Paul to Cc, as being the maintainer here. > > >>>>>> > > >>>>>> Got it, by sticking BUG() just before that domain_crash() in > > >>>>>> handle_pio(). And also vcpu 0 of both HVM domains do have > > >>>>>> v->defer_shutdown. > > >>>>> > > >>>>> As per the log they did get it set. I'd be curious of the flag's > > >>>>> value (as well as v->paused_for_shutdown's) at the point of the > > >>>>> problematic handle_pio() invocation (see below). It may be > > >>>>> worthwhile to instrument vcpu_check_shutdown() (inside its if()) > > >>>>> - before exiting to guest context (in order to then come back > > >>>>> and call handle_pio()) the vCPU ought to be getting through > > >>>>> there. No indication of it doing so would be a sign that there's > > >>>>> a code path bypassing the call to vcpu_end_shutdown_deferral(). > > >>>>> > > >>>>>> (XEN) hvm.c:1620:d6v0 All CPUs offline -- powering off. > > >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > > >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > > >>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x0001 > > >>>>>> (XEN) d3v0 handle_pio port 0xb004 write 0x2001 > > >>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 reason 0 > > >>>>>> (XEN) d4v0 domain 3 domain_shutdown vcpu_id 0 defer_shutdown 1 > > >>>>>> (XEN) d4v0 XEN_DMOP_remote_shutdown domain 3 done > > >>>>>> (XEN) hvm.c:1620:d5v0 All CPUs offline -- powering off. > > >>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 > > >>>>>> (XEN) d1v0 handle_pio port 0xb004 read 0x0000 > > >>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x0001 > > >>>>>> (XEN) d1v0 handle_pio port 0xb004 write 0x2001 > > >>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 reason 0 > > >>>>>> (XEN) d2v0 domain 1 domain_shutdown vcpu_id 0 defer_shutdown 1 > > >>>>>> (XEN) d2v0 XEN_DMOP_remote_shutdown domain 1 done > > >>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x3 ref 0x11d flags 0x2 > > >>>>>> d6 > > >>>>>> (XEN) grant_table.c:3702:d0v0 Grant release 0x4 ref 0x11e flags 0x2 > > >>>>>> d6 > > >>>>>> (XEN) d3v0 handle_pio port 0xb004 read 0x0000 > > >>>>> > > >>>>> Perhaps in this message could you also log > > >>>>> v->domain->is_shutting_down, v->defer_shutdown, and > > >>>>> v->paused_for_shutdown? (Would be nice if, after having made > > >>>>> changes to your debugging patch, you could point again at the > > >>>>> precise version you've used for the log provided.) > > >>>>> > > >>>>>> (XEN) d3v0 Unexpected PIO status 1, port 0xb004 read 0xffff > > >>>>>> (XEN) Xen BUG at io.c:178 > > >>>>> > > >>>>> Btw, instead of BUG(), WARN() or dump_execution_state() would > > >>>>> likely also do, keeping Xen alive. > > >>>>> > > >>>> > > >>>> A shutdown deferral problem would result in X86EMUL_RETRY wouldn't it? > > >>>> That would mean we > > wouldn't > > >> be > > >>>> seeing the "Unexpected PIO" message. From that message this clearly > > >>>> X86EMUL_UNHANDLEABLE which > > >>>> suggests a race with ioreq server teardown, possibly due to selecting > > >>>> a server but then not > > finding > > >> a > > >>>> vcpu match in ioreq_vcpu_list. > > >>> > > >>> Actually looking at remote_shutdown... the test of ( reason == > > >>> SHUTDOWN_crash ) and then > clearing > > >> defer_shutdown looks a bit odd... Just because the domain shutdown code > > >> has been set that way > > doesn't > > >> mean that a vcpu is not deferred in emulation; SCHEDOP_shutdown_code > > >> could easily be called from > > one > > >> vcpu whilst another has emulation pending. > > >> > > >> I'm confused: The deferral is of shutting down the domain, not of > > >> a specific instance of emulation. > > > > > > Now I'm confused... defer_shutdown is per-vcpu. > > > > Right - each vCPU can individually defer shutting down of the domain > > as a whole. > > > > Ok, I think we're only going to make more progress if we know exactly where > the X86EMUL_UNHANDLEABLE > is coming from. > This (untested) patch might help: diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c index c55c4bc4bc..8aa8779ba2 100644 --- a/xen/arch/x86/hvm/ioreq.c +++ b/xen/arch/x86/hvm/ioreq.c @@ -109,12 +109,7 @@ static void hvm_io_assist(struct hvm_ioreq_vcpu *sv, uint64_t data) ioreq_t *ioreq = &v->arch.hvm.hvm_io.io_req; if ( hvm_ioreq_needs_completion(ioreq) ) - { - ioreq->state = STATE_IORESP_READY; ioreq->data = data; - } - else - ioreq->state = STATE_IOREQ_NONE; msix_write_completion(v); vcpu_end_shutdown_deferral(v); @@ -209,6 +204,9 @@ bool handle_hvm_io_completion(struct vcpu *v) } } + ioreq->state = hvm_ioreq_needs_completion(&vio->ioreq) ? + STATE_IORESP_READY : STATE_IOREQ_NONE; + io_completion = vio->io_completion; vio->io_completion = HVMIO_no_completion;

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.