[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: handle_pio looping during domain shutdown, with qemu 4.2.0 in stubdom
> -----Original Message----- > From: Jürgen Groß <jgross@xxxxxxxx> > Sent: 08 June 2020 10:25 > To: paul@xxxxxxx; 'Jan Beulich' <jbeulich@xxxxxxxx>; 'Marek > Marczykowski-Górecki' > <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; 'Ian Jackson' <ian.jackson@xxxxxxxxxxxxx> > Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' > <xen-devel@xxxxxxxxxxxxxxxxxxxx> > Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in > stubdom > > On 08.06.20 11:15, Paul Durrant wrote: > >> -----Original Message----- > >> From: Jan Beulich <jbeulich@xxxxxxxx> > >> Sent: 08 June 2020 09:14 > >> To: 'Marek Marczykowski-Górecki' <marmarek@xxxxxxxxxxxxxxxxxxxxxx>; > >> paul@xxxxxxx > >> Cc: 'Andrew Cooper' <andrew.cooper3@xxxxxxxxxx>; 'xen-devel' > >> <xen-devel@xxxxxxxxxxxxxxxxxxxx> > >> Subject: Re: handle_pio looping during domain shutdown, with qemu 4.2.0 in > >> stubdom > >> > >> On 05.06.2020 18:18, 'Marek Marczykowski-Górecki' wrote: > >>> On Fri, Jun 05, 2020 at 04:39:56PM +0100, Paul Durrant wrote: > >>>>> From: Jan Beulich <jbeulich@xxxxxxxx> > >>>>> Sent: 05 June 2020 14:57 > >>>>> > >>>>> On 05.06.2020 15:37, Paul Durrant wrote: > >>>>>>> From: Jan Beulich <jbeulich@xxxxxxxx> > >>>>>>> Sent: 05 June 2020 14:32 > >>>>>>> > >>>>>>> On 05.06.2020 13:05, Paul Durrant wrote: > >>>>>>>> That would mean we wouldn't be seeing the "Unexpected PIO" message. > >>>>>>>> From that message this > >> clearly > >>>>>>> X86EMUL_UNHANDLEABLE which suggests a race with ioreq server > >>>>>>> teardown, possibly due to > selecting > >> a > >>>>>>> server but then not finding a vcpu match in ioreq_vcpu_list. > >>>>>>> > >>>>>>> I was suspecting such, but at least the tearing down of all servers > >>>>>>> happens only from relinquish-resources, which gets started only > >>>>>>> after ->is_shut_down got set (unless the tool stack invoked > >>>>>>> XEN_DOMCTL_destroydomain without having observed XEN_DOMINF_shutdown > >>>>>>> set for the domain). > >>>>>>> > >>>>>>> For individually unregistered servers - yes, if qemu did so, this > >>>>>>> would be a problem. They need to remain registered until all vCPU-s > >>>>>>> in the domain got paused. > >>>>>> > >>>>>> It shouldn't be a problem should it? Destroying an individual server > >>>>>> is only done with the > domain > >>>>> paused, so no vcpus can be running at the time. > >>>>> > >>>>> Consider the case of one getting destroyed after it has already > >>>>> returned data, but the originating vCPU didn't consume that data > >>>>> yet. Once that vCPU gets unpaused, handle_hvm_io_completion() > >>>>> won't find the matching server anymore, and hence the chain > >>>>> hvm_wait_for_io() -> hvm_io_assist() -> > >>>>> vcpu_end_shutdown_deferral() would be skipped. handle_pio() > >>>>> would then still correctly consume the result. > >>>> > >>>> True, and skipping hvm_io_assist() means the vcpu internal ioreq state > >>>> will be left set to > >> IOREQ_READY and *that* explains why we would then exit hvmemul_do_io() > >> with X86EMUL_UNHANDLEABLE > (from > >> the first switch). > >>> > >>> I can confirm X86EMUL_UNHANDLEABLE indeed comes from the first switch in > >>> hvmemul_do_io(). And it happens shortly after ioreq server is destroyed: > >>> > >>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 reason 0 > >>> (XEN) d12v0 domain 11 domain_shutdown vcpu_id 0 defer_shutdown 1 > >>> (XEN) d12v0 XEN_DMOP_remote_shutdown domain 11 done > >>> (XEN) d12v0 hvm_destroy_ioreq_server called for 11, id 0 > >> > >> Can either of you tell why this is? As said before, qemu shouldn't > >> start tearing down ioreq servers until the domain has made it out > >> of all shutdown deferrals, and all its vCPU-s have been paused. > >> For the moment I think the proposed changes, while necessary, will > >> mask another issue elsewhere. The @releaseDomain xenstore watch, > >> being the trigger I would consider relevant here, will trigger > >> only once XEN_DOMINF_shutdown is reported set for a domain, which > >> gets derived from d->is_shut_down (i.e. not mistakenly > >> d->is_shutting_down). > > > > I can't find anything that actually calls xendevicemodel_shutdown(). It was > > added by: > > destroy_hvm_domain() in qemu does. > Ah ok, thanks. So it looks like this should only normally be called when the guest has written to the PIIX to request shutdown. Presumably the hvm_destroy_ioreq_server call we see afterwards is QEMU then exiting. There is one other circumstance when destroy_hvmdomain() would be called and that is if the ioreq state is not STATE_IOREQ_INPROCESS... in which case there should be an accompanying error message in the qemu log. Paul > > Juergen
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |