Xen project Mailing List

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

To: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>, Wen Congyang <wency@xxxxxxxxxxxxxx>, Yang Hongyang <yanghy@xxxxxxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxx>

From: Paul Durrant <Paul.Durrant@xxxxxxxxxx>

Date: Wed, 10 Jun 2015 10:35:02 +0000

Accept-language: en-GB, en-US

Cc: Wei Liu <wei.liu2@xxxxxxxxxx>, Ian Campbell <Ian.Campbell@xxxxxxxxxx>, "guijianfeng@xxxxxxxxxxxxxx" <guijianfeng@xxxxxxxxxxxxxx>, "yunhong.jiang@xxxxxxxxx" <yunhong.jiang@xxxxxxxxx>, Eddie Dong <eddie.dong@xxxxxxxxx>, "rshriram@xxxxxxxxx" <rshriram@xxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxx>

Delivery-date: Wed, 10 Jun 2015 10:35:08 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

Thread-index: AQHQodA50QXfGUHZAUa1cH58uzZqCJ2iPkSAgAAEsICAAPcbgIAAbRGAgAFvmICAACa+gIAAFsaAgAARRoCAACjF4A==

Thread-topic: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time

> -----Original Message----- > From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx] > Sent: 10 June 2015 11:08 > To: Wen Congyang; Yang Hongyang; xen-devel@xxxxxxxxxxxxx; Paul Durrant > Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong; > guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson > Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq > page only one time > > On 10/06/15 10:06, Wen Congyang wrote: > > Cc: Paul Durrant > > > > On 06/10/2015 03:44 PM, Andrew Cooper wrote: > >> On 10/06/2015 06:26, Yang Hongyang wrote: > >>> > >>> On 06/09/2015 03:30 PM, Andrew Cooper wrote: > >>>> On 09/06/2015 01:59, Yang Hongyang wrote: > >>>>> > >>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote: > >>>>>> On 08/06/15 10:58, Yang Hongyang wrote: > >>>>>>> > >>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote: > >>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote: > >>>>>>>>> ioreq page contains evtchn which will be set when we resume > the > >>>>>>>>> secondary vm the first time. The hypervisor will check if the > >>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more > >>>>>>>>> than one time. > >>>>>>>>> > >>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is > >>>>>>>>> suspended, so it is OK if we only zero it one time. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx> > >>>>>>>>> Signed-off-by: Wen congyang <wency@xxxxxxxxxxxxxx> > >>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > >>>>>>>> The issue here is that we are running the restore algorithm over a > >>>>>>>> domain which has already been running in Xen for a while. This is > a > >>>>>>>> brand new usecase, as far as I am aware. > >>>>>>> Exactly. > >>>>>>> > >>>>>>>> Does the qemu process associated with this domain get frozen > >>>>>>>> while the > >>>>>>>> secondary is being reset, or does the process get destroyed and > >>>>>>>> recreated. > >>>>>>> What do you mean by reset? do you mean secondary is suspended > at > >>>>>>> checkpoint? > >>>>>> Well - at the point that the buffered records are being processed, > we > >>>>>> are in the process of resetting the state of the secondary to match > >>>>>> the > >>>>>> primary. > >>>>> Yes, at this point, the qemu process associated with this domain is > >>>>> frozen. > >>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to > >>>>> pause > >>>>> qemu. After we processed all records, qemu will be restored with the > >>>>> received > >>>>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate() > in > >>>>> qemu) > >>>>> api to restore qemu with received state. Currently in libxl, qemu only > >>>>> start > >>>>> with the received state, there's no api to load received state while > >>>>> qemu is > >>>>> running for a while. > >>>> Now I consider this more, it is absolutely wrong to not zero the page > >>>> here. The event channel in the page is not guaranteed to be the same > >>>> between the primary and secondary, > >>> That's why we don't zero it on secondary. > >> I think you missed my point. Apologies for the double negative. It > >> must, under all circumstances, be zeroed at this point, for safety reasons. > >> > >> The page in question is subject to logdirty just like any other guest > >> pages, which means that if the guest writes to it naturally (i.e. not a > >> Xen or Qemu write, both of whom have magic mappings which are not > >> subject to logdirty), it will be transmitted in the stream. As the > >> event channel could be different, the lack of zeroing it at this point > >> means that the event channel would be wrong as opposed to simply > >> missing. This is a worse position to be in. > > The guest should not access this page. I am not sure if the guest can > > access the ioreq page. > > "should not" and "can't" are two very different things. We have had > XSAs covering the fact that the guest can write to these pages in the past. > > In practice, a guest can't actually query the appropriate hvmparam, but > it can rely on the fact that the domain builder is incredibly > predictable in this regard. > > > > > But in the exceptional case, the ioreq page is dirtied, and is copied to > > the secondary vm. The ioreq page will contain a wrong event channel, the > > hypervisor will check it: if the event channel is wrong, the guest will > > be crashed. > > This is my point. It is completely legitimate for the event channels to > be different between the primary and secondary, which means that we > should be capable of dealing cleanly with the fallout when the bufioreq > page does appear as dirty update. > > > > >>>> and we don't want to unexpectedly > >>>> find a pending/in-flight ioreq. > >>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended, > there > >>> should be no pending/in-flight ioreq at checkpoint. > >> In the common case perhaps, but we must consider the exceptional case. > >> The exceptional case here is some corruption which happens to appear as > >> an in-flight ioreq. > > If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the > hypervisor > > has a bug, anything can happen. I think we should trust the hypervisor. > > In the worst case, the contents of the pages can be completely > arbitrary. Zeroing of the pages is to cover the case where there is > junk present, so Xen doesn't crash the guest due to a bad ioreq state. > > I think Xen's behaviour is legitimate here. If it observes wonky ioreq > state, all bets are off. > > > > >>>> Either qemu needs to take care of re-initialising the event channels > >>>> back to appropriate values, or Xen should tolerate the channels > >>>> disappearing. > >> I still stand by this statement. I believe it is the only safe way of > >> solving the issue you have discovered. > > Add a new qemu monitor command to update ioreq page? > > Who/what actually complains about the event channel? I can't see any > event channels in the ABI for the pages. > QEMU only samples the event channels from the shared ioreq page on startup as it does not expect them to change in its lifetime. Paul > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.