[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time
> -----Original Message----- > From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] > Sent: 11 June 2015 02:14 > To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx > Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; > yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson > Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq > page only one time > > On 06/10/2015 07:47 PM, Paul Durrant wrote: > >> -----Original Message----- > >> From: xen-devel-bounces@xxxxxxxxxxxxx [mailto:xen-devel- > >> bounces@xxxxxxxxxxxxx] On Behalf Of Wen Congyang > >> Sent: 10 June 2015 12:38 > >> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- > devel@xxxxxxxxxxxxx > >> Cc: Wei Liu; Ian Campbell; guijianfeng@xxxxxxxxxxxxxx; > >> yunhong.jiang@xxxxxxxxx; Eddie Dong; rshriram@xxxxxxxxx; Ian Jackson > >> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero > ioreq > >> page only one time > >> > >> On 06/10/2015 06:58 PM, Paul Durrant wrote: > >>>> -----Original Message----- > >>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] > >>>> Sent: 10 June 2015 11:55 > >>>> To: Paul Durrant; Andrew Cooper; Yang Hongyang; xen- > >> devel@xxxxxxxxxxxxx > >>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong; > >>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson > >>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero > >> ioreq > >>>> page only one time > >>>> > >>>> On 06/10/2015 06:40 PM, Paul Durrant wrote: > >>>>>> -----Original Message----- > >>>>>> From: Wen Congyang [mailto:wency@xxxxxxxxxxxxxx] > >>>>>> Sent: 10 June 2015 10:06 > >>>>>> To: Andrew Cooper; Yang Hongyang; xen-devel@xxxxxxxxxxxxx; Paul > >>>> Durrant > >>>>>> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong; > >>>>>> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson > >>>>>> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: > zero > >>>> ioreq > >>>>>> page only one time > >>>>>> > >>>>>> Cc: Paul Durrant > >>>>>> > >>>>>> On 06/10/2015 03:44 PM, Andrew Cooper wrote: > >>>>>>> On 10/06/2015 06:26, Yang Hongyang wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> On 06/09/2015 03:30 PM, Andrew Cooper wrote: > >>>>>>>>> On 09/06/2015 01:59, Yang Hongyang wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote: > >>>>>>>>>>> On 08/06/15 10:58, Yang Hongyang wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote: > >>>>>>>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote: > >>>>>>>>>>>>>> ioreq page contains evtchn which will be set when we > >> resume > >>>> the > >>>>>>>>>>>>>> secondary vm the first time. The hypervisor will check if > the > >>>>>>>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page > more > >>>>>>>>>>>>>> than one time. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the > vm > >> is > >>>>>>>>>>>>>> suspended, so it is OK if we only zero it one time. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx> > >>>>>>>>>>>>>> Signed-off-by: Wen congyang <wency@xxxxxxxxxxxxxx> > >>>>>>>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> > >>>>>>>>>>>>> > >>>>>>>>>>>>> The issue here is that we are running the restore algorithm > >> over > >>>> a > >>>>>>>>>>>>> domain which has already been running in Xen for a while. > >> This > >>>> is a > >>>>>>>>>>>>> brand new usecase, as far as I am aware. > >>>>>>>>>>>> > >>>>>>>>>>>> Exactly. > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Does the qemu process associated with this domain get > >> frozen > >>>>>>>>>>>>> while the > >>>>>>>>>>>>> secondary is being reset, or does the process get destroyed > >> and > >>>>>>>>>>>>> recreated. > >>>>>>>>>>>> > >>>>>>>>>>>> What do you mean by reset? do you mean secondary is > >>>> suspended > >>>>>> at > >>>>>>>>>>>> checkpoint? > >>>>>>>>>>> > >>>>>>>>>>> Well - at the point that the buffered records are being > >> processed, > >>>> we > >>>>>>>>>>> are in the process of resetting the state of the secondary to > >> match > >>>>>>>>>>> the > >>>>>>>>>>> primary. > >>>>>>>>>> > >>>>>>>>>> Yes, at this point, the qemu process associated with this > domain is > >>>>>>>>>> frozen. > >>>>>>>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in > >> qemu) > >>>> to > >>>>>>>>>> pause > >>>>>>>>>> qemu. After we processed all records, qemu will be restored > with > >>>> the > >>>>>>>>>> received > >>>>>>>>>> state, that's why we add a > >>>> libxl__qmp_restore(qemu_load_vmstate() > >>>>>> in > >>>>>>>>>> qemu) > >>>>>>>>>> api to restore qemu with received state. Currently in libxl, > qemu > >> only > >>>>>>>>>> start > >>>>>>>>>> with the received state, there's no api to load received state > while > >>>>>>>>>> qemu is > >>>>>>>>>> running for a while. > >>>>>>>>> > >>>>>>>>> Now I consider this more, it is absolutely wrong to not zero the > >> page > >>>>>>>>> here. The event channel in the page is not guaranteed to be the > >>>> same > >>>>>>>>> between the primary and secondary, > >>>>>>>> > >>>>>>>> That's why we don't zero it on secondary. > >>>>>>> > >>>>>>> I think you missed my point. Apologies for the double negative. It > >>>>>>> must, under all circumstances, be zeroed at this point, for safety > >>>> reasons. > >>>>>>> > >>>>>>> The page in question is subject to logdirty just like any other guest > >>>>>>> pages, which means that if the guest writes to it naturally (i.e. not > >>>>>>> a > >>>>>>> Xen or Qemu write, both of whom have magic mappings which are > >> not > >>>>>>> subject to logdirty), it will be transmitted in the stream. As the > >>>>>>> event channel could be different, the lack of zeroing it at this point > >>>>>>> means that the event channel would be wrong as opposed to > simply > >>>>>>> missing. This is a worse position to be in. > >>>>>> > >>>>>> The guest should not access this page. I am not sure if the guest can > >>>>>> access the ioreq page. > >>>>>> > >>>>>> But in the exceptional case, the ioreq page is dirtied, and is copied > >>>>>> to > >>>>>> the secondary vm. The ioreq page will contain a wrong event > channel, > >> the > >>>>>> hypervisor will check it: if the event channel is wrong, the guest will > >>>>>> be crashed. > >>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>>> and we don't want to unexpectedly > >>>>>>>>> find a pending/in-flight ioreq. > >>>>>>>> > >>>>>>>> ioreq->state is always STATE_IOREQ_NONE after the vm is > >> suspended, > >>>>>> there > >>>>>>>> should be no pending/in-flight ioreq at checkpoint. > >>>>>>> > >>>>>>> In the common case perhaps, but we must consider the > exceptional > >>>> case. > >>>>>>> The exceptional case here is some corruption which happens to > >> appear > >>>> as > >>>>>>> an in-flight ioreq. > >>>>>> > >>>>>> If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the > >>>>>> hypervisor > >>>>>> has a bug, anything can happen. I think we should trust the > hypervisor. > >>>>>> > >>>>>>> > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Either qemu needs to take care of re-initialising the event > channels > >>>>>>>>> back to appropriate values, or Xen should tolerate the channels > >>>>>>>>> disappearing. > >>>>>>> > >>>>>>> I still stand by this statement. I believe it is the only safe way of > >>>>>>> solving the issue you have discovered. > >>>>>> > >>>>>> Add a new qemu monitor command to update ioreq page? > >>>>>> > >>>>> > >>>>> If you're attaching to a 'new' VM (i.e one with an updated image) then > I > >>>> suspect you're going to have to destroy and re-create the ioreq server > so > >>>> that the shared page gets re-populated with the correct event > channels. > >>>> Either that or you're going to have to ensure that the page is not part > >>>> of > >>>> restored image and sample the new one that Xen should have set up. > >>>> > >>>> > >>>> I agree with it. I will try to add a new qemu monitor command(or do it > >> when > >>>> updating qemu's state) to destroy and re-create it. > >>> > >>> The slightly tricky part of that is that you're going to have to cache and > >> replay all the registrations that were done on the old instance, but you > need > >> to do that in any case as it's not state that is transferred in the VM save > >> record. > >> > >> Why do we have to cache and replay all the registrations that were done > on > >> the old instance? > > > > Do you not have device models that you need to continue to function? > When the ioreq server is torn down then all MMIO, port IO and PCI config > ranges that were mapped to it will disappear. > > Yes, I don't known which should be done unless I implement and test it. > > I have some questions about it: > 1. Can guest access the ioreq page? If the page is modified by the guest > unexpectedly, > what will happen? No, the guest cannot modify the pages once a non-default ioreq server is active. The pages are removed from the guest P2M when it is activated, which is one of the reasons for modifying QEMU to not behave as a legacy default server. > 2. If the ioreq page is dirtied by the guest, it will be transfered from > primary > to secondary during checkpoint. The evtchn is invalid, I think the best > behavior > is that: make the guest crashed, not continue to run. As I said, the pages are not in the P2M if the server is active so they will not be transferred as part of the VM state. However, this presents a problem; at the far end, the emulator will not be able to hook into the guest. So, when the source domain is paused, the ioreq server needs to be torn down (so that its pages are re-inserted into the P2M and marked dirty for transfer). This is what happens in a normal migration. One extra problem you have is that the source domain is not then killed, it is resumed along with the emulator. Thus, on resume, the emulator needs to create a new ioreq server and re-register all its device models with that new server. I don't know the detail of what you do at the far end, but if you always start a new emulator instance using the QEMU save record then you should be fine (just like with a normal migration). Paul > > Thanks > Wen Congyang > > > > > Paul > > > >> We will set to the guest to a new state, the old state should be dropped. > >> > >> Thanks > >> Wen Congyang > >> > >>> > >>> Paul > >>> > >>>> > >>>> Thanks > >>>> Wen Congyang > >>>> > >>>>> > >>>>> Paul > >>>>> > >>>>> > >>>>>> Thanks > >>>>>> Wen Congyang > >>>>>> > >>>>>>> > >>>>>>> ~Andrew > >>>>>>> . > >>>>>>> > >>>>> > >>>>> . > >>>>> > >>> > >>> . > >>> > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@xxxxxxxxxxxxx > >> http://lists.xen.org/xen-devel > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxx > > http://lists.xen.org/xen-devel > > . > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |