[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq page only one time



> -----Original Message-----
> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
> Sent: 10 June 2015 11:08
> To: Wen Congyang; Yang Hongyang; xen-devel@xxxxxxxxxxxxx; Paul Durrant
> Cc: Wei Liu; Ian Campbell; yunhong.jiang@xxxxxxxxx; Eddie Dong;
> guijianfeng@xxxxxxxxxxxxxx; rshriram@xxxxxxxxx; Ian Jackson
> Subject: Re: [Xen-devel] [PATCH v2 COLOPre 03/13] libxc/restore: zero ioreq
> page only one time
> 
> On 10/06/15 10:06, Wen Congyang wrote:
> > Cc: Paul Durrant
> >
> > On 06/10/2015 03:44 PM, Andrew Cooper wrote:
> >> On 10/06/2015 06:26, Yang Hongyang wrote:
> >>>
> >>> On 06/09/2015 03:30 PM, Andrew Cooper wrote:
> >>>> On 09/06/2015 01:59, Yang Hongyang wrote:
> >>>>>
> >>>>> On 06/08/2015 06:15 PM, Andrew Cooper wrote:
> >>>>>> On 08/06/15 10:58, Yang Hongyang wrote:
> >>>>>>>
> >>>>>>> On 06/08/2015 05:46 PM, Andrew Cooper wrote:
> >>>>>>>> On 08/06/15 04:43, Yang Hongyang wrote:
> >>>>>>>>> ioreq page contains evtchn which will be set when we resume
> the
> >>>>>>>>> secondary vm the first time. The hypervisor will check if the
> >>>>>>>>> evtchn is corrupted, so we cannot zero the ioreq page more
> >>>>>>>>> than one time.
> >>>>>>>>>
> >>>>>>>>> The ioreq->state is always STATE_IOREQ_NONE after the vm is
> >>>>>>>>> suspended, so it is OK if we only zero it one time.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Yang Hongyang <yanghy@xxxxxxxxxxxxxx>
> >>>>>>>>> Signed-off-by: Wen congyang <wency@xxxxxxxxxxxxxx>
> >>>>>>>>> CC: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> >>>>>>>> The issue here is that we are running the restore algorithm over a
> >>>>>>>> domain which has already been running in Xen for a while.  This is
> a
> >>>>>>>> brand new usecase, as far as I am aware.
> >>>>>>> Exactly.
> >>>>>>>
> >>>>>>>> Does the qemu process associated with this domain get frozen
> >>>>>>>> while the
> >>>>>>>> secondary is being reset, or does the process get destroyed and
> >>>>>>>> recreated.
> >>>>>>> What do you mean by reset? do you mean secondary is suspended
> at
> >>>>>>> checkpoint?
> >>>>>> Well - at the point that the buffered records are being processed,
> we
> >>>>>> are in the process of resetting the state of the secondary to match
> >>>>>> the
> >>>>>> primary.
> >>>>> Yes, at this point, the qemu process associated with this domain is
> >>>>> frozen.
> >>>>> the suspend callback will call libxl__qmp_stop(vm_stop() in qemu) to
> >>>>> pause
> >>>>> qemu. After we processed all records, qemu will be restored with the
> >>>>> received
> >>>>> state, that's why we add a libxl__qmp_restore(qemu_load_vmstate()
> in
> >>>>> qemu)
> >>>>> api to restore qemu with received state. Currently in libxl, qemu only
> >>>>> start
> >>>>> with the received state, there's no api to load received state while
> >>>>> qemu is
> >>>>> running for a while.
> >>>> Now I consider this more, it is absolutely wrong to not zero the page
> >>>> here.  The event channel in the page is not guaranteed to be the same
> >>>> between the primary and secondary,
> >>> That's why we don't zero it on secondary.
> >> I think you missed my point.  Apologies for the double negative.   It
> >> must, under all circumstances, be zeroed at this point, for safety reasons.
> >>
> >> The page in question is subject to logdirty just like any other guest
> >> pages, which means that if the guest writes to it naturally (i.e. not a
> >> Xen or Qemu write, both of whom have magic mappings which are not
> >> subject to logdirty), it will be transmitted in the stream.  As the
> >> event channel could be different, the lack of zeroing it at this point
> >> means that the event channel would be wrong as opposed to simply
> >> missing.  This is a worse position to be in.
> > The guest should not access this page. I am not sure if the guest can
> > access the ioreq page.
> 
> "should not" and "can't" are two very different things.  We have had
> XSAs covering the fact that the guest can write to these pages in the past.
> 
> In practice, a guest can't actually query the appropriate hvmparam, but
> it can rely on the fact that the domain builder is incredibly
> predictable in this regard.
> 
> >
> > But in the exceptional case, the ioreq page is dirtied, and is copied to
> > the secondary vm. The ioreq page will contain a wrong event channel, the
> > hypervisor will check it: if the event channel is wrong, the guest will
> > be crashed.
> 
> This is my point.  It is completely legitimate for the event channels to
> be different between the primary and secondary, which means that we
> should be capable of dealing cleanly with the fallout when the bufioreq
> page does appear as dirty update.
> 
> >
> >>>> and we don't want to unexpectedly
> >>>> find a pending/in-flight ioreq.
> >>> ioreq->state is always STATE_IOREQ_NONE after the vm is suspended,
> there
> >>> should be no pending/in-flight ioreq at checkpoint.
> >> In the common case perhaps, but we must consider the exceptional case.
> >> The exceptional case here is some corruption which happens to appear as
> >> an in-flight ioreq.
> > If the state is STATE_IOREQ_NONE, it may be hypervisor's bug. If the
> hypervisor
> > has a bug, anything can happen. I think we should trust the hypervisor.
> 
> In the worst case, the contents of the pages can be completely
> arbitrary.  Zeroing of the pages is to cover the case where there is
> junk present, so Xen doesn't crash the guest due to a bad ioreq state.
> 
> I think Xen's behaviour is legitimate here.  If it observes wonky ioreq
> state, all bets are off.
> 
> >
> >>>> Either qemu needs to take care of re-initialising the event channels
> >>>> back to appropriate values, or Xen should tolerate the channels
> >>>> disappearing.
> >> I still stand by this statement.  I believe it is the only safe way of
> >> solving the issue you have discovered.
> > Add a new qemu monitor command to update ioreq page?
> 
> Who/what actually complains about the event channel?  I can't see any
> event channels in the ABI for the pages.
> 

QEMU only samples the event channels from the shared ioreq page on startup as 
it does not expect them to change in its lifetime.

  Paul

> ~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.