[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2] xSplice design



On Fri, Jun 12, 2015 at 12:09:24PM -0400, Konrad Rzeszutek Wilk wrote:
> On Fri, Jun 12, 2015 at 04:31:12PM +0200, Martin Pohlack wrote:
> > On 12.06.2015 16:03, Konrad Rzeszutek Wilk wrote:
> > > On Fri, Jun 12, 2015 at 01:39:05PM +0200, Martin Pohlack wrote:
> > >> On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote:
> > >> [...]
> > >>> ## Hypercalls
> > >>>
> > >>> We will employ the sub operations of the system management hypercall 
> > >>> (sysctl).
> > >>> There are to be four sub-operations:
> > >>>
> > >>>  * upload the payloads.
> > >>>  * listing of payloads summary uploaded and their state.
> > >>>  * getting an particular payload summary and its state.
> > >>>  * command to apply, delete, or revert the payload.
> > >>>
> > >>> The patching is asynchronous therefore the caller is responsible
> > >>> to verify that it has been applied properly by retrieving the summary 
> > >>> of it
> > >>> and verifying that there are no error codes associated with the payload.
> > >>>
> > >>> We **MUST** make it asynchronous due to the nature of patching: it 
> > >>> requires
> > >>> every physical CPU to be lock-step with each other. The patching 
> > >>> mechanism
> > >>> while an implementation detail, is not an short operation and as such
> > >>> the design **MUST** assume it will be an long-running operation.
> > >>
> > >> I am not convinced yet, that you need an asynchronous approach here.
> > >>
> > >> The experience from our prototype suggests that hotpatching itself is
> > >> not an expensive operation.  It can usually be completed well below 1ms
> > >> with the most expensive part being getting the hypervisor to a quiet 
> > >> state.
> > >>
> > >> If we go for a barrier at hypervisor exit, combined with forcing all
> > >> other CPUs through the hypervisor with IPIs, the typical case is very 
> > >> quick.
> > >>
> > >> The only reason why that would take some time is, if another CPU is
> > >> executing a lengthy operation in the hypervisor already.  In that case,
> > >> you probably don't want to block the whole machine waiting for the
> > >> joining of that single CPU anyway and instead re-try later, for example,
> > >> using a timeout on the barrier.  That could be signaled to the user-land
> > >> process (EAGAIN) so that he could re-attempt hotpatching after some 
> > >> seconds.
> > > 
> > > Which is also an asynchronous operation.
> > 
> > Right, but in userland.  My main aim is to have as little complicated
> > code as possible in the hypervisor for obvious reasons.  This approach
> > would not require any further tracking of state in the hypervisor.
> 
> True.
> > 
> > > The experience with previous preemption XSAs have left me quite afraid of
> > > long-running operations - which is why I was thinking to have this
> > > baked this at the start.
> > > 
> > > Both ways - EAGAIN or doing an _GET_STATUS would provide an mechanism for
> > > the VCPU to do other work instead of being tied up.
> > 
> > If I understood your proposal correctly, there is a difference.  With
> > EAGAIN, all activity is dropped and the machine remains fully available
> > to whatever guests are running at the time.
> 
> Correct.
> > 
> > With _GET_STATUS, you would continue to try to bring the hypervisor to a
> > quiet state in the background but return to userland to let this one
> > thread continue.  Behind the scenes though, you would still need to
> 
> <nods>
> > capture all CPUs at one point and all captured CPUs would have to wait
> > for the last straggler.  That would lead to noticeable dead-time for
> > guests running on-top.
> 
> Potentially. Using the time calibration routine to do the patching guarantees
> that we will have an sync-up every second on machine - so there will be always
> that possiblity.
> > 
> > I might have misunderstood your proposal though.
> 
> You got it right.

As I was going over the v3 of this, I reread this and realized that in
fact I was not explaining it properly.

The 'dead-time' you refer to I presume is the time wherein the guests
don't get to run as we are waiting (spinning) on other CPUs.

That is not the case here - the 'capture all CPUs at one point and all
captured CPUs would have to wait for the last straggler' is an
implementation details. One can implement it that way (which is
what I think you have with time-out barriers at hypervisor exits),
or we can piggyback on the time rendevous code which syncs all CPUs.

Either way - we will always have an opportunity to have the CPUs
all synced and can utilize both if need to.

> > 
> > > The EAGAIN mandates that the 'bringing the CPUs together' must be done
> > > under 1ms and that there must be code to enforce an timeout on the 
> > > barrier.
> > 
> > The 1ms is just a random number.  I would actually suggest to allow a
> > sysadmin or hotpatch management tooling to specify how long one is
> > willing to potentially block the whole machine when waiting for a
> > stop_machine-like barrier as part of a relevant hypercall.  You could
> > imagine userland to start out with 1ms and slowly work its way up
> > whenever it retries.
> > 
> > > The _GET_STATUS does not enforce this and can take longer giving us
> > > more breathing room - and also unbounded time - which means if
> > > we were to try to cancel it (say it had run for an hour and still
> > > could not patch it)- we have to add some hairy code to
> > > deal with cancelling asynchronous code.

.. Which I am not sure if is really needed. I think if we baked
that if the patching cannot be done within a certain time (2seconds)
regardless of the implementation - it would provide an error in
the `status' for the _GET_STATUS.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.