[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC v2] xSplice design



On Fri, Jun 12, 2015 at 01:39:05PM +0200, Martin Pohlack wrote:
> On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote:
> [...]
> > ## Hypercalls
> > 
> > We will employ the sub operations of the system management hypercall 
> > (sysctl).
> > There are to be four sub-operations:
> > 
> >  * upload the payloads.
> >  * listing of payloads summary uploaded and their state.
> >  * getting an particular payload summary and its state.
> >  * command to apply, delete, or revert the payload.
> > 
> > The patching is asynchronous therefore the caller is responsible
> > to verify that it has been applied properly by retrieving the summary of it
> > and verifying that there are no error codes associated with the payload.
> > 
> > We **MUST** make it asynchronous due to the nature of patching: it requires
> > every physical CPU to be lock-step with each other. The patching mechanism
> > while an implementation detail, is not an short operation and as such
> > the design **MUST** assume it will be an long-running operation.
> 
> I am not convinced yet, that you need an asynchronous approach here.
> 
> The experience from our prototype suggests that hotpatching itself is
> not an expensive operation.  It can usually be completed well below 1ms
> with the most expensive part being getting the hypervisor to a quiet state.
> 
> If we go for a barrier at hypervisor exit, combined with forcing all
> other CPUs through the hypervisor with IPIs, the typical case is very quick.
> 
> The only reason why that would take some time is, if another CPU is
> executing a lengthy operation in the hypervisor already.  In that case,
> you probably don't want to block the whole machine waiting for the
> joining of that single CPU anyway and instead re-try later, for example,
> using a timeout on the barrier.  That could be signaled to the user-land
> process (EAGAIN) so that he could re-attempt hotpatching after some seconds.

Which is also an asynchronous operation.

The experience with previous preemption XSAs have left me quite afraid of
long-running operations - which is why I was thinking to have this
baked this at the start.

Both ways - EAGAIN or doing an _GET_STATUS would provide an mechanism for
the VCPU to do other work instead of being tied up.

The EAGAIN mandates that the 'bringing the CPUs together' must be done
under 1ms and that there must be code to enforce an timeout on the barrier.

The _GET_STATUS does not enforce this and can take longer giving us
more breathing room - and also unbounded time - which means if
we were to try to cancel it (say it had run for an hour and still
could not patch it)- we have to add some hairy code to
deal with cancelling asynchronous code.

Your way is simpler - but I would advocate expanding the -EAGAIN to _all_
the xSplice hypercalls. Thoughts?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.