[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC v2] xSplice design
On Fri, Jun 12, 2015 at 01:39:05PM +0200, Martin Pohlack wrote: > On 15.05.2015 21:44, Konrad Rzeszutek Wilk wrote: > [...] > > ## Hypercalls > > > > We will employ the sub operations of the system management hypercall > > (sysctl). > > There are to be four sub-operations: > > > > * upload the payloads. > > * listing of payloads summary uploaded and their state. > > * getting an particular payload summary and its state. > > * command to apply, delete, or revert the payload. > > > > The patching is asynchronous therefore the caller is responsible > > to verify that it has been applied properly by retrieving the summary of it > > and verifying that there are no error codes associated with the payload. > > > > We **MUST** make it asynchronous due to the nature of patching: it requires > > every physical CPU to be lock-step with each other. The patching mechanism > > while an implementation detail, is not an short operation and as such > > the design **MUST** assume it will be an long-running operation. > > I am not convinced yet, that you need an asynchronous approach here. > > The experience from our prototype suggests that hotpatching itself is > not an expensive operation. It can usually be completed well below 1ms > with the most expensive part being getting the hypervisor to a quiet state. > > If we go for a barrier at hypervisor exit, combined with forcing all > other CPUs through the hypervisor with IPIs, the typical case is very quick. > > The only reason why that would take some time is, if another CPU is > executing a lengthy operation in the hypervisor already. In that case, > you probably don't want to block the whole machine waiting for the > joining of that single CPU anyway and instead re-try later, for example, > using a timeout on the barrier. That could be signaled to the user-land > process (EAGAIN) so that he could re-attempt hotpatching after some seconds. Which is also an asynchronous operation. The experience with previous preemption XSAs have left me quite afraid of long-running operations - which is why I was thinking to have this baked this at the start. Both ways - EAGAIN or doing an _GET_STATUS would provide an mechanism for the VCPU to do other work instead of being tied up. The EAGAIN mandates that the 'bringing the CPUs together' must be done under 1ms and that there must be code to enforce an timeout on the barrier. The _GET_STATUS does not enforce this and can take longer giving us more breathing room - and also unbounded time - which means if we were to try to cancel it (say it had run for an hour and still could not patch it)- we have to add some hairy code to deal with cancelling asynchronous code. Your way is simpler - but I would advocate expanding the -EAGAIN to _all_ the xSplice hypercalls. Thoughts? _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |