[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]



Ian Campbell writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network 
buffering in remus callbacks [and 1 more messages]"):
> Regardless of the answer here, would it make sense to do some/all of the
> checkpoint processing in the helper subprocess anyway and only signal
> the eventual failover up to the libxl process?

It might do.  Mind you, the code in libxc is tangled enough as it is
and is due for a rewrite.  Perhaps this could be done in the helper
executable, although there isn't currently any way to easily
intersperse code in there.

> This async op is potentially quite long running I think compared to a
> normal one i.e. if the guest doesn't die it is expected that the ao
> lives "forever". Since the associated gc's persist until the ao ends
> this might end up accumulating lots of allocations? Ian had a similar
> concern about Roger's hotplug daemon series and suggested creating a per
> iteration gc or something.

Yes, this is indeed a problem.  Well spotted.

Which of the xc_domain_save (and _restore) callbacks are called each
remus iteration ?

I think introducing a per-iteration gc here is going to involve taking
some care, since we need to be sure not to put
per-iteraton-gc-allocated objects into data structures which are used
by subsequent iterations.

Shriram writes:
> Fair enough. My question is what is the overhead of setting up, firing
> and tearing down a timeout event using the event gen framework, if I
> wish to checkpoint the VM, say every 20ms ?

The ultimate cost of going back into the event loop to wait for a
timeout will depend on what else the process is doing.  If the process
is doing nothing else, it's about two calls to gettimeofday and one to
poll.  Plus a bit of in-process computation, but that's going to be
swamped by system call overhead.

Having said that, libxl is not performance-optimised.  Indeed the
callback mechanism involves context switching, and IPC, between the
save/restore helper and libxl proper.  Probably not too much to be
doing every 20ms for a single domain, but if you have a lot of these
it's going to end up taking a lot of dom0 cpu etc.

I assume you're not doing this for HVM domains, which involve saving
the qemu state each time too.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.