|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 4 of 5 V3] tools/libxl: Control network buffering in remus callbacks [and 1 more messages]
Shriram Rajagopalan writes ("Re: [PATCH 4 of 5 V3] tools/libxl: Control network
buffering in remus callbacks [and 1 more messages]"):
> On Mon, Nov 4, 2013 at 10:06 AM, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>
> wrote:
> Which of the xc_domain_save (and _restore) callbacks are called each
> remus iteration ?
>
> Almost all of them on the xc_domain_save side. (suspend, resume,
> save_qemu state, checkpoint).
Right.
> xc_domain_restore doesn't have any
> callbacks AFAIK. And remus as of now does not have a component on
> the restore side. It piggybacks on live migration's restore
> framework.
But the libxl memory management in the restore code is currently
written to assume a finite lifetime for the ao. So I think this needs
to be improved.
Perhaps all the suspend/restore callbacks should each get one of the
nested ao type things that Roger needs for his driver domain daemon.
> FWIW, the remus related code that executes per iteration does not
> allocate anything. All allocations happen only during setup and I
> was under the impression that no other allocations are taking place
> everytime xc_domain_save calls back into libxl.
If this is true, then good, because we don't need to do anything, but
there is a lot of code there and I would want to check.
> However, it may be possible that other parts of the AO machinery
> (and there are a lot of them) are allocating stuff per
> iteration. And if that is the case, it could easily lead to OOMs
> since Remus technically runs as long as the domain lives.
The ao and event machinery doesn't do much allocation itself.
> Having said that, libxl is not performance-optimised. Indeed the
> callback mechanism involves context switching, and IPC, between the
> save/restore helper and libxl proper. Probably not too much to be
> doing every 20ms for a single domain, but if you have a lot of these
> it's going to end up taking a lot of dom0 cpu etc.
>
> Yes and that is a problem. Xend+Remus avoided this by linking
> the libcheckpoint library that interfaced with both the python & libxc code.
Have you observed whether the performance is acceptable with your V3
patches ?
> I assume you're not doing this for HVM domains, which involve saving
> the qemu state each time too.
...
> It includes HVM domains too. Although in that case, xenstore based suspend
> takes about 5ms. So the checkpoint interval is typically 50ms or so.
Right.
> If there is a latency sensitive task running inside
> the VM, lower checkpoint interval leads to better performance.
Yes.
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |