[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [RFC] use event channel to improve suspend speed
On Friday, 11 May 2007 at 07:55, Keir Fraser wrote: > On 11/5/07 00:00, "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote: > > > It would be interesting to know what aspect of the xenstore interaction > > is responsible for the slowdown. In particular, whether it is a fundamental > > architectural constraint, or whether it is merely due to the poor > > performance > > of the current impl. We already know from previous tests that XenD impl of > > transactions absolutely kills performance of various XenD operations due to > > the vast amount of unneccessary I/O it does. > > > > If fixing the XenstoreD transaction code were to help suspend performance > > too, it might be a better option than re-writing all code which touches > > xenstore. A quick test of putting /var/lib/xenstored on a ramdisk would > > be a way of testing whether its the I/O which is hurting suspend time. > > Yes. We could go either way -- it wouldn't be too bad to add support via > dynamic VIRQ_DOM_EXC for example, or add other things to get xenstore off > the critical path for save/restore. But if the problem is that xenstored > sucks it probably is worth investing a bit of time to tackle the problem > directly and see where the time is going. We could end up with optimisations > which have benefits beyond just save/restore. I'm sure xenstore could be made significantly faster, but barring a redesign maybe it's better just to use it for low-frequency transactions with pretty loose latency expectations? Running the suspend notification through xenstore, to xend and finally back to xc_save (as the current code does) seems convoluted, and bound to create opportunities for bad scheduling compared to directly notifying xc_save. In case there's interest, I'll attach the two patches I'm using to speed up checkpointing (and live migration downtime). As I mentioned earlier, the first patch should be semantically equivalent to existing code, and cuts downtime to about 30-35ms. The second notifies xend that the domain has been suspended asynchronously, so that final round memory copying may begin before device migration stage 2. This is a semantic change, but I can't think of a concrete drawback. It's a little rough-and-ready -- suggestions for improvement are welcome. Here are some stats on final round time (100 runs): xen 3.1: avg: 93.40 ms, min: 72.59, max: 432.46, median: 85.10 patch 1 (trigger suspend via event channel): avg: 43.69 ms, min: 35.21, max: 409.50, median: 37.21 patch 1, /var/lib/xenstored on tmpfs: avg: 33.88 ms, min: 27.01, max: 369.21, median: 28.34 patch 2 (receive suspended notification via event channel): avg: 4.95 ms, min: 3.46, max: 14.73, median: 4.63 Attachment:
suspend-evtchn.patch Attachment:
subscribe-suspend.patch _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |