[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH RFC 00/20] Add postcopy live migration support
On 27/03/2017 10:06, Joshua Otto wrote: > Hi, > > We're a team of three fourth-year undergraduate software engineering students > at > the University of Waterloo in Canada. In late 2015 we posted on the list [1] > to > ask for a project to undertake for our program's capstone design project, and > Andrew Cooper pointed us in the direction of the live migration implementation > as an area that could use some attention. We were particularly interested in > post-copy live migration (as evaluated by [2] and discussed on the list at > [3]), > and have been working on an implementation of this on-and-off since then. > > We now have a working implementation of this scheme, and are submitting it for > comment. The changes are also available as the 'postcopy' branch of the > GitHub > repository at [4] > > As a brief overview of our approach: > - We introduce a mechanism by which libxl can indicate to the libxc stream > helper process that the iterative migration precopy loop should be > terminated > and postcopy should begin. > - At this point, we suspend the domain, collect the final set of dirty pfns > and > write these pfns (and _not_ their contents) into the stream. > - At the destination, the xc restore logic registers itself as a pager for the > migrating domain, 'evicts' all of the pfns indicated by the sender as > outstanding, and then resumes the domain at the destination. > - As the domain executes, the migration sender continues to push the remaining > oustanding pages to the receiver in the background. The receiver > monitors both the stream for incoming page data and the paging ring event > channel for page faults triggered by the guest. Page faults are forwarded > on > the back-channel migration stream to the migration sender, which prioritizes > these pages for transmission. > > By leveraging the existing paging API, we are able to implement the postcopy > scheme without any hypervisor modifications - all of our changes are confined > to > the userspace toolstack. However, we inherit from the paging API the > requirement that the domains be HVM and that the host have HAP/EPT support. Wow. Considering that the paging API has had no in-tree consumers (and its out-of-tree consumer folded), I am astounded that it hasn't bitrotten. > > We haven't yet had the opportunity to perform a quantitative evaluation of the > performance trade-offs between the traditional pre-copy and our post-copy > strategies, but intend to. Informally, we've been testing our implementation > by > migrating a domain running the x86 memtest program (which is obviously a > tremendously write-heavy workload), and have observed a substantial reduction > in > total time required for migration completion (at the expense of a visually > obvious 'slowdown' in the execution of the program). Do you have any numbers, even for this informal testing? > We've also noticed that, > when performing a postcopy without any leading precopy iterations, the time > required at the destination to 'evict' all of the outstanding pages is > substantial - possibly because there is no batching mechanism by which pages > can > be evicted - so this area in particular might require further attention. > > We're really interested in any feedback you might have! Do you have a design document for this? The spec modifications and code comments are great, but there is no substitute (as far as understanding goes) for a description in terms of the algorithm and design choices. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |