Xen project Mailing List

RE: [Xen-API] cross-pool migrate, with any kind of storage (shared or local)

To: Dave Scott <Dave.Scott@xxxxxxxxxxxxx>

From: Daniel Stodden <daniel.stodden@xxxxxxxxxx>

Date: Mon, 18 Jul 2011 11:43:24 -0700

Cc: "xen-api@xxxxxxxxxxxxxxxxxxx" <xen-api@xxxxxxxxxxxxxxxxxxx>

Delivery-date: Mon, 18 Jul 2011 11:43:40 -0700

List-id: Discussion of API issues surrounding Xen <xen-api.lists.xensource.com>

On Mon, 2011-07-18 at 06:25 -0400, Dave Scott wrote: > Hi Daniel, > > Thanks for your thoughts. I think I'll add a wiki page later to describe the > DRBD-based design idea and we can list some pros and cons of each perhaps. > > I'm still not a DRBD expert but I've now read the manual and configured it a > few times (where 'few' = 'about 3') :) > > Daniel wrote: > > If only FS integrity matters, you can run a coarser series of updates, > > for asynchronous mirroring. I suspect DRBD to do at least sth like that > > (I'm not a DRBD expert either). I'm not sure if the asynchronous mode I > > see on the feature list allows for conclusions on DRBD's idea of HA in > > any way. It may just limit HA to being synchronous mode. Does anyone > > know? > > It seems that DRBD can operate in 3 different synchronization modes: > > 1. fully synchronous: writes are ACK'ed only when written to both disks > 2. asynchronous: writes are ACK'ed when written to the primary disk (data is > somewhere in-flight to the secondary) > 3. semi-synchronous: writes are ACK'ed when written to the primary disk and > in the memory (not disk) of the secondary > > Apparently most people run it in fully synchronous mode over a fast > LAN. Provided we could get DRBD to flush outstanding updates and > guarantee that the two block devices are identical during the > migration downtime when the domain is shutdown, I guess we could use > any of these methods. Although if fully synchronous is the most common > option, we may want to stick with that? Are we still talking about storage migration, or mirroring applications? Well, these semantics only really make sense for disk pair starting from mirrored state. So what are the semantics for a pair which has just been created? Does DRBD default to anything while performing an initial synch at start of day? Ideally, it would stay asynchronous and wait for WWS convergence. Synchronnous mode in that state doesn't buy you anything, it's just going to produce seek overhead. Durability is only useful if you actually have something consistent to switch over to in the failure case. The normal way of doing linear passes through a bitmap (what memory migration does) makes a lot of sense for storage, because it's naturally elevating through the block list. I terms of DRBD consistency guarantees, that's fully asynchronous. Until stop/copy is reached, the question is whether you're converging smoothly. With a sane network, sane guest and local storage it typically will. A migration smoke test usually comprises a diabolic workload to prove correctness under worst case scenarios. Does DRBD use a transfer block size above sector size? Almost certainly yes. I'd suggest two cases, random and linear writes, small sizes at a stride which equals block size, and see what happens. It will likely need to throttle guest throughput. In tapdisk, we've got some work on rate limiting on trunk now, it might fit in. If you don't want to deal with enforcing eventual termination, yeah, I guess a mkfs, as George suggests, is a decent scenario too. Daniel > > Anyway, it's not exactly a rainy weekend project, so if you want > > consistent mirroring, there doesn't seem to be anything better than > > DRBD > > around the corner. > > It did rain this weekend :) So I've half-written a python module for > configuring and controlling DRBD: > > https://github.com/djs55/drbd-manager > > It'll be interesting to see how this performs in practice. For some realistic > workloads I'd quite like to measure > 1. total migration time > 2. total migration downtime > 3. ... effect on the guest during migration (somehow) > > For (3) I would expect that continuous replication would slow down guest I/O > more during the migrate than explicit snapshot/copy (as if every I/O > performed a "mini snapshot/copy") but it would probably improve the downtime > (2), since there would be no final disk copy. > > What would you recommend for workloads / measurements? > > > In summary, my point is that it's probably better to focus on migration > > only - it's one flat dirty log index and works in-situ at the block > > level. Beyond, I think it's perfectly legal to implement mirroring > > independently -- the math is very similar, but the difference make for > > huge impact on performance, I/O overhead, space to be set aside, and > > robustness. > > Thanks, > Dave > > > > > Cheers, > > Daniel > > > > [PS: comments/corrections welcome, indeed]. > > > > > 3. use the VM metadata export/import to move the VM metadata between > > pools > > > > > > I'd also like to > > > * make the migration code unit-testable (so I can test the failure > > paths easily) > > > * make the code more robust to host failures by host heartbeating > > > * make migrate properly cancellable > > > > > > I've started making a prototype-- so far I've written a simple python > > wrapper around the iscsi target daemon: > > > > > > https://github.com/djs55/iscsi-target-manager > > > > > > _______________________________________________ > > > xen-api mailing list > > > xen-api@xxxxxxxxxxxxxxxxxxx > > > http://lists.xensource.com/mailman/listinfo/xen-api > > > _______________________________________________ xen-api mailing list xen-api@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/mailman/listinfo/xen-api

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.