[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] How to list the drives a DomU is using, and their types -- with or without a parition -- from the Dom0?


  • To: xen-users@xxxxxxxxxxxxx
  • From: "Austin S. Hemmelgarn" <ahferroin7@xxxxxxxxx>
  • Date: Wed, 17 Feb 2016 10:32:48 -0500
  • Delivery-date: Wed, 17 Feb 2016 15:34:59 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 2016-02-16 20:08, Adam Goryachev wrote:
On 17/02/16 02:06, Simon Hobson wrote:
billb@xxxxxxx wrote:

I'd thought that Xen 'figured out' what kind of disk structure
it's mounting.
The "disk structure" as far as Xen is concerned is ... a "disk".
Just a collection of blocks.

I thought that snaps had to remain 'in' the VG, and they can get
kind of large.
A snapshot is simply a virtual image of the volume at a point in
time. In terms of what you see when you look into it, it is exactly
the same size as the volume - though the underlying representation
of it will grow over time (starting from near enough zero space).
So if you wanted to backup that snapshot, you'd treat it like a
disk and image it - what you do with the image is then up to you.
When you are done, remember to delete the snapshot or it will slow
 performance* and eat disk space*.

A couple of caveats though. If you snapshot the volume, what you
see is the same as if you unceremoniously yanked the power cord on
a physical machine - anything still held in the guest's dirty cache
will not be there. You can minimise this by triggering the guest to
sync it's cache, and of course using a journaled filesystem - but
you cannot completely eliminate it.

The way some virtualisation & backup systems handle this is (to
simplify "a bit") to make the combination of hypervisor, guest OS,
and backup system intimately aware of each other so that the backup
system can copy what the guest has, not what it has already written
to disk.

Personally, I work the other way - I do backups from within the
guest using rsync to maintain a clone of the filesystem as seen by
the guest on one central device. From there I generate multiple
generations with file de-dupe etc as a separate off-line task. One
area where if you ask 10 people you'll get 11 opinions :-)

* It's worth considering the way snapshots work. AIUI in LVM, when
you make a snapshot, what is on-disk is kept as the "before image".
Any writes then go into a separate "after image" area which starts
at zero and grows. Anything accessing the snapshot gets the before
image data, anything accessing the live volume gets the data from
the after image if there is a corresponding block, or the before
image data if it's not been altered. Over time the after-image data
set grows - and performance will be affected as the data gets
fragmented. When you destroy the snapshot, any data in the after
image is then copied into the main volume and after that the after
image is deleted.

Actually, I'm pretty sure LVM simple discards the "before image"
blocks that have been modified in the "after image", it doesn't
actually relocate the after image blocks back to the original
location. (At least, discarding a snapshot is too quick for it to be
doing this, so maybe if the origin VM is written, it copies the
"before image" to a new location, and then modifies the before image
location with the new data).
In LVM, snapshots are implemented via copy-on-write with some
indirection for the changed data. The origin volume is not at all
changed by the snapshot process other than being tagged as an origin,
all it's blocks stay exactly where they are, and are treated almost the
same as if it weren't a snapshot origin. As changes are made on the
origin volume, the versions from the time the snapshot was created of
the blocks that get changed get written to space in the snapshot volume,
and an indirection is added in the mapping so that access to the
snapshot references those blocks instead of the ones in the origin
volume, thus preserving the content of the volume when it was
snapshotted. The is also done when a block in the snapshot is written to
(assuming it's a writable snapshot), except the origin's copy is copied
to the snapshot, then updated there.

As an example, assume a simple setup with an origin LV 'origin' with 4
physical extents (well call them e1 through e4) which just had a
snapshot created called 'snap'. Prior to making any changes to the
origin volume, all four extents are referenced directly by the snapshot.
The mapping might be represented something like this:

origin: e1 e2 e3 e4
snap: ^^ ^^ ^^ ^^

Assume we then try to write something to extent e1 in the origin. This
will cause the device-mapper to copy the contents of e1 from the origin
to the snapshot, update the snapshot to point at it's own copy of e1,
and then write the changes on the origin volume. After such a change,
the mapping could be represented like so:

origin: e1' e2 e3 e4
snap: e1 ^^ ^^ ^^

If we then decide to change e4 in the snapshot, this will cause the
contents of e4 in the origin to be copied to the snapshot, the snapshot
mapping to be updated to point at it's own copy of e4, and then the
changes will get written. After this, the mapping would be:

origin: e1' e2 e3 e4
snap: e1 ^^ ^^ e4'

If we then decide to modify e4 in the origin, or modify e1 in the snapshot, they get modified in place after a simple check that detects that the other copy has already been modified. This check gets performed only once for the snapshot (checking against the origin), but gets performed once per-snapshot for the origin (checking against each snapshot). As a result of this, performance degrades for the origin volume proportionate to the number of snapshots of that volume. The design however means that creating and deleting snapshots are relatively inexpensive operations, which can be very important on big systems.

Secondarily, this also shows why snapshots can start out small, and why the practical upper bound on their size is around 105% of the size of the origin (the extra 5% is for the metadata that stores the mappings).

It is possible to merge a snapshot back into an origin volume, but this
gets dangerous if both the origin and snapshot have been modified since the snapshot was taken. There is no way to determine which side is correct in such a case, so there is significant risk of ending up with a horribly broken filesystem.
I would not have thought that you could safely mount a snapshot. If
 you try mounting it read-only then you have a dirty filesystem. If
you mount it read-write then fsck will try and fix it, which then
means you have two different systems trying to write to a volume. I
don't know if LVM will handle this and prevent the host access from
 corrupting the guests filesystem.
You can mount a snapshot RW, and LVM will handle keeping any writes
separate from the "origin image" copy.
This is correct, and is sometimes used for creating stable filesystem images for backups. In such usage, the typical workflow is:
1. Take a snapshot of a mounted, in-use LV.
2. Run fsck on the snapshot to make sure the FS is clean.
3. Mount the snapshot read-only, and run whatever backup software against this instead of against the origin volume.
4. Unmount and delete the snapshot.
Some backup software actually has built-in functionality to do this automatically, and for some filesystems (like XFS), you don't even need the fsck. This is how I used to manage backups on my systems before I switched to BTRFS (which has the snapshot functionality built in, so I don't need LVM for it), and it works well provided you can handle the small performance degradation while running the backup.

This is also used sometimes for multiple-instantiation of VM's from a base image by creating a snapshot for each VM, which is then used as the backing device for that VM, so the base image doesn't get modified. I know a couple people who use this for the root filesystem for driver domains on Xen.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.