[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] LVM Snapshot Troubles
On Tue, Sep 28, 2004 at 04:43:25PM +0100, Ian Pratt wrote: > There's nothing in slabinfo that looks crazy. I wander where all > your memory is gone? BTW: how big is your dom0? > > It's possible that dm-io or kcopyd is chewing up pages (which > won't show up in the slab allocator). I'm surprised they're not > just transient, though. When I've run into memory trouble with snapshots, I've always seen a stack backtrace that points me at kcopyd_client_create. Following the code: when creating a snapshot, a new kcopyd client is created with 256 (SNAPSHOT_PAGES in dm-snap.c) pages (= 1 MB) dedicated to that snapshot. I think I managed to dig up the logs from one of the failures I've seen; I've attached them to this message. The problem seems to be made worse by the fact that all 256 pages are allocated in a fairly short span of time, and (at least this is my guess) the allocation fails even if it would be possible for the kernel to free up the necessary memory with a bit more work. (I've been able to create many more snapshots before running into trouble if I try to make sure the kernel has a bit of extra free memory before each lvcreate call--using dd to create a several megabyte file, then deleting it to free up that space in the page cache.) As has been noted, LVM doesn't have a very graceful failure mode when this memory allocation problem is hit--I lose access to all the snapshots when that happens. I have also found that I can use dmsetup to create the COW devices myself, which did at least (if I'm remembering correctly--this was a little bit ago) have the benefit that if one snapshot failed, the others were still available. Basically, I used the same setup that LVM normally would, except that I didn't create a snapshot-origin device layered over the original device (this is what intercepts writes to the source device and propagates a copy of the original data to each snapshot, if needed). Doing this manually isn't ideal, however. Improvements that I think could be made: - Change the dm-snapshot driver in the kernel to (optionally?) allocate less memory for each snapshot, and fail more gracefully if unable to allocate the memory. - Adjust the LVM userspace tool to fail more gracefully if the device mapper driver gives an out-of-memory error. - Add an option to LVM for snapshots with a read-only origin (as I was doing manually with dmsetup). --Michael Vrable Attachment:
lvm_error_log
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |