[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Making snapshot of logical volumes handling HVM domU causes OOPS and instability



On Sunday 12 September 2010 20:48:09 Scott Garron wrote:
> On 9/12/2010 5:41 AM, J. Roeleveld wrote:
> > I also use LVMs extensively and do similar steps for backups.
> > 1) umount in domU
> > 2) block-detach
> > 3) lvcreate snapshot
> > 4) block-attach
> > 5) mount in domU
> 
>       I think the biggest difference, here, is that you unmount and
> detach the source volumes before creating the snapshot whereas I just
> leave them active and mounted in the guest.  I don't know if that will
> end up being the difference between stability and instability on my
> system, but it's an observation and probably worth experimentation.

I tend to umount first to ensure the filesystem is consistent and no writes are 
still left in the write-buffer on the guest.
Filesystem recoveries are fine, but why rely on them when it's not necessary? 
:)

> > I, however, have no need for HVM and only use PV guests.
> 
>       It turns out that it doesn't seem isolated to HVM guests on my
> system any longer.  That was just coincidental during the first few
> crashes that I observed.

Ok, I believe the issue might be related to the LVM-stack and the way Xen 
holds the devices locked when they are actually mounted and attached?

> > Are you certain the snapshots are large enough to hold all possible
> > changes that might occur on the LV during the existence of the
> > snapshot?
> 
>       Certainly.  The most recent one to cause a crash has existed
> through the crash and for 3 days now, and is only using 2.65% of its COW
> space.  They usually don't get a chance to go above even 0.3% before the
> rsync on them is finished and they are unmounted and removed by the
> backup script.

Ok, guess that's not the cause :)
Although, I get the "unable to remove active" error when there is 0% used, but 
also over 20% used, so there is no clear indication what is causing it (to me)

> > Another thing I notice, which might be of help to people who
> > understand this better then I do, in my backup-script, sometimes step
> > "5" fails because the domU hasn't noticed the device is attached
> > again when I try to mount it. The domU-commands are run using
> > SSH-connections.
> 
>       That probably just has to do with variations in how long it takes
> the guest kernel to poll or be notified of device changes, and how long
> it takes for its udev to create the device files and whatnot.
> Introducing some sanity checks or just a longer delay in your backup
> script would likely get around that problem.  (I could be wrong, though)

I do need to add some sanity checks into the script at some point, but 
currently I start these manually and 'fix' the left-overs myself.
The mount-issue is a simple one and I notice this within 30-40 seconds of the 
scripts starting.

--
Joost

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.