[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] sharing NFS SRs

To: xen-api@xxxxxxxxxxxxx
From: George Shuklin <george.shuklin@xxxxxxxxx>
Date: Sat, 26 May 2012 13:50:53 +0400
Delivery-date: Sat, 26 May 2012 09:50:55 +0000
List-id: User and development list for XCP and XAPI <xen-api.lists.xen.org>


On 26.05.2012 12:57, Dave Scott wrote:

Hi,

IMHO one of the weaknesses of the current NFS SR backend in XCP is that it a 
single SR cannot be shared between pools. This is because the backend relies on 
the xapi pool framework to prevent

1. multiple hosts from coalescing the same vhds.

2. the same vhd being attached to two VMs at the same time.

3. a vhd being read one one node even after it has been coalesced and deleted 
on another

If multiple pools could safely share the same NFS SR then a cross-pool migrate 
(which is possible with the current code) wouldn't have to actually mirror the 
disks.

With this in mind I've been looking into NFS locking again. I realize this is 
a... tricky thing to get right... and google turns up lots of horror stories. 
Anyway, here's what I was thinking:

For handling (1) and (2), we would only need one lock file (really a "lease 
file") per vhd. In the event of a network interruption we already know that running 
VMs are likely to fail after 90s or so -- the maximum time (IIRC) a windows VM will allow 
a page file write to take. So we could

* explicitly tell tapdisk to shutdown after this long (since the VM will 
probably have blue-screened anyway)

* periodically refresh our leases, setting them to expire well after the 
tapdisks are guaranteed to have shutdown

So if a host leaves the network, all disks become unlocked a few minutes later and the 
VMs (and coalesce jobs) can safely be restarted on another pool. This could then be used 
as the foundation for a new "HA" feature, where only VMs whose I/Os have failed 
are shutdown and restarted.

 From an implementation point of view, this python library looks pretty good:

http://bazaar.launchpad.net/~barry/flufl.lock/trunk/view/head:/flufl/lock/_lockfile.py

I'm not totally sure how to handle (3): would it be sufficient to periodically 
reopen the vhd chain in tapdisk, or just handle the error where a read fails 
and reopen the chain then?

I've somehow afraid idea of 'leasing' operation (and periodic open/closeoperation).


Here some scenarios to think about:

1) temporal loss of the host SAN connectivity. NFS on the host is goingto interruptible sleep and continue IO as soon as we get connectivityback. We already kill tapdisk, remove lease, restart vm on other hostand suddenly networking is revived... And pending NFS write operation isgoing straight in the middle of 'mission critical' database with fresh'week after expiration date' data. May be weeks later after 'issue' withVM restart.

2) SR live migration is still very important feature I very hope to see.

3) Those leases will create additional IO. F.e. if we do have ~20k VMs(not really large number for clouds of new age) and lease is 10 minutes,it wll create ~33 IOPS - equivalent about 60-70 VMs (according tostatistic from our cloud).4) how do you plan to guarantee to tapdisk shutdown (this is NFS, ifserver is down or some issues with connectivity, there is no way to shutdown locked in IO process)?5) I think 30s is not very good number. Linux kernel starts to throwingIO errors after 120 seconds of IO wait.6) about this library: '''you also need to make sure that your clocksare properly synchronized. """ I think this must add requirement tocoexisting of hosts: do not allow to plug nfs sr until clock is syncedwith master. (Same for cross-pool migration - reject migration if clockis out of sync, but allow to shoot own leg with --force).



_______________________________________________
Xen-api mailing list
Xen-api@xxxxxxxxxxxxx
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

Follow-Ups:
- Re: [Xen-API] sharing NFS SRs
  - From: Dave Scott

References:
- [Xen-API] sharing NFS SRs
  - From: Dave Scott

Prev by Date: [Xen-API] sharing NFS SRs
Next by Date: [Xen-API] XCP and XenServer software stack versions.
Previous by thread: [Xen-API] sharing NFS SRs
Next by thread: Re: [Xen-API] sharing NFS SRs
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.