[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-API] XCP 1.6 Beta: upgrade of RAID 1 installation

I'd like to report some progress here on the networking issues that stopped me.  I modified my approach in two ways.

a) for the new slave, instead of the management network starting on one of the NICs of the future bond, I used eth0 which is unbonded in its final configuration.  This avoided loosing connectivity upon joining the pool.  The management network can be moved later.

b) I stated above that 'the bonds are created but are connected to the wrong xapi network'.  I now look at it the other way round, i.e. that the xapi network is fine and the joining process creates bonds whose numbering on the slave may be different from the numbering on the master.  On reading the Citrix documentation that the network configuration on the master is replicated on the slave on joining the pool, I now understand that this is not strictly applicable for the bond device id.  What muddled the issues in my case was that I was using the bond device id (bondx) also as the network label to which the bond pif was attached.  As soon as I accepted that bond device ids may be different on different hosts in the pool, even if they correspond to the same pair of NICs, then the situation was clarified.  In my case, the situation is now:

NIC     master    slave    xapi
eth0     no bond  no bond 
eth 1/2 bond 0    bond 2  xapi3
eth 3/4 bond 1    bond 0  xapi9
eth 5/6 bond 2    bond 1  xapi11

The slave can fully participate in the pool as far as I can determine.

Unfortunately I cannot go back to the original topic as the slave on the old version was ejected or forgotten from the pool at some stage.  I apologise for the off-topic diversion.

On Wed, Dec 19, 2012 at 4:42 PM, Black Bird <blackbird1758@xxxxxxxxx> wrote:
I haven't updated this thread for a while, although I've been busy moving from one problem to the next.

a) the problem I reported above, needs some clarification.  The upgrade of 1.6Beta2 to 1.6final had worked on an MBR partitioned master on a cleanly installed host.  

However the same upgrade on a 1.6Beta2 GPT partitioned master had failed on a host which had been de-mirrored (using filesystem copy from mirrored to a non mirrored disk).  Although the non-mirrored host was bootable and completely functional, the 1.6Final installer failed to recognise an existing installation.  I realised that the same situation existed when a mirrored host is moved to a non-mirrored MBR partitioned master, (bootable, fully functional, and not seen by the installer).

b) I was able to do a normal clean installation of 1.6Beta2 on an unmirrored host, do an xe-pool-restore-database, and then upgrade to 1.6Final.  This is appropriate on the master, but not on slaves.  The host can then be re-converted to mirrored configuration.

c) on the slaves, xe-pool-restore-database is not appropriate as discussed.  As mentioned above, the installer is unable to recognise a working de-mirrored installation.  So I reverted to the process outlined by George previously.

I did a clean installation of 1.6Beta2 on a single disk.  I entered into single user mode, set the ethernet devices manually using /etc/sysconfig/network-scripts/interface-rename-data/static-rules.conf, copied across /etc/xensource* and /var/xapi from the mirrored disks and rebooted. This was a checkpoint which was intended to verify a de-mirrored slave prior to a subsequent upgrade.  Unfortunately, as the slave boots up, it looses connectivity with the master.  I repeated this process (i.e. new installation, copy config files across and reboot) a number of times to try and find a pattern, but on occasion some network cards did not come up, or xapi complained about not reaching the master.  I've been trying different combinations of this process, trying to force the network interface names before and after installation, to no avail.

At one stage, in desperation I reset the network with xe-reset-networking, but then the slave joined the pool with completely wrong eth values and bonded interfaces.  I've been unable to resolve this.

Giving up on this approach, I've also tried to add a new slave to the pool, intending to get a fresh host to which I can subsequently add the local storage.  To my disappointment, the network configuration on the new host also does not reflect the one expected.  I had expected the network configuration of the master
eth0 no bond
eth 1/2 bond 0
eth 3/4 bond 1
eth 5/6 bond 2
to be reflected in the slave.  However what has happened is that on joining the pool, the bonds are created but are connected to the wrong xapi network.

At this stage I have no way forward.  For me the main point of having a pool is to share configuration detail between hosts and automate the installation process when a new slave enters the pool (there are other advantages too), adopting things like common network configuration, common storage networking etc.  If this process is not reliable then I question the validity of having a pool.  At this stage I'm considering reverting to single host masters and doing additional configuration manually or using an external cloud orchestration facility such as OpenStack.

On Mon, Dec 3, 2012 at 2:01 PM, Black Bird <blackbird1758@xxxxxxxxx> wrote:
Hmmm.  I've managed to upgrade from 1.6Beta2 to 1.6Final from an MBR-partitioned master, but not a GPT-partitioned master.  In the latter case, the installer does not recognise an existing installation and proceeds to ask for a root password, at which stage I stop.

Should a GPT-partitioned master be able to be recognised and upgraded?

On Fri, Nov 30, 2012 at 3:07 PM, Black Bird <blackbird1758@xxxxxxxxx> wrote:
I have thought of possibly a better procedure (albeit still a workaround) than the one I proposed in the first email.

(a) make a pool-dump-database and store safely
(b) extract one hard disk and store as a recovery strategy (exist_device2), leaving the other (exist_device1)
(c) create 2 new partitions on a separate temporary device (temp_device), same sized as those live
(d) copy the contents of the existing two partitions (currently /dev/md0 and /dev/md1).  (Incidentally, I think it's of no use to mirror the 2nd partition, as it is only used during installation as a backup, which is only able to use raw partitions, but this is a separate story)
(e) reboot, 
(f) enter the BIOS screen and configure temp_device as the default boot device
(g) continue with boot and confirm that XCP host is working as normal
(h) insert the upgrade ISO media (CD/USB)
(i) reboot
(j) enter the BIOS screen and configure device containing the ISO media as the default boot device
(k) during the installer stage, select temp_device.  
(l) The installer should now recognise an existing installation, and any backups.  Proceed with upgrade as normal
(m) reboot at end, removing installer media
(n) enter the BIOS screen and configure temp_device as the default boot device
(o) on bootup, xsconsole will show that the local SR is unavailable.  Some more steps are needed just to re-set up the md device for local storage.  The SR configuration should still be there.  If no local SR exists, then skip to (t)
(p) mdadm --examine --brief --scan --config=partitions >> /etc/mdadm.conf (this will restore the mdadm configuration, from the md metadata on the partitions in exist_device1
(q) mdadm --assemble /dev/md2 (restart the md device containing the LVM volumes used by the local SR)
(r) xe pbd-plug (attach the storage to the SR)
(s) at this stage you should be able to test that any VMs needing VDIs on the local SR can be started
(t) copy the contents of the filesystems on the temp_device back onto /dev/md0 and /dev/md1
(u) reboot while removing temp_device (or rather shutdown, remove temp_device, start host)
(v) in BIOS screen configure exist_device1 as the boot device
(w) verify that XCP host is running normally
(x) insert 2nd disk exist_device2
(z) mdadm /dev/md<x> --re-add /dev/<exist_device2><partition>

This procedure avoids a completely new installation, while retaining a fallback.  It should also work for a slave.

So far I've tested parts of the above, but not as a complete procedure.  That's my next step.  I'll keep you posted.

On Fri, Nov 30, 2012 at 10:13 AM, George Shuklin <george.shuklin@xxxxxxxxx> wrote:

I'd also love to see at least some procedure taking such installations as a consideration, as we're also using XCP/XS on software RAID1, and every upgrade is in fact a reinstallation, very suboptimal procedure. Perhaps, given the fact XCP doesnt have to be tied to 'supported configuration' as XS does, we could have mdraid support in XCP for installation/reinstallation since so many people use it?

Well, I've gladly do this, but main problem is opensource part. xen-api is pure opensource and source is available on github.

XCP/XenServer installer is not. I mean, there is no published way to do something like 'make xcp-iso' command. Internals of installer is half-python, but no any information about xen-api expectation about files placement in older installation or proper way to do stuff. We internally simply hack original installer ISO to help us with installation procedure over md raid1. It looks kinda ugly and definitively not for 'public'. And I really wants to create it properly...

Xen-api mailing list

Xen-api mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.