[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Strange failures of Xen 4.3.1, PVHVM storage VM, iSCSI and Windows+GPLPV VM combination


  • To: Kuba <kuba.0000@xxxxx>, xen-users <xen-users@xxxxxxxxxxxxx>
  • From: James Harper <james.harper@xxxxxxxxxxxxxxxx>
  • Date: Fri, 31 Jan 2014 01:35:12 +0000
  • Accept-language: en-AU, en-US
  • Delivery-date: Fri, 31 Jan 2014 01:36:37 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>
  • Thread-index: AQHPHcQrr4avfNutpEeCEkGXVvHfupqeBmzA
  • Thread-topic: [Xen-users] Strange failures of Xen 4.3.1, PVHVM storage VM, iSCSI and Windows+GPLPV VM combination

> 
> I am trying to set up a following configuration:
> 1. very simple Linux-based dom0 (Debian 7.3) with Xen 4.3.1 compiled
> from sources,
> 2. one storage VM (FreeBSD 10, HVM+PV) with SATA controller attached
> using VT-d, exporting block devices via iSCSI to other VMs and physical
> machines,
> 3. one Windows 7 SP1 64 VM (HVM+GPLPV) with GPU passthrough (Quadro
> 4000) installed on a block device exported from the storage VM (target
> on the storage VM, initiator on dom0).
> 
> Everything works perfectly (including PCI & GPU passthrough) until I
> install GPLPV drivers on the Windows VM. After driver installation,
> Windows needs to reboot, boots fine, displays a message that PV SCSI

(a)

> drivers were installed and needs to reboot again, and then cannot boot.
> Sometimes it gets stuck at "booting from harddrive" in SeaBIOS,
> sometimes BSODs with "unmountable boot volume" message. All of the
> following I tried without GPU passthrough to narrow down the problem.
> 
> The intriguing part is this:
> 
> 1. If the storage VM's OS is Linux - it fails with the above symptoms.
> 2. If the block devices for the storage VM come directly from dom0 (not
> via pci-passthrough) - it fails.
> 2. If the storage VM is an HVM without PV drivers (e.g. FreeBSD
> 9.2-GENERIC) - it all works.
> 3. If the storage VM's OS is Linux with kernel compiled without Xen
> guest support - it works, but is unstable (see below).
> 4. If the iSCSI target is on a different physical machine - it all works.
> 5. If the iSCSI target is on dom0 itself - it works.
> 6. If I attach the AHCI controller to the Windows VM and install
> directly on the hard drive - it works.
> 7. If the block device for Windows VM is a disk, partition, file, LVM
> volume or even a ZoL's zvol (and it comes from a dom0 itself, without
> iSCSI)- it works.
> 
> If I install Windows and the GPLPV drivers on a hard drive attached to
> dom0, Windows + GPLPV work perfectly. If I then give the same hard drive
> as a block device to the storage VM and re-export it through iSCSI,

(b)

> Windows usually boots fine, but works unstable. And by unstable I mean
> random read/write errors, sometimes programs won't start, ntdll.dll
> crashes, and after couple reboots Windows won't boot (just like
> mentioned above).
> 
> The configurations I would like to achieve makes sense only with PV
> drivers on both storage and Windows VM. All of the "components" seem to
> work perfectly until all put together, so I am not really sure where the
> problem is.
> 
> I would be very grateful for any suggestions or ideas that could
> possibly help to narrow down the problem. Maybe I am just doing
> something wrong (I hope so). Or maybe there is a bug that shows itself
> only in such a particular configuration (hope not)?
> 

I'm curious about prompting for the pvscsi drivers to be installed. Is this 
definitely what it is asking for? Pvscsi for gplpv is removed in the latest 
versions and suffered varying degrees of bitrot in earlier versions. If you 
have the iscsi initiator in dom0 then exporting a block device to windows via 
the normal vbd channel should be just fine.

You've gone to great lengths to explain the various things you've tried, but I 
think I'm a little confused on where the iscsi initiator is in the "doesn't 
work" scenarios. I'm having a bit of an off day today so it's probably just me, 
but above I have highlighted the two scenarios... could you fill me in on a few 
things:

At (a) and (b), is the iscsi initiator in dom0, or are you actually booting 
windows directly via iscsi?

At (b), with latest debug build of gplpv, can you run debugview from 
sysinternals.com and see if any interesting messages are displayed before 
things fall in a heap?

Are any strange logs shown in any of Win DomU, Dom0, or storage DomU?

How big are your disks?

Can you reproduce with only one vcpu?

What bridge are you using? Openvswitch or traditional linux bridge?

What MTU are you using on your storage network? If you are using Jumbo frames 
can you go back to 1500 (or at least <= 4000)?

Can you turn off scatter gather, Large Send Offload (GSO), and IP Checksum 
offload on all the iscsi endpoints?

Can you turn on data digest/checksum on iscsi? If all endpoints support it then 
this would provide additional verification that none of the network packets are 
getting corrupted.
 
Would driver domain work in your scenario? Then the disk could be attached 
directly from your storage DomU without accruing all the iscsi overhead. I'm 
not up with the status of HVM, vbd, and driver domain so I don't know if this 
is possible.

More questions than answers. Sorry :)

James


_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.