[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Recommendations for Virtulization Hardware

To: xen-users@xxxxxxxxxxxxx
From: ShadesOfGrey <shades_of_grey@xxxxxxxxxxxxx>
Date: Sun, 23 Sep 2012 23:45:18 -0400
Delivery-date: Mon, 24 Sep 2012 03:46:10 +0000
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=earthlink.net; b=QVrYabwlemzl4HYZvpvaefNFfbLfBUIZo2zDBRsm7p80ScubxbuVQKuGeMNX2Oej; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:X-ELNK-Trace:X-Originating-IP;
List-id: Xen user discussion <xen-users.lists.xen.org>

Sorry for the late response, I've had a lot to digest.

On 09/21/2012 11:22 AM, Robin Axelsson wrote:

If you want to be able to use PCI and VGA passthrough you basicallyneed to make sure that your hardware supports either AMD-Vi (formerlyknown as AMD-IOMMU) or Intel VT-d extensions. In the Intel case itlimits your choice of Motherboard (it must be supported in the BIOS)and CPU. In the AMD case it limits only your choice of motherboard. Agood start is to check out one of these pages:
http://wiki.xensource.com/xenwiki/VTdHowTo
http://wiki.xen.org/wiki/VTd_HowTo
A word of warning here is that parts of the documentation is somewhatdated. You can also communicate with e.g. Gigabyte, Asus or ASRockcustomer support and ask them if a particular motherboard supportsthese extensions. Most motherboards also have downloadable usermanuals, if the BIOS settings in those pages shows options toenable/disable VT-d or AMD-Vi/IOMMU extensions then you will be okwith that motherboard.

The lack of current information about Xen (and KVM) online has beenfrustrating — especially finding the many proof of concept videos thatdemonstrated possibilities but offered no real specifics. Looking forspecifics, I sought information from gaming and enthusiast sites; Ifigured finding confirmation of VT-d and AMD-Vi support on such siteswould be more likely. However, I found that wasn't often the case. I diddetermine that ASRock motherboards seem to be the most likely to supportVT-d, ASUS least likely (unless equipped with an Intel 'sanctioned' VT-dchipset). I had narrowed my choices to two motherboards that appear tooffer VT-d support and was intending to contact the manufacturer beforepurchase. Both choices are a bit pricey and I've been reconsideringwhether I should look to other motherboards to reduce costs.

The other thing is choice of GPU for VGA passthrough and it ispreferable that the GPU supports FLR or Function Level Reset as it iscalled. Thing is that the hardware needs to be reset somehow as it ispassed through to the host. This is best done with FLR and nVidia isknown to supply firmware patches for some of their Geforce cards withthis support and it is said to be supported by default with theirQuadro cards. FLR is not the only way to reset a PCI device, a resetcould be trigged through the ACPI power management framework bytemporarily cutting power to the affected PCI slot. These resetmethods are called d3d0 and bus reset. The question however, is ifthis works on PCI cards that use auxiliary power directly from thePSU. There is a pdf document on the VMWare website(http://www.vmware.com/files/pdf/techpaper/vsp_4_vmdirectpath_host.pdf) aboutthis:
-----------------------
Reset Method
Possible values for the reset method include flr, d3d0, link, bridge,or default.
The default setting is described as follows. If a device supportsfunction level reset (FLR), ESX always uses FLR. If the device doesnot support FLR, ESX next defaults to link reset and bus reset in thatorder. Link reset and bus reset might prevent some devices from beingassigned to different virtual machines, or from being assigned betweenthe VMkernel and virtual machines. In the absence of FLR, it ispossible to use PCI Power Management capability (D3 to D0 transitions)to trigger a reset. Most of the Intel NICs and various other HBAssupport this mode.
-----------------------
There are indications from people that d3d0 also work with PCI cardsthat take power from auxiliary inputs. I suggest that you take a lookat the following youtube clip and read the comments there:
http://www.youtube.com/watch?v=Gtmwnx-k2qg
So it seems that it works although it may be a bit more quirky. Itdoesn't hurt to take that discussion (particularly about FLR support)with nVidia and/or AMD.

This is precisely the kind of information I was looking for from thethreads I started on Ars Technica. It's just unfortunate that FLR and D3D0 support aren't often found in the tech specs of must expansionhardware. However, now that I know what to ask, I'll try contactinghardware manufacturers prior to purchasing any expansion hardware. Thankyou!

When it comes to virtualization, the technology has come very far, butit is still lacking considerably when it comes to sharing GPUs andalso to some degree when it comes to sharing I/O devices (especiallywhen you intend to run many virtual machines on a single system). TheGPU today consists of three types of components; the processing unit,graphics memory and the video output unit/adapter and it is not clearas to how to share these components seamlessly between the host andvirtual machines with minimal overhead. Whereas there are VT-xextensions that allows you to pretty seamlessly share CPU coresbetween VMs and the host there are currently none for the processingunit. It is also not clear how the hardware can assist with sharingTV/monitor screen estate between machines with all 3D effects such asAqua for Win7 and the whatnot enabled for all machines. Especiallywhen considering the dynamics of plugging and unplugging computermonitors to multiport/eyefinity graphics cards and the ability tochange screen resolution. Things are improving for sure and a lot ofresearch is likely going into this. I don't know what's happening inthe GPU frontline but I know that the next thing with passthrough isthe SR-IOV that allows PCI units to present several virtual instancesof oneself to several virtual machines. It's a cool thing, I recommendfurther reading about this here:
http://www.intel.com/content/www/us/en/pci-express/pci-sig-sr-iov-primer-sr-iov-technology-paper.html
http://blog.scottlowe.org/2009/12/02/what-is-sr-iov/

That is fascinating. Extending virtualization to expansion hardware viaSR-IOV, sure would make the kind of setup I'm attempting a lot easier.However, if I can replicate what I've seen in proof of concept videos(namely Casey DeLorme's), I think that will meet my needs for now. As itstands, I initially intend to reserve any discrete GPU(s) for Windowsand rely on an integrated GPU for all other VMs using PV drivers(wherever possible). Afterward, I want to experiment with re-assigningthe whatever discrete GPU(s) for GPGPU functions under a Linux VMwhenever the GPU is not going to used for gaming (if at all possible).

It is likely to take a few years before something useful will come outof it. In the meanwhile, unless you want to use several GPUs whichmight not be a bad thing as a lot of monitors these days have severalinputs, you can resort to using a remote desktop client to integrateone machine with another. Virtualbox for example use RDP through whichyou can interact with your virtual machine. In a similar manner youcan set up a VNC server on your Linux host and establish a connectionto it through your Windows VM. You will not get full 3D functionality(such as Aqua) through the client although there is a growing supportfor it through VirtualGL extensions that are coming to VNC and perhapsthe Spice protocol. But some clients might even allow for seamlessmode that lets you mix Linux and Windows windows on the same desktoplike this for example:
http://i.techrepublic.com.com/blogs/seamless.png
http://www.youtube.com/watch?v=eQr8iI0yZH4
Just keep in mind that this is still a little bit of unchartedterritory so there may be a few bumps on the way and it may not workas smooth as you would desire.

From everything I've read, solutions that rely on any form of remotedisplay protocols would be limited to a subset of Direct3D functions.Furthermore, these would vary from one implementation to another, thusmaking them far less attractive for gaming than VGA passthrough... Well,in my opinion anyway.

VirtualBox's seamless mode is pretty nifty. But it's a Type 2 Hypervisorand relies on paravirtualized drivers that also suffer from the samelimitations as remote display protocols. It's great for most things, butgaming is not one of them. And I'm speaking from personal experience.Though I haven't used them myself, the same would seem to hold true ofParallel's and VMWare's 'Workstation' offerings. At least, as far asI've gathered.

FYI, the Type 1 Hypervisors from Parallel's and VMWare* are pricedwaaayyy outside my budget.

*I only found out about VMWare's 'free' vSphere after I'd written thisresponse.

I see that your demands are somewhat multifaceted. I believe that youalso want to use diffent services such as using your machine as a fileserver with the possible intention of using filesystems such as ZFS.If you do, you should be careful with your selection of hardware forthese particular purposes. If you want to get full protection againstdata corruption from ZFS, your choice of hardware gets rather limitedwhen it comes to choice of hard drives, host bus adapter and networkcontroller. The most stable implementation of ZFS is found withIllumos based operating systems (such as OpenIndiana, SmartOS, OmniOS,Belenix etc) or Solaris if you choose to download it from Oracle'swebsite. With these operating systems you are most likely to want touse hardware that has certified drivers for it. That way you are lesslikely to run into problems later on. That implies that you will belimited to choosing Intel based network adapters and LSI based SAScontrollers. There should be _no_ hardware RAID functionality in theSAS controller that merely should be run in IT mode (orInitiator-Target mode). That requires the LSI controller to be flashedwith IT firmware in most cases. The objective here is to make surethat _all_ errors that might occur with the hard drives are reportedall the way to the software level and that nothing is concealed ofobfuscated by internal error handling in the hardware. It is thereforerecommended to use SAS hard drives instead of S-ATA (which also arefully compatible with SAS controllers). SAS hard drives are not muchmore expensive than similar SATA drives and you get a higherreliability out of them. It is also recommended to have at least twodrive redundancy simply because if one drive is dead and you swap it,it is not uncommon that another drive dies in the rebuild process ofthe RAID cluster because of the added strain the rebuild process (or'resilvering' as it is called in Solaris terms) put on the drives. Ofcourse, the system should communicate directly to the hard drivehardware and not be obfuscated by some virtual abstraction layer inbetween which means that you either run ZFS on the metal or throughPCI passthrough of the SAS (and perhaps also network) adapters. Also,it is highly recommended that you use ECC RAM for such applicationsand it doesn't hurt to dedicate a few gigs of it to the ZFS as RAM isused for cache. The good news is that most motherboards with goodchipsets support ECC RAM even though you might not find anything aboutit in the user manuals.

Again, thanks for the thorough explanation. This gives me a great dealto think about. The more I learn about ZFS, the less appealing itbecomes. And by that I mean the confusion over which version of ZFS isin what OS? And just how well maintained the OSes supporting ZFS are?Now I have additional hardware considerations to keep in mind that may(or may not) make the cost of ZFS RAID-Z pool comparable to a hardwareRAID5/6 solution anyway. Do you have any suggestions as to which of LSIHBAs I should be considering? I haven't found an HCL for ZFS in my searches.

Out of curiosity — and if you would happen to know — do you think whatyou suggest about the HBA and SAS drives for ZFS also applies to Btrfs?I'm assuming it would, but I'd appreciate some confirmation.

It's funny how the "I" in RAID never really seems to apply... Especiallysince it looks more and more like using ZFS or Btrfs will require Icommit myself, from the start, to one or the other and a discrete HBA.Transitioning from an integrated SATA controller(s) and mdadm seemsrather impractical. If I understand what's involved in doing socorrectly. It may turn out that anything other than mdadm is priceprohibitive.

I admire your persistence with pursuing this undertaking and wish youthe best of luck with it!
Robin.

Thanks. I've invested too much time in research to not at least make theattempt. Besides, if all else fails, I can fallback to a two boxsolution. That is, if I can get my hypothetical virtualization box tofit in my budget envelope...



_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users

Follow-Ups:
- Re: [Xen-users] Recommendations for Virtulization Hardware
  - From: Robin Axelsson

References:
- [Xen-users] Recommendations for Virtulization Hardware
  - From: ShadesOfGrey
- Re: [Xen-users] Recommendations for Virtulization Hardware
  - From: Robin Axelsson

Prev by Date: Re: [Xen-users] Recommendations for Virtulization Hardware
Next by Date: Re: [Xen-users] Recommendations for Virtulization Hardware
Previous by thread: Re: [Xen-users] Recommendations for Virtulization Hardware
Next by thread: Re: [Xen-users] Recommendations for Virtulization Hardware
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.