[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance


  • To: xen-users@xxxxxxxxxxxxx
  • From: Gordan Bobic <gordan@xxxxxxxxxx>
  • Date: Fri, 27 Jun 2014 14:03:41 +0100
  • Delivery-date: Fri, 27 Jun 2014 13:04:46 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

On 2014-06-26 18:36, lee wrote:
Gordan Bobic <gordan@xxxxxxxxxx> writes:

On 2014-06-26 17:12, lee wrote:
Mihail Ivanov <mihail.ivanov93@xxxxxxxxx> writes:

So next thing I've read about RAID, so I am thinking of raiding 2 x WD
Black 2 TB. (Should I do software raid or hardware raid?)

Software raid can mean quite a slowdown compared to hardware raid.

The only situation where hardware RAID helps is if you have
a _large_ battery backed write cache, and then it only helps
on small bursty writes. A recent x86 CPU can do the RAID
checksumming orders of magnitude faster than most RAID card
ASICs, and hardware RAID cache is completely useless since
anything that is likely to be caught in it will also be in
the OS page cache.

The CPU may be able to handle the raid faster, and there may be lots of
RAM available for caching.  Both using CPU and RAM draws on resources
that may be occupied otherwise.

A typical caching hardware RAID controller has maybe 3% of RAM of
a typical server. And I'm pretty sure that for the price of one
you could easily get more than an extra 3% of CPU and RAM.

The time where hardware RAID was worthwhile has passed.

I'm not sure what you consider "recent".  I have an AMD Phenom 965, and
I do notice the slowdowns due to software raid compared to hardware
raid, on the very same machine.

I can believe that if you have a battery backed cache module
and your workload includes a lot of synchronous writes. But
for that workload you would probably be better off getting an
SSD and using ZFS with ZIL in terms of total cost, performance
and reliability.

If you are forcing write-cache to on without a BBU, then
you might as well just LD_PRELOAD=libeatmydata.so in terms
of data safety in case of a crash or power outage.

Besides, try to find a board that has more than only six SATA ports or
that can do both SAS and SATA.  There are few, and they are the more
expensive ones.  Perhaps the lack of ports is not so much of a problem
with the available disk capacities nowadays; however, it is what made me
get a hardware raid controller.

Hardware RAID is, IMO, far too much of a liability with
modern disks. Latent sector errors happen a lot more
often than most people realize, and there are error
situations that hardware RAID cannot meaningfully handle.

Also I will be using ZFS and my Dom0 will be Fedora.

Fedora for a dom0 is a rather bad choice.  Fedora is an experimental
testing distribution with a very limited lifetime and prone to
experience lots of unexpected or undesirable changes.
[...]

True, and very unfortunate. Doubly so because my preferred distro (EL)
is based on Fedora. The problem is that the quality (or lack thereof)
trickles down, even after a lot of polishing.

I can say that the quality of Debian has been declining quite a lot over
the years and can't say that about Fedora.  I haven't used Fedora that
long, and it's working quite well.

Depends on what your standards and requirements are, I suppose.
I have long bailed on Fedora other than for experimental testing
purposes to get an idea of what to expect in the next EL. And
enough bugs filter down to EL despite the lengthy stabilization
stage that it's becoming quite depressing.

The question I am still pondering is whether I should get an E3 Xeon
(no E3's with IGP are sold in my country), an 6-core E5 Xeon or AMD FX
8***,

Is power consumption an issue you need to consider?


As someone suggested, it might be a good idea to go for certified
hardware. My server is going down about every 24 hours with a flood of
messages in dom0 like "aacraid 0000:04:00.0: swiotlb buffer is full
(sz:
4096 bytes)", and it's actual server hardware.  I made a bug report a
while ago; nobody cares and I keep pressing the reset button.  You
probably don't want to end up like that.

Hardware RAID is just downright evil.

I don't think that the problem is due to the raid controller.  The
driver for it is considered very stable and mature, and there's a theory
that this problem might have to do with some sort of memory
misalignments or something which results in the block layer being
supposed to write out stuff via DMA from places it cannot really access.
If that is true, every dom0 under xen is prone to the same problem,
regardless whether software or hardware raid is used.

I've looked at the code --- it seems that the relevant part of the
kernel hands the write request back to the xen part, telling it that it
failed and expecting the problem to be handled somewhere else. I didn't
trace it any further because there's no point: I won't be able to fix
this anyway.

I suspect that the problem occurs only under certain circumstances, like
depending on the number of VMs and on how they are set up.

I find that on my motherboard most RAID controllers don't work
at all with IOMMU enabled. Something about the way the transparent
bridging native PCIX RAID ASICs to PCIe makes things not work.

Cheap SAS cards, OTOH, work just fine, and at a fraction of
the cost.

I use plain old SATA with on-board and cheap add-in controllers and
find that to be by far the least problematic combination.

I haven't found any cheap SATA controller that looked like it would work
with Linux and like it was a decent piece of hardware.  Look at those
that seem decent, and a used SAS/SATA RAID controller is cheaper and
much more capable than the SATA ones.

As I said, I had far more problems with SAS RAID cards than SATA
controllers, and I use PMPs on top of those SAS controllers. I
might look at alternatives if I was running on pure solid state
but for spinning rust SATA+PMP+FIS+NCQ yields results that a
hardware RAID controller wouldn't likely improve on.

And after spending quite a bunch of money on hardware, you might
want to
use something else than xen.  Give it a try and set up a couple VMs
on a
testing machine so you get an idea of what you're getting into, and
reconsider.

Alternatives aren't better, IMO. Having tried Xen, VMware and KVM,
Xen was the only one I managed to (eventually) get working in the
way I originally envisaged.

Hm, I find that surprising.  I haven't tried VMware and thought that as
a commercial product, it would make it easy to set up some VMs and to
run them reliably.

It's fine as long as you don't have quirky hardware.
Unfortunately, most hardware is buggy to some degree,
in which case things like PCI passthrough are likely
to not work at all.

With Xen there is always the source that can be modified
to work around at least the more workaroundable problems.
And unlike on the KVM development lists, Xen developers
actually respond to questions about working around such
hardware bugs.

KVM/QEMU I tried years ago, and it seemed much more
straightforward than xen does now, which appears to be very chaotic.

Now try using it without virt-manager.

After all, I'm not convinced that virtualization as it's done with xen
and the like is the right way to go.  It has advantages and solves some
problems while creating disadvantages and other problems.  It's like
going back to mainframes because the hardware has become too powerful,
using software to turn this very hardware into "multiframes" --- and
then finding out that it doesn't work so well because the hardware,
though powerful enough, never was designed for it.  It's like using an
axe on the hardware to cut it into pieces and expecting such pieces to
be particularly useful.

I am not a fan of virtualization for most workloads, but sometimes
it is convenient, not least in order to work around deficiencies of
other OS-es you might want to run. For example, I don't want to
maintain 3 separate systems - partitioning up one big system is
much more convenient. And I can run Windows gaming VMs while
still having the advantages of easy full system rollbacks by
having my domU disks backed by ZFS volumes. It's not for HPC
workloads, but for some things it is the last unsuitable solution.

Gordan

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.