[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Cheap IOMMU hardware and ECC support importance



Gordan Bobic <gordan@xxxxxxxxxx> writes:

> On 06/28/2014 12:25 PM, lee wrote:
>> Kuba <kuba.0000@xxxxx> writes:

>> SSD caching
>> means two extra disks for the cache (or what happens when the cache disk
>> fails?),
>
> For ZIL (write caching), yes, you can use a mirrored device. For read
> caching it obviously doesn't matter.

That's not so obvious --- when the read cache fails, ZFS would
automatically have to resort to the disks.

>> and ZFS doesn't increase the number of SAS/SATA ports you have.
>
> No, but it does deprecate the RAID and caching parts of a controller,

Why does it deprecate them?

> so you might as well just use an HBA (cheaper). Covering the whole
> stack, ZFS can also make much better use of on-disk caches (my 4TB
> HGSTs have 64MB of RAM each. If you have 20 of them on a 4-port SATA
> card with a 5-port multiplier on each port,

There are multipliers for SATA ports?  Can you connect SAS disks to them
as well?  Do the disks show up individually or bundled when you use one?
Aren't they getting into each others ways, filling up the bandwidth of
the port?

> that's 1280MB of cache - more than any comparably priced caching
> controller. Being aware of FS level operations, ZFS can be much
> cleverer about exactly when to flush what data to what disk. A caching
> controller, in contrast, being unaware of what is actually going on at
> file system level, cannot leverage the on-disk cache for
> write-caching, it has to rely on it's own on-board cache for
> write-caching, thus effectively wasting those 1280MB of disk cache.

That's a very good point.  Even if you don't have 20 disks, every bit of
cache wasted is a bit too much.

>> How does it do the checksumming?
>
> Every block is checksummed, and this is stored and checked on every
> read of that block. In addition, every block (including it's checksum)
> are encoded for any extra redundancy specified (e.g. mirroring or n+1,
> n+2 or n+3). So if you read the block, you also read the checksum
> stored with it, and if it checks out, you hand the data to the app
> with nothing else to be done. If the checksum doesn't match the data
> (silent corruption), or read of one of the disks containing a piece of
> the block fails (non-silent corruption, failed sector)), ZFS will go
> and

And? Correct the error?

So it's like RAID built into the file system?  What about all the CPU
overhead?

>> Read everything after it's been written to verify?
>
> No, just written with a checksum on the block and encoded for extra
> redundancy.

That means you don't really know whether the data has been written as
expected before it's read.

> If you have Seagate disks that support the feature you can
> enable Write-Read-Verify at disk level. I wrote a patch for hdparm for
> toggling the feature.

Only 4 small SAS disks are Seagates (I only put two of them in), the
rest is WD SATAs --- and I'm starting to suspect that the RAID
controller in the server doesn't like the WD disks at all, which causes
the crashes.  Those disks weren't made at all for this application.

>> I'll consider using it next time I need to create a file system.
>
> ZFS is one of those things that once you start using them you soon
> afterwards have no idea how you ever managed without them. And when
> you have to make do without them, it feels like you're trying to read
> braille with hooks.

Maybe, if it's simple to use.


-- 
Knowledge is volatile and fluid.  Software is power.

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.