[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] block backend issues



On 14.06.2012 14:27, Marek Marczykowski wrote:
> On 08.06.2012 15:11, Marek Marczykowski wrote:
>> Hey,
>>
>> I've faced strange problem with block devices. When trying to read some file
>> (from read-only ext3), everything looks good, except that file content is
>> corrupted! But this can be coincidence (that "failed" reads doesn't hit
>> filesystem metadata).
>> fsck in dom0 on filesystem image returns no errors.
>> fsck (with -nf flags) in domU on the device causes the kernel to output
>> "blkfront: flush disk cache: empty write xvdd op failed", "blkfront: xvdd:
>> barrier or flush: disable". And returns no filesystem errors. From that 
>> point,
>> file reads return correct file content. For most cases dropping block cache
>> (echo 3 > /proc/sys/vm/drop_caches) or remounting device also "fixes" the 
>> problem.
>>
>> On RW device (with different size, filesystem and content), domU kernel
>> complains about EXT4 errors.
>> Doesn't observed such strange issues on device-mapper backed devices.
>>
>> On 3.2.7 it worked, problem observed on 3.3.5 and 3.4 in dom0, regardless of
>> domU kernel (tried 3.2.7, 3.3.5, 3.4.0).
>>
>> I've suspected feature-flush-cache/feature-barrier, but when disabled its
>> advertise in blkback code, problem still occurs.
>>
>> Some details:
>> dom0: 3.4.0-1.pvops.qubes.x86_64 (vanilla 3.4 + Konrad's patches for ACPI S3)
>> domU: 3.3.5-1.pvops.qubes.x86_64 (vanilla 3.3.5 + Konrad's patches for ACPI 
>> S3)
> 
> (...)
> Still the case on 3.4.1 with applied patches from Konrad's for-jens-3.5 
> branch.
> I've compared file contents and it differs in (multiply of) 1024 bytes - the
> same as filesystem block size. And only if block wasn't in pagecache in dom0.
> When I flush VM pagecache (echo 1 > /proc/.../drop_caches) after trying to
> read some files (actually md5sum -c), but not dom0 pagecache - problem
> vanished. But if I clean also dom0 pagecache - problem returns.
> 
> Any clues welcomed...

Ok, found the reason. It wasn't blkback fault, even on baremetal,
loopback-mounted image had the same problem. It was caused by "0fc9d104
radix-tree: use iterators in find_get_pages* functions" commit somehow between
3.3 and 3.4. It is already fixed in 3.4.2.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.