[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen 4.0.0x allows for data corruption in Dom0
On Mon, Mar 08, 2010 at 03:22:32PM -0800, Daniel Stodden wrote: > On Sun, 2010-03-07 at 11:12 -0500, Pasi Kärkkäinen wrote: > > On Sun, Mar 07, 2010 at 02:39:09PM +0000, Keir Fraser wrote: > > > On 07/03/2010 14:36, "Pasi Kärkkäinen" <pasik@xxxxxx> wrote: > > > > > > >> Tried a few times and no luck reproducing so far. I hope some other > > > >> people > > > >> on the list also will give it a go, since it's so easy to try it out. > > > >> > > > > > > > > I'm able to reproduce this with xen/master 2.6.31.6 dom0 kernel (from > > > > 2010-02-20), > > > > but I'm not able to reproduce it with the current xen/stable 2.6.32.9. > > > > > > > > I'll try with the most recent 2.6.31.6 dom0 kernel aswell.. > > > > > > Thanks Pasi! > > > > > > > It seems to happen with the latest xen/master 2.6.31.6 aswell! > > Does this look to you like we're corrupting memory or on-disk storage? > > E.g. does a > $ dd if=/dev/zero bs=1M | hexdump -C > have the same issue? > > I have some initial trouble with the idea that zero.read() in a PV domU > somehow unlearned to scrub a 1M user buffer. > My setup: Dom0 distro: Fedora 12 Xen hypervisor: 4.0.0-rc5 x86_64 Dom0 kernel: latest xen/master 2.6.31.6 x86_64 Xen hypervisor boot options in grub.conf: dom0_mem=1G loglvl=all guest_loglvl=all Dom0 kernel boot options in grub.conf: ro root=/dev/mapper/vg_f12test-lv01 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=fi nomodeset Steps to reproduce the bug: 1. Reboot the system 2. Start a dummy guest using the domU kernel (rpm) provided in the original bugreport: # xm create -c /dev/null memory=400 kernel="vmlinuz-2.6.31.9-1.2.82.xendom0.fc12.x86_64" extra="rootdelay=1000" 3. run in dom0: # dd if=/dev/zero of=test bs=1M count=10000 && sync && sync && xxd test | grep -v "0000 0000 0000 0000 0000 0000 0000 0000" 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 233.621 s, 44.9 MB/s 6039000: 1000 0000 0000 0000 c0b0 ffff 0300 0000 ................ 6039010: 1d5e 06ab b502 0000 1eb2 27b5 ff00 0000 .^........'..... 2dfe9000:3000 0000 0000 0000 a43c 7687 0e00 0000 0........<v..... 2dfe9010:cfc1 ba64 b902 0000 1eb2 27b5 ff00 0000 ...d......'..... 50685000:4800 0000 0000 0000 f954 0f6d 1600 0000 H........T.m.... 50685010:5b1d 0230 bc02 0000 1eb2 27b5 ff00 0000 [..0......'..... 743f9000:6200 0000 0000 0000 e0e2 1ffb 1e00 0000 b............... 743f9010:acc3 e436 bf02 0000 1eb2 27b5 ff00 0000 ...6......'..... As you can see, very easy to reproduce. Now, I "xm destroy" the domU, run "sync" and "echo 3 > /proc/sys/vm/drop_caches" in dom0, and then re-start the dummy domU, and try the other method as requested by Daniel: # dd if=/dev/zero bs=1M | hexdump -C 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * ^C20984+0 records in 20983+0 records out 22002270208 bytes (22 GB) copied, 206.353 s, 107 MB/s So that method didn't show the corruption.. Now immediately after (no domU restart) let's try to reproduce again with the dd + xxd method: # dd if=/dev/zero of=test bs=1M count=10000 && sync && sync && xxd test | grep -v "0000 0000 0000 0000 0000 0000 0000 0000" 10000+0 records in 10000+0 records out 10485760000 bytes (10 GB) copied, 258.85 s, 40.5 MB/s 7dc2000: 5a02 0000 0000 0000 760d d90c c500 0000 Z.......v....... 7dc2010: 3785 8def 8003 0000 1eb2 27b5 ff00 0000 7.........'..... 2dc0d000:7802 0000 0000 0000 ec70 d8eb ce00 0000 x........p...... 2dc0d010:6fb9 a66d 8403 0000 1eb2 27b5 ff00 0000 o..m......'..... So it seems to be related to disk IO in dom0? -- Pasi _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |