[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [win-pv-devel] Problems with xenvbd



> -----Original Message-----
> From: Fabio Fantoni [mailto:fabio.fantoni@xxxxxxx]
> Sent: 21 August 2015 14:14
> To: RafaÅ WojdyÅa; Paul Durrant; win-pv-devel@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: [win-pv-devel] Problems with xenvbd
> 
> Il 21/08/2015 10:12, Fabio Fantoni ha scritto:
> > Il 21/08/2015 00:03, RafaÅ WojdyÅa ha scritto:
> >> On 2015-08-19 23:25, Paul Durrant wrote:
> >>>> -----Original Message----- From:
> >>>> win-pv-devel-bounces@xxxxxxxxxxxxxxxxxxxx [mailto:win-pv-devel-
> >>>> bounces@xxxxxxxxxxxxxxxxxxxx] On Behalf Of Rafal Wojdyla Sent: 18
> >>>> August 2015 14:33 To: win-pv-devel@xxxxxxxxxxxxxxxxxxxx Subject:
> >>>> [win-pv-devel] Problems with xenvbd
> >>>>
> >>>> Hi,
> >>>>
> >>>> I've been testing the current pvdrivers code in preparation for
> >>>> creating upstream patches for my xeniface additions and I noticed
> >>>> than xenvbd seems to be very unstable for me. I'm not sure if it's
> >>>> a problem with xenvbd itself or my code because it seemed to only
> >>>> manifest when the full suite of our guest tools was installed along
> >>>> with xenvbd. In short, most of the time the system crashed with
> >>>> kernel memory corruption in seemingly random processes shortly
> >>>> after start. Driver Verifier didn't seem to catch anything. You can
> >>>> see a log from one such crash in the attachment crash1.txt.
> >>>>
> >>>> Today I tried to perform some more tests but this time without our
> >>>> guest tools (only pvdrivers and our shared libraries were
> >>>> installed). To my surprise now Driver Verifier was crashing the
> >>>> system every time in xenvbd (see crash2.txt). I don't know why it
> >>>> didn't catch that previously... If adding some timeout to the
> >>>> offending wait doesn't break anything I'll try that to see if I can
> >>>> reproduce the previous memory corruptions.
> >>>>
> >>> Those crashes do look odd. I'm on PTO for the next week but I'll have
> >>> a look when I get back to the office. I did run verifier on all the
> >>> drivers a week or so back (while running vbd plug/unplug tests) but
> >>> there have been a couple of changes since then.
> >>>
> >>> Paul
> >>>
> >> No problem. I attached some more logs. The last one was during system
> >> shutdown, after that the OS failed to boot (probably corrupted
> >> filesystem since the BSOD itself seemed to indicate that). I think every
> >> time there is a BLKIF_RSP_ERROR somewhere but I'm not yet familiar with
> >> Xen PV device interfaces so not sure what that means.
> >>
> >> In the meantime I've run more tests on my modified xeniface driver to
> >> make sure it's not contributing to these issues but everything seemed to
> >> be fine there.
> >>
> >>
> >
> > I also had a disk corruption on windows 10 pro 64 bit with pv drivers
> > build of 11 august but I'm not sure that is related to winpv drivers,
> > on same domU I started testing also snapshot with qcow2 disk overlay.
> > For this case I don't have useful information because don't try to
> > boot windows at all but if rehappen I'll try to take other useful
> > information.
> 
> Happen another time but also this I was unable to understand what is
> exactly the cause.
> On windows reboot all seems was ok and did a clean shutdown but on
> reboot seabios don't found bootable disk and qemu log don't show useful
> informations.
> qemu-img check show errors:
> > /usr/lib/xen/bin/qemu-img check W10.disk1.cow-sn1
> > ERROR cluster 143 refcount=1 reference=2
> > Leaked cluster 1077 refcount=1 reference=0
> > ERROR cluster 1221 refcount=1 reference=2
> > Leaked cluster 2703 refcount=1 reference=0
> > Leaked cluster 5212 refcount=1 reference=0
> > Leaked cluster 13375 refcount=1 reference=0
> >
> > 2 errors were found on the image.
> > Data may be corrupted, or further writes to the image may corrupt it.
> >
> > 4 leaked clusters were found on the image.
> > This means waste of disk space, but no harm to data.
> > 27853/819200 = 3.40% allocated, 22.65% fragmented, 0.00% compressed
> > clusters
> > Image end offset: 1850736640
> I created it with:
> /usr/lib/xen/bin/qemu-img create -o
> backing_file=W10.disk1.xm,backing_fmt=raw -f qcow2 W10.disk1.cow-sn1
> and changed the xl domU configuration:
> disk=['/mnt/vm2/W10.disk1.cow-sn1,qcow2,hda,rw',...
> Dom0 is with xen 4.6-rc1 and qemu 2.4.0
> DomU is windows 10 pro 64 bit with pv drivers build of 11 august
> 
> How I can know for sure if it is a winpv or qemu or other problem and
> take useful information to report?
> 
> Thanks for any reply and sorry for my bad english.

This sounds very much like a lack of synchronization somewhere. I recall seeing 
other problems of this ilk when someone was messing around with O_DIRECT for 
opening images. I wonder if we are missing a flush operation on shutdown.

  Paul

_______________________________________________
win-pv-devel mailing list
win-pv-devel@xxxxxxxxxxxxxxxxxxxx
http://lists.xenproject.org/cgi-bin/mailman/listinfo/win-pv-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.