[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] BUG: ext3 corruption in domU



On Thu, May 30, 2013 at 2:36 PM, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
> On Wed, May 29, 2013 at 07:53:39AM -0400, Anthony Sheetz wrote:
>> Is there anything else I can get you at this time to help troubleshoot this?
>
> Well, this reminds me of a ext3 bug in the 2.6.32 stable tree that
> the maintainer of ext3 would not want to backport the fix. It was an
> bug that caused corruption.
>
> If I could just remember the email thread about it.
>>
>> On Fri, May 24, 2013 at 10:20 AM, Konrad Rzeszutek Wilk
>> <konrad.wilk@xxxxxxxxxx> wrote:
>> > On Thu, May 23, 2013 at 02:19:50PM -0400, Anthony Sheetz wrote:
>> >> On Wed, May 22, 2013 at 4:10 PM, Konrad Rzeszutek Wilk
>> >> <konrad.wilk@xxxxxxxxxx> wrote:
>> >> > On Mon, Apr 22, 2013 at 01:26:34PM +0100, Ian Campbell wrote:
>> >> >> Konrad is on vacation this week, so it'll probably be next week before
>> >> >> this gets looked at by him.
>> >> >
>> >> > And I finally got to this email in my 'vacation-mbox'
>> >> >>
>> >> >> Ian.
>> >> >>
>> >> >> On Mon, 2013-04-22 at 13:22 +0100, Anthony Sheetz wrote:
>> >> >> > I realize folks are pretty busy, but we're still interested in 
>> >> >> > getting
>> >> >> > this problem solved, and I want to be sure it's not lost in the
>> >> >> > shuffle.
>> >> >> > Any chance of getting some attention for it?
>> >> >> >
>> >> >> > On Wed, Apr 17, 2013 at 9:00 AM, Ian Campbell 
>> >> >> > <Ian.Campbell@xxxxxxxxxx> wrote:
>> >> >> > > On Tue, 2013-04-16 at 18:39 +0100, Anthony Sheetz wrote:
>> >> >> > >> (re-sending, first message seems to have gotten lost)
>> >> >> > >>
>> >> >> > >> I was referred here by Ian Campbell ijc@xxxxxxxxxxxxxx from 
>> >> >> > >> bugs.debian.org.
>> >> >> > >
>> >> >> > > I'm here too (different hat ;-)), thanks for posting it here. I've 
>> >> >> > > added
>> >> >> > > some people who know about the block stuff to the CC.
>> >> >> > >
>> >> >> > > Guys, my suspicion is that the issue is that barriers issued by 
>> >> >> > > ext3
>> >> >> > > inside the guest aren't making it all the way down the
>> >> >> > > ext3->blkfront->blkback->lvm->dm-crypt->disk chain leading the
>> >> >> > > filesystem to eventually corrupt itself.
>> >> >> > >
>> >> >> > > The issue seems to relate to the use of dm-crypt since
>> >> >> > > ext3->blkfront->blkback->lvm->disk is reported work fine.
>> >> >> > >
>> >> >> > > However there is no problem with the local dom0 ext3 root 
>> >> >> > > filesystem
>> >> >> > > which is also in the same lvm VG on the crypt device (i.e.
>> >> >> > > ext3->lvm->dm-crypt->disk), so its not purely a dm-crypt issue. I 
>> >> >> > > figure
>> >> >> > > something is up at the blkfront->back link which causes the 
>> >> >> > > barriers
>> >> >> > > which blkback is injecting into the block subsystem either don't 
>> >> >> > > make it
>> >> >> > > to the dm-crypt layer or do not DTRT once they arrive.
>> >> >> > >
>> >> >> > > I'm not really sure with how to proceed (or how to ask Anthony to
>> >> >> > > proceed) with verifying any part of that hypothesis though.
>> >> >> > >
>> >> >> > > ISTR issues with old vs new style barriers or barriers with no 
>> >> >> > > data in
>> >> >> > > them or something, could this be related to that? (or am I 
>> >> >> > > thinking of
>> >> >> > > DISCARD?)
>> >> >
>> >> > You are using two different kernel versions. The 2.6.32 domU is only 
>> >> > using
>> >> > WRITE_BARRIERs, while in the 3.2 kernels that have been completly 
>> >> > eliminated.
>> >> > The mechanism they use is called 'WRITE_FLUSH'. The 3.2 kernel has a 
>> >> > patch:
>> >> > ommit 29bde093787f3bdf7b9b4270ada6be7c8076e36b
>> >> > Author: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
>> >> > Date:   Mon Oct 10 00:42:22 2011 -0400
>> >> >
>> >> >     xen/blkback: Support 'feature-barrier' aka old-style BARRIER 
>> >> > requests.
>> >> >
>> >> >
>> >> > which emulates the barrier request by draining all of the oustanding 
>> >> > I/Os and then
>> >> > sending the WRITE_FLUSH.
>> >> >
>> >> > But it looks like you are hitting an issue here. Just to make sure
>> >> > that is the case, what happens if you use the _same_ kernel in both 
>> >> > dom0 and
>> >> > domU? Does it work then?
>> >> >
>> >>
>> >> First, thank you so much for getting back to me, it's really appreciated.
>> >> At this point I've forgotten if I did this with Wheezy on Wheezy, and
>> >> what the result was.
>> >> I'll have to test using the 3.2 kernel on the domU Debian Squeeze and
>> >> get back to you. I should be able to do that early next week.
>> >
>> > Thank you. Also when you do this test, could you also provide the 
>> > 'xenstore-ls'
>> > output from dom0? And the 'dmesg' output from the guest (or at least
>> > the 'xl console <guest> | tee /tmp/log' ? That would give me and idea if
>> > the frontend/backend have the right negotiation parameters.
>> >
>> > Have a good weekend!
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@xxxxxxxxxxxxx
>> http://lists.xen.org/xen-devel
>>

Is there anything I can do at this point to help with this bug?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.