[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36



Hi:
 
    I finally captured extents overlaped in the ext4. But still wondering how 
it happen.
 
    I checked overlap for the last extent in the tree at the very beginning of 
ext4_ext_convert_to_initialized. Messages.12 attached show the overlap found.
 
Line 8-10: 3467:[1]15:57921642 3468:[0]14:57921643 has overlaped.
 
 
8 Sep 15 08:27:39 xmao kernel:  3331:[0]7:53750025 3338:[0]8:53750033 
3346:[0]1:53848953 3347:[0]7:53848955 3354:[0]1:53848969 3355:[0]7:53848971 
3362:[0]1:53848985 3363:[0]7:56996848 3370:[0]1:57606144 3371:[0]7:57795290 
3378:[0]1:57814407 3379:[0]7:57858606 3386:[0]8:57858620 3394:[0]1:57858629 
3395:[0]8:57858637 3403:[0]7:57858646 3410:[0]1:57858661 3411:[0]8:57858669 
3419:[0]7:57858678 3426:[0]8:57858692 3434:[0]1:57858701 3435:[0]7:57858709 
3442:[0]1:57858717 3443:[0]7:57858725 3450:[0]1:57858733 3451:[0]7:57858741 
3458:[0]1:57858749 3459:[0]7:57858757 3466:[0]1:57921634 3467:[1]15:57921642 
9  Sep 15 08:27:39 xmao kernel: Displaying leaf extents for inode 12339004
10 Sep 15 08:27:39 xmao kernel: 3468:[0]14:57921643 3482:[0]1:57921664 
3483:[0]7:57921666 3490:[0]1:57921680 3491:[0]8:57921682 3499:[0]7:57921691 
3506:[0]8:57921705 3514:[0]1:57921714 3515:[0]7:57921722 3522:[0]41:57916683 
3563:[0]7:58159767 3570:[0]1:58159781 3571:[0]7:58238992 3578:[0]1:58288144 
3579:[0]7:58327750 3586:[0]1:58579969 3587:[0]7:58954838 3594:[0]1:59006641 
3595:[0]7:59006643 3602:[0]1:59006657 3603:[0]7:59006659 3610:[0]8:59006673 
3618:[0]8:59006688 3626:[0]470:58982658 4096:[0]3:58987732 4099:[0]1:58992655 
4100:[0]7:59143253 4107:[0]1:59171840 4108:[0]7:59183878 4115:[0]1:59192886 
4116:[0]8:59593463 4124:[0]8:59669484 4132:[0]7:73086538 4139:[0]1:73352801 
4140:[0]7:73339273 4147:[0]1:73526280 4148:[0]8:78229012 4156:[0]1:78229021 
4157:[0]7:78818388 4164:[0]1:79069383 4165:[0]7:79428616 4172:[0]1:80490925 
4173:[0]7:81439488 4180:[0]1:82854062 4181:[0]7:83462272 4188:[0]1:83656904 
4189:[0]7:89127381 4196:[0]1:89584313 4197:[0]8:91592930 4205:[0]7:91592945 
4212:[0]1:91592953 4213:[0]7:91592961 422

 
I also dumped file in disk use filefrag which show no overlap, no extent 
3468:[0]14:57921643.
 
 ext logical physical expected length flags
....
 337    3459 57858757 57858749      7
 338    3466 57921634 57858763      1 unwritten
 339    3467 57921642 57921634     15 unwritten
 340    3482 57921664 57921656      1
 341    3483 57921666 57921664      7
.....
 
There is one assumption, After 3468:[0]14:57921643 successfully inserted,  
there is something err happen. 
At the bottom of ext4_ext_convert_to_initialized, fix_extent_len will fix the 
origin ex ee_len.(Later I will do the err check)
 
3403 fix_extent_len:
3404     ex->ee_block = orig_ex.ee_block;
3405     ex->ee_len   = orig_ex.ee_len;
3406     ext4_ext_store_pblock(ex, ext_pblock(&orig_ex));
3407     ext4_ext_mark_uninitialized(ex);
3408     ext4_ext_dirty(handle, inode, path + depth);
 
 
Any comments?
 
 
Well, but something strange messages.12.
 
 
message.12 is from another machine, it log is printf right before 
BUG_ON(newext->ee_block == nearex->ee_block);
strange is 14412:[1]16:9927's pblock is much different from 
14411:[0]1:222332613.

 
1993         if(newext->ee_block == nearex->ee_block){
1994             len = (EXT_MAX_EXTENT(eh) - nearex) * sizeof(struct 
ext4_extent);
1995             len = len < 0 ? 0 : len;
1996             printk("old_depth %d depth %d old_path %p path %p 
next_has_free %d next %llu\n",
1997                     old_depth, depth, old_path, path, next_has_free, 
(unsigned long long)next);
2004 
2005             printk("insert %d:%llu:[%d]%d before: nearest 0x%p, "
2006                     "move %d from 0x%p to 0x%p\n",
2007                     le32_to_cpu(newext->ee_block),
2008                     ext_pblock(newext),
2009                     ext4_ext_is_uninitialized(newext),
2010                     ext4_ext_get_actual_len(newext),
2011                     nearex, len, nearex + 1, nearex + 2);
2012             ext4_ext_show_leaf_xmao(inode, old_path);
2013             ext4_ext_show_leaf_xmao(inode, path);
2014         };
2015         BUG_ON(newext->ee_block == nearex->ee_block);
 
 
 
Sep 13 16:16:35 xmao kernel: 57:[0]31:157254721 12288:[0]54:157503830 
12342:[0]10:157503884 12352:[0]5:157534763 12357:[0]1:157534768 
12358:[0]58:157534769 12416:[0]64:157567168 12480:[0]13:158051261 
12493:[0]73:172263095 12566:[0]24:172265399 12590:[0]71:172521859 
12661:[0]71:172627897 12732:[0]71:172733735 12803:[0]69:172722619 
12872:[0]9:172764859 12881:[0]42:110500028 12923:[0]86:143030061 
13009:[0]86:143119859 13095:[0]48:143173376 13143:[0]16:195333586 
13159:[0]32:197526105 13191:[0]40:198875861 13231:[0]39:198872300 
13270:[0]5:199663576 13275:[0]26:200964192 13301:[0]36:202015708 
13337:[0]47:202221682 13384:[0]9:202221729 13393:[0]58:202624966 
13451:[0]12:202606535 13463:[0]35:212117725 13498:[0]35:212135811 
13533:[0]34:212115513 13567:[0]32:212108608 13599:[0]29:212144185 
13628:[0]50:231280420 13678:[0]38:231645389 13716:[0]13:231645427 
13729:[0]51:231650765 13780:[0]50:231647658 13830:[0]54:231985340 
13884:[0]24:231981259 13908:[0]64:105098731 13972:[0]87:136696745 
14059:[0]45:136700237 14104:[0]61:2
Sep 13 16:16:35 xmao kernel: 3651 14165:[0]69:222042299 14234:[0]68:222044092 
14302:[0]34:222091761 14336:[0]68:222172860 14404:[0]7:222332606 
14411:[0]1:222332613 
Sep 13 16:16:35 xmao kernel: Displaying leaf extents for inode 30685060
Sep 13 16:16:35 xmao kernel: 14412:[1]16:9927 14428:[1]41:13213 
14469:[1]1:13254 14470:[0]67:222673085 
 
 
Also, filefrag show extents is ok.
 
 336   14302 222091761 222044159     34
 337   14336 222172860 222091794     68
 338   14404 222332606 222172927      7
 339   14411 222332613              59 unwritten
 340   14470 222673085 222332671     67
 341   14537 222848155 222673151     43
 342   14580 165617358 222848197     56
 343   14636 165777353 165617413     55
 344   14691 165961927 165777407     57
 
seems 14412:[1]16:9927 14428:[1]41:13213 14469:[1]1:13254  is unexpected.
 
 
Many thanks.
----------------------------------------
> From: tinnycloud@xxxxxxxxxxx
> To: jeremy@xxxxxxxx
> CC: konrad.wilk@xxxxxxxxxx; linux-ext4@xxxxxxxxxxxxxxx; 
> xen-devel@xxxxxxxxxxxxxxxxxxx
> Subject: RE: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36
> Date: Wed, 7 Sep 2011 10:35:21 +0800
>
>
>
>
> ----------------------------------------
> > Date: Tue, 6 Sep 2011 11:55:02 -0700
> > From: jeremy@xxxxxxxx
> > To: tinnycloud@xxxxxxxxxxx
> > CC: konrad.wilk@xxxxxxxxxx; linux-ext4@xxxxxxxxxxxxxxx; 
> > xen-devel@xxxxxxxxxxxxxxxxxxx
> > Subject: Re: [Xen-devel] RE: ext4 BUG in dom0 Kernel 2.6.32.36
> >
> > On 09/06/2011 08:11 AM, MaoXiaoyun wrote:
> > >
> > > > Date: Tue, 6 Sep 2011 10:53:47 -0400
> > > > From: konrad.wilk@xxxxxxxxxx
> > > > To: tinnycloud@xxxxxxxxxxx
> > > > CC: linux-ext4@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxx;
> > > jeremy@xxxxxxxx
> > > > Subject: Re: ext4 BUG in dom0 Kernel 2.6.32.36
> > > >
> > > > On Tue, Sep 06, 2011 at 03:24:14PM +0800, MaoXiaoyun wrote:
> > > > >
> > > > >
> > > > > Hi:
> > > > >
> > > > > I've met an ext4 Bug in dom0 kernel 2.6.32.36. (See kernel stack
> > > below)
> > > >
> > > > Did you try the 3.0 kernel?
> > > No, I am afried the change would be to much for our current env.
> > > May result in other stable issue.
> > > So, I want to dig out what really happen. Hopes.
> >
> > Another question is whether this is a regression compared to earlier
> > versions of 2.6.32? Do you know if this problem exists in a non-Xen
> > environment?
> >
>
> There are some others reports this issue in non-xen env.
> http://markmail.org/message/ywr4nfgiuvgdcr7y
> http://www.spinics.net/lists/linux-ext4/msg21066.html
>
> The difficulty is I haven't find a efficient way to reproduce it.
> (Currently it only show in our cluster, redeploy our cluster may cost 3days 
> more. )
>
>
> > Thanks,
> > J
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html           
> >                           

Attachment: messages.12
Description: Binary data

Attachment: messages.15
Description: Binary data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.