[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode

To: Shaohua Li <shli@xxxxxxxxxx>
From: MasterPrenium <masterprenium.lkml@xxxxxxxxx>
Date: Sat, 13 May 2017 02:06:31 +0200
Cc: linux-raid@xxxxxxxxxxxxxxx, xen-users@xxxxxxxxxxxxx, "MasterPrenium@xxxxxxxxx" <MasterPrenium@xxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Sat, 13 May 2017 00:07:02 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

Hi guys,

My issue is still remaining with new kernels, at least last revision of4.10.x branch.

But I found something that can be interesting for investigations, here Iattached another .config file for kernel building, with thisconfiguration I'm not able to reproduce the kernel panic, I got no crashat all with exactly the same procedure.


Tested on 4.9.16 kernel and 4.10.13 :

- config_Crash.txt : result in a crash running fio within less than 2minutes- config_NoCrash.txt : even after hours of fio, rebuilding arrays, etc... no crash at all, neither no warning or anything in dmesg.

Note : config_NoCrash is coming from another server on which I had setupsimilar system and which was not crashing. Tested this kernel on mycrashing system, and no crash anymore...


I can't believe how a different config can solve a kernel BUG...

If someone has any idea...

Bests,


Le 09/01/2017 à 23:44, Shaohua Li a écrit :

On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote:

Hello,

Replies below + :
- I don't know if this can help but after the crash, when the system
reboots, the Raid 5 stack is re-synchronizing
[   37.028239] md10: Warning: Device sdc1 is misaligned
[   37.028541] created bitmap (15 pages) for device md10
[   37.030433] md10: bitmap initialized from disk: read 1 pages, set 59 of
29807 bits

- Sometimes the kernel completely crash (lost serial + network connection),
sometimes only got the "BUG" dump, but still have network access (but a
reboot is impossible, need to reset the system).

- You can find blktrace here (while running fio), I hope it's complete since
the end of the file is when the kernel crashed : https://goo.gl/X9jZ50

Looks most are normal full stripe writes.

I'm trying to reproduce, but no success. So
ext4->btrfs->raid5, crash
btrfs->raid5, no crash
right? does subvolume matter? When you create the raid5 array, does adding
'--assume-clean' option change the behavior? I'd like to narrow down the issue.
If you can capture the blktrace to the raid5 array, it would be great to hint
us what kind of IO it is.

Yes Correct.
The subvolume doesn't matter.
-- assume-clean doesn't change the behaviour.

so it's not a resync issue.

Don't forget that the system needs to be running on xen to crash, without
(on native kernel) it doesn't crash (or at least, I was not able to make it
crash).

Regarding your patch, I can't find it. Is it the one sent by Konstantin
Khlebnikov ?

Right.

It doesn't help :(. Maybe the crash is happening a little bit later.

ok, the patch is unlikely helpful, since the IO size isn't very big.

Don't have good idea yet. My best guess so far is virtual machine introduces
extra delay, which might trigger some race conditions which aren't seen in
native.  I'll check if I could find something locally.

Thanks,
Shaohua

Attachment: Config_Crash.txt
Description: Text document

Attachment: Config_NoCrash.txt
Description: Text document

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
  - From: MasterPrenium

Prev by Date: Re: [Xen-devel] [PATCH v3 04/29] x86: assembly, use ENDPROC for functions
Next by Date: [Xen-devel] [xen-4.7-testing baseline-only test] 71300: tolerable trouble: blocked/broken/fail/pass
Previous by thread: [Xen-devel] [qemu-mainline test] 109351: regressions - FAIL
Next by thread: Re: [Xen-devel] PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.