[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] mpt3sas bug with Debian jessie kernel only under Xen - "swiotlb buffer is full"



Hi Andrew,

On Sun, Dec 04, 2016 at 03:59:20PM +0000, Andrew Cooper wrote:
> On 04/12/16 08:32, Andy Smith wrote:
> > Under the Debian jessie amd64 kernel (linux-image-3.16.0-4-amd64
> > 3.16.36-1+deb8u2) running under Xen, I cannot put the system's
> > storage under heavy load without receiving a bunch of "swiotlb
> > buffer is full" kernel error messages and severely degraded
> > performance. Sometimes the system panics and reboots itself.

[…]

> Can you try these two patches from the XenServer Patch queue?
> https://github.com/xenserver/linux-3.x.pg/blob/master/master/series#L613-L614

Looking good.

Using those patches I'm ~20 minutes into this now:

Every 2.0s: cat /proc/mdstat                                 Tue Dec  6 
02:16:40 2016

Personalities : [raid1] [raid10]
md5 : active raid10 sdb[0] sda[1]
      1875243008 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU]
      [==>..................]  check = 11.5% (217058176/1875243008) 
finish=133.9min speed=206252K/sec
      bitmap: 0/14 pages [0KB], 65536KB chunk

md4 : active raid10 sdc[0] sdd[1]
      3906886656 blocks super 1.2 512K chunks 2 far-copies [2/2] [UU]
      [>....................]  check =  2.6% (102650880/3906886656) 
finish=674.4min speed=94007K/sec
      bitmap: 0/30 pages [0KB], 65536KB chunk

…where previously it would have given kernel errors within 5
seconds, so I think that fixes it. I will have to perform some more
strenuous testing.

Those two patches did not apply cleanly to source of
linux-image-3.16.0-4-amd64 3.16.36-1+deb8u2. The last bit of each
patch was rejected, so I removed them and put them into a separate
patch file (0003-fixup.patch attached).

I have not done this process in a long time so just for the
archives, my process was as per:

    
https://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official

# mkdir -p /data/debian
# chown andy: /data/debian
# apt-get install build-essential fakeroot
# apt-get build-dep linux
$ cd /data/debian
$ apt-get source linux
$ wget 
https://raw.githubusercontent.com/xenserver/linux-3.x.pg/master/master/0001-dma-add-dma_get_required_mask_from_max_pfn.patch
$ wget 
https://raw.githubusercontent.com/xenserver/linux-3.x.pg/master/master/0002-x86-xen-correct-dma_get_required_mask-for-Xen-PV-gue.patch
$ # remove last parts of each patch file, create 0003-fixup.patch that performs 
equivalent changes
$ cd linux-3.16.36
$ # applying these patches is going to change symbols so changing the abiname
$ # is necessary.
$ # See https://kernel-handbook.alioth.debian.org/ch-versions.html#s-abi-name
$ sed -i -e 's/^abiname: 4/abiname: 4bf/' debian/config/defines
$ fakeroot debian/rules debian/control-real
$ bash debian/bin/test-patches -f amd64 
../0001-dma-add-dma_get_required_mask_from_max_pfn.patch 
../0002-x86-xen-correct-dma_get_required_mask-for-Xen-PV-gue.patch 
../0003-fixup.patch
# dpkg -i ../linux-headers-3.16.0-4bf-amd64_3.16.36-1+deb8u2a~test_amd64.deb 
../linux-image-3.16.0-4bf-amd64_3.16.36-1+deb8u2a~test_amd64.deb

boot into new kernel under Xen

$ uname -a
Linux elephant 3.16.0-4bf-amd64 #1 SMP Debian 3.16.36-1+deb8u2a~test 
(2016-12-05) x86_64 GNU/Linux

I think my next steps should be:

1. Do some more strenuous testing

2. Report bug against source package "linux" in Debian jessie with
   pointer to those two patches.

3. Check if those fixes are already applied in Debian backports
   and/or Debian testing linux package.

> > Dec  4 07:06:00 elephant kernel: [22019.373653] mpt3sas 0000:01:00.0: 
> > swiotlb buffer is full (sz: 57344 bytes)
> > Dec  4 07:06:00 elephant kernel: [22019.374707] mpt3sas 0000:01:00.0: 
> > swiotlb buffer is full
> > Dec  4 07:06:00 elephant kernel: [22019.375754] BUG: unable to handle 
> > kernel NULL pointer dereference at 0000000000000010
> > Dec  4 07:06:00 elephant kernel: [22019.376430] IP: [<ffffffffa004e779>] 
> > _base_build_sg_scmd_ieee+0x1f9/0x2d0 [mpt3sas]
> > Dec  4 07:06:00 elephant kernel: [22019.377122] PGD 0
> 
> This alone is a clear error handling bug in the mpt3sas driver.  It
> hasn't checked the DMA mapping call for a successful mapping before
> following the NULL pointer it got given back.  It is collateral damage
> from the swiotlb buffer being full, but a bug none the less.

Does that require reporting as an upstream linux bug in mpt3sas
then?

Thanks for your help.

Cheers,
Andy

Attachment: 0003-fixup.patch
Description: Text Data

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.