[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] "swiotlb buffer is full" problem with tg3 and kernel 3.16.0-4-686-pae on Xen 4.4.1



Hi,

After upgrading to Debian jessie, and consequently to the default Linux
kernel 3.16.0-4-686-pae and Xen hypervisor 4.4.1-amd64 in that
distribution, I'm having problems with the tg3 network driver under high
load.  Unfortunately this affects a production system that I am
administrating.  It usually happens when doing a DRBD sync.  Here is one
such event:

[ 4765.528635] block drbd0: Began resync as SyncSource (will sync 886784
KB [221696 bits set])
[ 4765.528654] block drbd0: updated sync UUID
09891C136111799E:F7FD1C0A50225596:F7FC1C0A50225596:F7FB1C0A50225596
[ 4768.992280] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4769.400296] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4770.216360] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4771.852283] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.120286] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.776027] tg3 0000:02:00.0: swiotlb buffer is full (sz: 32768 bytes)
[ 4775.778814] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.780995] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.783345] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.785097] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4775.988290] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4776.396285] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4777.212295] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4778.848298] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4781.664292] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4782.120285] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4788.672288] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4793.776046] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 6
[ 4794.752314] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4799.776046] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 5
[ 4801.760290] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4805.776040] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 4
[ 4811.776040] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 3
[ 4817.776050] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 2
[ 4823.776079] drbd base-disk: [drbd_w_base-dis/1734] sock_sendmsg time
expired, ko = 1
[ 4827.936300] tg3 0000:02:00.0: swiotlb buffer is full (sz: 1448 bytes)
[ 4829.776069] drbd base-disk: peer( Secondary -> Unknown ) conn(
SyncSource -> Timeout )
[ 4829.776088] block drbd0: drbd_send_block() failed

Sometimes I also see the message "swiotlb_tbl_map_single: 8 callbacks
suppressed" or similar between the "buffer full" messages.

Sometimes the sync finishes, sometimes it stalls and fails completely.

The problem only occurs when running Linux 3.16.0-4-686-pae under Xen
4.4.1.  It does NOT occur when booting the same kernel without Xen, or
when booting the corresponding amd64 kernel (3.16.0-4-amd64) with or
without Xen.  There was no problem in Debian wheezy before the upgrade
(kernel 3.2.0-4-686-pae and Xen Hypervisor 4.1.3-amd64). The problem
also occurs when only dom0 is running (all domU VMs shut down).

I found the thread "tg3 NIC driver bug in 3.14.x under Xen"
(http://www.spinics.net/lists/netdev/msg324124.html) which looks like a
similar issue, but I don't understand exactly what is going on there and
what I could do to fix or debug it further.

Shall I try to build a 3.16.0-4-686-pae kernel with
"CONFIG_NEED_DMA_MAP_STATE=y"?

Shall I try to set the 'iommu' and/or 'swiotlb' kernel parameters? To
what values?

Any help or hint how to fix or work around this issue is very much
appreciated. Also hints how to debug this further are welcome.

Thanks,
Marco

P.S. Here is some information that might help figuring out what's going on:
-------------------------------------------------------------------
kepler:~# ethtool -S eth0 | grep -v ': 0$'
NIC statistics:
     rx_octets: 42531865
     rx_ucast_packets: 582596
     rx_mcast_packets: 127
     rx_bcast_packets: 1
     tx_octets: 8692263469
     tx_ucast_packets: 5755264
     tx_mcast_packets: 10
-------------------------------------------------------------------

-------------------------------------------------------------------
kepler:~# ethtool -i eth0
driver: tg3
version: 3.137
firmware-version: 5722-v3.09, ASFIPMI v6.03
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
-------------------------------------------------------------------

-------------------------------------------------------------------
kepler:~# lspci -vvv -s 02:00.0
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5722
Gigabit Ethernet PCI Express
        Subsystem: IBM IBM System x3350 (Machine type 4192)
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 59
        Region 0: Memory at e8200000 (64-bit, non-prefetchable) [size=64K]
        Expansion ROM at <ignored> [disabled]
        Capabilities: [48] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] Vital Product Data
                Product Name: Broadcom NetXtreme Gigabit Ethernet Controller
                Read-only fields:
                        [PN] Part number: BCM95722
                        [EC] Engineering changes: 106679-15
                        [SN] Serial number: 0123456789
                        [MN] Manufacture ID: 31 34 65 34
                        [RV] Reserved: checksum good, 28 byte(s) reserved
                Read/write fields:
                        [YA] Asset tag: XYZ01234567
                        [RW] Read-write area: 107 byte(s) free
                End
        Capabilities: [58] Vendor Specific Information: Len=78 <?>
        Capabilities: [e8] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee0200c  Data: 4121
        Capabilities: [d0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 
unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
                        RlxdOrd- ExtTag+ PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit 
Latency L0s
<4us, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive-
BWMgmt- ABWMgmt-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF-
MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+
MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [13c v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 00-21-5e-ff-fe-4d-2c-13
        Capabilities: [16c v1] Power Budgeting <?>
        Kernel driver in use: tg3
-------------------------------------------------------------------

-------------------------------------------------------------------
kepler:~# brctl show
bridge name     bridge id               STP enabled     interfaces
xenbrext0               8000.00215e4d2c14       no              eth1
xenbrint0               8000.00215e4d2c13       no              eth0

kepler:~# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:21:5e:4d:2c:13
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:582865 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5755690 errors:0 dropped:1153 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:42557655 (40.5 MiB)  TX bytes:8692339211 (8.0 GiB)
          Interrupt:16

kepler:~# ifconfig xenbrint0
xenbrint0 Link encap:Ethernet  HWaddr 00:21:5e:4d:2c:13
          inet addr:192.168.2.100  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: 2001:1620:206b:1::2:1/64 Scope:Global
          inet6 addr: fe80::221:5eff:fe4d:2c13/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:582461 errors:0 dropped:0 overruns:0 frame:0
          TX packets:329904 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:32044143 (30.5 MiB)  TX bytes:8330130321 (7.7 GiB)
-------------------------------------------------------------------

-------------------------------------------------------------------
kepler:~# cat /proc/version
Linux version 3.16.0-4-686-pae (debian-kernel@xxxxxxxxxxxxxxxx) (gcc
version 4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt9-3~deb8u1
(2015-04-24)

kepler:~# grep -e SWIOTLB -e CONFIG_NEED_DMA_MAP_STATE /boot/config-*
/boot/config-3.16.0-4-686-pae:CONFIG_SWIOTLB=y
/boot/config-3.16.0-4-686-pae:CONFIG_SWIOTLB_XEN=y

/boot/config-3.16.0-4-amd64:CONFIG_NEED_DMA_MAP_STATE=y
/boot/config-3.16.0-4-amd64:CONFIG_SWIOTLB=y
/boot/config-3.16.0-4-amd64:CONFIG_SWIOTLB_XEN=y
-------------------------------------------------------------------

-------------------------------------------------------------------
kepler:~# xen info
host                   : kepler
release                : 3.16.0-4-686-pae
version                : #1 SMP Debian 3.16.7-ckt9-3~deb8u1 (2015-04-24)
machine                : i686
nr_cpus                : 2
max_cpu_id             : 1
nr_nodes               : 1
cores_per_socket       : 2
threads_per_core       : 1
cpu_mhz                : 2400
hw_caps                :
bfebfbff:20100800:00000000:00000900:0000e39d:00000000:00000001:00000000
virt_caps              :
total_memory           : 8189
free_memory            : 3999
sharing_freed_memory   : 0
sharing_used_memory    : 0
outstanding_claims     : 0
free_cpus              : 0
xen_major              : 4
xen_minor              : 4
xen_extra              : .1
xen_version            : 4.4.1
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xff400000
xen_changeset          :
xen_commandline        : placeholder com1=115200,8n1 console=com1
dom0_mem=4096M,max:4096M
cc_compiler            : gcc (Debian 4.9.2-10) 4.9.2
cc_compile_by          : waldi
cc_compile_domain      : debian.org
cc_compile_date        : Mon Apr  6 19:49:18 UTC 2015
xend_config_format     : 4
-------------------------------------------------------------------

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.