|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Network blocked after sending several packets larger than 128 bytes when using Driver Domain
On 19/03/15 03:40, openlui wrote: Yes, you are correct: the first slot (at most 128 bytes from it) is
grant copied to a locally allocated skb, whilst the rest is grant mapped
from the guest's memory in this case.
I would try out an upstream kernel, there were some grant mapping changes recently, maybe that solves your issue.I am wondering if it is the problem caused by PCI-passthrough and DMA operations, or if there is some wrong configuration in our environment. How can I continue to debug this problem? I am looking forward to your replay and advice, Thanks. The environment we used are as follows: a. Dom0: SUSE 12 (kernel: 3.12.28) b. XEN: 4.4.1_0602.2 (provided by SUSE 12) c. DomU: kernel 3.17.4 d. Driver Domain: kernel 3.17.8 Also, have you set the kernel's loglevel to DEBUG? ixgbe also has a modul parameter to enable further logging. e. OVS: 2.1.2 f. Host: Huawei RH2288, CPU Intel Xenon E5645@xxxxxxx, disabled HyperThread, enabled VT-d g. pNIC: we tried Intel 82599 10GE NIC (ixgbe v3.23.2), Intel 82576 1GE NIC (igb) and Broadcom NetXtreme II BCM 5709 1GE NIC (bnx2 v2.2.5) h. para-virtulization driver: netfront/netback i. MTU: 1500 The detailed Logs in Driver Domain after the network is blocked are as follows: 1. When using 82599 10GE NIC, syslog and dmesg includes infos below. The log shows that the Tx unit Hang is detected and driver will try to reset the adapter repeatly, however, the network is still blocked. <snip> ixgbe: 0000:00:04.0 eth10ï Detected Tx Unit Hang Tx Queue <0> TDH, TDT <1fd>, <5a> next_to_use <5a> next_to_clean <1fc> ixgbe: 0000:00:04.0 eth0: tx hang 11 detected on queue 0, resetting adapter ixgbe: 0000:00:04.0 eth10: Reset adapter ixgbe: 0000:00:04.0 eth10: PCIe transaction pending bit also did not clear ixgbe: 0000:00:04.0 master disable timed out ixgbe: 0000:00:04.0 eth10: detected SFP+: 3 ixgbe: 0000:00:04.0 eth10: NIC Link is Up 10 Gbps, Flow Control: RX/TX ... </snip> I have tried to remove the "reset adpater" call in ixgbe driver's ndo_tx_timeout function, and the logs are shown below. The log shows that when network is blocked, the "TDH" and the nic cannot be incremented any more. <snip> ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang Tx Queue <0> TDH, TDT <1fd>, <5a> next_to_use <5a> next_to_clean <1fc> ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean] time_stamp <1075b74ca> jiffies <1075b791c> ixgbe 0000:00:04.0 eth3: Fake Tx hang detected with timeout of 5 seconds ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang Tx Queue <0> TDH, TDT <1fd>, <5a> next_to_use <5a> next_to_clean <1fc> ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean] time_stamp <1075b74ca> jiffies <1075b7b11> ... </snip> I have also compared the nic's corresponding pci status before and after the network is hung, and found that the "DevSta" filed changed from "TransPend-" to "TransPend+" after the network is blocked: <snip> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend+ </snip> The network can only be recovered after we reload the ixgbe module in driver domain. 2. When using BCM5709 NIC, the results is smiliar. After the network is blocked, the syslog has info below: <snip> bnx2 0000:00:04.0 eth14: <--- start FTQ dump ---> bnx2 0000:00:04.0 eth14: RV2P_PFTQ_CTL 00010000 bnx2 0000:00:04.0 eth14: RV2P_TFTQ_CTL 00020000 ... bnx2 0000:00:04.0 eth14: CP_CPQ_FTQ_CTL 00004000 bnx2 0000:00:04.0 eth14: CPU states: bnx2 0000:00:04.0 eth14: 045000 mode b84c state 80001000 evt_mask 500 pc 8001280 pc 8001288 instr 8e030000 ... bnx2 0000:00:04.0 eth14: 185000 mode b8cc state 80000000 evt_mask 500 pc 8000ca8 pc 8000920 instr 8ca50020 bnx2 0000:00:04.0 eth14: <--- end FTQ dump ---> bnx2 0000:00:04.0 eth14: <--- start TBDC dump ---> ... </snip> The difference of lspci command results before and after the network is hung show that the Status field changed from "MAbort-" to "MAbort+": <snip> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- </snip> The network can not be recovered even after we reload the bnx2 module in driver domain. ---------- openlui Best Regards _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |