[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Network blocked after sending several packets larger than 128 bytes when using Driver Domain



At 2015-03-20 00:48:01, "Zoltan Kiss" <zoltan.kiss@xxxxxxxxxx> wrote:
>
>
>On 19/03/15 03:40, openlui wrote:
>> Hi, all:
>>
>> I am trying to use a HVM with PCI pass-through NIC as network driver domain. However, when I send packets whose size are larger than 128 bytes from DomU using pkt-gen tools, after several seconds, the network between driver domain and destination host will be blocked.
>>
>> The networking structure when testing is shown below:
>> Pkt-gen (in DomU) <--> Virtual Eth (in DomU) <---> VIF (in Driver Domain) <--> OVS (in Driver Domain) <--> pNIC (passthrough nic in Driver Domain) <---> Another Host
>> 	
>> The summarized results are as follows:
>> 1. When we just ping from DomU to another host, the network seems ok.
>> 2. When sending 64 or 128 bytes UDP packets from DomU, the network will not be blocked
>> 3. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and if the scatter-gather feature of passthrough NIC in driver domain is on, the network will be blocked
>> 4. When sending 256, 1024 or 1400 bytes UDP packets from DomU, and only if the scatter-gather feature of passthrough NIC in driver domain is off, the network will not be blocked
>>
>> As shown in detailed syslog below, when network is blocked, it seems that the passthrough NIC's driver entry an exception state and the tx queue is hung.
>> As far as I know, when sending 64 or 128 bytes package, the skb generated by netback only has the linearized data, and the data is stored in the PAGE allocated from the driver domain's memory. But for packets whose size is larger than 128 bytes, the skb will also has a frag page which is grant mapped from DomU's memory. And if we disable the scatter-gather feature of NIC, the skb sent from netback will be linearized firstly, and it will make the skb's data is stored in the PAGE allocated from the driver domain other than the DomU's memory.
>Yes, you are correct: the first slot (at most 128 bytes from it) is 
>grant copied to a locally allocated skb, whilst the rest is grant mapped 
>from the guest's memory in this case.
>>
>> I am wondering if it is the problem caused by PCI-passthrough and DMA operations, or if there is some wrong configuration in our environment. How can I continue to debug this problem? I am looking forward to your replay and advice, Thanks.
>>
>> The environment we used are as follows:
>> a. Dom0: SUSE 12 (kernel: 3.12.28)
>> b. XEN: 4.4.1_0602.2 (provided by SUSE 12)
>> c. DomU: kernel 3.17.4
>> d. Driver Domain: kernel 3.17.8
>I would try out an upstream kernel, there were some grant mapping 
>changes recently, maybe that solves your issue.
>Also, have you set the kernel's loglevel to DEBUG?
>ixgbe also has a modul parameter to enable further logging.

Thanks for your advice. 
I have tried the xen 4.4.2 and xen 4.5, and found that under xen 4.5, the problem is solved.Then after bisecting, I find it is the commit 203746bc36b41443d0eec78819f153fb59bc68d1 ([1]) which solves this problem. 
After learning the patch of this commit, I find that this commit will not only map iommu pages whose p2m types are p2m_ram_rw but also other type pages when the hap_ept page table is not shared.

I have some other questions as follows about this commit:
1. About the hap_ept page table share, does it mean that the page table used by ept and IOMMU is shared? And does it need the hardware support "Large Intel VT-d Pages" features? ([2])
2. What is the meaning and difference among the p2m types (p2m_ram_**, p2m_grant_map_**, p2m_ram_logdirty and p2m_map_foreign)  listed in the commit above? Is there some documents about it?

Our hosts under testing does have no support for "Intel VT-d Shared EPT tables", which can be confirmed from the "xl dmesg" output in Dom0. I will try to find hosts which support this feature these days, and do the testing and look if there is similar problem with the hap_ept_pt_share enabled.

But it seems that at least for 4.4.2 version, this problem does exist and maybe the commit above needed to be merged into 4.4.2 version?

>> e. OVS: 2.1.2
>> f. Host: Huawei RH2288, CPU Intel Xenon E5645@xxxxxxx, disabled HyperThread, enabled VT-d >> g. pNIC: we tried Intel 82599 10GE NIC (ixgbe v3.23.2), Intel 82576 1GE NIC (igb) and Broadcom NetXtreme II BCM 5709 1GE NIC (bnx2 v2.2.5) >> h. para-virtulization driver: netfront/netback >> i. MTU: 1500 >> >> The detailed Logs in Driver Domain after the network is blocked are as follows: >> 1. When using 82599 10GE NIC, syslog and dmesg includes infos below. The log shows that the Tx unit Hang is detected and driver will try to reset the adapter repeatly, however, the network is still blocked. >> >> <snip> >> ixgbe: 0000:00:04.0 eth10: Detected Tx Unit Hang >> Tx Queue <0> >> TDH, TDT <1fd>, <5a> >> next_to_use <5a> >> next_to_clean <1fc> >> ixgbe: 0000:00:04.0 eth0: tx hang 11 detected on queue 0, resetting adapter >> ixgbe: 0000:00:04.0 eth10: Reset adapter >> ixgbe: 0000:00:04.0 eth10: PCIe transaction pending bit also did not clear >> ixgbe: 0000:00:04.0 master disable timed out >> ixgbe: 0000:00:04.0 eth10: detected SFP+: 3 >> ixgbe: 0000:00:04.0 eth10: NIC Link is Up 10 Gbps, Flow Control: RX/TX >> ... >> </snip> >> >> I have tried to remove the "reset adpater" call in ixgbe driver's ndo_tx_timeout function, and the logs are shown below. The log shows that when network is blocked, the "TDH" and the nic cannot be incremented any more. >> >> <snip> >> ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang >> Tx Queue <0> >> TDH, TDT <1fd>, <5a> >> next_to_use <5a> >> next_to_clean <1fc> >> ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean] >> time_stamp <1075b74ca> >> jiffies <1075b791c> >> ixgbe 0000:00:04.0 eth3: Fake Tx hang detected with timeout of 5 seconds >> ixgbe 0000:00:04.0 eth3: Detected Tx Unit Hang >> Tx Queue <0> >> TDH, TDT <1fd>, <5a> >> next_to_use <5a> >> next_to_clean <1fc> >> ixgbe 0000:00:04.0 eth3: tx_buffer_info[next_to_clean] >> time_stamp <1075b74ca> >> jiffies <1075b7b11> >> ... >> </snip> >> >> I have also compared the nic's corresponding pci status before and after the network is hung, and found that the "DevSta" filed changed from "TransPend-" to "TransPend+" after the network is blocked: >> >> <snip> >> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend+ >> </snip> >> >> The network can only be recovered after we reload the ixgbe module in driver domain. >> >> 2. When using BCM5709 NIC, the results is smiliar. After the network is blocked, the syslog has info below: >> >> <snip> >> bnx2 0000:00:04.0 eth14: <--- start FTQ dump ---> >> bnx2 0000:00:04.0 eth14: RV2P_PFTQ_CTL 00010000 >> bnx2 0000:00:04.0 eth14: RV2P_TFTQ_CTL 00020000 >> ... >> bnx2 0000:00:04.0 eth14: CP_CPQ_FTQ_CTL 00004000 >> bnx2 0000:00:04.0 eth14: CPU states: >> bnx2 0000:00:04.0 eth14: 045000 mode b84c state 80001000 evt_mask 500 pc 8001280 pc 8001288 instr 8e030000 >> ... >> bnx2 0000:00:04.0 eth14: 185000 mode b8cc state 80000000 evt_mask 500 pc 8000ca8 pc 8000920 instr 8ca50020 >> bnx2 0000:00:04.0 eth14: <--- end FTQ dump ---> >> bnx2 0000:00:04.0 eth14: <--- start TBDC dump ---> >> ... >> </snip> >> >> The difference of lspci command results before and after the network is hung show that the Status field changed from "MAbort-" to "MAbort+": >> >> <snip> >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx- >> </snip> >> >> The network can not be recovered even after we reload the bnx2 module in >> driver domain. >> >> ---------- >> openlui >> Best Regards >> >> >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@xxxxxxxxxxxxx >> http://lists.xen.org/xen-devel >> > >_______________________________________________ >Xen-devel mailing list >Xen-devel@xxxxxxxxxxxxx >http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.