[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] bnx2x DMA mapping errors cause iscsi problems




Hi Malcolm,


Thank you for your answer !

We already tried to tune serveral parameters in order to find a workaround for our concern:

- swiotlb size:

We didn't increase the swiotlb size. We found a similar case found on the Citrix forums ( http://discussions.citrix.com/topic/324343-xenserver-61-bnx2x-sw-iommu/ ): using swiotlb=256 did not help. So we didn't try ourselves. Unfortunately, there is no mention of a solution in that thread, only a patch for the bnx2x driver (Driver Disk for Broadcom bnx2x driver v1.74.22 for XenServer 6.1.0 with Hotfix XS61E018) but I have to verify if it is related to our problem. I have to mention that we have no error messages about "Out of SW-IOMMU space" but this can be due the verbosity of the driver or the kernel.

- disable_tpa=1

this is already the case by disabling LRO (correct ?). Here is the output of ethtool:

root@xen2-pyth:~# ethtool -k eth4
Features for eth4:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-unneeded: off [fixed]
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: on
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on [fixed]
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: on
loopback: off

- reducing the queues.

We reduced the queues to 4 (default was 11). When the problems happened this week, we modified again the parameter dynamically to num_queues=1. We were then able to go on without rebooting the hypervisor. No more messages 'Can't map rx data' till now... but for how long ? Setting the number of queues as low as 1 could have a long term effect ?

I've read the draft you wrote to solve the problem. As far as I understand (because this a very complex for me), this could be the root cause of our problem. But how can we monitor the different parameters (DMA, SW-IOMMU space, ...) when we have this problem to validate this assumption ?

BTW what is the time frame for implementing the proposed solution in your draft ? We run version 4.1.4 of Xen : are there improvements related to this problem in newer versions ?

Regards,

Patrick




_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.