[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-users] Mellanox SR-IOV IB PCI passthrough in Xen - MSI-X pciback issue


  • To: xen-users@xxxxxxxxxxxxx
  • From: Andrew J Younge <ayounge@xxxxxxx>
  • Date: Wed, 19 Jun 2013 14:21:44 -0400
  • Delivery-date: Wed, 19 Jun 2013 18:22:53 +0000
  • List-id: Xen user discussion <xen-users.lists.xen.org>

Does anybody in the Xen community have any experience with the
xen-pciback driver and MSI-X??

Thanks,

Andrew

On 6/10/13 12:21 PM, Andrew J Younge wrote:
> Greetings Xen user community,
> 
> I am interested in using Mellanox ConnectX cards with SR-IOV capabilities to 
> passthrough pci-e Virtual Functions (VFs) to Xen guests. The hope is to allow 
> for the use of InfiniBand directly within virtual machines and thereby enable 
> a plethora of high performance computing applications that already leverage 
> InfiniBand interconnects. However, I have run into some issues using the 
> xen-pciback driver and its initialization of MSI-X as required for VFs in 
> Xen.  The hardware used is Mellanox Connect X3 MT27500 VPI pci-express cards 
> set up in InfiniBand mode in HP blades with Intel Xeon E5-2670 CPUs and 42GB 
> of memory.  SR-IOV is enabled in the system BIOS along with VT-X, and of 
> course VT-d. 
> 
> This system is a RHEL/CENTOS 6.4 x86_64 Dom0 running a  3.9.3-1 kernel with 
> Xen 4.1.2 installed and intel_iommu enabled in the kernel.  The advantage of 
> this kernel is the built-in mlx4_core/en/ib kernel modules which support 
> SR-IOV added in versions 3.5 and above. The basic OFED drivers provided by 
> Mellanox do not compile with a custom Dom0 kernel (even the 2.0-beta OFED 
> drivers), so a 3.5 or newer linux kernel is necessary. I updated the firmware 
> on the ConnectX3 provided by Mellanox (2.11.500) to enable SR-IOV in the 
> firmware. Using this setup I am able to enable up to 64 VFs in InfiniBand 
> mode ( modprobe mlx4_core num_vfs=8 port_type_array=1,1 msi_x=1) within a Xen 
> Dom0 kernel.
> 
> 21:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
>       Subsystem: Hewlett-Packard Company Device 18d6
>       Physical Slot: 4
>       Flags: bus master, fast devsel, latency 0, IRQ 50
>       Memory at fbf00000 (64-bit, non-prefetchable) [size=1M]
>       Memory at fb000000 (64-bit, prefetchable) [size=8M]
>       Capabilities: [40] Power Management version 3
>       Capabilities: [48] Vital Product Data
>       Capabilities: [9c] MSI-X: Enable+ Count=128 Masked-
>       Capabilities: [60] Express Endpoint, MSI 00
>       Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
>       Capabilities: [148] Device Serial Number 00-02-c9-03-00-f6-ef-f0
>       Capabilities: [108] Single Root I/O Virtualization (SR-IOV)
>       Capabilities: [154] Advanced Error Reporting
>       Capabilities: [18c] #19
>       Kernel driver in use: mlx4_core
>       Kernel modules: mlx4_core
> 
> 21:00.1 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 
> Virtual Function]
>       Subsystem: Hewlett-Packard Company Device 61b0
>       Physical Slot: 4
>       Flags: fast devsel
>       [virtual] Memory at db000000 (64-bit, prefetchable) [size=8M]
>       Capabilities: [60] Express Endpoint, MSI 00
>       Capabilities: [9c] MSI-X: Enable- Count=4 Masked-
>       Kernel modules: mlx4_core
> … up to as many VFs as enabled (in my case 8). 
> 
> I am able to load the xen-pciback kernel module and hide one of the VFs, and 
> then start a Centos6.3 HVM VM with pci-passthrough enabled on one of the VFs 
> (pci = [ '21:00.5' ] in the .hvm config file). The VM itself sees the VF as 
> the xen-pciback module translates the VF to 00:05:0 int he guest as expected:
> 
> 00:05.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3 
> Virtual Function]
>       Subsystem: Hewlett-Packard Company Device 61b0
>       Physical Slot: 5
>       Flags: fast devsel
>       Memory at f3000000 (64-bit, prefetchable) [size=8M]
>       Capabilities: [60] Express Endpoint, MSI 00
>       Capabilities: [9c] MSI-X: Enable- Count=4 Masked-
>       Kernel modules: mlx4_core
> 
> With the VM using a generic 2.6.32 Centos 6-3 kernel, I installed the MLNX 
> 2.0-beta drivers (they actually compile with standard rhel kernel). The 
> problem is when I modprobe mlx4_core, I get the following error in the VM:
> 
> mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
> mlx4_core: Initializing 0000:00:05.0
> mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode
> mlx4_core 0000:00:05.0: Sending reset
> mlx4_core 0000:00:05.0: Sending vhcr0
> mlx4_core 0000:00:05.0: HCA minimum page size:512
> mlx4_core 0000:00:05.0: irq 48 for MSI/MSI-X
> mlx4_core 0000:00:05.0: irq 49 for MSI/MSI-X
> mlx4_core 0000:00:05.0: failed execution of VHCR_POST commandopcode 0x31
> mlx4_core 0000:00:05.0: NOP command failed to generate MSI-X interrupt IRQ 
> 49).
> mlx4_core 0000:00:05.0: Trying again without MSI-X.
> mlx4_core: probe of 0000:00:05.0 failed with error -16
> 
> Clearly, the kernel module is not happy with MSI-X.  If I try to specify 
> modprobe mlx4_core msi_x=0 (turning msi off in the VM VF), I get an error 
> saying VFs aren't supported without MSI-X: 
> 
> mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)
> mlx4_core: Initializing 0000:00:05.0
> mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode
> mlx4_core 0000:00:05.0: Sending reset
> mlx4_core 0000:00:05.0: Sending vhcr0
> mlx4_core 0000:00:05.0: HCA minimum page size:512
> mlx4_core 0000:00:05.0: INTx is not supported in multi-function mode. 
> aborting.
> 
> Apparently it is necessary to have MSI-X working in order to use the VFs for 
> the Mellanox Connect X3 card (not surprising). Looking back into the Dom0 
> dmesg, it seems the lack of MSI-X support is actually an error on the 
> xen-pciback module: 
> 
> pciback 0000:21:00.5: seizing device
> pciback 0000:21:00.5: enabling device (0000 -> 0002)
> pciback 0000:21:00.5: MSI-X preparation failed (-38)
> xen-pciback: backend is vpci
> 
> I've explicitly made sure the mlx4_core module on Dom0 has MSI-X enabled on 
> the PF to rule-out that potential problem (via modprobe). It seems the main 
> problem is the xen-pciback method does not know how to properly set up MSI-X 
> for the Mellanox ConnectX3 InfiniBand card. To be explicit, I'm running a 
> fairly recent Xen installation (4.1.2) with new Sandy Bridge hardware and a 
> very recent linux kernel (3.9).
> 
> [root@hp6 xen_tests]# uname -a
> Linux hp6 3.9.3-1.el6xen.x86_64 #1 SMP Tue May 21 11:55:32 EST 2013 x86_64 
> x86_64 x86_64 GNU/Linux
> [root@hp6 xen_tests]# xm info
> host                   : hp6
> release                : 3.9.3-1.el6xen.x86_64
> version                : #1 SMP Tue May 21 11:55:32 EST 2013
> machine                : x86_64
> nr_cpus                : 32
> nr_nodes               : 2
> cores_per_socket       : 8
> threads_per_core       : 2
> cpu_mhz                : 2593
> hw_caps                : 
> bfebfbff:2c000800:00000000:00003f40:13bee3ff:00000000:00000001:00000000
> virt_caps              : hvm hvm_directio
> total_memory           : 49117
> free_memory            : 8306
> free_cpus              : 0
> xen_major              : 4
> xen_minor              : 1
> xen_extra              : .2
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32 
> hvm-3.0-x86_32p hvm-3.0-x86_64 
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          : unavailable
> xen_commandline        : 
> cc_compiler            : gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC) 
> cc_compile_by          : mockbuild
> cc_compile_domain      : 
> cc_compile_date        : Fri Jun 15 17:40:35 EDT 2012
> xend_config_format     : 4
> [root@hp6 xen_tests]# dmesg | grep "Command line"
> Command line: ro root=/dev/mapper/vg_hp6-lv_root nomodeset rd_NO_LUKS 
> LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto 
> rd_LVM_LV=vg_hp6/lv_swap  KEYBOARDTYPE=pc KEYTABLE=us 
> rd_LVM_LV=vg_hp6/lv_root rd_NO_DM rdblacklist=nouveau nouveau.modeset=0  
> intel_iommu=on
> 
> In this current state, I am currently at an impasse in getting SR-IOV 
> InfiniBand working within Xen. Does anyone here in the Xen community have a 
> possible solution to this problem?  Is there a patch or custom version of Xen 
> I haven't found but need to try? I've done a whole lot of searching but 
> turned up nothing that helps thus far. Is this an instance where these 
> pci-quirks are used (and if so, how), or is that only for PV guests?  Has 
> anyone else have a working solution for enabling pci-passthrough of Mellanox 
> IB SR-IOV VFs in Xen VMs? I know this is possible in KVM but I'd like to 
> avoid that route at all costs obviously.  I hope I am close to getting 
> InfiniBand working with Xen. Any help would be greatly appreciated, as this 
> success could enable a whole new set of use cases for Xen related to high 
> performance computing.
> 
> Regards,
> 
> Andrew
> 
> 
> --
> Andrew J. Younge
> Information Sciences Institute 
> University of Southern California
> 

-- 
Andrew J. Younge
Information Sciences Institute
University of Southern California

_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxx
http://lists.xen.org/xen-users


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.