[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-users] Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success!
Hi everyone, after following in the footsteps of the following discussion (http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html) I had been able to turn my GTX 480 into a Quadro 6000. When I VT-d passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5 seems to function properly up to a point: lspci -v: 00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL [Quadro 6000] (rev a3) (prog-if 00 [VGA controller]) Subsystem: ASUSTeK Computer Inc. Device 075f Physical Slot: 4 Flags: bus master, fast devsel, latency 0, IRQ 32 Memory at ee000000 (32-bit, non-prefetchable) [size=32M] Memory at e0000000 (64-bit, prefetchable) [size=128M] Memory at e8000000 (64-bit, prefetchable) [size=64M] I/O ports at c100 [size=128] Expansion ROM at f1000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [b4] Vendor Specific Information: Len=14 <?> Kernel driver in use: nvidia 00:05.0 Audio device: NVIDIA Corporation GF100 High Definition Audio Controller (rev a1) Subsystem: ASUSTeK Computer Inc. Device 075f Physical Slot: 5 Flags: bus master, fast devsel, latency 0, IRQ 37 Memory at f1080000 (32-bit, non-prefetchable) [size=16K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Kernel driver in use: snd_hda_intel NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "Quadro 6000" CUDA Driver Version / Runtime Version 6.0 / 5.5 CUDA Capability Major/Minor version number: 2.0 Total amount of global memory: 1536 MBytes (1610285056 bytes) (15) Multiprocessors, ( 32) CUDA Cores/MP: 480 CUDA Cores GPU Clock rate: 1401 MHz (1.40 GHz) Memory Clock rate: 1848 Mhz Memory Bus Width: 384-bit L2 Cache Size: 786432 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 32768 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (65535, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device PCI Bus ID / PCI location ID: 0 / 4 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000 Result = PASS Unfortunately if I try to run any CUDA app or even nvidia-smi afterwards, I get the following errors: NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 10 -> invalid device ordinal Result = FAIL # nvidia-smi Unable to determine the device handle for GPU 0000:00:04.0: The NVIDIA kernel module detected an issue with GPU interrupts.Consult the "Common Problems" Chapter of the NVIDIA Driver README for details and steps that can be taken to resolve this issue. If I restart the VM I can run a single CUDA app again, once. It's still pretty impressive to be able to do that without having to patch Xen or reboot the entire machine =) It doesn't seem to matter what CUDA app I'm running, here is matrixMul for example: matrixMul# ./matrixMul [Matrix Multiply Using CUDA] - Starting... GPU Device 0: "Quadro 6000" with compute capability 2.0 MatrixA(320,320), MatrixB(640,320) Computing result using CUDA Kernel... done Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block Checking computed result for correctness: Result = PASS Note: For peak performance, please refer to the matrixMulCUBLAS example. Anyhoo, does anyone have any idea what might I be able to tweak of avoiding this issue? The setup clearly seems to work for the most part. My domU config: arch = 'x86_64' name = "debian-miner" builder = "hvm" maxmem = 512 memory = 512 vcpus = 1 maxcpus = 1 boot = "cd" pae=1 acpi = 1 apic = 1 hap=1 hpet=1 shadow_memory = 32 on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vnc=1 vncunused=1 vnclisten="0.0.0.0" vif = [ 'type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa'] device_model_version="qemu-xen-traditional" gfx_passthru=0 xen_platform_pci=1 pci = [ '01:00.0', '01:00.1' ] pci_msitranslate = 1 pci_power_mgmt = 1 pci_permissive = 1 xen_extended_power_mgmt = 1 acpi_s3 = 1 acpi_s4 = 1 disk = [ 'phy:/dev/t0vg/debian-testing,xvda,w']; And I'm running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 in the domU. Thanks and cheers! Tamas _______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |