Xen project Mailing List

Re: [Xen-users] Quadrified GTX 480 VT-d passthrough. CUDA 5.5 in Linux partial success

To: Tamas Lengyel <tamas.lengyel@xxxxxxxxxxxx>

From: Gordan Bobic <gordan@xxxxxxxxxx>

Date: Tue, 19 Nov 2013 13:48:55 +0000

Delivery-date: Tue, 19 Nov 2013 13:49:04 +0000

List-id: Xen user discussion <xen-users.lists.xen.org>

Actually - try something simpler first - just unload and reload the nvidia.ko driver, see if that resets the card back into a CUDA-ble state.

On Tue, 19 Nov 2013 13:47:18 +0000, Gordan Bobic <gordan@xxxxxxxxxx>

wrote:

I can't remember how it's all symlinked, but I normally
find it under somewhere like:

/sys/devices/pci0000:00/0000:00:03.0/0000:0b:00.0/0000:0c:02.0/0000:0d:00.0/reset

(the path reflects PCI bridges along the way - yes, I have a card
behind 3 PCIe

bridges on my motherboard (5520->NF200->NF200->GPU) - and that's noteven theGTX690 - that would add at least one more bridge to the path -madness)


If nvidia driver isn't exposing it, you could try unloading the
nvidia driver,
loading the nouveau driver (make sure mode switching is disabled so
it doesn't

get bound into a non-loadable state by the console), issuing a reset(if thatexposes a reset node, which IIRC it does no Fermi+ GPUs), unloadingnouveau,

and reloading nvidia.ko. Then see if it works after that.

Gordan

On Tue, 19 Nov 2013 14:22:48 +0100, Tamas Lengyel
<tamas.lengyel@xxxxxxxxxxxx> wrote:

I don't see reset unfortunately:

ls /sys/module/nvidia/drivers/pci:nvidia/0000:00:04.0
boot_vga  Âd3cold_allowed Âenable  i2c-3 msi_bus Â Ârescan
resource3 Â Â subsystem_device
broken_parity_status  Âdevice  Âfirmware_node Âirq msi_irqs Â
resource  resource3_wc Âsubsystem_vendor
class  Âdma_mask_bits Â i2c-0  local_cpulist numa_node Âresource0
resource5 Â Â uevent
config  Âdriver  Âi2c-1  local_cpus power  Â resource1  rom  Â Â
Â vendor
consistent_dma_mask_bits Âdrm  Âi2c-2  modalias remove  Â
resource1_wc Âsubsystem

On Tue, Nov 19, 2013 at 11:32 AM, Gordan Bobic  wrote:
 Does the nvidia binary driver provide a reset handle for the device
via sysfs?
 If you echo 1 into it, does it help or does it crash things?

 On Tue, 19 Nov 2013 10:32:31 +0100, Tamas Lengyel  wrote:

 Hi everyone,
 after following in the footsteps of the following discussion

Â(http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html

[3]
 [1])

 ÂI had been able to turn my GTX 480 into a Quadro 6000. When I VT-d
 passthrough it to a Debian jessie VM it shows up fine and CUDA 5.5
 Âseems to function properly up to a point:

 lspci -v:

00:04.0 VGA compatible controller: NVIDIA Corporation GF100GL[Quadro

 Â6000] (rev a3) (prog-if 00 [VGA controller])
 Subsystem: ASUSTeK Computer Inc. Device 075f
 ÂPhysical Slot: 4
 Flags: bus master, fast devsel, latency 0, IRQ 32
 ÂMemory at ee000000 (32-bit, non-prefetchable) [size=32M]
 Memory at e0000000 (64-bit, prefetchable) [size=128M]
 ÂMemory at e8000000 (64-bit, prefetchable) [size=64M]
 I/O ports at c100 [size=128]
 ÂExpansion ROM at f1000000 [disabled] [size=512K]
 Capabilities: [60] Power Management version 3
 ÂCapabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
 Capabilities: [78] Express Endpoint, MSI 00
 ÂCapabilities: [b4] Vendor Specific Information: Len=14

 Kernel driver in use: nvidia

00:05.0 Audio device: NVIDIA Corporation GF100 High DefinitionAudio

 ÂController (rev a1)
 Subsystem: ASUSTeK Computer Inc. Device 075f
 ÂPhysical Slot: 5
 Flags: bus master, fast devsel, latency 0, IRQ 37
 ÂMemory at f1080000 (32-bit, non-prefetchable) [size=16K]
 Capabilities: [60] Power Management version 3
 ÂCapabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
 Capabilities: [78] Express Endpoint, MSI 00
 ÂKernel driver in use: snd_hda_intel

 NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery
 Â./deviceQuery Starting...

 ÂCUDA Device Query (Runtime API) version (CUDART static linking)

 Detected 1 CUDA Capable device(s)

 ÂDevice 0: "Quadro 6000"
 Â CUDA Driver Version / Runtime Version Â Â Â Â Â6.0 / 5.5
 ÂÂ CUDA Capability Major/Minor version number: Â Â2.0
 Â Total amount of global memory: Â Â Â Â Â Â Â Â 1536 MBytes
 (1610285056 bytes)
 ÂÂ (15) Multiprocessors, ( 32) CUDA Cores/MP: Â Â 480 CUDA Cores
 Â GPU Clock rate: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
Â1401
 MHz (1.40 GHz)
 ÂÂ Memory Clock rate: Â Â Â Â Â Â Â Â Â Â Â Â Â Â
1848
 Mhz
 Â Memory Bus Width: Â Â Â Â Â Â Â Â Â Â Â Â Â Â
 Â384-bit
 ÂÂ L2 Cache Size: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
 786432 bytes
 Â Maximum Texture Dimension Size (x,y,z) Â Â Â Â 1D=(65536),
 Â2D=(65536, 65535), 3D=(2048, 2048, 2048)
 Â Maximum Layered 1D Texture Size, (num) layers Â1D=(16384), 2048
 layers
 ÂÂ Maximum Layered 2D Texture Size, (num) layers Â2D=(16384,
16384),
 2048 layers
 Â Total amount of constant memory: Â Â Â Â Â Â Â 65536 bytes
 ÂÂ Total amount of shared memory per block: Â Â Â 49152 bytes
 Â Total number of registers available per block: 32768
 ÂÂ Warp size: Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
 32
 Â Maximum number of threads per multiprocessor: Â1536
 ÂÂ Maximum number of threads per block: Â Â Â Â Â 1024
 Â Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
 ÂÂ Max dimension size of a grid size Â Â(x,y,z): (65535, 65535,
 65535)
 Â Maximum memory pitch: Â Â Â Â Â Â Â Â Â Â Â Â
 Â2147483647 [4] bytes
 ÂÂ Texture alignment: Â Â Â Â Â Â Â Â Â Â Â Â Â Â 512
 bytes
 Â Concurrent copy and kernel execution: Â Â Â Â ÂYes with 2
copy
 engine(s)
 ÂÂ Run time limit on kernels: Â Â Â Â Â Â Â Â Â Â No
 Â Integrated GPU sharing Host Memory: Â Â Â Â Â ÂNo
 ÂÂ Support host page-locked memory mapping: Â Â Â Yes
 Â Alignment requirement for Surfaces: Â Â Â Â Â ÂYes
 ÂÂ Device has ECC support: Â Â Â Â Â Â Â Â Â Â Â
 ÂDisabled
 Â Device supports Unified Addressing (UVA): Â Â ÂYes
 ÂÂ Device PCI Bus ID / PCI location ID: Â Â Â Â Â 0 / 4
 Â Compute Mode:
 ÂÂ Â Â< Default (multiple host threads can use ::cudaSetDevice()
 with
 device simultaneously) >

 deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA
 ÂRuntime Version = 5.5, NumDevs = 1, Device0 = Quadro 6000
 Result = PASS

 Unfortunately if I try to run any CUDA app or even nvidia-smi
 Âafterwards, I get the following errors:

 NVIDIA_CUDA-5.5_Samples/1_Utilities/deviceQuery# ./deviceQuery
 Â./deviceQuery Starting...

 ÂCUDA Device Query (Runtime API) version (CUDART static linking)

 cudaGetDeviceCount returned 10
 -> invalid device ordinal
 ÂResult = FAIL

 # nvidia-smi
 ÂUnable to determine the device handle for GPU 0000:00:04.0: The
 NVIDIA
 kernel module detected an issue with GPU interrupts.Consult the
 Â"Common Problems" Chapter of the NVIDIA Driver README for
 details and steps that can be taken to resolve this issue.

 If I restart the VM I can run a single CUDA app again, once. It's
 Âstill pretty impressive to be able to do that without having to
patch
 Xen or reboot the entire machine =)ÂIt doesn't seem to matter what
 CUDA app I'm running, here is matrixMul
 Âfor example:

 matrixMul# ./matrixMul
 Â[Matrix Multiply Using CUDA] - Starting...
 GPU Device 0: "Quadro 6000" with compute capability 2.0

 MatrixA(320,320), MatrixB(640,320)
 Computing result using CUDA Kernel...
 Âdone
 Performance= 227.22 GFlop/s, Time= 0.577 msec, Size= 131072000 Ops,
 ÂWorkgroupSize= 1024 threads/block
 Checking computed result for correctness: Result = PASS

 Note: For peak performance, please refer to the matrixMulCUBLAS
 example.

Anyhoo, does anyone have any idea what might I be able to tweak soI

 can
 Âavoid this issue? The setup clearly seems to work for the most
 part.

 My domU config:

 Âarch = 'x86_64'
 name = "debian-miner"
 Âbuilder = "hvm"
 maxmem = 512
 Âmemory = 512
 vcpus = 1
 Âmaxcpus = 1
 boot = "cd"
 Âpae=1
 acpi = 1
 Âapic = 1
 hap=1
 Âhpet=1
 shadow_memory = 32
 Âon_poweroff = "destroy"
 on_reboot = "restart"
 Âon_crash = "restart"
 vnc=1
 Âvncunused=1
 vnclisten="0.0.0.0"
 Âvif = [ 'type=netfront,bridge=xenbr0,mac=00:16:3e:12:c3:fa']
 Âdevice_model_version="qemu-xen-traditional"
 Âgfx_passthru=0
 xen_platform_pci=1
 Âpci Â= [ '01:00.0', '01:00.1' ]
 pci_msitranslate = 1
 Âpci_power_mgmt = 1
 pci_permissive = 1
 Âxen_extended_power_mgmt = 1
 acpi_s3 = 1
 Âacpi_s4 = 1
 disk = [ Â Â Â Â'phy:/dev/t0vg/debian-testing,xvda,w'];

And I'm running on Xen 4.3.1 with NVIDIA driver 331.20 x86_64 inthe

 domU.

 Thanks and cheers!

 Links:
 ------
 [1]

http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html

[5]



Links:
------
[1] mailto:gordan@xxxxxxxxxx
[2] mailto:tamas.lengyel@xxxxxxxxxxxx
[3]

http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html
[4] http://mail.shatteredsilicon.net/tel:2147483647
[5]

http://lists.xenproject.org/archives/html/xen-users/2013-09/msg00106.html

_______________________________________________ Xen-users mailing list Xen-users@xxxxxxxxxxxxx http://lists.xen.org/xen-users

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.