[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Xen-devel] pci-passthrough loses msi-x interrupts ability after domain destroy


  • To: xen-devel@xxxxxxxxxxxxxxxxxxxx
  • From: Jérôme Oufella <jerome.oufella@xxxxxxxxxxxxxxxxxxxx>
  • Date: Wed, 20 Sep 2017 15:50:35 -0400 (EDT)
  • Delivery-date: Wed, 20 Sep 2017 19:50:44 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xen.org>
  • Thread-index: Rxt8us53fLhgBgf3LU2G4XGaiHbCNQ==
  • Thread-topic: pci-passthrough loses msi-x interrupts ability after domain destroy

Hi Xen-devel, 

I'm using PCI pass-through to map a PCIe (intel i210) controller into 
a HVM domain. The system uses xen-pciback to hide the appropriate PCI 
device from Dom0. 

When creating the HVM domain after an hypervisor cold boot, the HVM 
domain can access and use the PCIe controller without problem. 

However, if the HVM domain is destroyed then restarted, it won't be 
able to use the pass-through PCI device anymore. The PCI device is 
seen and can be mapped, however, the interrupts will not be passed to 
the HVM domain anymore (this is visible under a Linux guest as 
/proc/interrupts counters remain 0). The behavior on a Windows10 guest 
is the same. 

A few interesting hints I noticed: 

- On Dom0, 'lspci -vv' on that PCIe device between the "working" and 
the "muted interrupts" states, I noted a difference between the 
MSI-X caps: 

- Capabilities: [70] MSI-X: Enable- Count=5 Masked- <-- IRQs will work if 
domain started 
+ Capabilities: [70] MSI-X: Enable- Count=5 Masked+ <-- IRQs won't work if 
domain started
                                            ^^^^^^^

- When the HVM OS is Linux, rmmod'ing the i210 (igb) driver from 
inside that domain before destroying the domain provides a way to 
keep the device working during the next destroy/create cycle: in the 
lspci view above, the MSI-X caps will not appear as 'Masked+' if the 
driver was unloaded prior to destroy. 

- However, if the domain was destroyed without that precaution, I 
found no way to bring it back to a working state. 

I tried a few methods without success: 

- Removing / rescanning the device from the PCI bus in Dom0. 
- echo 1 >reset in the device's PCI sysfs 

Am I missing something, or is there something I can try to 
troubleshoot this? Any hint will be helpful.

Best,
Jerome



Setup uses the following: 

- Xen 4.8.1 
- Linux 4.8 [ xen-pciback.hide=(07:00.0) ] // 
- iommu is enabled on Core i5-5350U 

- The domain config file: 

---snip--- 
builder = 'hvm' 
memory = 4096 
vcpus = 2 
name = "LiveCD" 

disk = [ 'file:/data/ubuntu.iso,xvdc:cdrom,r', 'format=raw, vdev=hdb, 
access=rw, backendtype=qdisk, target=/dev/sda5' ] 
boot = "c" 
acpi = 1 
device_model_version = "qemu-xen" 
sdl = 0 
vnc = 1 
vnclisten = '10.0.0.1:0' 

# i210 pass-through 
pci = ['07:00.0'] 

usb = 1 
usbdevice = ['tablet'] 
-----snap------ 

- xl dmesg (loglvl=debug): 

(XEN) Xen version 4.8.1 (@) (gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3) debug=n 
Tue Sep 19 17:22:36 UTC 2017 
(XEN) Latest ChangeSet: 
(XEN) Bootloader: GRUB 2.00 
(XEN) Command line: loglvl=debug dom0_mem=4096M,max:4096M dom0_max_vcpus=2 
(XEN) Video information: 
(XEN) VGA is text mode 80x25, font 8x16 
(XEN) VBE/DDC methods: V2; EDID transfer time: 1 seconds 
(XEN) Disc information: 
(XEN) Found 2 MBR signatures 
(XEN) Found 3 EDD information structures 
(XEN) Xen-e820 RAM map: 
(XEN) 0000000000000000 - 000000000009d800 (usable) 
(XEN) 000000000009d800 - 00000000000a0000 (reserved) 
(XEN) 00000000000e0000 - 0000000000100000 (reserved) 
(XEN) 0000000000100000 - 00000000d80bb000 (usable) 
(XEN) 00000000d80bb000 - 00000000d83f9000 (reserved) 
(XEN) 00000000d83f9000 - 00000000dc364000 (usable) 
(XEN) 00000000dc364000 - 00000000dc3c4000 (reserved) 
(XEN) 00000000dc3c4000 - 00000000dc5b4000 (usable) 
(XEN) 00000000dc5b4000 - 00000000dcd39000 (ACPI NVS) 
(XEN) 00000000dcd39000 - 00000000dcfff000 (reserved) 
(XEN) 00000000dcfff000 - 00000000dd000000 (usable) 
(XEN) 00000000dd800000 - 00000000e0000000 (reserved) 
(XEN) 00000000f8000000 - 00000000fc000000 (reserved) 
(XEN) 00000000fec00000 - 00000000fec01000 (reserved) 
(XEN) 00000000fed00000 - 00000000fed04000 (reserved) 
(XEN) 00000000fed1c000 - 00000000fed20000 (reserved) 
(XEN) 00000000fee00000 - 00000000fee01000 (reserved) 
(XEN) 00000000ff000000 - 0000000100000000 (reserved) 
(XEN) 0000000100000000 - 000000041e000000 (usable) 
(XEN) ACPI: RSDP 000F0580, 0024 (r2 ALASKA) 
(XEN) ACPI: XSDT DCCFA090, 00A4 (r1 ALASKA A M I 1072009 AMI 10013) 
(XEN) ACPI: FACP DCD10478, 010C (r5 ALASKA A M I 1072009 AMI 10013) 
(XEN) ACPI: DSDT DCCFA1D0, 162A8 (r2 ALASKA A M I 1072009 INTL 20120913) 
(XEN) ACPI: FACS DCD37F80, 0040 
(XEN) ACPI: APIC DCD10588, 0084 (r3 ALASKA A M I 1072009 AMI 10013) 
(XEN) ACPI: FPDT DCD10610, 0044 (r1 ALASKA A M I 1072009 AMI 10013) 
(XEN) ACPI: FIDT DCD10658, 009C (r1 ALASKA A M I 1072009 AMI 10013) 
(XEN) ACPI: MCFG DCD106F8, 003C (r1 ALASKA A M I 1072009 MSFT 97) 
(XEN) ACPI: HPET DCD10738, 0038 (r1 ALASKA A M I 1072009 AMI. 5) 
(XEN) ACPI: SSDT DCD10770, 0315 (r1 SataRe SataTabl 1000 INTL 20120913) 
(XEN) ACPI: UEFI DCD10A88, 0042 (r1 0 0) 
(XEN) ACPI: SSDT DCD10AD0, 08F4 (r2 Ther_R Ther_Rvp 1000 INTL 20120913) 
(XEN) ACPI: ASF! DCD113C8, 00A0 (r32 INTEL HCG 1 TFSM F4240) 
(XEN) ACPI: TCPA DCD11468, 0032 (r2 ALASKA NAPAASF 1 MSFT 1000013) 
(XEN) ACPI: SSDT DCD114A0, 0518 (r2 PmRef Cpu0Ist 3000 INTL 20120913) 
(XEN) ACPI: SSDT DCD119B8, 0B74 (r2 CpuRef CpuSsdt 3000 INTL 20120913) 
(XEN) ACPI: SSDT DCD12530, 5CF6 (r2 SaSsdt SaSsdt 3000 INTL 20120913) 
(XEN) ACPI: DMAR DCD18228, 00F8 (r1 INTEL BDW 1 INTL 1) 
(XEN) ACPI: CSRT DCD18320, 00C4 (r1 INTL BDW-ULT 1 INTL 20100528) 
(XEN) System RAM: 16289MB (16680656kB) 
(XEN) No NUMA configuration found 
(XEN) Faking a node at 0000000000000000-000000041e000000 
(XEN) Domain heap initialised 
(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 61 (0x3d), Stepping 4 (raw 
000306d4) 
(XEN) found SMP MP-table at 000fd8e0 
(XEN) DMI 2.8 present. 
(XEN) Using APIC driver default 
(XEN) ACPI: PM-Timer IO Port: 0x1808 (32 bits) 
(XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0] 
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1804,1:0], pm1x_evt[1:1800,1:0] 
(XEN) ACPI: 32/64X FACS address mismatch in FADT - dcd37f80/0000000000000000, 
using 32 
(XEN) ACPI: wakeup_vec[dcd37f8c], vec_size[20] 
(XEN) ACPI: Local APIC address 0xfee00000 
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) 
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) 
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled) 
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) 
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] dfl res lint[0x44]) 
(XEN) ACPI: NMI not connected to LINT 1! 
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0]) 
(XEN) ACPI: NMI not connected to LINT 1! 
(XEN) ACPI: LAPIC_NMI (acpi_id[0x03] low dfl lint[0xc3]) 
(XEN) ACPI: NMI not connected to LINT 1! 
(XEN) ACPI: LAPIC_NMI (acpi_id[0x04] dfl res lint[0x8]) 
(XEN) ACPI: NMI not connected to LINT 1! 
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) 
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-39 
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) 
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) 
(XEN) ACPI: IRQ0 used by override. 
(XEN) ACPI: IRQ2 used by override. 
(XEN) ACPI: IRQ9 used by override. 
(XEN) Enabling APIC mode: Flat. Using 1 I/O APICs 
(XEN) ACPI: HPET id: 0x8086a701 base: 0xfed00000 
(XEN) ERST table was not found 
(XEN) Using ACPI (MADT) for SMP configuration information 
(XEN) SMP: Allowing 4 CPUs (0 hotplug CPUs) 
(XEN) IRQ limits: 40 GSI, 744 MSI/MSI-X 
(XEN) Not enabling x2APIC (upon firmware request) 
(XEN) xstate: size: 0x340 and states: 0x7 
(XEN) Thermal monitoring handled by SMI 
(XEN) Intel machine check reporting enabled 
(XEN) Using scheduler: SMP Credit Scheduler (credit) 
(XEN) Platform timer is 14.318MHz HPET 
(XEN) Detected 1795.844 MHz processor. 
(XEN) Initing memory sharing. 
(XEN) alt table ffff82d0802bef60 -> ffff82d0802c06a0 
(XEN) spurious 8259A interrupt: IRQ7. 
(XEN) PCI: MCFG configuration 0: base f8000000 segment 0000 buses 00 - 3f 
(XEN) PCI: MCFG area at f8000000 reserved in E820 
(XEN) PCI: Using MCFG for segment 0000 bus 00-3f 
(XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB. 
(XEN) Intel VT-d iommu 1 supported page sizes: 4kB, 2MB, 1GB. 
(XEN) Intel VT-d Snoop Control not enabled. 
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled. 
(XEN) Intel VT-d Queued Invalidation enabled. 
(XEN) Intel VT-d Interrupt Remapping enabled. 
(XEN) Intel VT-d Posted Interrupt not enabled. 
(XEN) Intel VT-d Shared EPT tables enabled. 
(XEN) I/O virtualisation enabled 
(XEN) - Dom0 mode: Relaxed 
(XEN) Interrupt remapping enabled 
(XEN) nr_sockets: 1 
(XEN) Enabled directed EOI with ioapic_ack_old on! 
(XEN) ENABLING IO-APIC IRQs 
(XEN) -> Using old ACK method 
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=0 pin2=0 
(XEN) TSC deadline timer enabled 
(XEN) Allocated console ring of 32 KiB. 
(XEN) mwait-idle: MWAIT substates: 0x11142120 
(XEN) mwait-idle: v0.4.1 model 0x3d 
(XEN) mwait-idle: lapic_timer_reliable_states 0xffffffff 
(XEN) mwait-idle: max C-state count of 8 reached 
(XEN) VMX: Supported advanced features: 
(XEN) - APIC MMIO access virtualisation 
(XEN) - APIC TPR shadow 
(XEN) - Extended Page Tables (EPT) 
(XEN) - Virtual-Processor Identifiers (VPID) 
(XEN) - Virtual NMI 
(XEN) - MSR direct-access bitmap 
(XEN) - Unrestricted Guest 
(XEN) - VMCS shadowing 
(XEN) - VM Functions 
(XEN) - Virtualisation Exceptions 
(XEN) HVM: ASIDs enabled. 
(XEN) HVM: VMX enabled 
(XEN) HVM: Hardware Assisted Paging (HAP) detected 
(XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB 
(XEN) [VT-D]INTR-REMAP: Request device [0000:f0:1f.0] fault index 0, iommu reg 
= ffff82c000203000 
(XEN) [VT-D]INTR-REMAP: reason 25 - Blocked a compatibility format interrupt 
request 
(XEN) mwait-idle: max C-state count of 8 reached 
(XEN) mwait-idle: max C-state count of 8 reached 
(XEN) mwait-idle: max C-state count of 8 reached 
(XEN) Brought up 4 CPUs 
(XEN) build-id: e119032a1c69cee07ab82491a5eab6892747eac4 
(XEN) ACPI sleep modes: S3 
(XEN) VPMU: disabled 
(XEN) mcheck_poll: Machine check polling timer started. 
(XEN) Dom0 has maximum 424 PIRQs 
(XEN) NX (Execute Disable) protection active 
(XEN) *** LOADING DOMAIN 0 *** 
(XEN) Xen kernel: 64-bit, lsb, compat32 
(XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x1c63000 
(XEN) PHYSICAL MEMORY ARRANGEMENT: 
(XEN) Dom0 alloc.: 000000040e000000->0000000410000000 (1040384 pages to be 
allocated) 
(XEN) VIRTUAL MEMORY ARRANGEMENT: 
(XEN) Loaded kernel: ffffffff81000000->ffffffff81c63000 
(XEN) Init. ramdisk: 0000000000000000->0000000000000000 
(XEN) Phys-Mach map: 0000008000000000->0000008000800000 
(XEN) Start info: ffffffff81c63000->ffffffff81c634b4 
(XEN) Page tables: ffffffff81c64000->ffffffff81c77000 
(XEN) Boot stack: ffffffff81c77000->ffffffff81c78000 
(XEN) TOTAL: ffffffff80000000->ffffffff82000000 
(XEN) ENTRY ADDRESS: ffffffff8189c180 
(XEN) Dom0 has maximum 2 VCPUs 
(XEN) Bogus DMIBAR 0xfed18001 on 0000:00:00.0 
(XEN) Scrubbing Free RAM on 1 nodes using 2 CPUs 
(XEN) ..................................................................done. 
(XEN) Initial low memory virq threshold set at 0x4000 pages. 
(XEN) Std. Loglevel: All 
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings) 
(XEN) Xen is relinquishing VGA console. 
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to 
Xen) 
(XEN) Freed 316kB init memory 
(XEN) Bogus DMIBAR 0xfed18001 on 0000:00:00.0 
(XEN) PCI add device 0000:00:00.0 
(XEN) PCI add device 0000:00:02.0 
(XEN) PCI add device 0000:00:03.0 
(XEN) PCI add device 0000:00:14.0 
(XEN) PCI add device 0000:00:16.0 
(XEN) PCI add device 0000:00:19.0 
(XEN) PCI add device 0000:00:1b.0 
(XEN) PCI add device 0000:00:1c.0 
(XEN) PCI add device 0000:00:1c.1 
(XEN) PCI add device 0000:00:1c.2 
(XEN) PCI add device 0000:00:1c.3 
(XEN) PCI add device 0000:00:1d.0 
(XEN) PCI add device 0000:00:1f.0 
(XEN) PCI add device 0000:00:1f.2 
(XEN) PCI add device 0000:00:1f.3 
(XEN) PCI add device 0000:01:00.0 
(XEN) PCI add device 0000:02:01.0 
(XEN) PCI add device 0000:02:02.0 
(XEN) PCI add device 0000:02:03.0 
(XEN) PCI add device 0000:04:00.0 
(XEN) PCI add device 0000:05:00.0 
(XEN) PCI add device 0000:06:00.0 
(XEN) PCI add device 0000:07:00.0 


- Device's lspci -vv:

07:00.0 Class 0200: Device 8086:1537 (rev 03)
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- 
<MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at f5c00000 (32-bit, non-prefetchable) [disabled] 
[size=512K]
        Region 2: I/O ports at c000 [disabled] [size=32]
        Region 3: Memory at f5c80000 (32-bit, non-prefetchable) [disabled] 
[size=16K]
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] MSI-X: Enable- Count=5 Masked+
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <512ns, 
L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- 
Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ 
TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit 
Latency L0s <2us, L1 <16us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ 
DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, 
OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, 
OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, 
EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, 
EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, 
LinkEqualizationRequest-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- 
RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [140 v1] Device Serial Number 00-50-d2-ff-ff-10-34-b6
        Capabilities: [1a0 v1] Transaction Processing Hints
                Device specific mode supported
                Steering table in TPH capability structure
        Kernel driver in use: pciback

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.