[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: PCI pass-through problem for SN570 NVME SSD
On Mon, Jul 04, 2022 at 07:34:47PM +0800, G.R. wrote: > On Mon, Jul 4, 2022 at 5:53 PM Roger Pau Monné <roger.pau@xxxxxxxxxx> wrote: > > > > On Sun, Jul 03, 2022 at 01:43:11AM +0800, G.R. wrote: > > > Hi everybody, > > > > > > I run into problems passing through a SN570 NVME SSD to a HVM guest. > > > So far I have no idea if the problem is with this specific SSD or with > > > the CPU + motherboard combination or the SW stack. > > > Looking for some suggestions on troubleshooting. > > > > > > List of build info: > > > CPU+motherboard: E-2146G + Gigabyte C246N-WU2 > > > XEN version: 4.14.3 > > > > Are you using a debug build of Xen? (if not it would be helpful to do > > so). > It's a release version at this moment. I can switch to a debug version > later when I get my hands free. > BTW, I got a DEBUG build of the xen_pciback driver to see how it plays > with 'xl pci-assignable-xxx' commands. > You can find this in my 2nd email in the chain. > > > > > > Dom0: Linux Kernel 5.10 (built from Debian 11.2 kernel source package) > > > The SN570 SSD sits here in the PCI tree: > > > +-1d.0-[05]----00.0 Sandisk Corp Device 501a > > > > Could be helpful to post the output with -vvv so we can see the > > capabilities of the device. > Sure, please find the -vvv output from the attachment. > This one is just to indicate the connection in the PCI tree. > I.e. 05:00.0 is attached under 00:1d.0. > > > > > > Syndromes observed: > > > With ASPM enabled, pciback has problem seizing the device. > > > > > > Jul 2 00:36:54 gaia kernel: [ 1.648270] pciback 0000:05:00.0: > > > xen_pciback: seizing device > > > ... > > > Jul 2 00:36:54 gaia kernel: [ 1.768646] pcieport 0000:00:1d.0: > > > AER: enabled with IRQ 150 > > > Jul 2 00:36:54 gaia kernel: [ 1.768716] pcieport 0000:00:1d.0: > > > DPC: enabled with IRQ 150 > > > Jul 2 00:36:54 gaia kernel: [ 1.768717] pcieport 0000:00:1d.0: > > > DPC: error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ > > > SwTrigger+ RP PIO Log 4, DL_ActiveErr+ > > > > Is there a device reset involved here? It's possible the device > > doesn't reset properly and hence the Uncorrectable Error Status > > Register ends up with inconsistent bits set. > > xen_pciback appears to force a FLR whenever it attempts to seize a > capable device. > As shown in pciback_dbg_xl-pci_assignable_XXX.log attached in my 2nd mail. > [ 323.448115] xen_pciback: wants to seize 0000:05:00.0 > [ 323.448136] pciback 0000:05:00.0: xen_pciback: probing... > [ 323.448137] pciback 0000:05:00.0: xen_pciback: seizing device > [ 323.448162] pciback 0000:05:00.0: xen_pciback: pcistub_device_alloc > [ 323.448162] pciback 0000:05:00.0: xen_pciback: initializing... > [ 323.448163] pciback 0000:05:00.0: xen_pciback: initializing config > [ 323.448344] pciback 0000:05:00.0: xen_pciback: enabling device > [ 323.448425] xen: registering gsi 16 triggering 0 polarity 1 > [ 323.448428] Already setup the GSI :16 > [ 323.448497] pciback 0000:05:00.0: xen_pciback: save state of device > [ 323.448642] pciback 0000:05:00.0: xen_pciback: resetting (FLR, D3, > etc) the device > [ 323.448707] pcieport 0000:00:1d.0: DPC: containment event, > status:0x1f11 source:0x0000 > [ 323.448730] pcieport 0000:00:1d.0: DPC: unmasked uncorrectable error > detected > [ 323.448760] pcieport 0000:00:1d.0: PCIe Bus Error: > severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver > ID) > [ 323.448786] pcieport 0000:00:1d.0: device [8086:a330] error > status/mask=00200000/00010000 > [ 323.448813] pcieport 0000:00:1d.0: [21] ACSViol (First) > [ 324.690979] pciback 0000:05:00.0: not ready 1023ms after FLR; > waiting <============ HERE > [ 325.730706] pciback 0000:05:00.0: not ready 2047ms after FLR; waiting > [ 327.997638] pciback 0000:05:00.0: not ready 4095ms after FLR; waiting > [ 332.264251] pciback 0000:05:00.0: not ready 8191ms after FLR; waiting > [ 340.584320] pciback 0000:05:00.0: not ready 16383ms after FLR; > waiting > [ 357.010896] pciback 0000:05:00.0: not ready 32767ms after FLR; waiting > [ 391.143951] pciback 0000:05:00.0: not ready 65535ms after FLR; giving up > [ 392.249252] pciback 0000:05:00.0: xen_pciback: reset device > [ 392.249392] pciback 0000:05:00.0: xen_pciback: > xen_pcibk_error_detected(bus:5,devfn:0) > [ 392.249393] pciback 0000:05:00.0: xen_pciback: device is not found/assigned > [ 392.397074] pciback 0000:05:00.0: xen_pciback: > xen_pcibk_error_resume(bus:5,devfn:0) > [ 392.397080] pciback 0000:05:00.0: xen_pciback: device is not found/assigned > [ 392.397284] pcieport 0000:00:1d.0: AER: device recovery successful > Note, I only see this in FLR action the 1st attempt. > And my SATA controller which doesn't support FLR appears to pass > through just fine... > > > > > > ... > > > Jul 2 00:36:54 gaia kernel: [ 1.770039] xen: registering gsi 16 > > > triggering 0 polarity 1 > > > Jul 2 00:36:54 gaia kernel: [ 1.770041] Already setup the GSI :16 > > > Jul 2 00:36:54 gaia kernel: [ 1.770314] pcieport 0000:00:1d.0: > > > DPC: containment event, status:0x1f11 source:0x0000 > > > Jul 2 00:36:54 gaia kernel: [ 1.770315] pcieport 0000:00:1d.0: > > > DPC: unmasked uncorrectable error detected > > > Jul 2 00:36:54 gaia kernel: [ 1.770320] pcieport 0000:00:1d.0: > > > PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction > > > Layer, (Receiver ID) > > > Jul 2 00:36:54 gaia kernel: [ 1.770371] pcieport 0000:00:1d.0: > > > device [8086:a330] error status/mask=00200000/00010000 > > > Jul 2 00:36:54 gaia kernel: [ 1.770413] pcieport 0000:00:1d.0: > > > [21] ACSViol (First) > > > Jul 2 00:36:54 gaia kernel: [ 1.770466] pciback 0000:05:00.0: > > > xen_pciback: device is not found/assigned > > > Jul 2 00:36:54 gaia kernel: [ 1.920195] pciback 0000:05:00.0: > > > xen_pciback: device is not found/assigned > > > Jul 2 00:36:54 gaia kernel: [ 1.920260] pcieport 0000:00:1d.0: > > > AER: device recovery successful > > > Jul 2 00:36:54 gaia kernel: [ 1.920263] pcieport 0000:00:1d.0: > > > DPC: containment event, status:0x1f01 source:0x0000 > > > Jul 2 00:36:54 gaia kernel: [ 1.920264] pcieport 0000:00:1d.0: > > > DPC: unmasked uncorrectable error detected > > > Jul 2 00:36:54 gaia kernel: [ 1.920267] pciback 0000:05:00.0: > > > xen_pciback: device is not found/assigned > > > > That's from a different device (05:00.0). > 00:1d.0 is the bridge port that 05:00.0 attaches to. > > > > > > > > After the 'xl pci-assignable-list' appears to be self-consistent, > > > creating VM with the SSD assigned still leads to a guest crash: > > > From qemu log: > > > [00:06.0] xen_pt_region_update: Error: create new mem mapping failed! > > > (err: 1) > > > qemu-system-i386: terminating on signal 1 from pid 1192 (xl) > > > > > > From the 'xl dmesg' output: > > > (XEN) d1: GFN 0xf3078 (0xa2616,0,5,7) -> (0xa2504,0,5,7) not permitted > > > > Seems like QEMU is attempting to remap a p2m_mmio_direct region. > > > > Can you paste the full output of `xl dmesg`? (as that will contain the > > memory map). > Attached. > > > > > Would also be helpful if you could get the RMRR regions from that > > box. Booting with `iommu=verbose` on the Xen command line should print > > those. > Coming in my next reply... > 00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port > #9 (rev f0) (prog-if 00 [Normal decode]) > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- > <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 126 > IOMMU group: 10 > Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 > I/O behind bridge: 0000f000-00000fff [disabled] > Memory behind bridge: a2600000-a26fffff [size=1M] > Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff > [disabled] > Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- > <MAbort+ <SERR- <PERR- > BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0 > ExtTag- RBE+ > DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+ > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- > MaxPayload 256 bytes, MaxReadReq 128 bytes > DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ > TransPend- > LnkCap: Port #9, Speed 8GT/s, Width x4, ASPM L0s L1, Exit > Latency L0s <1us, L1 <16us > ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+ > LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 8GT/s (ok), Width x4 (ok) > TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- > Surprise- > Slot #12, PowerLimit 25.000W; Interlock- NoCompl+ > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- > LinkChg- > Control: AttnInd Unknown, PwrInd Unknown, Power- > Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ > Interlock- > Changed: MRL- PresDet- LinkState+ > RootCap: CRSVisible- > RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- > CRSVisible- > RootSta: PME ReqID 0000, PMEStatus- PMEPending- > DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- > LTR+ > 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- > EETLPPrefix- > EmergencyPowerReduction Not Supported, > EmergencyPowerReductionInit- > FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- > ARIFwd+ > AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ > OBFF Disabled, ARIFwd- > AtomicOpsCtl: ReqEn- EgressBlck- > LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- > 2Retimers- DRS- > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, > EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -3.5dB, > EqualizationComplete+ EqualizationPhase1+ > EqualizationPhase2+ EqualizationPhase3+ > LinkEqualizationRequest- > Retimer- 2Retimers- CrosslinkRes: unsupported > Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- > Address: fee002b8 Data: 0000 > Capabilities: [90] Subsystem: Gigabyte Technology Co., Ltd Cannon Lake > PCH PCI Express Root Port > Capabilities: [a0] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [100 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ > RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- > RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > AdvNonFatalErr- > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- > AdvNonFatalErr+ > AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- > ECRCChkCap- ECRCChkEn- > MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- > HeaderLog: 00000000 00000000 00000000 00000000 > RootCmd: CERptEn+ NFERptEn+ FERptEn+ > RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd- > FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0 > ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000 > Capabilities: [140 v1] Access Control Services > ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd- > EgressCtrl- DirectTrans- > ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd- > EgressCtrl- DirectTrans- > Capabilities: [150 v1] Precision Time Measurement > PTMCap: Requester:- Responder:+ Root:+ > PTMClockGranularity: 4ns > PTMControl: Enabled:+ RootSelected:+ > PTMEffectiveGranularity: Unknown > Capabilities: [200 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > L1_PM_Substates+ > PortCommonModeRestoreTime=40us PortTPowerOnTime=44us > L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=40us LTR1.2_Threshold=65536ns > L1SubCtl2: T_PwrOn=44us > Capabilities: [220 v1] Secondary PCI Express > LnkCtl3: LnkEquIntrruptEn- PerformEqu- > LaneErrStat: 0 > Capabilities: [250 v1] Downstream Port Containment > DpcCap: INT Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log > 4, DL_ActiveErr+ > DpcCtl: Trigger:1 Cmpl- INT+ ErrCor- PoisonedTLP- SwTrigger- > DL_ActiveErr- > DpcSta: Trigger- Reason:00 INT- RPBusy- TriggerExt:00 RP PIO > ErrPtr:1f > Source: 0000 > Kernel driver in use: pcieport > > 05:00.0 Non-Volatile memory controller: Sandisk Corp Device 501a (prog-if 02 > [NVM Express]) > Subsystem: Sandisk Corp Device 501a > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- > <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 16 > NUMA node: 0 > IOMMU group: 13 > Region 0: Memory at a2600000 (64-bit, non-prefetchable) [size=16K] > Region 4: Memory at a2604000 (64-bit, non-prefetchable) [size=256] I think I'm slightly confused, the overlapping happens at: (XEN) d1: GFN 0xf3078 (0xa2616,0,5,7) -> (0xa2504,0,5,7) not permitted So it's MFNs 0xa2616 and 0xa2504, yet none of those are in the BAR ranges of this device. Can you paste the lspci -vvv output for any other device you are also passing through to this guest? Thanks, Roger.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |