[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
On Mon, Jun 08, 2015 at 09:09:15AM +0100, Malcolm Crossley wrote: > On 08/06/15 08:42, Jan Beulich wrote: > >>>> On 07.06.15 at 08:23, <mst@xxxxxxxxxx> wrote: > >> On Mon, Apr 20, 2015 at 04:32:12PM +0200, Michael S. Tsirkin wrote: > >>> On Mon, Apr 20, 2015 at 03:08:09PM +0100, Jan Beulich wrote: > >>>>>>> On 20.04.15 at 15:43, <mst@xxxxxxxxxx> wrote: > >>>>> On Mon, Apr 13, 2015 at 01:51:06PM +0100, Jan Beulich wrote: > >>>>>>>>> On 13.04.15 at 14:47, <mst@xxxxxxxxxx> wrote: > >>>>>>> Can you check device capabilities register, offset 0x4 within > >>>>>>> pci express capability structure? > >>>>>>> Bit 15 is 15 Role-Based Error Reporting. > >>>>>>> Is it set? > >>>>>>> > >>>>>>> The spec says: > >>>>>>> > >>>>>>> 15 > >>>>>>> On platforms where robust error handling and PC-compatible > >>>>>>> Configuration > >>>>>>> Space probing is > >>>>>>> required, it is suggested that software or firmware have the > >>>>>>> Unsupported > >>>>>>> Request Reporting Enable > >>>>>>> bit Set for Role-Based Error Reporting Functions, but clear for > >>>>>>> 1.0a > >>>>>>> Functions. Software or > >>>>>>> firmware can distinguish the two classes of Functions by > >>>>>>> examining the > >>>>>>> Role-Based Error Reporting > >>>>>>> bit in the Device Capabilities register. > >>>>>> > >>>>>> Yes, that bit is set. > >>>>> > >>>>> curiouser and curiouser. > >>>>> > >>>>> So with functions that do support Role-Based Error Reporting, we have > >>>>> this: > >>>>> > >>>>> > >>>>> With device Functions implementing Role-Based Error Reporting, > >>>>> setting the > >>>>> Unsupported Request > >>>>> Reporting Enable bit will not interfere with PC-compatible > >>>>> Configuration > >>>>> Space probing, assuming > >>>>> that the severity for UR is left at its default of non-fatal. > >>>>> However, > >>>>> setting the Unsupported Request > >>>>> Reporting Enable bit will enable the Function to report UR > >>>>> errors 97 > >>>>> detected with posted Requests, > >>>>> helping avoid this case for potential silent data corruption. > >>>> > >>>> I still don't see what the PC-compatible config space probing has to > >>>> do with our issue. > >>> > >>> I'm not sure but I think it's listed here because it causes a ton of URs > >>> when device scan probes unimplemented functions. > >>> > >>>>> did firmware reconfigure this device to report URs as fatal errors then? > >>>> > >>>> No, the Unsupported Request Error Serverity flag is zero. > >>> > >>> OK, that's the correct configuration, so how come the box crashes when > >>> there's a UR then? > >> > >> Ping - any update on this? > > > > Not really. All we concluded so far is that _maybe_ the bridge, upon > > seeing the UR, generates a Master Abort, rendering the whole thing > > fatal. Otoh the respective root port also has > > - Received Master Abort set in its Secondary Status register (but > > that's also already the case in the log that we have before the UR > > occurs, i.e. that doesn't mean all that much), > > - Received System Error set in its Secondary Status register (and > > after the UR the sibling endpoint [UR originating from 83:00.0, > > sibling being 83:00.1] also shows Signaled System Error set). > > > > Disabling the Memory decode in the command register could also result in a > completion timeout on the > root port issuing a transaction towards the PCI device in question. Can it really? Such device would violate the PCIE spec, which says: If the request is not claimed, then it is handled as an Unsupported Request, which is the PCI Express equivalent of conventional PCIâs Master Abort termination. > PCIE completion timeouts can be > escalated to Fatal AER errors which trigger system firmware to inject NMI's > into the host. > > Unsupported requests can also be escalated to be Fatal AER errors (which > would again trigger system > firmware to inject an NMI). Only if the system is misconfigured. We found out the system in question is not configured to do this. > Here is an example AER setup for a PCIE root port. You can see UnsupReq > errors are masked and so do > not trigger errors. CmpltTO ( completion timeout) errors are not masked and > the errors are treated > as Fatal because the corresponding bit in the Uncorrectable Severity register > is set. > > Capabilities: [148 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- > MalfTLP- ECRC- UnsupReq+ ACSViol+ > UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ > MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- > > A root port completion timeout will also result in the master abort bit being > set. How do you figure this one out? The spec I have says master abort is the equivalent of UR. > Typically system firmware clears the error in the AER registers after it's > processed it. So the > operating system may not be able to determine what error triggered the NMI in > the first place. At least for debugging, just disable firmware and handle everything in software. > >> Do we can chalk this up to hardware bugs on a specific box? > > > > I have to admit that I'm still very uncertain whether to consider all > > this correct behavior, a firmware flaw, or a hardware bug. > I believe the correct behaviour is happening but a PCIE completion timeout is > occurring instead of a > unsupported request. > > Malcolm This guess would be easy to check - just mask out the timeout bit. > > > > > Jan > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@xxxxxxxxxxxxx > > http://lists.xen.org/xen-devel > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |