[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
>>> On 08.06.15 at 10:09, <malcolm.crossley@xxxxxxxxxx> wrote: > On 08/06/15 08:42, Jan Beulich wrote: >> Not really. All we concluded so far is that _maybe_ the bridge, upon >> seeing the UR, generates a Master Abort, rendering the whole thing >> fatal. Otoh the respective root port also has >> - Received Master Abort set in its Secondary Status register (but >> that's also already the case in the log that we have before the UR >> occurs, i.e. that doesn't mean all that much), >> - Received System Error set in its Secondary Status register (and >> after the UR the sibling endpoint [UR originating from 83:00.0, >> sibling being 83:00.1] also shows Signaled System Error set). >> > > Disabling the Memory decode in the command register could also result in a > completion timeout on the > root port issuing a transaction towards the PCI device in question. PCIE > completion timeouts can be > escalated to Fatal AER errors which trigger system firmware to inject NMI's > into the host. And how does all that play with PC compatibility (where writes into no-where get dropped, and reads from no-where get all ones returned)? Remember - we#re talking about CPU side accesses here. > Here is an example AER setup for a PCIE root port. You can see UnsupReq > errors are masked and so do > not trigger errors. CmpltTO ( completion timeout) errors are not masked and > the errors are treated > as Fatal because the corresponding bit in the Uncorrectable Severity > register is set. > > Capabilities: [148 v1] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > MalfTLP- ECRC- UnsupReq- > ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- > MalfTLP- ECRC- > UnsupReq+ ACSViol+ > UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ > MalfTLP+ ECRC- > UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- > > A root port completion timeout will also result in the master abort bit > being set. > > Typically system firmware clears the error in the AER registers after it's > processed it. So the > operating system may not be able to determine what error triggered the NMI > in the first place. Right, but in the case at hand we have an ITP log available, which increases the hope that we see a reasonably complete picture. >>> Do we can chalk this up to hardware bugs on a specific box? >> >> I have to admit that I'm still very uncertain whether to consider all >> this correct behavior, a firmware flaw, or a hardware bug. > I believe the correct behaviour is happening but a PCIE completion timeout > is occurring instead of a > unsupported request. Might it be that with the supposedly correct device returning UR the root port reissues the request to the sibling device, which then fails it in a more dramatic way (albeit the sibling's Uncorrectable Error Status Register also has only Unsupported Request Error Status set)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |