[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH][XSA-126] xen: limit guest control of PCI command register
On Mon, Jun 08, 2015 at 10:03:18AM +0100, Jan Beulich wrote: > >>> On 08.06.15 at 10:09, <malcolm.crossley@xxxxxxxxxx> wrote: > > On 08/06/15 08:42, Jan Beulich wrote: > >> Not really. All we concluded so far is that _maybe_ the bridge, upon > >> seeing the UR, generates a Master Abort, rendering the whole thing > >> fatal. Otoh the respective root port also has > >> - Received Master Abort set in its Secondary Status register (but > >> that's also already the case in the log that we have before the UR > >> occurs, i.e. that doesn't mean all that much), > >> - Received System Error set in its Secondary Status register (and > >> after the UR the sibling endpoint [UR originating from 83:00.0, > >> sibling being 83:00.1] also shows Signaled System Error set). > >> > > > > Disabling the Memory decode in the command register could also result in a > > completion timeout on the > > root port issuing a transaction towards the PCI device in question. PCIE > > completion timeouts can be > > escalated to Fatal AER errors which trigger system firmware to inject NMI's > > into the host. > > And how does all that play with PC compatibility (where writes into > no-where get dropped, and reads from no-where get all ones > returned)? Remember - we#re talking about CPU side accesses > here. > > > Here is an example AER setup for a PCIE root port. You can see UnsupReq > > errors are masked and so do > > not trigger errors. CmpltTO ( completion timeout) errors are not masked and > > the errors are treated > > as Fatal because the corresponding bit in the Uncorrectable Severity > > register is set. > > > > Capabilities: [148 v1] Advanced Error Reporting > > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- > > MalfTLP- ECRC- UnsupReq- > > ACSViol- > > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- > > MalfTLP- ECRC- > > UnsupReq+ ACSViol+ > > UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ > > MalfTLP+ ECRC- > > UnsupReq- ACSViol- > > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- > > CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+ > > AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- > > > > A root port completion timeout will also result in the master abort bit > > being set. > > > > Typically system firmware clears the error in the AER registers after it's > > processed it. So the > > operating system may not be able to determine what error triggered the NMI > > in the first place. > > Right, but in the case at hand we have an ITP log available, which > increases the hope that we see a reasonably complete picture. > > >>> Do we can chalk this up to hardware bugs on a specific box? > >> > >> I have to admit that I'm still very uncertain whether to consider all > >> this correct behavior, a firmware flaw, or a hardware bug. > > I believe the correct behaviour is happening but a PCIE completion timeout > > is occurring instead of a > > unsupported request. > > Might it be that with the supposedly correct device returning UR > the root port reissues the request to the sibling device, which then > fails it in a more dramatic way (albeit the sibling's Uncorrectable > Error Status Register also has only Unsupported Request Error > Status set)? > > Jan Isn't the sibling a function on the same device? And is the request causing the UR a memory read? If so doesn't this use address routing? What does it mean that the request is "to the sibling device" then? Does the sibling device have a BAR overlapping the address? -- MST _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |