[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Segment truncation in multi-segment PCI handling?



On Mon, Jun 10, 2024 at 12:11:58PM +0200, Jan Beulich wrote:
> On 10.06.2024 11:46, Roger Pau Monné wrote:
> > On Mon, Jun 10, 2024 at 10:41:19AM +0200, Jan Beulich wrote:
> >> On 10.06.2024 10:28, Roger Pau Monné wrote:
> >>> On Mon, Jun 10, 2024 at 09:58:11AM +0200, Jan Beulich wrote:
> >>>> On 07.06.2024 21:52, Andrew Cooper wrote:
> >>>>> On 07/06/2024 8:46 pm, Marek Marczykowski-Górecki wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I've got a new system, and it has two PCI segments:
> >>>>>>
> >>>>>>     0000:00:00.0 Host bridge: Intel Corporation Device 7d14 (rev 04)
> >>>>>>     0000:00:02.0 VGA compatible controller: Intel Corporation Meteor 
> >>>>>> Lake-P [Intel Graphics] (rev 08)
> >>>>>>     ...
> >>>>>>     10000:e0:06.0 System peripheral: Intel Corporation RST VMD Managed 
> >>>>>> Controller
> >>>>>>     10000:e0:06.2 PCI bridge: Intel Corporation Device 7ecb (rev 10)
> >>>>>>     10000:e1:00.0 Non-Volatile memory controller: Phison Electronics 
> >>>>>> Corporation PS5021-E21 PCIe4 NVMe Controller (DRAM-less) (rev 01)
> >>>>>>
> >>>>>> But looks like Xen doesn't handle it correctly:
> >>>
> >>> In the meantime you can probably disable VMD from the firmware and the
> >>> NVMe devices should appear on the regular PCI bus.
> >>>
> >>>>>>     (XEN) 0000:e0:06.0: unknown type 0
> >>>>>>     (XEN) 0000:e0:06.2: unknown type 0
> >>>>>>     (XEN) 0000:e1:00.0: unknown type 0
> >>>>>>     ...
> >>>>>>     (XEN) ==== PCI devices ====
> >>>>>>     (XEN) ==== segment 0000 ====
> >>>>>>     (XEN) 0000:e1:00.0 - NULL - node -1 
> >>>>>>     (XEN) 0000:e0:06.2 - NULL - node -1 
> >>>>>>     (XEN) 0000:e0:06.0 - NULL - node -1 
> >>>>>>     (XEN) 0000:2b:00.0 - d0 - node -1  - MSIs < 161 >
> >>>>>>     (XEN) 0000:00:1f.6 - d0 - node -1  - MSIs < 148 >
> >>>>>>     ...
> >>>>>>
> >>>>>> This isn't exactly surprising, since pci_sbdf_t.seg is uint16_t, so
> >>>>>> 0x10000 doesn't fit. OSDev wiki says PCI Express can have 65536 PCI
> >>>>>> Segment Groups, each with 256 bus segments.
> >>>>>>
> >>>>>> Fortunately, I don't need this to work, if I disable VMD in the
> >>>>>> firmware, I get a single segment and everything works fine.
> >>>>>>
> >>>>>
> >>>>> This is a known issue.  Works is being done, albeit slowly.
> >>>>
> >>>> Is work being done? After the design session in Prague I put it on my
> >>>> todo list, but at low priority. I'd be happy to take it off there if I
> >>>> knew someone else is looking into this.
> >>>
> >>> We had a design session about VMD?  If so I'm afraid I've missed it.
> >>
> >> In Prague last year, not just now in Lisbon.
> >>
> >>>>> 0x10000 is indeed not a spec-compliant PCI segment.  It's something
> >>>>> model specific the Linux VMD driver is doing.
> >>>>
> >>>> I wouldn't call this "model specific" - this numbering is purely a
> >>>> software one (and would need coordinating between Dom0 and Xen).
> >>>
> >>> Hm, TBH I'm not sure whether Xen needs to be aware of VMD devices.
> >>> The resources used by the VMD devices are all assigned to the VMD
> >>> root.  My current hypothesis is that it might be possible to manage
> >>> such devices without Xen being aware of their existence.
> >>
> >> Well, it may be possible to have things work in Dom0 without Xen
> >> knowing much. Then Dom0 would need to suppress any physdevop calls
> >> with such software-only segment numbers (in order to at least not
> >> confuse Xen). I'd be curious though how e.g. MSI setup would work in
> >> such a scenario.
> > 
> > IIRC from my read of the spec,
> 
> So you have found a spec somewhere? I didn't so far, and I had even asked
> Intel ...
> 
> > VMD devices don't use regular MSI
> > data/address fields, and instead configure an index into the MSI table
> > on the VMD root for the interrupt they want to use.  It's only the VMD
> > root device (which is a normal device on the PCI bus) that has
> > MSI(-X) configured with real vectors, and multiplexes interrupts for
> > all devices behind it.
> > 
> > If we had to passthrough VMD devices we might have to intercept writes
> > to the VMD MSI(-X) entries, but since they can only be safely assigned
> > to dom0 I think it's not an issue ATM (see below).
> > 
> >> Plus clearly any passing through of a device behind
> >> the VMD bridge will quite likely need Xen involvement (unless of
> >> course the only way of doing such pass-through was to pass on the
> >> entire hierarchy).
> > 
> > All VMD devices share the Requestor ID of the VMD root, so AFAIK it's
> > not possible to passthrough them (unless you passthrough the whole VMD
> > root) because they all share the same context entry on the IOMMU.
> 
> While that was my vague understanding too, it seemed too limiting to me
> to be true.

I my case, it was a single NVMe disk behind this VMD thing, so passing
through the whole VMD device wouldn't be too bad. I have no idea (nor
really interest in...) how it behaves with more disks.
From the above discussion I understand the 0x10000 segment is really a
software construct, not anything that hardware expects, so IMO dom0
shouldn't tell Xen anything about it.

Since I have the hardware, I can do some more tests if somebody is
interested in results. But for now I have disabled VMD in firmware and
everything is fine.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

Attachment: signature.asc
Description: PGP signature


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.