[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Qemu-devel] [PATCH 2/2] xen:i386:pc_piix: create isa bridge specific to IGD passthrough

On 2014/10/7 15:26, Michael S. Tsirkin wrote:
On Tue, Sep 30, 2014 at 10:43:09AM +0800, Chen, Tiejun wrote:
On 2014/9/29 18:01, Michael S. Tsirkin wrote:
On Sun, Sep 28, 2014 at 10:59:05AM +0800, Chen, Tiejun wrote:
On 2014/9/3 9:40, Kay, Allen M wrote:

-----Original Message-----
From: Chen, Tiejun
Sent: Monday, September 01, 2014 12:50 AM
To: Michael S. Tsirkin
Cc: xen-devel@xxxxxxxxxxxxxxxxxxx; Kay, Allen M; qemu-devel@xxxxxxxxxx;
Konrad Rzeszutek Wilk
Subject: Re: [Qemu-devel] [Xen-devel] [PATCH 2/2] xen:i386:pc_piix: create
isa bridge specific to IGD passthrough

On 2014/9/1 14:05, Michael S. Tsirkin wrote:
On Mon, Sep 01, 2014 at 10:50:37AM +0800, Chen, Tiejun wrote:
On 2014/8/31 16:58, Michael S. Tsirkin wrote:
On Fri, Aug 29, 2014 at 09:28:50AM +0800, Chen, Tiejun wrote:

On 2014/8/28 8:56, Chen, Tiejun wrote:
+     */
+    dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
+                            "xen-igd-passthrough-isa-bridge");
+    if (dev) {
+        r = xen_host_pci_device_get(&hdev, 0, 0,
+ PCI_DEVFN(0x1f,
0), 0);
+        if (!r) {
+            pci_config_set_vendor_id(dev->config,
+            pci_config_set_device_id(dev->config,
+ hdev.device_id);

Can you, instead, implement the reverse logic, probing the card
and supplying the correct device id for PCH?

Here what is your so-called reverse logic as I already asked you
previously? Do you mean I should list all PCHs with a combo
illustrated with the vendor/device id in advance? Then look up
if we can find a




Could you explain this exactly? Then I can try follow-up your
idea ASAP if this is necessary and possible.


Could you give us some explanation for your "reverse logic" when
you're free?


So future drivers will look at device ID for the card and figure out
how things should work from there.
Old drivers still poke at device id of the chipset for this, but
maybe qemu can do what newer drivers do:
look at the card and figure out what guest should do, then present
the appropriate chipset id.

This is based on what Jesse said:
        Practically speaking, we could probably assume specific GPU/PCH
        as I don't think they're generally mixed across generations, though
        and IVB did have compatible sockets, so there is the possibility of
        mixing CPT and PPT PCHs, but those are register identical as far as the
        graphics driver is concerned, so even that should be safe.


Thanks for your explanation.

so the idea is to have a reverse table mapping GPU back to PCH.
Present to guest the ID that will let it assume the correct GPU.

I guess you mean we should create to maintain such a table:

[GPU Card: device_id(s), PCH: device_id]

Then each time, instead of exposing that real PCH device id directly,
qemu first can get the real GPU device id to lookup this table to
present a appropriate PCH's device id to the guest.

And looks here that appropriate PCH's device id is not always a that
real PCH's device id. Right? If I'm wrong please correct me.

Exactly: we don't really care what the PCH ID is - we only need it for
the guest driver to do the right thing.


I need to ask Allen to check if one given GPU card device id is always
corresponding to one given PCH on both HSW and BDW currently. If yes, I can
do this quickly. Otherwise I need some time to establish that sort
of connection.


Sorry for this delay response but please keep in mind we really are in this

You know this involve different GPU components we have to take some time to
communicate or even discuss internal.

Now I have a draft codes, could you take a look at this? I hope that comment
can help us to understand something, but if you have any question, we can
further respond inline.

typedef struct {
     uint16_t gpu_device_id;
     uint16_t pch_device_id;
} XenIGDDeviceIDInfo;

/* In real world different GPU should have different PCH. But actually
  * the different PCH DIDs likely map to different PCH SKUs. We do the
  * same thing for the GPU. For PCH, the different SKUs are going to be
  * all the same silicon design and implementation, just different
  * features turn on and off with fuses. The SW interfaces should be
  * consistent across all SKUs in a given family (eg LPT). But just same
  * features may not be supported.
  * Most of these different PCH features probably don't matter to the
  * Gfx driver, but obviously any difference in display port connections
  * will so it should be fine with any PCH in case of passthrough.
  * So currently use one PCH version, 0x8c4e, to cover all HSW scenarios,
  * 0x9cc3 for BDW.

Pls write Haswell and Broadwell in full in comment.

Are you saying I should list all PCH device ids both of HSW and BDW here
like this format?


I am merely saying don't say "HSW, BDW" in comments. Say "Haswell,

Sorry I'm misunderstanding what you mean.

Broadwell, or if you like HWS(Haswell), BDW(Broadwell).

Looks the latter is moderate.

+ * So currently use one PCH version, 0x8c4e, to cover all HSW(Haswell)
+ * scenarios, 0x9cc3 for BDW(Broadwell).
+ */

static const XenIGDDeviceIDInfo xen_igd_combo_id_infos[] = {
     /* HSW Classic */
     {0x0402, 0x8c4e}, /* HSWGT1D, HSWD_w7 */
     {0x0406, 0x8c4e}, /* HSWGT1M, HSWM_w7 */
     {0x0412, 0x8c4e}, /* HSWGT2D, HSWD_w7 */
     {0x0416, 0x8c4e}, /* HSWGT2M, HSWM_w7 */
     {0x041E, 0x8c4e}, /* HSWGT15D, HSWD_w7 */
     /* HSW ULT */
     {0x0A06, 0x8c4e}, /* HSWGT1UT, HSWM_w7 */
     {0x0A16, 0x8c4e}, /* HSWGT2UT, HSWM_w7 */
     {0x0A26, 0x8c4e}, /* HSWGT3UT, HSWM_w7 */
     {0x0A2E, 0x8c4e}, /* HSWGT3UT28W, HSWM_w7 */
     {0x0A1E, 0x8c4e}, /* HSWGT2UX, HSWM_w7 */
     {0x0A0E, 0x8c4e}, /* HSWGT1ULX, HSWM_w7 */
     /* HSW CRW */
     {0x0D26, 0x8c4e}, /* HSWGT3CW, HSWM_w7 */
     {0x0D22, 0x8c4e}, /* HSWGT3CWDT, HSWD_w7 */
     /* HSW Server */
     {0x041A, 0x8c4e}, /* HSWSVGT2, HSWD_w7 */
     /* HSW SRVR */
     {0x040A, 0x8c4e}, /* HSWSVGT1, HSWD_w7 */
     /* BSW */
     {0x1606, 0x9cc3}, /* BDWULTGT1, BDWM_w7 */
     {0x1616, 0x9cc3}, /* BDWULTGT2, BDWM_w7 */
     {0x1626, 0x9cc3}, /* BDWULTGT3, BDWM_w7 */
     {0x160E, 0x9cc3}, /* BDWULXGT1, BDWM_w7 */
     {0x161E, 0x9cc3}, /* BDWULXGT2, BDWM_w7 */
     {0x1602, 0x9cc3}, /* BDWHALOGT1, BDWM_w7 */
     {0x1612, 0x9cc3}, /* BDWHALOGT2, BDWM_w7 */
     {0x1622, 0x9cc3}, /* BDWHALOGT3, BDWM_w7 */
     {0x162B, 0x9cc3}, /* BDWHALO28W, BDWM_w7 */
     {0x162A, 0x9cc3}, /* BDWGT3WRKS, BDWM_w7 */
     {0x162D, 0x9cc3}, /* BDWGT3SRVR, BDWM_w7 */

static void xen_igd_passthrough_pciisabridge_get_pci_device_id(Object *obj,
                                                                Visitor *v,
                                                                void *opaque,
                                                                const char
                                                                Error **errp)
     uint32_t value = 0;
     XenHostPCIDevice hdev;
     int r = 0, num;

     r = xen_host_pci_device_get(&hdev, 0, 0, 0x02, 0);
     if (!r) {
         value = hdev.device_id;

         num = sizeof(xen_igd_combo_id_infos)/sizeof(uint16_t);
         for (r = 0; r < num; r++)
             if (value == xen_igd_combo_id_infos[r].gpu_device_id)
                 value = xen_igd_combo_id_infos[r].pch_device_id;

that's messy, I would use different variables for ID
of GPU and PCH.

         for (r = 0; r < num; r++)
             if (gpu_id == xen_igd_combo_id_infos[r].gpu_device_id)
                 pch_id = xen_igd_combo_id_infos[r].pch_device_id;



     visit_type_uint32(v, &value, name, errp);

what I was suggesting is to start with the GPU device ID
(you can get it e.g. from the config array).

I think current codes is doing this:

     r = xen_host_pci_device_get(&hdev, 0, 0, 0x02, 0);
     if (!r) {
         gpu_id = hdev.device_id;

Here xen_host_pci_device_get() is a wrapper of reading pci sysfs.

Do you want the device ID of the GPU?


So please don't poke at it directly in sysfs.
Instead, find the GPU's PCIDevice and do
or just get from gpu->config.

Good, but how can we get gpu as pci_dev instance here? Then we can call pci_dev->config_read as expect.

Use that to look up the PCH ID in xen_igd_combo_id_infos.
If there, override the PCH ID.

If not there, this is a new device so its driver will
not look at PCH at all, we can make it whatever or
skip it completely.

I think we should return one initialized value, 0xffff, since often this
represents an invalid PCI device.

I'm not sure you really want an invalid PCI device,
would be nicer not to have a PCH device there at all.

Yes, I means 0xffff indicates that there's no any PCI device.

This seems to almost do this, however
- Why are you looking at host PCH device ID at all?
- Why don't you look at the GPU device ID?

We still fix this bridge at 1f.0, and current our implementation can cover
our requirement and safe. I means this bridge should not be used for other
use cases, so if its still be accessed we mightn't take care of them, right?

I was asking about the host bridge device, not guest.

Are you assuming what will happen to a new device?


static void xen_igd_passthrough_isa_bridge_initfn(Object *obj)
     object_property_add(obj, "device-id", "int",
                         NULL, NULL, NULL, NULL);

OK and what reads this property?

In sequent patch I will do something like this,

@@ -464,6 +464,32 @@ static void pc_xen_hvm_init(MachineState *machine)

+static void xen_igd_passthrough_isa_bridge_create(PCIBus *bus)
+    struct PCIDevice *dev;
+    Error *local_err = NULL;
+    uint16_t device_id = 0xffff;
+    /* Currently IGD drivers always need to access PCH by 1f.0. */
+    dev = pci_create_simple(bus, PCI_DEVFN(0x1f, 0),
+                            "xen-igd-passthrough-isa-bridge");
+    /* Identify PCH card with its own real vendor/device ids.
+     * Here that vendor id is always PCI_VENDOR_ID_INTEL.
+     */
+    if (dev) {
+        device_id = (uint16_t)object_property_get_int(OBJECT(dev),
+                                                      "device-id",
+                                                      &local_err);

cast is not needed here.

+        if ((!local_err) && (device_id != 0xffff)) {

too many ().

+        device_id = object_property_get_int(OBJECT(dev), "device-id",
+                                            &local_err);
+        if (!local_err && device_id != 0xffff) {

+            pci_config_set_device_id(dev->config, device_id);
+            return;
+        }
+    }
+    fprintf(stderr, "xen set xen-igd-passthrough-isa-bridge failed\n");
  static void xen_igd_passthrough_pc_hvm_init(MachineState *machine)
      PCIBus *bus;
@@ -473,6 +499,7 @@ static void xen_igd_passthrough_pc_hvm_init(MachineState
      bus = pci_find_primary_bus();
      if (bus != NULL) {
          pci_create_simple(bus, -1, "xen-platform");
+        xen_igd_passthrough_isa_bridge_create(bus);


By the way, I wonder: would it make sense to add this code in xen_pt_initfn?

You would then do the following there:

        - look up GPU by device/vendor ID
        - if a legacy GPU is found, get the ID, translate
          to host bridge ID, and create the compat bridge.

saves the hassle of using a distinct machine type for passthrough, no?

As I remember Paolo thought we'd better have a separate machine specific to IGD passthrough, so I'm not sure if its fine to him now.



If I understand this correctly, the only difference is instead of reading PCH 
DevID/RevID from the host hardware, QEMU inserts those values into PCH virtual 
device by looking at the reverse mapping table it maintains.

I agree the downside of doing this is the reverse mapping table may be hard to 

What is the advantage of doing this instead of having QEMU reading it from the 
host?  Is it to test to make sure reverse mapping methods works before it is 
adopted in the new drivers?


the problem with these tables is they are hard to keep up to date

Yeah. But I think currently we can just start from some modern CPU
such as HSW and BDW, then things could be easy.


Any idea to this suggestion?

as new hardware comes out, but as future hardware won't need these
hacks, we shall be fine.




matched PCH? If yes, what is that benefit you expect in
passthrough case? Shouldn't we pass these info to VM directly in
passthrough case?



Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.