[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support
On Wed, Jul 02, 2014 at 08:59:38AM +0800, Chen, Tiejun wrote: > On 2014/7/1 20:33, Michael S. Tsirkin wrote: > >On Tue, Jul 01, 2014 at 05:46:58PM +0800, Chen, Tiejun wrote: > >>On 2014/7/1 17:12, Michael S. Tsirkin wrote: > >>>On Tue, Jul 01, 2014 at 10:40:42AM +0800, Chen, Tiejun wrote: > >>>>On 2014/6/30 19:28, Michael S. Tsirkin wrote: > >>>>>On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote: > >>>>>>On 2014/6/30 17:55, Michael S. Tsirkin wrote: > >>>>>>>On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote: > >>>>>>>>On 2014/6/30 17:05, Michael S. Tsirkin wrote: > >>>>>>>>>On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote: > >>>>>>>>>>On 2014/6/30 14:48, Michael S. Tsirkin wrote: > >>>>>>>>>>>On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: > >>>>>>>>>>>>On 2014/6/26 18:03, Paolo Bonzini wrote: > >>>>>>>>>>>>>Il 26/06/2014 11:18, Chen, Tiejun ha scritto: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>- offsets 0x0000..0x0fff map to configuration space of the > >>>>>>>>>>>>>>>host MCH > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>Are you saying the config space in the video device? > >>>>>>>>>>>>> > >>>>>>>>>>>>>No, I am saying in a new BAR, or at some magic offset of an > >>>>>>>>>>>>>existing > >>>>>>>>>>>>>MMIO BAR. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>As I mentioned previously, the IGD guy told me we have no any > >>>>>>>>>>>>unused a > >>>>>>>>>>>>offset or BAR in the config space. > >>>>>>>>>>>> > >>>>>>>>>>>>And guy who are responsible for the native driver seems not be > >>>>>>>>>>>>accept to > >>>>>>>>>>>>extend some magic offset of an existing MMIO BAR. > >>>>>>>>>>>> > >>>>>>>>>>>>In addition I think in a short time its not possible to migrate > >>>>>>>>>>>>i440fx to > >>>>>>>>>>>>q35 as a PCIe machine of xen. > >>>>>>>>>>> > >>>>>>>>>>>That seems like a weak motivation. I don't see a need to get > >>>>>>>>>>>something > >>>>>>>>>>>merged upstream in a short time: this seems sure to miss 2.1, > >>>>>>>>>>>so you have the time to make it architecturally sound. > >>>>>>>>>>>"Making existing guests work" would be a better motivation. > >>>>>>>>>> > >>>>>>>>>>Yes. > >>>>>>>>> > >>>>>>>>>So focus on this then. Existing guests will probably work > >>>>>>>>>fine on a newer chipset - likely better than on i440fx. > >>>>>>>>>xen management tools need to do some work to support this? > >>>>>>>>>That will just give everyone more long term benefits. > >>>>>>>>> > >>>>>>>>>If instead we create a hack that does not resemble > >>>>>>>>>any existing hardware even remotely, what's the > >>>>>>>>>chance that it will not break with any future > >>>>>>>>>guest modification? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>>Isn't this possible with an mch chipset? > >>>>>>>>>> > >>>>>>>>>>If you're saying q35, I mean AFAIK we have no any plan to migrate > >>>>>>>>>>to this > >>>>>>>>>>MCH in xen case. > >>>>>>>>> > >>>>>>>>>q35 or a newer chipset that's closer to what guests expect. > >>>>>>>>> > >>>>>>>>>>Additionally, I think I should follow this feature after > >>>>>>>>>>q35 can work for xen scenario. > >>>>>>>>> > >>>>>>>>>What's stopping you? > >>>>>>>> > >>>>>>>>I mean if you want create an new machine based on q35, actually this > >>>>>>>>is > >>>>>>>>equal to start making xen to migrate to q35 now. Right? I can't image > >>>>>>>>how > >>>>>>>>much effort should be done. > >>>>>>> > >>>>>>>I don't see why you don't try. Sounds like a more robust solution to > >>>>>>>me. > >>>>>> > >>>>>>As I think this is another requirement to us. I'm not sure if I have > >>>>>>enough > >>>>>>time to touch this currently. > >>>>>> > >>>>>>> > >>>>>>>>So this is a reason why I'm saying I'd like to follow this feature > >>>>>>>>after q35 > >>>>>>>>can work with xen completely. > >>>>>>> > >>>>>>>Then we'll end up with more configurations to support, and to what end? > >>>>>>> > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>So could we do this step by step: > >>>>>>>>>>>> > >>>>>>>>>>>>#1 phase: We just cover current qemu-xen implementation based on > >>>>>>>>>>>>i44fx, so > >>>>>>>>>>>>still provide that pseudo ISA bridge at 00:1f.0 as we already did. > >>>>>>>>>>> > >>>>>>>>>>>By the way there is no reason to put it at 00:1f.0 specifically I > >>>>>>>>>>>think. > >>>>>>>>>>>So it seems simple: create a dummy device that gets device and > >>>>>>>>>>>vendor id as properties. If you really like, add an option to get > >>>>>>>>>>>values > >>>>>>>>>> > >>>>>>>>>>Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, > >>>>>>>>>>gfx > >>>>>>>>>>passthrough: create pseudo intel isa bridge. There, we fake this > >>>>>>>>>>device just > >>>>>>>>>>at 00:1f.0. > >>>>>>>>>>But you guys don't like this, and shouldn't this be just this point > >>>>>>>>>>we > >>>>>>>>>>discussing now? > >>>>>>>>>> > >>>>>>>>>>If you guys agree that , everything is fine. > >>>>>>>>> > >>>>>>>>>Actually, this isn't what you did. > >>>>>>>>>Don't tie it to xen, and don't tie it to 1f. > >>>>>>>>>Just make it a simple stub pci device. > >>>>>>>>>Whoever wants it, creates it. > >>>>>>>>> > >>>>>>>>>The thing I worry about, is the chance this will break going forward. > >>>>>>>>>So you created a system with 2 isa bridges. > >>>>>>>> > >>>>>>>>I don't set class type to claim this is an ISA bridge, just a normal > >>>>>>>>PCI > >>>>>>>>device. > >>>>>>>>+static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice > >>>>>>>>*hdev) > >>>>>>>>+{ > >>>>>>>>+ struct PCIDevice *dev; > >>>>>>>>+ > >>>>>>>>+ char rid; > >>>>>>>>+ > >>>>>>>>+ /* We havt to use a simple PCI device to fake this ISA bridge > >>>>>>>>+ * to avoid making some confusion to BIOS and ACPI. > >>>>>>>>+ */ > >>>>>>>>+ dev = pci_create(bus, PCI_DEVFN(0x1f, 0), > >>>>>>>>"pseudo-intel-pch-isa-bridge"); > >>>>>>>>+ > >>>>>>>>+ qdev_init_nofail(&dev->qdev); > >>>>>>>>+ > >>>>>>>>+ pci_config_set_vendor_id(dev->config, hdev->vendor_id); > >>>>>>>>+ pci_config_set_device_id(dev->config, hdev->device_id); > >>>>>>>>+ > >>>>>>>>+ xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)&rid, > >>>>>>>>1); > >>>>>>>>+ > >>>>>>>>+ pci_config_set_revision(dev->config, rid); > >>>>>>>>+ > >>>>>>>>+ XEN_PT_LOG(dev, "The pseudo Intel PCH ISA bridge created.\n"); > >>>>>>>>+ return 0; > >>>>>>>>+} > >>>>>>> > >>>>>>>Then I don't see how it's supposed to work. > >>>>>>>Doesn't i915 look for an isa bridge? > >>>>>>> > >>>>>>> /* > >>>>>>> * The reason to probe ISA bridge instead of Dev31:Fun0 is to > >>>>>>> * make graphics device passthrough work easy for VMM, that > >>>>>>> only > >>>>>>> * need to expose ISA bridge to let driver know the real > >>>>>>> hardware > >>>>>>> * underneath. This is a requirement from virtualization team. > >>>>>>> * > >>>>>>> * In some virtualized environments (e.g. XEN), there is > >>>>>>> irrelevant > >>>>>>> * ISA bridge in the system. To work reliably, we should scan > >>>>>>> trhough > >>>>>>> * all the ISA bridge devices and check for the first match, > >>>>>>> instead > >>>>>>> * of only checking the first one. > >>>>>>> */ > >>>>>>> while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) > >>>>>>> { > >>>>>>> if (pch->vendor == PCI_VENDOR_ID_INTEL) { > >>>>>>> unsigned short id = pch->device & > >>>>>>> INTEL_PCH_DEVICE_ID_MASK; > >>>>>>> dev_priv->pch_id = id; > >>>>>>> > >>>>>>> if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) { > >>>>>>> dev_priv->pch_type = PCH_IBX; > >>>>>>> > >>>>>>>etc > >>>>>>> > >>>>>> > >>>>>>I already post this to mainline to change as follows: > >>>>>> > >>>>>>- while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) { > >>>>>>+ pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0)); > >>>>>>+ if (pch) { > >>>>> > >>>>>I see - responded on that mail - but I thought the point is to make > >>>>>existing guests run? In fact you said so explicitly. > >>>>> > >>>>> > >>>>>>Please refer to this, > >>>>>> > >>>>>>[RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead > >>>>>>of > >>>>>>check class type > >>>>>> > >>>>>>Linux Native guys would like to accept this. And actually Windows > >>>>>>always use > >>>>>>devfn to detect this. > >>>>> > >>>>>In fact I see this: > >>>>> > >>>>> linux 2.6.35-3.9 probes the 1st IA bridge > >>>>> > >>>>> no idea how would you fix this. > >>>>> try changing default class for the main bridge? > >>>>> > >>>>> linux 3.10 probes all ISA bridges > >>>>> > >>>>> requires your stub to be the isa bridge? > >>>>> > >>>>> > >>>>>I don't see why are driver guys so willing to do crazy things. They > >>>>>want to match specific device/vendor id pairs, why don't they do just > >>>>>that? Why poke at class, random slot numbers etc etc? > >>>> > >>>>AFAIK what they did is from our previous incorrect requirement as I > >>>>understand. So we need to correct this. > >>>> > >>>>Thanks > >>>>Tiejun > >>> > >>>Since we can't fix existing guests, I would say > >> > >>This shouldn't be a fix existing guest, and this is why I can send this > >>before you guys accept GFX passthrough for qemu ML. > >> > >>I think you can re-read that patch head description. 1f.0 can work under all > >>scenarios including qemu-xen-traditonal. > >> > >>And this is also expected by native Linux organically, and especially > >>Windows always use devfn to detect PCH, this is not like current Linux. So > >>here I just sync this to make sense. > >> > >>Unless you'd like to make Linux specific to this point. > >> > >>Tiejun > > > >Why don't both windows and linux drivers look device up > >by device and vendor id? > > Its easy to understand because the same video device can work under > different PCH, but we need to do different initialization based on different > PCH. > > Thanks > Tiejun Right so why don't they look up the PCH by device and vendor id? For example, on linux use pci_get_device to iterate over all intel devices. I'm sure windows has a similar API as well. This isn't a suggestion to make the change in guests right now! Just a question. > >Same applies to MCH really. > > > > > >>>get things working first, find a clean way for > >>>new driver to work next. > >>> > >>>>> > >>>>> > >>>>> > >>>>>>> > >>>>>>> > >>>>>>>>>This is already not something that exists on real hardware. > >>>>>>>>>So it might break some guests that will get confused. > >>>>>>>>>Maybe we are lucky and most guests see an unfamiliar device > >>>>>>>>>and ignore it. It seems believable. > >>>>>>>>> > >>>>>>>>>But your MCH hack overrides registers in the pci host. > >>>>>>>> > >>>>>>>>We just try to write *one* register we already confirm this is safe > >>>>>>>>enough. > >>>>>>> > >>>>>>>This should go in code in form of comments: > >>>>>>>document what this register does on 440fx > >>>>>>>and why it's safe to override. > >>>>>>>We don't see what you > >>>>>>>confirmed off-line. > >>>>>> > >>>>>>That offset is one specific to IGD usage, not for i440fx common. This > >>>>>>is why > >>>>>>we need to expose something in the host bridge. They're just introduced > >>>>>>to > >>>>>>support IGD. > >>>>>> > >>>>>>> > >>>>>>>>Other register are read-only. > >>>>>>> > >>>>>>>Doesn't matter, need to document these as well. > >>>>>> > >>>>>>I think everything are covered in igd_pci_write()/igd_pci_write(). > >>>>>> > >>>>>>> > >>>>>>>>>Are we lucky and there's nothing in these registers > >>>>>>>>>of interest to guests? This seems much more fragile. > >>>>>>>>>So please poke at the spec, and compile the list > >>>>>>>>>of registers you want to touch, figure out why they are > >>>>>>>>>safe to override, and put this all in code comments. > >>>>>>>>> > >>>>>>>>>And the same thing that applies to the isa bridge > >>>>>>>> > >>>>>>>>We just expose its own vendor/device ids here. We don't access to > >>>>>>>>change > >>>>>>>>anything in the isa bridge. > >>>>>>>> > >>>>>>>>>applies here too. It should work without QEMU touching > >>>>>>>>>hosts' hardware. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>>from sysfs: device and vendor id are world readable, so just get > >>>>>>>>>>>them > >>>>>>>>>>>directly and not through xen wrappers, this way you can open the > >>>>>>>>>>>files > >>>>>>>>>>>RO and not RW. > >>>>>>>>>>>You seem to poke at revision as well, I don't see > >>>>>>>>>>>driver looking at that - strictly necessary? > >>>>>>>>>>>If yes please patch host kernel to expose that info in sysfs, > >>>>>>>>>>>though we can fall back on pci config if not there. > >>>>>>>>>>> > >>>>>>>>>>>MCH (bridge_dev) hacks in i915 are nastier. > >>>>>>>>>>>To clean them up, we really have to have an explicit driver for > >>>>>>>>>>>this > >>>>>>>>>>>bridge, not a pass-through device. Long term, the right thing to > >>>>>>>>>>>do is > >>>>>>>>>>>likely to extend host driver and expose the necessary information > >>>>>>>>>>>in > >>>>>>>>>>>sysfs on host kernel. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>I'm a bit confused. Any sysfs should be filled by the associated > >>>>>>>>>>PCIe > >>>>>>>>>>device, shouldn't it? So qemu still need to emulate this PCIe device > >>>>>>>>>>firstly, then set properties into sysfs. > >>>>>>>>> > >>>>>>>>>I am talking about getting host properties into qemu. > >>>>>>>>>You don't want to give qemu R/W root access to host sysfs system > >>>>>>>>>of the root bridge, that's not secure. > >>>>>>>> > >>>>>>>>igd_pci_read() > >>>>>>>> | > >>>>>>>> + xen_host_pci_get_block() > >>>>>>>> | > >>>>>>>> + xen_host_pci_config_read(() > >>>>>>>> | > >>>>>>>> + pread() > >>>>>>>> > >>>>>>>>So shouldn't we already expose these information via sysfs? > >>>>>>> > >>>>>>>That's poking at sysfs of a real device, and after opening it RW. > >>>>>> > >>>>>>I don't think we can really write anything to those read-only sysfs > >>>>>>interface even we open them as RW. > >>>>>> > >>>>>>Tiejun > >>>>>> > >>>>>>> > >>>>>>>>>Avoiding read only access to filesystem is a good idea too, so it > >>>>>>>>>should be possible to pass all parameters in as > >>>>>>>>>device properties, and let whoever starts qemu > >>>>>>>>>figure out what are reasonable values. > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>#2 phase: Now, we will choose a capability ID that won't be > >>>>>>>>>>>>conflicting with > >>>>>>>>>>>>others. To do this properly, we need to get one from PCI SIG > >>>>>>>>>>>>group. To have > >>>>>>>>>>>>this workable and consistently validated, this method shouldn't > >>>>>>>>>>>>be virt > >>>>>>>>>>>>specific. Then native driver should use the same method. > >>>>>>>>>>> > >>>>>>>>>>>You mean you will be able to talk sense into hardware guys? > >>>>>>>>>>>I doubt that. If they could be convinced to make e.g. i915 base a > >>>>>>>>>> > >>>>>>>>>>We're negotiating this, so this is just our long term solution we > >>>>>>>>>>figure out > >>>>>>>>>>currently. > >>>>>>>>>> > >>>>>>>>>>>proper BAR, why didn't they? > >>>>>>>>>> > >>>>>>>>>>We already have no any free BAR as we mentioned previously. > >>>>>>>>> > >>>>>>>>>I thought you were talking about modifying hardware? > >>>>>>>> > >>>>>>>>Yes. > >>>>>>>> > >>>>>>>>Tiejun > >>>>>>> > >>>>>>>And they can't figure out how to stick extra info in an existing BAR? > >>>>>>>Oh well, that's hardware for you. > >>>>>>> > >>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>So when xen work on > >>>>>>>>>>>>q35 PCIe machine, we can walk this way. > >>>>>>>>>>> > >>>>>>>>>>>If you are emulating MCH anyway, pick one that is close > >>>>>>>>>>>to what i915 driver expects. It would then work with existing > >>>>>>>>>> > >>>>>>>>>>Looks you guys prefer we create a new MCH anyway, right? But is it > >>>>>>>>>>necessary > >>>>>>>>>>to create a new based on i440fx just for a little change? > >>>>>>>>>> > >>>>>>>>>>Thanks > >>>>>>>>>>Tiejun > >>>>>>>>> > >>>>>>>>>You can inherit it. Maybe you are lucky and this happens to > >>>>>>>>>work without conflicting with whatever other guests want to do. > >>>>>>>>>But if you ask me, you are really just piling up hacks. > >>>>>>>>>If some guest does not work on i440, you should just work on > >>>>>>>>>emulating whatever it does work on. > >>>>>>>>>That would have real value. > >>>>>>>>> > >>>>>>>>>>>devices, without new capability IDs. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>Anthony, > >>>>>>>>>>>> > >>>>>>>>>>>>Any comments to address this in xen case? > >>>>>>>>>>>> > >>>>>>>>>>>>Thanks > >>>>>>>>>>>>Tiejun > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>>_______________________________________________ > >>>>>>>Xen-devel mailing list > >>>>>>>Xen-devel@xxxxxxxxxxxxx > >>>>>>>http://lists.xen.org/xen-devel > >>>>>>> > >>>>>>> > >>>>> > >>> > >>> > > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |