[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH] tools/libs/light/libxl_pci.c: explicitly grant access to Intel IGD opregion





On 4/1/22 9:21 AM, Chuck Zmudzinski wrote:
On 3/30/22 2:45 PM, Jason Andryuk wrote:
On Fri, Mar 18, 2022 at 4:13 AM Jan Beulich <jbeulich@xxxxxxxx> wrote:
On 14.03.2022 04:41, Chuck Zmudzinski wrote:
When gfx_passthru is enabled for the Intel IGD, hvmloader maps the IGD
opregion to the guest but libxl does not grant the guest permission to
access the mapped memory region. This results in a crash of the i915.ko
kernel module in a Linux HVM guest when it needs to access the IGD
opregion:

Oct 23 11:36:33 domU kernel: Call Trace:
Oct 23 11:36:33 domU kernel:  ? idr_alloc+0x39/0x70
Oct 23 11:36:33 domU kernel: drm_get_last_vbltimestamp+0xaa/0xc0 [drm]
Oct 23 11:36:33 domU kernel: drm_reset_vblank_timestamp+0x5b/0xd0 [drm]
Oct 23 11:36:33 domU kernel:  drm_crtc_vblank_on+0x7b/0x130 [drm]
Oct 23 11:36:33 domU kernel: intel_modeset_setup_hw_state+0xbd4/0x1900 [i915]
Oct 23 11:36:33 domU kernel:  ? _cond_resched+0x16/0x40
Oct 23 11:36:33 domU kernel:  ? ww_mutex_lock+0x15/0x80
Oct 23 11:36:33 domU kernel: intel_modeset_init_nogem+0x867/0x1d30 [i915]
Oct 23 11:36:33 domU kernel:  ? gen6_write32+0x4b/0x1c0 [i915]
Oct 23 11:36:33 domU kernel:  ? intel_irq_postinstall+0xb9/0x670 [i915]
Oct 23 11:36:33 domU kernel:  i915_driver_probe+0x5c2/0xc90 [i915]
Oct 23 11:36:33 domU kernel:  ? vga_switcheroo_client_probe_defer+0x1f/0x40
Oct 23 11:36:33 domU kernel:  ? i915_pci_probe+0x3f/0x150 [i915]
Oct 23 11:36:33 domU kernel:  local_pci_probe+0x42/0x80
Oct 23 11:36:33 domU kernel:  ? _cond_resched+0x16/0x40
Oct 23 11:36:33 domU kernel:  pci_device_probe+0xfd/0x1b0
Oct 23 11:36:33 domU kernel:  really_probe+0x222/0x480
Oct 23 11:36:33 domU kernel:  driver_probe_device+0xe1/0x150
Oct 23 11:36:33 domU kernel:  device_driver_attach+0xa1/0xb0
Oct 23 11:36:33 domU kernel:  __driver_attach+0x8a/0x150
Oct 23 11:36:33 domU kernel:  ? device_driver_attach+0xb0/0xb0
Oct 23 11:36:33 domU kernel:  ? device_driver_attach+0xb0/0xb0
Oct 23 11:36:33 domU kernel:  bus_for_each_dev+0x78/0xc0
Oct 23 11:36:33 domU kernel:  bus_add_driver+0x12b/0x1e0
Oct 23 11:36:33 domU kernel:  driver_register+0x8b/0xe0
Oct 23 11:36:33 domU kernel:  ? 0xffffffffc06b8000
Oct 23 11:36:33 domU kernel:  i915_init+0x5d/0x70 [i915]
Oct 23 11:36:33 domU kernel:  do_one_initcall+0x44/0x1d0
Oct 23 11:36:33 domU kernel:  ? do_init_module+0x23/0x260
Oct 23 11:36:33 domU kernel:  ? kmem_cache_alloc_trace+0xf5/0x200
Oct 23 11:36:33 domU kernel:  do_init_module+0x5c/0x260
Oct 23 11:36:33 domU kernel: __do_sys_finit_module+0xb1/0x110
Oct 23 11:36:33 domU kernel:  do_syscall_64+0x33/0x80
Oct 23 11:36:33 domU kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
The call trace alone leaves open where exactly the crash occurred.
Looking at 5.17 I notice that the first thing the driver does
after mapping the range it to check the signature (both in
intel_opregion_setup()). As the signature can't possibly match
with no access granted to the underlying mappings, there shouldn't
be any further attempts to use the region in the driver; if there
are, I'd view this as a driver bug.
Yes.  i915_driver_hw_probe does not check the return value of
intel_opregion_setup(dev_priv) and just continues on.

Chuck, the attached patch may help if you want to test it.

Regards,
Jason

I tested the patch - it made no noticeable difference.

Correction (sorry for the confusion):

I didn't know I needed to replace more than just a
re-built i915.ko module to enable the patch
for testing. When I updated the entire Debian kernel
package including all the modules and the kernel
image with the patched kernel package, it made
quite a difference.

With Jason's patch, the three call traces just became a
much shorter error message:

Apr 05 20:46:18 debian kernel: xen: --> pirq=16 -> irq=24 (gsi=24)
Apr 05 20:46:18 debian kernel: i915 0000:00:02.0: [drm] VT-d active for gfx access Apr 05 20:46:18 debian kernel: i915 0000:00:02.0: vgaarb: deactivate vga console Apr 05 20:46:18 debian kernel: Console: switching to colour dummy device 80x25 Apr 05 20:46:18 debian kernel: i915 0000:00:02.0: [drm] DMAR active, disabling use of stolen memory Apr 05 20:46:18 debian kernel: resource sanity check: requesting [mem 0xffffffff-0x100001ffe], which spans more than Reserved [mem 0xfdfff000-0xffffffff] Apr 05 20:46:18 debian kernel: caller memremap+0xeb/0x1c0 mapping multiple BARs Apr 05 20:46:18 debian kernel: i915 0000:00:02.0: Device initialization failed (-22) Apr 05 20:46:18 debian kernel: i915 0000:00:02.0: Please file a bug on drm/i915; see https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs for details. Apr 05 20:46:18 debian kernel: i915: probe of 0000:00:02.0 failed with error -22
--------------------- End of Kernel Error Log ----------------------

So I think the patch does propagate the error up the
stack and bails out before producing the Call traces,

and...

I even had output after booting - the gdm3 Gnome display
manager login page displayed, but when I tried to login to
the Gnome desktop, the screen went dark and I could
not even login to the headless Xen Dom0 control domain
via ssh after that and I just used the reset button on the
machine to reboot it, so the patch causes some trouble
with the Dom0 when the guest cannot access the
opregion. The patch works fine when the guest can
access the opregion and in that case I was able to
login to the Gnome session, but it caused quite a bit of
trouble and apparently crashed the Dom0 or at
least caused networking in the Dom0 to stop working
when I tried to login to the Gnome session in the
guest for the case when the guest cannot access
the opregion. So I would not recommend Jason's
patch as is for the Linux kernel. The main reason
is that it looks like it is working at first with a
login screen displayed, but when a user tries to login,
the whole system crashes.

Regards,

Chuck



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.