[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cache coherency bug] i915 and PAT attributes


  • To: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, "intel-gfx@xxxxxxxxxxxxxxxxxxxxx" <intel-gfx@xxxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Fri, 16 Dec 2022 15:30:13 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pwwJEkrFReZPzLzP51LhMk3PJhtYr+RTzrbi7FqpBfA=; b=UhTZTbmvw51ZXqJWmlCsy9274YqjfWasfmRD1cAxpyiCvR/KNy/Z2S7/kW+kiDnctxobLymAPxHFRGVUpnkoj271iRDa/cfyUuxCWgRfMucygK1j59fQJVE9cP50sDw/cX6gEoV8JI9XcD6OWWwQ/HH7yzyl0TsTP4b9yWFuXzMNwUMlCexUwZ2XEtIqgH/CCLVzLx3ye5KpF71c7tUNuUmpCw8DaPnhJa2YBV53+CDzWW8m53dBJWNRU3pjydGo6f4f88Gqi1zrSf3IAtJyyAJA/GYXczBf1betJFuMnlfMRrndmGrT7x7BbhLWykcOKy9ddIEQZoaMqca94vUfNQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=G1n/OdYmGyBFbTIBgCVuaWwGZq4dFvguvcFySFXOaV1B1bQ8xSBg76TqFM3CyEf2uKcMMMo25PEF9zT4Fv432FKla/xMnU5RxVjesdap6v4CvZEQQJVZbdyaRRsh/Xhwe+sdFBhTiAprsFwQDdtuQglxJ8ToxrirS3cEjT588vFo2bFo6CuwL7DvS+PYW/YPRvT1gGdE1jOiWic4eMJU22unYrYWZSqRCw6rLuskkqaZ606Ev0uKxmIVUV2aLi2fejd+ju4i0afN0qf/o/P4AkvpILUV+3Dpk14EGCfX5kb5kbL71WLLvai9DErIREqTuQ/LQcjyHJ1qOMua+l6lfA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Demi M. Obenour" <demi@xxxxxxxxxxxxxxxxxxxxxx>, Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>, Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>, Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>, Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>, Matt Roper <matthew.d.roper@xxxxxxxxx>, Lucas De Marchi <lucas.demarchi@xxxxxxxxx>, José Roberto de Souza <jose.souza@xxxxxxxxx>, Daniel Vetter <daniel@xxxxxxxx>, the arch/x86 maintainers <x86@xxxxxxxxxx>
  • Delivery-date: Fri, 16 Dec 2022 15:30:31 +0000
  • Ironport-data: A9a23:5IWFhK9Zi2Q8OhI1+zlWDrUDTH6TJUtcMsCJ2f8bNWPcYEJGY0x3y jRMDWmPbKuJZ2GnctF2bdyw9UMBvJXVxoBlSQBqrSw8E34SpcT7XtnIdU2Y0wF+jCHgZBk+s 5hBMImowOQcFCK0SsKFa+C5xZVE/fjUAOG6UKucYHsZqTZMEE8JkQhkl/MynrlmiN24BxLlk d7pqojUNUTNNwRcawr40Ire7kIx1BjOkGlA5AZnP6kV5AW2e0Q9V/rzG4ngdxMUfaEMdgKKb 76r5K20+Grf4yAsBruN+losWhRXKlJ6FVHmZkt+A8BOsDAbzsAB+v9T2M4nQVVWk120c+VZk 72hg3ASpTABZcUgkMxFO/VR/roX0aduoNcrKlDn2SCfItGvn9IBDJyCAWlvVbD09NqbDklrx /tGdBZcfymCjr2bkKznGsZ0194seZyD0IM34hmMzBn/JNN/GNXpZfWP4tVVmjAtmspJAPDSI dIDbiZiZwjBZBsJPUoLDJU5n6GjgXyXnz9w8QrJ4/ZopTWDilUvgNABM/KMEjCObexTklyVu STt+GPhDwtBHNee1SCE4jSngeqncSbTCNNLSuDorqACbFu77EFLUy0weV+BiqOwlVWDco12I GUS5X97xUQ13AnxJjXnZDW9qWSBtwQRWPJRFfM78wCHzqfI4wefCXMARzQHY9sj3Oc3QyAn0 hqGkcPBAT10rKbTR2iQ+7uZtjCuPjBTKnUNDQcUQA1A79T9rYUbihPUUs0lAKOzlsfyGzz73 3aNtidWr7keiM8j1qOl/EvGiTahupjISAEu4gzdGGmi62tRZoejZsqu6FvG6f9oKIefU0nHv X4YlszY5+cLZbmPkyuLSf5LGLip+/eDPTv0hV9pAoln9jKx9nrldodViBlkI0tzM8kDPyHof k77uAVN6ZsVN3yvBYdrfZitCMNs0LL7CNDkUNjQb9xTct5wchOK+GdlYkv492XkjFQsnL55N dGBdt6hF14bD7hqyHy9QOJ1+bs2zSklg2/eQJnhxBSm+buYeHORD7wCNTOmbOci4eWfpxjH+ v5eMdeHz1NUV+iWSiXe948eKXgEI2c/Adb9q6R/Zr7dCglrAmcsD7nW27xJRmB+t6Fcl+ON9 HftXEZdkQP7nSeed1XMbW1/YrTyW5o5tWg8ISEnIVev3T4kfJqr66AcMZAweNHL6dBe8BK9d NFdE+3oPxiFYm6vF+g1BXUlkLFfSQ==
  • Ironport-hdrordr: A9a23:TLApKqOEchiqEsBcTjGjsMiBIKoaSvp037BK7S1MoH1uA6ilfq WV9sjzuiWatN98Yh8dcLO7Scy9qBHnhP1ICOAqVN/PYOCBggqVxelZhrcKqAeQeREWmNQ86U 4aSdkYNDXxZ2IK8foT4mODYqkdKA/sytHXuQ/cpU0dPD2Dc8tbnmFE4p7wKDwNeOFBb6BJba a01458iBeLX28YVci/DmltZZm/mzWa/KiWGSLvHnQcmXKzsQ8=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHZEWNKBewvTtnrP0qLWzHfxYCD+w==
  • Thread-topic: [cache coherency bug] i915 and PAT attributes

On 08/12/2022 1:55 pm, Marek Marczykowski-Górecki wrote:
> Hi,
>
> There is an issue with i915 on Xen PV (dom0). The end result is a lot of
> glitches, like here: https://openqa.qubes-os.org/tests/54748#step/startup/8
> (this one is on ADL, Linux 6.1-rc7 as a Xen PV dom0). It's using Xorg
> with "modesetting" driver.
>
> After some iterations of debugging, we narrowed it down to i915 handling
> caching. The main difference is that PAT is setup differently on Xen PV
> than on native Linux. Normally, Linux does have appropriate abstraction
> for that, but apparently something related to i915 doesn't play well
> with it. The specific difference is:
> native linux:
> x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WP  UC- WT
> xen pv:
> x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WC  WP  UC  UC
>                                   ~~          ~~      ~~  ~~
>
> The specific impact depends on kernel version and the hardware. The most
> severe issues I see on >=ADL, but some older hardware is affected too -
> sometimes only if composition is disabled in the window manager.
> Some more information is collected at
> https://github.com/QubesOS/qubes-issues/issues/4782 (and few linked
> duplicates...).
>
> Kind-of related commit is here:
> https://github.com/torvalds/linux/commit/bdd8b6c98239cad ("drm/i915:
> replace X86_FEATURE_PAT with pat_enabled()") - it is the place where
> i915 explicitly checks for PAT support, so I'm cc-ing people mentioned
> there too.
>
> Any ideas?
>
> The issue can be easily reproduced without Xen too, by adjusting PAT in
> Linux:
> -----8<-----
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index 66a209f7eb86..319ab60c8d8c 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -400,8 +400,8 @@ void pat_init(void)
>                * The reserved slots are unused, but mapped to their
>                * corresponding types in the presence of PAT errata.
>                */
> -             pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
> -                   PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
> +             pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> +                   PAT(4, WC) | PAT(5, WP) | PAT(6, UC)       | PAT(7, UC);
>       }
>  
>       if (!pat_bp_initialized) {
> -----8<-----
>

Hello, can anyone help please?

Intel's CI has taken this reproducer of the bug, and confirmed the
regression. 
https://lore.kernel.org/intel-gfx/Y5Hst0bCxQDTN7lK@mail-itl/T/#m4480c15a0d117dce6210562eb542875e757647fb

We're reasonably confident that it is an i915 bug (given the repro with
no Xen in the mix), but we're out of any further ideas.

Thanks,

~Andrew

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.