[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xen panic due to xstate mismatch


  • To: Guillaume <thouveng@xxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Sun, 2 Feb 2025 16:10:23 +0000
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Sun, 02 Feb 2025 16:11:25 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Can you also get `xl dmesg` too, and attach it?

I think this is a VirtualBox bug, but I'm confused as to why Xen has
decided to turn off AVX.

~Andrew

On 02/02/2025 4:01 pm, Guillaume wrote:
> Yes sure I can collect the output. As you said the change is good
> enough to start the dom0 without errors (at least no apparent errors :).
> ```
> Xen reports there are maximum 120 leaves and 2 MSRs
> Raw policy: 32 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx
>   00000000:ffffffff -> 00000016:756e6547:6c65746e:49656e69
>   00000001:ffffffff -> 000806c1:00020800:f6fa3203:178bfbff
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>   00000004:00000000 -> 04000121:02c0003f:0000003f:00000000
>   00000004:00000001 -> 04000122:01c0003f:0000003f:00000000
>   00000004:00000002 -> 04000143:04c0003f:000003ff:00000000
>   00000004:00000003 -> 04000163:02c0003f:00003fff:00000004
>   00000006:ffffffff -> 00000004:00000000:00000000:00000000
>   00000007:00000000 -> 00000000:208c2569:00000000:30000400
>   0000000b:00000000 -> 00000000:00000001:00000100:00000000
>   0000000b:00000001 -> 00000001:00000002:00000201:00000000
>   0000000d:00000000 -> 00000007:00000000:00000340:00000000
>   0000000d:00000002 -> 00000100:00000240:00000000:00000000
>   80000000:ffffffff -> 80000008:00000000:00000000:00000000
>   80000001:ffffffff -> 00000000:00000000:00000121:28100800
>   80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65
>   80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37
>   80000004:ffffffff -> 37473538:33204020:4730302e:00007a48
>   80000006:ffffffff -> 00000000:00000000:01007040:00000000
>   80000007:ffffffff -> 00000000:00000000:00000000:00000100
>   80000008:ffffffff -> 00003027:00000000:00000000:00000000
>  MSRs:
>   index    -> value
>   000000ce -> 0000000000000000
>   0000010a -> 0000000000000000
> Host policy: 30 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx
>   00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
>   00000001:ffffffff -> 000806c1:00020800:c6fa2203:178bfbff
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>   00000004:00000000 -> 04000121:02c0003f:0000003f:00000000
>   00000004:00000001 -> 04000122:01c0003f:0000003f:00000000
>   00000004:00000002 -> 04000143:04c0003f:000003ff:00000000
>   00000004:00000003 -> 04000163:02c0003f:00003fff:00000004
>   00000007:00000000 -> 00000000:208c2549:00000000:30000400
>   0000000d:00000000 -> 00000003:00000000:00000240:00000000
>   80000000:ffffffff -> 80000008:00000000:00000000:00000000
>   80000001:ffffffff -> 00000000:00000000:00000121:28100800
>   80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65
>   80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37
>   80000004:ffffffff -> 37473538:33204020:4730302e:00007a48
>   80000006:ffffffff -> 00000000:00000000:01007040:00000000
>   80000007:ffffffff -> 00000000:00000000:00000000:00000100
>   80000008:ffffffff -> 00003027:00000000:00000000:00000000
>  MSRs:
>   index    -> value
>   000000ce -> 0000000000000000
>   0000010a -> 0000000000000000
> PV Max policy: 57 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx
>   00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
>   00000001:ffffffff -> 000806c1:00020800:c6f82203:1789cbf5
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>   00000004:00000000 -> 04000121:02c0003f:0000003f:00000000
>   00000004:00000001 -> 04000122:01c0003f:0000003f:00000000
>   00000004:00000002 -> 04000143:04c0003f:000003ff:00000000
>   00000004:00000003 -> 04000163:02c0003f:00003fff:00000004
>   00000007:00000000 -> 00000002:208c0109:00000000:20000400
>   0000000d:00000000 -> 00000003:00000000:00000240:00000000
>   80000000:ffffffff -> 80000021:00000000:00000000:00000000
>   80000001:ffffffff -> 00000000:00000000:00000123:28100800
>   80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65
>   80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37
>   80000004:ffffffff -> 37473538:33204020:4730302e:00007a48
>   80000006:ffffffff -> 00000000:00000000:01007040:00000000
>   80000007:ffffffff -> 00000000:00000000:00000000:00000100
>   80000008:ffffffff -> 00003027:00000000:00000000:00000000
>  MSRs:
>   index    -> value
>   000000ce -> 0000000000000000
>   0000010a -> 0000000010020004
> HVM Max policy: 4 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx
>  MSRs:
>   index    -> value
>   000000ce -> 0000000000000000
>   0000010a -> 0000000000000000
> PV Default policy: 30 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx
>   00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
>   00000001:ffffffff -> 000806c1:00020800:c6d82203:1789cbf5
>   00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
>   00000004:00000000 -> 04000121:02c0003f:0000003f:00000000
>   00000004:00000001 -> 04000122:01c0003f:0000003f:00000000
>   00000004:00000002 -> 04000143:04c0003f:000003ff:00000000
>   00000004:00000003 -> 04000163:02c0003f:00003fff:00000004
>   00000007:00000000 -> 00000000:208c0109:00000000:20000400
>   0000000d:00000000 -> 00000003:00000000:00000240:00000000
>   80000000:ffffffff -> 80000008:00000000:00000000:00000000
>   80000001:ffffffff -> 00000000:00000000:00000121:28100800
>   80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65
>   80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37
>   80000004:ffffffff -> 37473538:33204020:4730302e:00007a48
>   80000006:ffffffff -> 00000000:00000000:01007040:00000000
>   80000008:ffffffff -> 00003027:00000000:00000000:00000000
>  MSRs:
>   index    -> value
>   000000ce -> 0000000000000000
>   0000010a -> 0000000000000000
> HVM Default policy: 4 leaves, 2 MSRs
>  CPUID:
>   leaf     subleaf  -> eax      ebx      ecx      edx
>  MSRs:
>   index    -> value
>   000000ce -> 0000000000000000
>   0000010a -> 0000000000000000
> ```
>
> Guillaume
>
> On Sun, Feb 2, 2025 at 4:32 PM Andrew Cooper
> <andrew.cooper3@xxxxxxxxxx> wrote:
>
>     This is a sanity check that an algorithm in Xen matches hardware. 
>     It is only compiled into debug builds by default. 
>
>     Given that you're running under virtualbox, i have a suspicion as
>     to what's wrong.
>
>     Can you collect the full `xen-cpuid -p` output from within your
>     environment?  I don't believe you're suggested code change is
>     correct, but it will good enough to get these diagnostics.
>
>     ~Andrew
>
>     On Sun, 2 Feb 2025, 15:32 Guillaume, <thouveng@xxxxxxxxx> wrote:
>
>         Hello,
>
>          I'd like to report an issue I encountered when building Xen
>         from source. To give you some context, During the Xen winter
>         meetup in Grenoble few days ago, there was a discussion about
>         strengthening collaboration between Xen and academia. One
>         issue raised by a professor was that Xen is harder for
>         students to install and experiment compared to KVM. In
>         response it was mentionned that Debian packages are quite
>         decent. This motivated me to try installing and playing with
>         Xen myself. While I am familiar with Xen (I work on the XAPI
>         toolstack at Vates) I'm not deeply familiar with its
>         internals, so this seemed like a good learning opportunity and
>         maybe some contents for some blog posts :).
>
>          I set up a Debian testing VM on Virtualbox and installed Xen
>         from packages. Everything worked fine: Grub was updated, I
>         rebooted, and I had a functional Xen setup with xl running in
>         Dom0.
>          Next I download the last version of Xen from xenbits.org
>         <http://xenbits.org>,and built only the hypervisor (no tools,
>         no stubdom) , using the same configuration as the Debian
>         package (which is for Xen 4.19). After updating GRUB and
>         rebooting, Xen failed to boot. Fortunately, I was able to
>         capture the following error via `ttyS0`:
>         ```
>         (XEN) [0000000d2c23739a] xstate: size: 0x340 and states: 0x7
>         (XEN) [0000000d2c509c1d]
>         (XEN) [0000000d2c641ffa] ****************************************
>         (XEN) [0000000d2c948e3b] Panic on CPU 0:
>         (XEN) [0000000d2cb349bb] XSTATE 0x0000000000000003,
>         uncompressed hw size 0x340 != xen size 0x240
>         (XEN) [0000000d2cfc5786] ****************************************
>         (XEN) [0000000d2d308c24]
>         ```
>         From my understanding, the hardware xstate size (`hw_size`)
>         represents the maximum memory required for the `XSAVE/XRSTOR`
>         save area, while `xen_size` is computed by summing the space
>         required for the enabled features. In `xen/arch/x86/xstate.c`,
>         if these sizes do not match, Xen panics. However, wouldn’t it
>         be correct for `xen_size` to be **less than or equal to**
>         `hw_size` instead of exactly matching?
>
>         I tested the following change:
>         ```
>         --- a/xen/arch/x86/xstate.c
>         +++ b/xen/arch/x86/xstate.c
>         @@ -710,7 +710,7 @@ static void __init check_new_xstate(struct
>         xcheck_state *s, uint64_t new)
>               */
>              xen_size = xstate_uncompressed_size(s->states &
>         X86_XCR0_STATES);
>
>         -    if ( xen_size != hw_size )
>         +    if ( xen_size > hw_size )
>                  panic("XSTATE 0x%016"PRIx64", uncompressed hw size
>         %#x != xen size %#x\n",
>                        s->states, hw_size, xen_size);
>         ```
>         With this change, Xen boots correctly, but I may be missing
>         some side effects...
>         Additionally, I am confused as to why this issue does *not*
>         occur with the default Debian Xen package. Even when I rebuild
>         Xen *4.19.1* from source (the same version as the package), I
>         still encounter the issue.
>         So I have two questions:
>         - Is my understanding correct that |xen_size <= hw_size|
>         should be allowed?
>         - Are there any potential side effects of this change?
>         - Bonus: Have some of you any explanations about why does the
>         issue not occur with the packaged version of Xen but does with
>         a self-built version?
>
>         Hope I wasn't too long and thanks for taking the time to read
>         this,
>         Best regards,
>
>         Guillaume
>




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.