[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Xen panic due to xstate mismatch



Yes sure I can collect the output. As you said the change is good enough to start the dom0 without errors (at least no apparent errors :).
```
Xen reports there are maximum 120 leaves and 2 MSRs
Raw policy: 32 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx
  00000000:ffffffff -> 00000016:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 000806c1:00020800:f6fa3203:178bfbff
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> 04000121:02c0003f:0000003f:00000000
  00000004:00000001 -> 04000122:01c0003f:0000003f:00000000
  00000004:00000002 -> 04000143:04c0003f:000003ff:00000000
  00000004:00000003 -> 04000163:02c0003f:00003fff:00000004
  00000006:ffffffff -> 00000004:00000000:00000000:00000000
  00000007:00000000 -> 00000000:208c2569:00000000:30000400
  0000000b:00000000 -> 00000000:00000001:00000100:00000000
  0000000b:00000001 -> 00000001:00000002:00000201:00000000
  0000000d:00000000 -> 00000007:00000000:00000340:00000000
  0000000d:00000002 -> 00000100:00000240:00000000:00000000
  80000000:ffffffff -> 80000008:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000121:28100800
  80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37
  80000004:ffffffff -> 37473538:33204020:4730302e:00007a48
  80000006:ffffffff -> 00000000:00000000:01007040:00000000
  80000007:ffffffff -> 00000000:00000000:00000000:00000100
  80000008:ffffffff -> 00003027:00000000:00000000:00000000
 MSRs:
  index    -> value
  000000ce -> 0000000000000000
  0000010a -> 0000000000000000
Host policy: 30 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx
  00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 000806c1:00020800:c6fa2203:178bfbff
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> 04000121:02c0003f:0000003f:00000000
  00000004:00000001 -> 04000122:01c0003f:0000003f:00000000
  00000004:00000002 -> 04000143:04c0003f:000003ff:00000000
  00000004:00000003 -> 04000163:02c0003f:00003fff:00000004
  00000007:00000000 -> 00000000:208c2549:00000000:30000400
  0000000d:00000000 -> 00000003:00000000:00000240:00000000
  80000000:ffffffff -> 80000008:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000121:28100800
  80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37
  80000004:ffffffff -> 37473538:33204020:4730302e:00007a48
  80000006:ffffffff -> 00000000:00000000:01007040:00000000
  80000007:ffffffff -> 00000000:00000000:00000000:00000100
  80000008:ffffffff -> 00003027:00000000:00000000:00000000
 MSRs:
  index    -> value
  000000ce -> 0000000000000000
  0000010a -> 0000000000000000
PV Max policy: 57 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx
  00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 000806c1:00020800:c6f82203:1789cbf5
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> 04000121:02c0003f:0000003f:00000000
  00000004:00000001 -> 04000122:01c0003f:0000003f:00000000
  00000004:00000002 -> 04000143:04c0003f:000003ff:00000000
  00000004:00000003 -> 04000163:02c0003f:00003fff:00000004
  00000007:00000000 -> 00000002:208c0109:00000000:20000400
  0000000d:00000000 -> 00000003:00000000:00000240:00000000
  80000000:ffffffff -> 80000021:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000123:28100800
  80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37
  80000004:ffffffff -> 37473538:33204020:4730302e:00007a48
  80000006:ffffffff -> 00000000:00000000:01007040:00000000
  80000007:ffffffff -> 00000000:00000000:00000000:00000100
  80000008:ffffffff -> 00003027:00000000:00000000:00000000
 MSRs:
  index    -> value
  000000ce -> 0000000000000000
  0000010a -> 0000000010020004
HVM Max policy: 4 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx
 MSRs:
  index    -> value
  000000ce -> 0000000000000000
  0000010a -> 0000000000000000
PV Default policy: 30 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx
  00000000:ffffffff -> 0000000d:756e6547:6c65746e:49656e69
  00000001:ffffffff -> 000806c1:00020800:c6d82203:1789cbf5
  00000002:ffffffff -> 00feff01:000000f0:00000000:00000000
  00000004:00000000 -> 04000121:02c0003f:0000003f:00000000
  00000004:00000001 -> 04000122:01c0003f:0000003f:00000000
  00000004:00000002 -> 04000143:04c0003f:000003ff:00000000
  00000004:00000003 -> 04000163:02c0003f:00003fff:00000004
  00000007:00000000 -> 00000000:208c0109:00000000:20000400
  0000000d:00000000 -> 00000003:00000000:00000240:00000000
  80000000:ffffffff -> 80000008:00000000:00000000:00000000
  80000001:ffffffff -> 00000000:00000000:00000121:28100800
  80000002:ffffffff -> 68743131:6e654720:746e4920:52286c65
  80000003:ffffffff -> 6f432029:54286572:6920294d:31312d37
  80000004:ffffffff -> 37473538:33204020:4730302e:00007a48
  80000006:ffffffff -> 00000000:00000000:01007040:00000000
  80000008:ffffffff -> 00003027:00000000:00000000:00000000
 MSRs:
  index    -> value
  000000ce -> 0000000000000000
  0000010a -> 0000000000000000
HVM Default policy: 4 leaves, 2 MSRs
 CPUID:
  leaf     subleaf  -> eax      ebx      ecx      edx
 MSRs:
  index    -> value
  000000ce -> 0000000000000000
  0000010a -> 0000000000000000
```

Guillaume

On Sun, Feb 2, 2025 at 4:32 PM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
This is a sanity check that an algorithm in Xen matches hardware.  It is only compiled into debug builds by default. 

Given that you're running under virtualbox, i have a suspicion as to what's wrong.

Can you collect the full `xen-cpuid -p` output from within your environment?  I don't believe you're suggested code change is correct, but it will good enough to get these diagnostics.

~Andrew

On Sun, 2 Feb 2025, 15:32 Guillaume, <thouveng@xxxxxxxxx> wrote:
Hello,

 I'd like to report an issue I encountered when building Xen from source. To give you some context, During the Xen winter meetup in Grenoble few days ago, there was a discussion about strengthening collaboration between Xen and academia. One issue raised by a professor was that Xen is harder for students to install and experiment compared to KVM. In response it was mentionned that Debian packages are quite decent. This motivated me to try installing and playing with Xen myself. While I am familiar with Xen (I work on the XAPI toolstack at Vates) I'm not deeply familiar with its internals, so this seemed like a good learning opportunity and maybe some contents for some blog posts :).

 I set up a Debian testing VM on Virtualbox and installed Xen from packages. Everything worked fine: Grub was updated, I rebooted, and I had a functional Xen setup with xl running in Dom0.
 Next I download the last version of Xen from xenbits.org, and built only the hypervisor (no tools, no stubdom) , using the same configuration as the Debian package (which is for Xen 4.19). After updating GRUB and rebooting, Xen failed to boot. Fortunately, I was able to capture the following error via `ttyS0`:
```
(XEN) [0000000d2c23739a] xstate: size: 0x340 and states: 0x7
(XEN) [0000000d2c509c1d]
(XEN) [0000000d2c641ffa] ****************************************
(XEN) [0000000d2c948e3b] Panic on CPU 0:
(XEN) [0000000d2cb349bb] XSTATE 0x0000000000000003, uncompressed hw size 0x340 != xen size 0x240
(XEN) [0000000d2cfc5786] ****************************************
(XEN) [0000000d2d308c24]
```
From my understanding, the hardware xstate size (`hw_size`) represents the maximum memory required for the `XSAVE/XRSTOR` save area, while `xen_size` is computed by summing the space required for the enabled features. In `xen/arch/x86/xstate.c`, if these sizes do not match, Xen panics. However, wouldn’t it be correct for `xen_size` to be **less than or equal to** `hw_size` instead of exactly matching?

I tested the following change:
```
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -710,7 +710,7 @@ static void __init check_new_xstate(struct xcheck_state *s, uint64_t new)
      */
     xen_size = xstate_uncompressed_size(s->states & X86_XCR0_STATES);

-    if ( xen_size != hw_size )
+    if ( xen_size > hw_size )
         panic("XSTATE 0x%016"PRIx64", uncompressed hw size %#x != xen size %#x\n",
               s->states, hw_size, xen_size);
```
With this change, Xen boots correctly, but I may be missing some side effects...
Additionally, I am confused as to why this issue does not occur with the default Debian Xen package. Even when I rebuild Xen 4.19.1 from source (the same version as the package), I still encounter the issue.
So I have two questions:
- Is my understanding correct that xen_size <= hw_size should be allowed?
- Are there any potential side effects of this change?
- Bonus: Have some of you any explanations about why does the issue not occur with the packaged version of Xen but does with a self-built version?

Hope I wasn't too long and thanks for taking the time to read this,
Best regards,

Guillaume

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.