|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/x86: Change stub page freeing to fix smt=0
On 2026-05-27 10:14, Jason Andryuk wrote: On 2026-05-26 18:03, Andrew Cooper wrote:On 26/05/2026 9:31 pm, Jason Andryuk wrote:A single stubs page is initialized with 0xcc and re-used, with multiple CPUs each using a portion of the shared page. In cpu_smpboot_free(), each stubs area is checked against 0xcc. When all are set to 0xcc, the page is freed. Booting a system with smt=0, CPU0 is initially setup, allocating the stubs page and initializing to 0xcc. When more CPUs are brought up, CPU1 is initialized and then immediately brough offline as it is the sibling of CPU0. Since the page was initially memset with 0xcc, cpu_smpboot_free() finds all stubs as 0xcc and frees the page. However, the page is still assigned to CPU0 and continues to be assigned to other CPUs.It's more complicated than this. With CONFIG_PV (and !opt_fred in 4.22 which is perhaps newer than you're testing), the LSTAR and CSTAR stubs guarantee that the 0xcc's are overwritten with real instructions. In !CONFIG_PV, the 0xcc's only get overwritten by the exception recovery selftests (CPU0 only, and gated on CONFIG_SELF_TESTS), and "complicated" instructions in the emulator (which in your safety environment, you likely have compiled out). So, in your environment, I think you probably can exclude the stubs entirely and trim even more LoC.Thanks. Ok, my build was !CONFIG_PV, so 0xcc's were not overwritten. The fault happened before the self tests ran. Correction: It was after the self tests ran and during dom0 construction. (XEN) Pagetable walk from ffff830842652008: (XEN) L4[0x106] = 8000000079c72063 ffffffffffffffff (XEN) L3[0x021] = 0000000079ff3063 ffffffffffffffff (XEN) L2[0x013] = 000000085680f063 ffffffffffffffff (XEN) L1[0x052] = cccccccccccccccc ffffffffffffffffIt looks like the page is reallocated after free-ing, so after CPU1 is down. The re-use would write the page with PTEs. However, when later CPUs are brought down, their portion of the stubs page is overwritten with 0xcc. I think that is how the page, as a page table, is corrupted. Regards, Jason
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |