Xen project Mailing List

Re: issue with dom0_pvh on Xen 4.20

To: Manuel Bouyer <bouyer@xxxxxxxxxxxxxxx>

Date: Tue, 2 Sep 2025 14:45:04 +0200

Autocrypt: addr=jbeulich@xxxxxxxx; keydata= xsDiBFk3nEQRBADAEaSw6zC/EJkiwGPXbWtPxl2xCdSoeepS07jW8UgcHNurfHvUzogEq5xk hu507c3BarVjyWCJOylMNR98Yd8VqD9UfmX0Hb8/BrA+Hl6/DB/eqGptrf4BSRwcZQM32aZK 7Pj2XbGWIUrZrd70x1eAP9QE3P79Y2oLrsCgbZJfEwCgvz9JjGmQqQkRiTVzlZVCJYcyGGsD /0tbFCzD2h20ahe8rC1gbb3K3qk+LpBtvjBu1RY9drYk0NymiGbJWZgab6t1jM7sk2vuf0Py O9Hf9XBmK0uE9IgMaiCpc32XV9oASz6UJebwkX+zF2jG5I1BfnO9g7KlotcA/v5ClMjgo6Gl MDY4HxoSRu3i1cqqSDtVlt+AOVBJBACrZcnHAUSuCXBPy0jOlBhxPqRWv6ND4c9PH1xjQ3NP nxJuMBS8rnNg22uyfAgmBKNLpLgAGVRMZGaGoJObGf72s6TeIqKJo/LtggAS9qAUiuKVnygo 3wjfkS9A3DRO+SpU7JqWdsveeIQyeyEJ/8PTowmSQLakF+3fote9ybzd880fSmFuIEJldWxp Y2ggPGpiZXVsaWNoQHN1c2UuY29tPsJgBBMRAgAgBQJZN5xEAhsDBgsJCAcDAgQVAggDBBYC AwECHgECF4AACgkQoDSui/t3IH4J+wCfQ5jHdEjCRHj23O/5ttg9r9OIruwAn3103WUITZee e7Sbg12UgcQ5lv7SzsFNBFk3nEQQCACCuTjCjFOUdi5Nm244F+78kLghRcin/awv+IrTcIWF hUpSs1Y91iQQ7KItirz5uwCPlwejSJDQJLIS+QtJHaXDXeV6NI0Uef1hP20+y8qydDiVkv6l IreXjTb7DvksRgJNvCkWtYnlS3mYvQ9NzS9PhyALWbXnH6sIJd2O9lKS1Mrfq+y0IXCP10eS FFGg+Av3IQeFatkJAyju0PPthyTqxSI4lZYuJVPknzgaeuJv/2NccrPvmeDg6Coe7ZIeQ8Yj t0ARxu2xytAkkLCel1Lz1WLmwLstV30g80nkgZf/wr+/BXJW/oIvRlonUkxv+IbBM3dX2OV8 AmRv1ySWPTP7AAMFB/9PQK/VtlNUJvg8GXj9ootzrteGfVZVVT4XBJkfwBcpC/XcPzldjv+3 HYudvpdNK3lLujXeA5fLOH+Z/G9WBc5pFVSMocI71I8bT8lIAzreg0WvkWg5V2WZsUMlnDL9 mpwIGFhlbM3gfDMs7MPMu8YQRFVdUvtSpaAs8OFfGQ0ia3LGZcjA6Ik2+xcqscEJzNH+qh8V m5jjp28yZgaqTaRbg3M/+MTbMpicpZuqF4rnB0AQD12/3BNWDR6bmh+EkYSMcEIpQmBM51qM EKYTQGybRCjpnKHGOxG0rfFY1085mBDZCH5Kx0cl0HVJuQKC+dV2ZY5AqjcKwAxpE75MLFkr wkkEGBECAAkFAlk3nEQCGwwACgkQoDSui/t3IH7nnwCfcJWUDUFKdCsBH/E5d+0ZnMQi+G0A nAuWpQkjM1ASeQwSHEeAWPgskBQL

Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Juergen Gross <jgross@xxxxxxxx>

Delivery-date: Tue, 02 Sep 2025 12:45:14 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 02.09.2025 14:28, Manuel Bouyer wrote: > On Tue, Sep 02, 2025 at 02:22:29PM +0200, Juergen Gross wrote: >> On 02.09.25 12:56, Manuel Bouyer wrote: >>> On Tue, Sep 02, 2025 at 11:44:36AM +0100, Andrew Cooper wrote: >>>> On 02/09/2025 11:17 am, Manuel Bouyer wrote: >>>>> Hello, >>>>> I'm trying to boot a NetBSD PVH dom0 on Xen 4.20. >>>>> The same NetBSD kernel works fine with Xen 4.18 >>>>> >>>>> The boot options are: >>>>> menu=Boot netbsd-current PVH Xen420:dev hd0f:;load /netbsd-PVH >>>>> console=com0 root=wd0f; multiboot /xen420-debug.gz dom0_mem=1024M >>>>> console=com1 com1=38400,8n1 loglvl=all guest_loglvl=all >>>>> gnttab_max_nr_frames=64 sync_console=1 dom0=pvh >>>>> >>>>> and the full log from serial console is attached. >>>>> >>>>> With 4.20 the boot fails with: >>>>> >>>>> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input) >>>>> (XEN) Freed 664kB init memory >>>>> (XEN) d0v0 Triple fault - invoking HVM shutdown action 1 >>>>> (XEN) *** Dumping Dom0 vcpu#0 state: *** >>>>> (XEN) ----[ Xen-4.20.2-pre_20250821nb0 x86_64 debug=y Tainted: C >>>>> ]---- >>>>> (XEN) CPU: 7 >>>>> (XEN) RIP: 0008:[<000000000020e268>] >>>>> (XEN) RFLAGS: 0000000000010006 CONTEXT: hvm guest (d0v0) >>>>> (XEN) rax: 000000002024c003 rbx: 000000000020e260 rcx: >>>>> 00000000000dfeb7 >>>>> (XEN) rdx: 0000000000100000 rsi: 0000000000103000 rdi: >>>>> 000000000013e000 >>>>> (XEN) rbp: 0000000080000000 rsp: 00000000014002e4 r8: >>>>> 0000000000000000 >>>>> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: >>>>> 0000000000000000 >>>>> (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: >>>>> 0000000000000000 >>>>> (XEN) r15: 0000000000000000 cr0: 0000000000000011 cr4: >>>>> 0000000000000000 >>>>> (XEN) cr3: 0000000000000000 cr2: 0000000000000000 >>>>> (XEN) fsb: 0000000000000000 gsb: 0000000000000000 gss: >>>>> 0000000000000000 >>>>> (XEN) ds: 0010 es: 0010 fs: 0000 gs: 0000 ss: 0010 cs: 0008 >>>>> >>>>> because of the triple fault the RIP above doens't point to the code. >>>>> >>>>> I tracked it down to this code: >>>>> cmpl $0,%ecx ; /* zero-sized? */ \ >>>>> je 2f ; \ >>>>> pushl %ebp ; \ >>>>> movl RELOC(nox_flag),%ebp ; \ >>>>> 1: movl %ebp,(PDE_SIZE-4)(%ebx) ; /* upper 32 bits: NX */ \ >>>>> movl %eax,(%ebx) ; /* store phys addr */ \ >>>>> addl $PDE_SIZE,%ebx ; /* next PTE/PDE */ \ >>>>> addl $PAGE_SIZE,%eax ; /* next phys page */ \ >>>>> loop 1b ; \ >>>>> popl %ebp ; \ >>>>> 2: ; >>>>> >>>>> there are others pushl/popl before so I don't think that's the problem >>>>> (in fact the exact same fragment is called just before with different >>>>> inputs and it doesn't fault). So the culprit it probably the write to >>>>> (%ebx), >>>>> which would be 0x20e260 >>>>> This is in the range: >>>>> (XEN) [0000000000100000, 0000000040068e77] (usable) >>>>> so I can't see why this would be a problem. >>>>> >>>>> Any idea, including how to debug this further, welcome >>>> >>>> Even though triple fault's are aborts, they're generally accurate under >>>> virt, so 0x20e268 is most likely where things die. >>> >>> but that's the RIP of the last fault, not the first one, right ? >>> 0x20e268 isn't in the text segment of the kernel, my guess is that the >>> first fault triggers an exception, but the exeption handler isn't set up yet >>> so we end up jumping to some random value. >>> >> >> What puzzles me is that: >> >> - %cr2 is 0, so probably the first fault wasn't a page fault > > AFAIK it can't be as we're still in real mode It's protected mode, but with paging still off. >> - RIP is %ebx + 8, so maybe the code was just clobbered by the loop? >> >> Could it be the code has been moved to this location, or is about to >> be moved away afterwards? > > No. RIP shouldn't end up there in any way. the assembly code is quite simple, > it's just a loop and I'm quite confident that we did enter the loop with > sane values Yet Jürgen has a point - entry point and what is being modified are on the same page (and despite paging still being off, you writing page tables here makes pages a relevant unit). Considering - entry point @ 0x20e4d0 - %ecx = 0xdfeb7 - %ebx = 0x20e260 the loop continuing a little further will overwrite the entry point code. And with the entry point not at an even (e.g page-aligned) address, other code (like the one here) could conceivably live immediately ahead of it. (Of course this overwriting may be intentional, but it looks suspicious in this context.) Jan

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.