[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 24/35] xen/console: introduce hwdom_crashconsole=



On Thursday, December 12th, 2024 at 4:29 AM, Roger Pau Monné 
<roger.pau@xxxxxxxxxx> wrote:

>
>
> On Thu, Dec 05, 2024 at 08:41:54PM -0800, Denis Mukhin via B4 Relay wrote:
>
> > From: Denis Mukhin dmukhin@xxxxxxxx
> >
> > The new command line switch `hwdom_crashconsole=BOOL` allows to switch 
> > serial
> > console input focus to xen for machine state inspection using keyhandler
> > mechanism after the hardware domain crashes.
> >
> > The new command line switch is aliased via `dom0=...,crashconsole` knob.
> >
> > Such functionality can be useful while debugging dom0 bringup.
> >
> > Signed-off-by: Denis Mukhin dmukhin@xxxxxxxx
> > ---
> > docs/misc/xen-command-line.pandoc | 5 +++++
> > xen/arch/x86/dom0_build.c | 2 ++
> > xen/common/domain.c | 14 +++++++++++++-
> > xen/include/xen/domain.h | 1 +
> > 4 files changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/docs/misc/xen-command-line.pandoc 
> > b/docs/misc/xen-command-line.pandoc
> > index 
> > 293dbc1a957ba6e668fd4d55d58e84f643822126..fb77d7dca1ea517f79d6713aa6909422f31e7724
> >  100644
> > --- a/docs/misc/xen-command-line.pandoc
> > +++ b/docs/misc/xen-command-line.pandoc
> > @@ -806,6 +806,7 @@ Specify the bit width of the DMA heap.
> >
> > ### dom0
> > = List of [ pv | pvh, shadow=<bool>, verbose=<bool>,
> > + crashconsole=<bool>,
> > cpuid-faulting=<bool>, msr-relaxed=<bool> ] (x86)
> >
> > = List of [ sve=<integer> ] (Arm64)
> > @@ -839,6 +840,10 @@ Controls for how dom0 is constructed on x86 systems.
> > information during the dom0 build. It defaults to the compile time choice
> > of `CONFIG_VERBOSE_DEBUG`.
> >
> > +* The `crashconsole` boolean instructs Xen to drop into emergency console
> > + in case of dom0 crash. May be useful for dom0 bringup on a custom
>
>
> I think the 'a' is unneeded -> "on custom hardware."

Yep. Thank you.

>
>
> I think however this would be clearer as:
>
> "The `crashconsole` boolean instructs Xen to switch input console
> focus to the hypervisor when dom0 shuts down and avoid performing
> dom0 domain destruction. Should only be used for debugging
> purposes."

Fixed.

>
> It's IMO not clear what "instructs Xen to drop into emergency console"
> implies for Xen.
>
> > + hardware.
> > +
> > * The `cpuid-faulting` boolean is an interim option, is only applicable to
> > PV dom0, and defaults to true.
> >
> > diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
> > index 
> > e8f5bf5447bc47a6daa3d95787106a4c11e80d31..706aeec0ecbb565a415edbfb33ca2fd72967c560
> >  100644
> > --- a/xen/arch/x86/dom0_build.c
> > +++ b/xen/arch/x86/dom0_build.c
> > @@ -286,6 +286,8 @@ int __init parse_arch_dom0_param(const char *s, const 
> > char *e)
> > opt_dom0_cpuid_faulting = val;
> > else if ( (val = parse_boolean("msr-relaxed", s, e)) >= 0 )
> > opt_dom0_msr_relaxed = val;
> > + else if ( (val = parse_boolean("crashconsole", s, e)) >= 0 )
> > + opt_hwdom_crashconsole = !!val;
> > else
> > return -EINVAL;
> >
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index 
> > aab546c0a8535e4f007cbbc9c5c552bcf66b5807..4fe69f294158dda7b2e0b9d98d49c34e04131cb8
> >  100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -56,6 +56,13 @@ unsigned int xen_processor_pmbits = XEN_PROCESSOR_PM_PX;
> > bool opt_dom0_vcpus_pin;
> > boolean_param("dom0_vcpus_pin", opt_dom0_vcpus_pin);
> >
> > +/*
> > + * Hardware domain crash handler: if true, do not halt machine, but switch 
> > to
> > + * Xen console for debugging.
> > + */
> > +bool opt_hwdom_crashconsole;
>
>
> __ro_after_init.

Done.

>
> > +boolean_param("hwdom_crashconsole", opt_hwdom_crashconsole);
>
>
> This option doesn't seem to be documented at all in
> xen-command-line.pandoc?

Thanks; fixed.

>
> > +
> > /* Protect updates/reads (resp.) of domain_list and domain_hash. */
> > DEFINE_SPINLOCK(domlist_update_lock);
> > DEFINE_RCU_READ_LOCK(domlist_read_lock);
> > @@ -1138,7 +1145,12 @@ int domain_shutdown(struct domain *d, u8 reason)
> > reason = d->shutdown_code;
> >
> > if ( is_hardware_domain(d) )
> > - hwdom_shutdown(reason);
> > + {
> > + if ( opt_hwdom_crashconsole )
> > + console_set_owner(DOMID_XEN);
>
>
> Don't you need to pause all domain vCPUs and return early here to
> avoid executing the rest of the function, that will likely destroy the
> domain?
>
> Maybe there's something I'm missing that prevents the hardware domain
> destruction.

The point of this change is to drop the user to Xen console on crash instead
of unconditional system restart. TBH, I did not plan for "freezing" the domain
state. I also tested w/ non-hyperlaunch PV dom0 only (I used kexec to
start PV dom0 which was crashing).

My understanding, if I followed the code correctly, all is fine.
domain_destroy() is not called because asm_domain_crash_synchronous(), which
handles the crash, just spins around do_softirq():

void asm_domain_crash_synchronous(unsigned long addr)
{
...

    __domain_crash(current->domain); // that will call domain_crash()

    for ( ; ; )
        do_softirq();
}

>
> Thanks, Roger.





 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.