| 
    
 [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen: add support for automatic debug key actions in case of crash
 On 29.10.20 15:22, Jan Beulich wrote: On 22.10.2020 16:39, Juergen Gross wrote:When the host crashes it would sometimes be nice to have additional debug data available which could be produced via debug keys, but halting the server for manual intervention might be impossible due to the need to reboot/kexec rather sooner than later. Add support for automatic debug key actions in case of crashes which can be activated via boot- or runtime-parameter.While I can certainly see this possibly being a useful thing in certain situations, I'm uncertain it's going to be helpful in at least a fair set of cases. What output to request very much depends on the nature of the problem one is running into, and the more keys one adds "just in case", the longer the reboot latency, and the higher the risk (see also below) of the output generation actually causing further problems. The obvious case is a watchdog induced crash: at least 2 sets of dom0 state will help in many cases. IOW I'm neither fully convinced that we want this, nor fully opposed.Depending on the type of crash the desired data might be different, so support different settings for the possible types of crashes. The parameter is "crash-debug" with the following syntax: crash-debug-<type>=<string> with <type> being one of: panic, hwdom, watchdog, kexeccmd, debugkey and <string> a sequence of debug key characters with '.' having the special semantics of a 1 second pause.1 second is a whole lot of time. To get two successive sets of data, a much shorter delay (if any) would normally suffice. Yes, I'd be fine to trade that for a shorter period of time. Also, while '.' may seem like a good choice right now, with the shortage of characters we may want to put a real handler behind it at come point. The one character that clearly won't make much sense to use in this context is 'h', but that's awful as a (kind of) separator. Could we perhaps replace 'h' by '?', freeing up 'h' and allowing '?' to be used for this purpose here? Fine with me. Another possibility would be to add '\' as an escape character with '\.' meaning "debug-key .". --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -574,6 +574,29 @@ reduction of features at Xen's disposal to manage guests. ### cpuinfo (x86) > `= <boolean>`+### crash-debug-debugkey+### crash-debug-hwdom +### crash-debug-kexeccmd +### crash-debug-panic +### crash-debug-watchdog +> `= <string>` + +> Can be modified at runtime + +Specify debug-key actions in cases of crashes. Each of the parameters applies +to a different crash reason. The `<string>` is a sequence of debug key +characters, with `.` having the special meaning of a 1 second pause. + +So e.g. `crash-debug-watchdog=0.0r` would dump dom0 state twice with a +second between the two state dumps, followed by the run queues of the +hypervisor, if the system crashes due to a watchdog timeout. + +These parameters should be used carefully, as e.g. specifying +`crash-debug-debugkey=C` would result in an endless loop. Depending on the +reason of the system crash it might happen that triggering some debug key +action will result in a hang instead of dumping data and then doing a +reboot or crash dump.I think it would be useful if the flavors were (briefly) explained: At the very least "debugkey" doesn't directly fit "in cases of crashes", as there's no crash involved. kexec_crash() instead gets invoked without there having been any crash. Yes, and having some additional state generate for this case might help diagnosis. You may also want to point out that this is a best effort thing only - system state at the point of a crash may be such that the attempt of handling one or the debug keys would have further bad effects on the system, including that the actual kexec may then never occur. True. 
 OTOH the runtime parameters are much easier addressable that way. 
 Will fix. + if ( *action == '.' ) + mdelay(1000); + else + handle_keypress(*action, regs); + action++; + } +}I think only diagnostic keys should be permitted here. While I understand that other keys could produce nonsense or do harm, I'm not sure we should really prohibit them. Allowing them would e.g. allow to do just a reboot without kdump for one type of crashes. --- a/xen/include/xen/kexec.h +++ b/xen/include/xen/kexec.h @@ -1,6 +1,8 @@ #ifndef __XEN_KEXEC_H__ #define __XEN_KEXEC_H__+#include <xen/keyhandler.h>Could we go without this, utilizing the gcc extension of forward declared enums? Otoh ...@@ -82,7 +84,11 @@ void vmcoreinfo_append_str(const char *fmt, ...) #define kexecing 0static inline void kexec_early_calculations(void) {} And even being a #define all users of kexec_crash() would need to #include keyhandler.h (and I'm not sure there are any source files including kexec.h which don't use kexec_crash()). Juergen 
 
 
  | 
  
![]()  | 
            
         Lists.xenproject.org is hosted with RackSpace, monitoring our  |