Xen project Mailing List

Re: [PATCH v4 2/4] xen/x86: introduce self modifying code test

To: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Wed, 20 Dec 2023 10:08:37 +0100

Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx, Wei Liu <wl@xxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>

Delivery-date: Wed, 20 Dec 2023 09:08:45 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Tue, Dec 19, 2023 at 08:31:29PM +0000, Andrew Cooper wrote: > On 15/12/2023 11:18 am, Roger Pau Monne wrote: > > Introduce a helper to perform checks related to self modifying code, and > > start > > by creating a simple test to check that alternatives have been applied. > > > > Such test is hooked into the boot process and called just after alternatives > > have been applied. In case of failure a message is printed, and the > > hypervisor > > is tainted as not having passed the tests, this does require introducing a > > new > > taint bit (printed as 'T'). > > We've got stub_selftest() in extable.c which currently does an ah-hoc > form of this taint via warning_add(). > > Nothing else comes to mind, but I would suggest breaking out the new > taint into an earlier patch, as this one is complicated enough anyway. I see, so introduce the taint in a previous patch and use it in stub_selftest() failure, > > diff --git a/xen/arch/x86/include/asm/test.h > > b/xen/arch/x86/include/asm/test.h > > new file mode 100644 > > index 000000000000..e96e709c6a52 > > --- /dev/null > > +++ b/xen/arch/x86/include/asm/test.h > > @@ -0,0 +1,30 @@ > > +#ifndef _ASM_X86_TEST_H_ > > +#define _ASM_X86_TEST_H_ > > + > > +#include <xen/types.h> > > + > > +int test_smoc(uint32_t selection, uint32_t *results); > > + > > +static inline void execute_selftests(void) > > IMO run_selftests() would be better, but this is already not all of our > selftests, and this particular function really doesn't warrant being > static inline. > > But I'm also not sure what this is liable to contain other than > test_smoc() so I'm not sure why we want it. This was requested by Jan, he was concerned that more of such tests would appear. It's new in v4 IIRC. > > diff --git a/xen/arch/x86/test/smoc.c b/xen/arch/x86/test/smoc.c > > new file mode 100644 > > index 000000000000..09db5cee9ae2 > > --- /dev/null > > +++ b/xen/arch/x86/test/smoc.c > > @@ -0,0 +1,66 @@ > > +/* SPDX-License-Identifier: GPL-2.0 */ > > + > > +#include <xen/errno.h> > > + > > +#include <asm/alternative.h> > > +#include <asm/cpufeature.h> > > +#include <asm/test.h> > > + > > +static bool cf_check test_insn_replacement(void) > > +{ > > +#define EXPECTED_VALUE 2 > > + unsigned int r = ~EXPECTED_VALUE; > > + > > + alternative_io("", "mov %1, %0", X86_FEATURE_ALWAYS, > > + "+r" (r), "i" (EXPECTED_VALUE)); > > + > > + return r == EXPECTED_VALUE; > > +#undef EXPECTED_VALUE > > +} > > + > > +int test_smoc(uint32_t selection, uint32_t *results) > > +{ > > + struct { > > + unsigned int mask; > > + bool (*test)(void); > > + const char *name; > > + } static const tests[] = { > > + { XEN_SYSCTL_TEST_SMOC_INSN_REPL, &test_insn_replacement, > > + "alternative instruction replacement" }, > > + }; > > Ah. I realise I said "like XTF", but I meant "checking one thing at a > time". > > While this pattern for tests[] is very convenient in XTF, it has one > major downside in Xen, and that's the proliferation of ENDBR's in the > running binary. But for the livepatch case for example it's interesting to patch functions that have the ENDBR prefix. I do like having all the tests in an array, as then adding new ones is trivial. > Also (see below), returning bool isn't ok. In the case of a failure, we > need: > > printk(XENLOG_ERR "%s() Failure: Expected $FOO, got $BAR\n"); There's already a message printed below, that's currently limited to system_state < SYS_STATE_active, but I would be fine with printing unconditionally that prints which test failed in a human readable form: printk(XENLOG_ERR "%s test failed\n", tests[i].name); So that would print: "alternative instruction replacement test failed" on the Xen dmesg. On one of the first versions test functions did return a value, but I ended up switching to this boolean version because I didn't see much value in returning anything that's not success or failure from the tests. I can switch back to returning a value, and then the array of tests will also store the expected returned value. > because that's what a human needs to know in order to fix the issue, not > a generic "something failed". > > > + unsigned int i; > > + > > + if ( selection & ~XEN_SYSCTL_TEST_SMOC_ALL ) > > + return -EINVAL; > > I'm not sure this is sensible. It's a testing hypercall, so why > shouldn't I be able to pass ~0 to mean "test everything the hypervisor > knows about" ? Well, for one livepatch tests will fail if the livepatch hasn't been applied yet. > > + > > + if ( results ) > > + *results = 0; > > + > > + for ( i = 0; i < ARRAY_SIZE(tests); i++ ) > > + { > > + if ( !(selection & tests[i].mask) ) > > + continue; > > + > > + if ( tests[i].test() ) > > + { > > + if ( results ) > > + *results |= tests[i].mask; > > How is results supposed to be used? > > XEN_SYSCTL_TEST_SMOC_INSN_REPL covers about 15 things we want to test, > making this output mask useless. The output mask just maps the input tests into output results. For example given the case you want to execute all tests (~0), and the livepatch replacements haven't been applied yet, the altinstructions test will succeed, but the livepatch ones will fail (as expected), we need a way to report this back to the caller. > > The selftests, like the exception fixup ones, are supposed to be > guarantee pass. Failure is an exceptional case, and is only expected to > be found with new compilers and new SMC development. Livepatch tests (at least the one I have implemented in patch 3) is expected to fail until a livepatch is applied to make it succeed. We do care about checking that it first fails, then we upload the livepatch and it succeeds, and that reverting the livepatch makes it fail again. > I can kind of see how an input mask might be useful, although I wouldn't > have had one myself. With correct diagnostics, running the hypercall > multiple times isn't useful to debugging, and without correct > diagnostics, the feedback provided by this is useless. > > So honestly, I think this "results" output is overengineered and doesn't > help the cases where it is actually going to matter. So for altinstructions it's true that the expectation is for them to always succeed, that's not the case for livepatch ones, where it's useful to explicitly test for failure, hence we need a fine grained way to report failure of specific tests. > > Remember most of all that self-modifying code which are going to cause > failures here have a high chance of crashing Xen outright. And we're > deliberately trying to make this happen in CI and before a breaking > change gets out into releases. > > > + continue; > > + } > > + > > + if ( system_state < SYS_STATE_active ) > > + printk(XENLOG_ERR "%s test failed\n", tests[i].name); > > This is a test hypercall, for the purpose of running testing, in > combination with test livepatches. Eliding the diagnostics isn't ok. > > Logspam concerns aren't an issue. If the user runs `while :; do > xen-test-smc; done` in dom0 then they get to have a full dmesg ring. > > Don't let that get in the way of having a sensible time figuring out > what went wrong. This was requested by Jan, and indeed my original intention was to unconditionally print the messages, as I think they are helpful. Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.