[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Hypercall fault injection (Was [PATCH 0/3] xen/domain: More structured teardown)
On 22/12/2020 15:47, Tamas K Lengyel
wrote:
On 22/12/2020 10:00, Jan
Beulich wrote:
> On 21.12.2020 20:36, Andrew Cooper wrote:
>> Hello,
>>
>> We have some very complicated hypercalls,
createdomain, and max_vcpus a
>> close second, with immense complexity, and very
hard-to-test error handling.
>>
>> It is no surprise that the error handling is
riddled with bugs.
>>
>> Random failures from core functions is one way, but
I'm not sure that
>> will be especially helpful. In particular, we'd
need a way to exclude
>> "dom0 critical" operations so we've got a usable
system to run testing on.
>>
>> As an alternative, how about adding a fault_ttl
field into the hypercall?
>>
>> The exact paths taken in {domain,vcpu}_create() are
sensitive to the
>> hardware, Xen Kconfig, and other parameters passed
into the
>> hypercall(s). The testing logic doesn't really
want to care about what
>> failed; simply that the error was handled
correctly.
>>
>> So a test for this might look like:
>>
>> cfg = { ... };
>> while ( xc_create_domain(xch, cfg) < 0 )
>> cfg.fault_ttl++;
>>
>>
>> The pro's of this approach is that for a specific
build of Xen on a
>> piece of hardware, it ought to check every failure
path in
>> domain_create(), until the ttl finally gets higher
than the number of
>> fail-able actions required to construct a domain.
Also, the test
>> doesn't need changing as the complexity of
domain_create() changes.
>>
>> The main con will mostly likely be the invasiveness
of code in Xen, but
>> I suppose any fault injection is going to be
invasive to a certain extent.
> While I like the idea in principle, the innocent
looking
>
> cfg = { ... };
>
> is quite a bit of a concern here as well: Depending on
the precise
> settings, paths taken in the hypervisor may heavily
vary, and hence
> such a test will only end up being useful if it covers
a wide
> variety of settings. Even if the number of tests to
execute turned
> out to still be manageable today, it may quickly turn
out not
> sufficiently scalable as we add new settings
controllable right at
> domain creation (which I understand is the plan).
Well - there are two aspects here.
First, 99% of all VMs in practice are one of 3 or 4
configurations. An
individual configuration is O(n) time complexity to test
with fault_ttl,
depending on the size of Xen's logic, and we absolutely want
to be able
to test these deterministically and to completion.
For the plethora of other configurations, I agree that it is
infeasible
to test them all. However, a hypercall like this is easy to
wire up
into a fuzzing harness.
TBH, I was thinking of something like
https://github.com/intel/kernel-fuzzer-for-xen-project
with a PVH Xen
and XTF "dom0" poking just this hypercall. All the other
complicated
bits of wiring AFL up appear to have been done.
Perhaps when we exhaust that as a source of bugs, we move
onto fuzzing
the L0 Xen, because running on native will give it more
paths to
explore. We'd need some way of reporting path/trace data
back to AFL in
dom0 which might require a bit plumbing.
This is a pretty cool idea, I would be very interested in
trying this out. If running Xen nested in a HVM domain is
possible (my experiments with nested setups using Xen have
only worked on ancient hw last time I tried) then running
the fuzzer would be entirely possible using VM forks. You
don't even need a special "dom0", you could just add the
fuzzer's CPUID harness to Xen's hypercall handler and the
only thing needed from the nested dom0 would be to trigger
the hypercall with a normal config. The fuzzer would take it
from there and replace the config with the fuzzed version
directly in VM forks. Defining what to report as a "crash"
to AFL would still need to be defined manually for Xen as
the current sink points are Linux specific ( https://github.com/intel/kernel-fuzzer-for-xen-project/blob/master/src/sink.h),
but that should be straight forward.
Also, running the fuzzer with PVH guests hasn't been
tested but since all VM forking needs is EPT it should work.
Xen running inside Xen definitely works, and is even supported as
far as PV-Shim goes (i.e. no nested virt). That would limit testing
to just creation of PV guests at L1, which is plenty to get started
with.
Xen nested under Xen does work to a first approximation, and for the
purposes of fuzzing areas other than the nested-virt logic, might
even be ok. (I use this configuration for a fair chunk of hvm
development).
~Andrew
|