[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-devel] [VMI] Possible race-condition in altp2m APIs
Hi Andrew, Tamas,
Le mercredi, mai 29, 2019 2:49 AM, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> a écrit :
After a series of tests on 1 or 4 VCPUs, my domain end up in 2 possible states:
- frozen: the mouse doesn't move: so I would guess the VCPU are blocked.
I'm calling the xc_(un)pause_domain APIs multiple times when I write to the shadow copies,
but It's always synchronous, so I doubt that they interfered and "paused" the domain.
xc_{,un}pause_domain() calls are reference counted. Calling unpause
too many times should be safe (from a refcount point of view), and
should fail the hypercall with -EINVAL.
Also, the log output I have before I detect that Ansible failed to execute is that the resume succeded and
Xen is ready to process VMI events.
- BSOD: that's the second possibility, apparently I'm corrupting critical data structure in the operating system,
and the Windbg analysis is inconclusive, so I can't tell much.
Either way, I can't execute this test sequentially 10 000 times without a crash.
Ok good - this is a far easier place to start debugging from.
-> Could you look at the implementation, and tell me if I misused the APIs somewhere ?
https://gist.github.com/mtarral/d99ce5524cfcfb5290eaa05702c3e8e7
Some observations.
1) In xen_pause_vm(), you do an xc_domain_getinfo(). First of all,
the API is crazy, so you also need to check "|| info.domid != domid"
in your error condition, but that shouldn't be an issue here as the
domid isn't changing.
"The API is crazy"
Could you elaborate on that ?
Furthermore, the results of xc_domain_getinfo() are stale before the
hypercall returns, so it is far less useful than it appears.
I looked at libvmi's implementation:
I guess I should have implemented the checks too.
They just didn't make sense for me as I was sure that my calls were synchronous, one after the other.
I'm afraid that the only safe way to issue pause/unpauses is to know
that you've reference counted your own correctly. All entities in
dom0 with privilege can fight over each others references, because
there is nothing Xen can use to distinguish the requests.
-> I removed my calls to xc_pause/resume.
2) You allocate a libxl context but do nothing with it. That can
all go, along with the linkage against libxl. Also, you don't need
to create a logger like that. Despite being utterly unacceptable
behaviour for a library, it is the default by passing NULL in
xc_interface_open().
I followed drakvuf's xen init function:
As I thought I was going to need this at some point.
Same for the xl_logger initialization.
3) A malloc()/memset() pair is more commonly spelt calloc()
True.
And some questions.
1) I'm guessing the drakvuf_inject_trap(drakvuf, 0x293e6a0, 0)
call is specific to the exact windows kernel in use?
Yes, I used a libvmi python script to tanslate the symbol -> virtual address -> physical address.
Then I replaced that value in my code and recompiled the binary before the test.
2) In vmi_init(), what is the purpose of fmask and zero_page_gfn?
You add one extra gfn to the guest, called zero_page, and fill it
with 1's from fmask.
3) You create two altp2m's, but both have the same default access.
Is this deliberate, or a bug? If deliberate, why?
Finally, and probably the source of the memory corruption...
4) When injecting a trap, you allocate a new gfn, memcpy() the
contents and insert a 0xcc (so far so good). You then remap the
executable view to point at the new gfn with a breakpoint in (fine),
and remap the readable view to point at the zero_page, which is full
of 1's (uh-oh).
What is this final step trying to achieve? It guarantees that
patch-guard will eventually notice and BSOD your VM for critical
structure corruption. The read-only view needs to point to the
original gfn with only read permissions, so when Windows reads the
gfn back, it sees what it expects. You also need to prohibit writes
to either gfn so you can spot writes (unlikely in this case but
important for general introspection) so you can propagate the change
to both copies.
Yes I missed that PatchGuard would eventually check those shadow pages anyway.
I was already happy to see that my breakpoints were working, and I proceeded to the tests
hoping to have a quick reproduction of the bug.
I implemented a basic mem_access event on the restricting to --X only on the original GFN being remapped,
and switching to hostp2m and singlestepping to escape PatchGuard.
It works, but I end up in a situation where Xen fails at some point, because at ~90 tests, it cannot populate the ring anymore:
INFO:root:==== test 92 ====
INFO:root:starting drakvuf
INFO:root:starting Ansible
INIT
xen_init_interface
xc_interface_open
create logger
allocating libxc context
init ring page
xc: error: Failed to populate ring pfn
(16 = Device or resource busy): Internal error
fail to enable monitoring: Device or resource busy
fail to init xen interface
CLOSE
Fail to init vmi
What do you think happened ?
I have a call to xc_domain_setmaxmem with ~0, so it shouldn't happen ?
I used the compat APIs, like Drakvuf does.
@Tamas, if you could check the traps implementation.
You also have stress-test.py, which is the small test suite that I used, and
the screenshot showing the stdout preceding a test failure,
when Ansible couldn't contact WinRM service because the domain was frozen.
Note: I stole some code from libvmi, to handle page read/write in Xen.
PS: in the case where the domain is frozen, and I destroy the domain, a (null) entry will remain
in xl list, despite that my stress-test.py process is already dead.
I have 4 of these entries in my xl list right now.
That's almost certainly a reference not being dropped on a page.
Can you run `xl debug-keys q` and paste the resulting analysis which
will be visible in `xl dmesg`?
It is probably some missing cleanup in the altp2m code.
I just checked, and I had a few "xen-drakvuf" processes still running in the background.
When the main Python process raised an exception and terminated, they became attached to systemd.
Killing them removed the (null) domain entries.
Thanks,
Mathieu
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel
|