Xen project Mailing List

Re: [PATCH v1 4/7] x86/vmx: add do_vmtrace_op

To: Roger Pau Monné <roger.pau@xxxxxxxxxx>

From: Michał Leszczyński <michal.leszczynski@xxxxxxx>

Date: Wed, 17 Jun 2020 21:13:05 +0200 (CEST)

Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Wed, 17 Jun 2020 19:13:59 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: 2F7rUJxrX+qiejqyNWQFiyWrG+s5iA==

Thread-topic: x86/vmx: add do_vmtrace_op

----- 16 cze 2020 o 19:23, Roger Pau Monné roger.pau@xxxxxxxxxx napisał(a): > On Tue, Jun 16, 2020 at 05:22:06PM +0200, Michał Leszczyński wrote: >> Provide an interface for privileged domains to manage >> external IPT monitoring. >> >> Signed-off-by: Michal Leszczynski <michal.leszczynski@xxxxxxx> > > Thanks for the patch! I have some questions below which require your > input. > >> --- >> xen/arch/x86/hvm/hvm.c | 170 ++++++++++++++++++++++++++++++++ >> xen/include/public/hvm/hvm_op.h | 27 +++++ >> 2 files changed, 197 insertions(+) >> >> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c >> index 5bb47583b3..9292caebe0 100644 >> --- a/xen/arch/x86/hvm/hvm.c >> +++ b/xen/arch/x86/hvm/hvm.c >> @@ -4949,6 +4949,172 @@ static int compat_altp2m_op( >> return rc; >> } >> >> +static int do_vmtrace_op( >> + XEN_GUEST_HANDLE_PARAM(void) arg) > > No need for the newline, this can fit on a single line. > >> +{ >> + struct xen_hvm_vmtrace_op a; >> + struct domain *d = NULL; > > I don't think you need to init d to NULL (at least by looking at the > current code below). > >> + int rc = -EFAULT; > > No need to init rc. > >> + int i; > > unsigned since it's used as a loop counter. > >> + struct vcpu *v; >> + void* buf; > > Nit: '*' should be prepended to the variable name. > >> + uint32_t buf_size; > > size_t > >> + uint32_t buf_order; > > Order is generally fine using unsigned int, no need to use a > specifically sized type. > >> + uint64_t buf_mfn; > > Could this use the mfn type? > >> + struct page_info *pg; >> + >> + if ( !hvm_ipt_supported() ) >> + return -EOPNOTSUPP; >> + >> + if ( copy_from_guest(&a, arg, 1) ) >> + return -EFAULT; >> + >> + if ( a.version != HVMOP_VMTRACE_INTERFACE_VERSION ) >> + return -EINVAL; >> + >> + switch ( a.cmd ) >> + { >> + case HVMOP_vmtrace_ipt_enable: >> + case HVMOP_vmtrace_ipt_disable: >> + case HVMOP_vmtrace_ipt_get_buf: >> + case HVMOP_vmtrace_ipt_get_offset: >> + break; >> + >> + default: >> + return -EOPNOTSUPP; >> + } >> + >> + d = rcu_lock_domain_by_any_id(a.domain); >> + >> + if ( d == NULL ) >> + return -ESRCH; >> + >> + if ( !is_hvm_domain(d) ) >> + { >> + rc = -EOPNOTSUPP; >> + goto out; >> + } >> + >> + domain_pause(d); >> + >> + if ( a.vcpu >= d->max_vcpus ) >> + { >> + rc = -EINVAL; >> + goto out; >> + } >> + >> + v = d->vcpu[a.vcpu]; >> + >> + if ( a.cmd == HVMOP_vmtrace_ipt_enable ) > > Please use a switch here, you might even consider re-using the switch > from above and moving the domain checks before actually checking the > command field, so that you don't need to perform two switches against > a.cmd. > >> + { >> + if ( v->arch.hvm.vmx.ipt_state ) { > > Coding style, brace should be on newline (there are more below which > I'm not going to comment on). > >> + // already enabled > > Comments should use /* ... */, there multiple instances of this below > which I'm not going to comment on, please check CODING_STYLE. > > Also, the interface looks racy, I think you are missing a lock to > protect v->arch.hvm.vmx.ipt_state from being freed under your feet if > you issue concurrent calls to the interface. > >> + rc = -EINVAL; >> + goto out; >> + } >> + >> + if ( a.size < PAGE_SIZE || a.size > 1000000 * PAGE_SIZE ) { > > You can use GB(4) which is easier to read. Should the size also be a > multiple of a PAGE_SIZE? > >> + // we don't accept trace buffer size smaller than single page >> + // and the upper bound is defined as 4GB in the specification >> + rc = -EINVAL; >> + goto out; >> + } > > Stray tab. > >> + >> + buf_order = get_order_from_bytes(a.size); >> + >> + if ( (a.size >> PAGE_SHIFT) != (1 << buf_order) ) { > > Oh here is the check. I think you can move this with the checks above > by doing a.size & ~PAGE_MASK. I belive it's more strict than a.size & ~PAGE_MASK. I think that CPU expects that the buffer size is a power of 2, so you can have 64 MB or 128 MB, but not 96 MB buffer. > >> + rc = -EINVAL; >> + goto out; >> + } >> + >> + buf = page_to_virt(alloc_domheap_pages(d, buf_order, >> MEMF_no_refcount)); > > What if alloc_domheap_pages return NULL? > > Since I think you only what the linear address of the page to zero it > I would suggest using clear_domain_page. > Hmm. This was fixed already. Most probably I did something strange with git and this change was not stored. I will correct this with patch v2. >> + buf_size = a.size; >> + >> + if ( !buf ) { >> + rc = -EFAULT; >> + goto out; >> + } >> + >> + memset(buf, 0, buf_size); >> + >> + for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ ) { >> + share_xen_page_with_privileged_guests(virt_to_page(buf) + i, >> SHARE_ro); > > This line (and some more below) exceeds 80 characters, please split > it. > >> + } >> + >> + v->arch.hvm.vmx.ipt_state = xmalloc(struct ipt_state); > > You should check that xmalloc has succeeds before trying to access > ipt_state. > >> + v->arch.hvm.vmx.ipt_state->output_base = virt_to_mfn(buf) << >> PAGE_SHIFT; >> + v->arch.hvm.vmx.ipt_state->output_mask = buf_size - 1; >> + v->arch.hvm.vmx.ipt_state->status = 0; >> + v->arch.hvm.vmx.ipt_state->ctl = RTIT_CTL_TRACEEN | RTIT_CTL_OS | >> RTIT_CTL_USR | RTIT_CTL_BRANCH_EN; > > Shouldn't the user be able to select what tracing should be enabled? > >> + } >> + else if ( a.cmd == HVMOP_vmtrace_ipt_disable ) >> + { >> + if ( !v->arch.hvm.vmx.ipt_state ) { >> + rc = -EINVAL; >> + goto out; >> + } >> + >> + buf_mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT; >> + buf_size = ( v->arch.hvm.vmx.ipt_state->output_mask + 1 ) & >> 0xFFFFFFFFUL; >> + >> + for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ ) >> + { >> + if ( (mfn_to_page(_mfn(buf_mfn + i))->count_info & >> PGC_count_mask) >> != 1 ) >> + { >> + rc = -EBUSY; >> + goto out; >> + } >> + } >> + >> + xfree(v->arch.hvm.vmx.ipt_state); >> + v->arch.hvm.vmx.ipt_state = NULL; >> + >> + for ( i = 0; i < (buf_size >> PAGE_SHIFT); i++ ) >> + { >> + pg = mfn_to_page(_mfn(buf_mfn + i)); >> + put_page_alloc_ref(pg); >> + if ( !test_and_clear_bit(_PGC_xen_heap, &pg->count_info) ) >> + ASSERT_UNREACHABLE(); >> + pg->u.inuse.type_info = 0; >> + page_set_owner(pg, NULL); >> + free_domheap_page(pg); > > Hm, this seems fairly dangerous, what guarantees that the caller is > not going to map the buffer while you are trying to tear it down? > > You perform a check before freeing ipt_state, but between the check > and the actual tearing down the domain might have setup mappings to > them. > > I wonder, could you expand a bit on why trace buffers are allocated > from domheap memory by Xen? In general, I thought it would be good to account trace buffers for particular DomUs, so it would be easier to troubleshoot the memory usage. > > There are a couple of options here, maybe the caller could provide > it's own buffer, then Xen would take an extra reference to those pages > and setup them to be used as buffers. > > Another alternative would be to use domhep memory but not let the > caller map it directly, and instead introduce a hypercall to copy > from the internal Xen buffer into a user-provided one. > > How much memory is used on average by those buffers? That would help > decide a model that would best fit the usage. >From 4 kB to 4 GB. Right now I use 128 MB buffers and it takes just a few >seconds to fill them up completely. I think I've just copied the pattern which is already present in Xen's code, e.g. interfaces used by xenbaked/xentrace tools. > >> + } >> + } >> + else if ( a.cmd == HVMOP_vmtrace_ipt_get_buf ) >> + { >> + if ( !v->arch.hvm.vmx.ipt_state ) { >> + rc = -EINVAL; >> + goto out; >> + } >> + >> + a.mfn = v->arch.hvm.vmx.ipt_state->output_base >> PAGE_SHIFT; > > This will not work for translated domains, ie: a PVH or HVM domain > won't be able to use this interface since it has no way to request the > mapping of a specific mfn into it's physmap. I think we need to take > this into account when deciding how the interface should be, so that > we don't corner ourselves with a PV only interface. Please be aware that this is only going to be used by Dom0. Is is well-supported case that somebody is using PVH/HVM Dom0? I think that all Virtual Machine Introspection stuff currently requires to have Dom0 PV. Our main goal is to have this working well in combo with VMI. > >> + a.size = (v->arch.hvm.vmx.ipt_state->output_mask + 1) & >> 0xFFFFFFFFUL; > > You can truncate it easier by casting to uint32_t I think. > > Or even better, you could put output_mask in a union like: > > union { > uint64_t raw; > struct { > uint32_t size; > uint32_t offset; > } > } > > Then you can avoid the shifting and the castings. > >> + } >> + else if ( a.cmd == HVMOP_vmtrace_ipt_get_offset ) >> + { >> + if ( !v->arch.hvm.vmx.ipt_state ) { >> + rc = -EINVAL; >> + goto out; >> + } >> + >> + a.offset = v->arch.hvm.vmx.ipt_state->output_mask >> 32; >> + } >> + >> + rc = -EFAULT; >> + if ( __copy_to_guest(arg, &a, 1) ) >> + goto out; >> + rc = 0; >> + >> + out: >> + smp_wmb(); > > Why do you need a barrier here? > >> + domain_unpause(d); >> + rcu_unlock_domain(d); >> + >> + return rc; >> +} >> + >> +DEFINE_XEN_GUEST_HANDLE(compat_hvm_vmtrace_op_t); >> + >> static int hvmop_get_mem_type( >> XEN_GUEST_HANDLE_PARAM(xen_hvm_get_mem_type_t) arg) >> { >> @@ -5101,6 +5267,10 @@ long do_hvm_op(unsigned long op, >> XEN_GUEST_HANDLE_PARAM(void) arg) >> rc = current->hcall_compat ? compat_altp2m_op(arg) : >> do_altp2m_op(arg); >> break; >> >> + case HVMOP_vmtrace: >> + rc = do_vmtrace_op(arg); >> + break; >> + >> default: >> { >> gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op); >> diff --git a/xen/include/public/hvm/hvm_op.h >> b/xen/include/public/hvm/hvm_op.h >> index 870ec52060..3bbcd54c96 100644 >> --- a/xen/include/public/hvm/hvm_op.h >> +++ b/xen/include/public/hvm/hvm_op.h >> @@ -382,6 +382,33 @@ struct xen_hvm_altp2m_op { >> typedef struct xen_hvm_altp2m_op xen_hvm_altp2m_op_t; >> DEFINE_XEN_GUEST_HANDLE(xen_hvm_altp2m_op_t); >> >> +/* HVMOP_vmtrace: Perform VM tracing related operation */ >> +#define HVMOP_vmtrace 26 >> + >> +#define HVMOP_VMTRACE_INTERFACE_VERSION 0x00000001 >> + >> +struct xen_hvm_vmtrace_op { >> + /* IN variable */ >> + uint32_t version; /* HVMOP_VMTRACE_INTERFACE_VERSION */ >> + uint32_t cmd; >> +/* Enable/disable external vmtrace for given domain */ >> +#define HVMOP_vmtrace_ipt_enable 1 >> +#define HVMOP_vmtrace_ipt_disable 2 >> +#define HVMOP_vmtrace_ipt_get_buf 3 >> +#define HVMOP_vmtrace_ipt_get_offset 4 >> + domid_t domain; > > You are missing a padding field here AFAICT. > > Roger. Thanks for your feedback, I will apply all the remaining suggestions in patch v2. Best regards, Michał Leszczyński CERT Polska

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.