[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC 3/4] HVM x86 deprivileged mode: Code for switching into/out of deprivileged mode



On 06/08/15 17:45, Ben Catterall wrote:
> The process to switch into and out of deprivileged mode can be likened to
> setjmp/longjmp.
>
> To enter deprivileged mode, we take a copy of the stack from the guest's
> registers up to the current stack pointer. This allows us to restore the stack
> when we have finished the deprivileged mode operation, meaning we can continue
> execution from that point. This is similar to if a context switch had 
> happened.
>
> To exit deprivileged mode, we copy the stack back, replacing the current 
> stack.
> We can then continue execution from where we left off, which will unwind the
> stack and free up resources. This method means that we do not need to
> change any other code paths and its invocation will be transparent to callers.
> This should allow the feature to be more easily deployed to different parts
> of Xen.
>
> Note that this copy of the stack is per-vcpu but, it will contain per-pcpu 
> data.
> Extra work is needed to properly migrate vcpus between pcpus.

Under what circumstances do you see there being persistent state in the
depriv area between calls, given that the calls are synchronous from VM
actions?

>
> The switch to and from deprivileged mode is performed using sysret and syscall
> respectively.

I suspect we need to borrow the SS attribute workaround from Linux to
make this function reliably on AMD systems.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=61f01dd941ba9e06d2bf05994450ecc3d61b6b8b

>
> The return paths in entry.S have been edited so that, when we receive an
> interrupt whilst in deprivileged mode, we return into that mode correctly.
>
> A hook on the syscall handler in entry.S has also been added which handles
> returning from user mode and will support deprivileged mode system calls when
> these are needed.
>
> Signed-off-by: Ben Catterall <Ben.Catterall@xxxxxxxxxx>
> ---
>  xen/arch/x86/domain.c               |  12 +++
>  xen/arch/x86/hvm/Makefile           |   1 +
>  xen/arch/x86/hvm/deprivileged.c     | 103 ++++++++++++++++++
>  xen/arch/x86/hvm/deprivileged_asm.S | 205 
> ++++++++++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/vmx/vmx.c          |   7 ++
>  xen/arch/x86/x86_64/asm-offsets.c   |   5 +
>  xen/arch/x86/x86_64/entry.S         |  35 ++++++
>  xen/include/asm-x86/hvm/vmx/vmx.h   |   2 +
>  xen/include/xen/hvm/deprivileged.h  |  38 +++++++
>  xen/include/xen/sched.h             |  18 +++-
>  10 files changed, 425 insertions(+), 1 deletion(-)
>  create mode 100644 xen/arch/x86/hvm/deprivileged_asm.S
>
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 045f6ff..a0e5e70 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -62,6 +62,7 @@
>  #include <xen/iommu.h>
>  #include <compat/vcpu.h>
>  #include <asm/psr.h>
> +#include <xen/hvm/deprivileged.h>
>  
>  DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
>  DEFINE_PER_CPU(unsigned long, cr4);
> @@ -446,6 +447,12 @@ int vcpu_initialise(struct vcpu *v)
>      if ( has_hvm_container_domain(d) )
>      {
>          rc = hvm_vcpu_initialise(v);
> +
> +        /* Initialise HVM deprivileged mode */
> +        printk("HVM initialising deprivileged mode ...");

All printk()s should have a XENLOG_$severity prefix.

> +        hvm_deprivileged_prepare_vcpu(v);
> +        printk("Done.\n");
> +
>          goto done;
>      }
>  
> @@ -523,7 +530,12 @@ void vcpu_destroy(struct vcpu *v)
>      vcpu_destroy_fpu(v);
>  
>      if ( has_hvm_container_vcpu(v) )
> +    {
> +        /* Destroy the deprivileged mode on this vcpu */
> +        hvm_deprivileged_destroy_vcpu(v);
> +
>          hvm_vcpu_destroy(v);
> +    }
>      else
>          xfree(v->arch.pv_vcpu.trap_ctxt);
>  }
> diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
> index bd83ba3..6819886 100644
> --- a/xen/arch/x86/hvm/Makefile
> +++ b/xen/arch/x86/hvm/Makefile
> @@ -17,6 +17,7 @@ obj-y += quirks.o
>  obj-y += rtc.o
>  obj-y += save.o
>  obj-y += deprivileged.o
> +obj-y += deprivileged_asm.o
>  obj-y += stdvga.o
>  obj-y += vioapic.o
>  obj-y += viridian.o
> diff --git a/xen/arch/x86/hvm/deprivileged.c b/xen/arch/x86/hvm/deprivileged.c
> index 071d900..979fc69 100644
> --- a/xen/arch/x86/hvm/deprivileged.c
> +++ b/xen/arch/x86/hvm/deprivileged.c
> @@ -439,3 +439,106 @@ int hvm_deprivileged_copy_l1(struct domain *d,
>      }
>      return 0;
>  }
> +
> +/* Used to prepare each vcpus data for user mode. Call for each HVM vcpu.
> + */
> +int hvm_deprivileged_prepare_vcpu(struct vcpu *vcpu)
> +{
> +    struct page_info *pg;
> +
> +    /* TODO: clarify if this MEMF is correct */
> +    /* Allocate 2^STACK_ORDER contiguous pages */
> +    pg = alloc_domheap_pages(NULL, STACK_ORDER, MEMF_no_owner);
> +    if( pg == NULL )
> +    {
> +        panic("HVM: Out of memory on per-vcpu deprivileged mode init.\n");
> +        return -ENOMEM;
> +    }
> +
> +    vcpu->stack = page_to_virt(pg);

Xen has two heaps, the xenheap and the domheap.

You may only construct pointers like this into the xenheap.  The domheap
is not guaranteed to have safe virtual mappings to.  (This code only
works because your test box isn't bigger than 5TB.  Also there is a bug
with xenheap allocations at the same point, but I need to fix that bug).

All access to domheap pages must strictly be within a
map_domain_page()/unmap() region, which construct save temporary mappings.

> +    vcpu->rsp = 0;
> +    vcpu->user_mode = 0;
> +
> +    return 0;
> +}
> +
> +/* Called on destroying each vcpu */
> +void hvm_deprivileged_destroy_vcpu(struct vcpu *vcpu)
> +{
> +    free_domheap_pages(virt_to_page(vcpu->stack), STACK_ORDER);
> +}
> +
> +/* Called to perform a user mode operation.
> + * Execution context is saved and then we move into user mode.
> + * This method is then jumped into to restore execution context after
> + * exiting user mode.
> + */
> +void hvm_deprivileged_user_mode(void)
> +{
> +    struct vcpu *vcpu = get_current();
> +    unsigned long int efer = read_efer();
> +    register unsigned long sp asm("rsp");
> +
> +    ASSERT( vcpu->user_mode == 0 );
> +    ASSERT( vcpu->stack != 0 );
> +    ASSERT( vcpu->rsp == 0 );
> +
> +    /* Flip the SCE bit to allow sysret/call */
> +    write_efer(efer | EFER_SCE);
> +
> +    /* Save the msr lstar and star. Xen does lazy loading of these
> +     * so we need to put the host values in and then restore the
> +     * guest ones once we're done.
> +     */
> +    rdmsrl(MSR_LSTAR, vcpu->msr_lstar);
> +    rdmsrl(MSR_STAR, vcpu->msr_star);
> +    wrmsrl(MSR_LSTAR,get_host_msr_state()->msrs[VMX_INDEX_MSR_LSTAR]);
> +    wrmsrl(MSR_STAR, get_host_msr_state()->msrs[VMX_INDEX_MSR_STAR]);

A partial context switch like this should be implemented as two new
hvm_ops such as hvm_op.depriv_ctxt_switch_{to,from}()

This allows you to keep the common code clean of vendor specific code.

> +
> +    /* The assembly routine to handle moving into/out of deprivileged mode */
> +    hvm_deprivileged_user_mode_asm();
> +
> +    /* If our copy failed */
> +    if( unlikely(vcpu->rsp == 0) )
> +    {
> +        gdprintk(XENLOG_ERR, "HVM: Stack too large in %s\n", __FUNCTION__);

__func__ please.  It conforms to C99 whereas __FUNCTION__ is a gnuism.

> +        domain_crash_synchronous();
> +    }
> +
> +    /* Debug info */
> +    vcpu->old_msr_lstar = get_host_msr_state()->msrs[VMX_INDEX_MSR_LSTAR];
> +    vcpu->old_msr_star = get_host_msr_state()->msrs[VMX_INDEX_MSR_STAR];
> +    vcpu->old_rsp = sp;
> +    vcpu->old_processor = smp_processor_id();
> +
> +    /* Restore the efer and saved msr registers */
> +    write_efer(efer);
> +    wrmsrl(MSR_LSTAR, vcpu->msr_lstar);
> +    wrmsrl(MSR_STAR, vcpu->msr_star);
> +    vcpu->user_mode = 0;
> +    vcpu->rsp = 0;
> +}
> +
> +/* Called when the user mode operation has completed
> + * Perform C-level processing on return pathx
> + */
> +void hvm_deprivileged_finish_user_mode(void)
> +{
> +    /* If we are not returning from user mode: bail */
> +    ASSERT(get_current()->user_mode == 1);
> +
> +    hvm_deprivileged_finish_user_mode_asm();
> +}
> +
> +void hvm_deprivileged_check_trap(const char* func_name)
> +{
> +    if( current->user_mode == 1 )
> +    {
> +        printk("HVM Deprivileged Mode: Trap whilst in user mode, %s\n",
> +               func_name);
> +        domain_crash_synchronous();
> +    }
> +}
> +
> +
> +
> diff --git a/xen/arch/x86/hvm/deprivileged_asm.S 
> b/xen/arch/x86/hvm/deprivileged_asm.S
> new file mode 100644
> index 0000000..00a9e9c
> --- /dev/null
> +++ b/xen/arch/x86/hvm/deprivileged_asm.S
> @@ -0,0 +1,205 @@
> +/*
> + * HVM security enhancements assembly code
> + */
> +#include <xen/config.h>
> +#include <xen/errno.h>
> +#include <xen/softirq.h>
> +#include <asm/asm_defns.h>
> +#include <asm/apicdef.h>
> +#include <asm/page.h>
> +#include <public/xen.h>
> +#include <irq_vectors.h>
> +#include <xen/hvm/deprivileged.h>
> +
> +/* Handles entry into the deprivileged mode and returning from this
> + * mode. This requires copying the current Xen privileged stack across
> + * to a per-vcpu buffer as we need to be able to handle interrupts and
> + * exceptions whilst in this mode. Xen is non-preemptable so our
> + * privileged mode stack would  be clobbered if we did not save it.
> + *
> + * If we are entering deprivileged mode, then we use a sysret to get there.
> + * If we are returning from deprivileged mode, then we need to unwind the 
> stack
> + * so we copy it back over the current stack so that we can return from the
> + * call path where we came in from.
> + *
> + * We're doing sort-of a long jump/set jump with copying to a stack to
> + * preserve it and allow returning code to continue executing from
> + * within this method.
> + */
> +ENTRY(hvm_deprivileged_user_mode_asm)
> +        /* Save our registers */
> +        push   %rax
> +        push   %rbx
> +        push   %rcx
> +        push   %rdx
> +        push   %rsi
> +        push   %rdi
> +        push   %rbp
> +        push   %r8
> +        push   %r9
> +        push   %r10
> +        push   %r11
> +        push   %r12
> +        push   %r13
> +        push   %r14
> +        push   %r15
> +        pushfq
> +
> +        /* Perform a near call to push rip onto the stack */
> +        call   1f
> +
> +        /* Magic: Add to the stored rip the size of the code between
> +         * label 1 and label 2. This allows  us to restart execution at 
> label 2.
> +         */
> +1:      addq   $2f-1b, (%rsp)
> +
> +        GET_CURRENT(%r8)
> +        xor    %rsi, %rsi
> +
> +        /* The following is equivalent to
> +         * (get_cpu_info() + sizeof(struct cpu_info))
> +         * This gets us to the top of the stack.
> +         */
> +        GET_STACK_BASE(%rcx)
> +        addq   $STACK_SIZE, %rcx
> +
> +        movq   VCPU_stack(%r8), %rdi
> +
> +        /* We need copy the current stack across to our buffer
> +         * Calculate the number of bytes to copy:
> +         * (top of stack - current stack pointer)
> +         * NOTE: We must not push any more data onto our stack after this 
> point
> +         * as it won't be saved.
> +         */
> +        sub    %rsp, %rcx
> +
> +        /* If the stack is too big, we don't do the copy: handled by caller. 
> */
> +        cmpq   $STACK_SIZE, %rcx
> +        ja     3f
> +
> +        mov    %rsp, %rsi
> +/* USER MODE ENTRY POINT */
> +2:
> +        /* More magic: If we came here from preparing to go into user mode,

There is a very fine line between magic and gross hack ;)

I havn't quite decided which this is yet, but it certainly is neat, if
rather opaque.

> +         * then we copy our current stack to the buffer (the lines above
> +         * have setup rsi, rdi and rcx to do this).
> +         *
> +         * If we came here from user mode, then we movsb to copy from
> +         * our buffer into our current stack so that we can continue
> +         * execution from the current code point, and return back to the 
> guest
> +         * via the path we came in. rsi, rdi and rcx have been setup by the
> +         * de-privileged return path for this.
> +         */
> +        rep    movsb
> +        mov    %rsp, %rsi
> +
> +        GET_CURRENT(%r8)
> +        movq   VCPU_user_mode(%r8), %rdx
> +        movq   VCPU_rsp(%r8), %rax
> +
> +        /* If !user_mode  */
> +        cmpq   $0, %rdx
> +        jne    3f
> +        cli
> +
> +        movabs $HVM_DEPRIVILEGED_TEXT_ADDR, %rcx /* RIP in user mode */
> +
> +        movq   $0x10200, %r11          /* RFLAGS user mode enable interrupts 
> */

Please use $(X86_FLAGS_IF | X86_FLAGS_MBS) to be more clear which flags
are being set.

Also, by enabling interrupts, you need some hook to short-circuit the
scheduling softirq.  As it currently stands, a timer interrupt
interrupting depriv mode is liable to swap all your state out from under
you.

> +        movq   $1, VCPU_user_mode(%r8) /* Now in user mode */
> +        movq   %rsi, VCPU_rsp(%r8)     /* The rsp to restore to */
> +
> +        /* Stack ptr is set by user mode to avoid race conditions.

What race condition are you referring to?

> +         * See Intel manual 2 on the sysret instruction.

As a general rule, read both the Intel and the AMD manual for bits like
this.  sysret is one of the areas where implementations differ.

> +         */
> +        movq   $HVM_STACK_PTR, %rbx
> +        sysretq                         /* Enter deprivileged mode */
> +
> +3:      GET_CURRENT(%r8)
> +        movq   %rsi, VCPU_rsp(%r8)
> +        pop    %rax    /* Pop off rip: used in a jump so still on stack */
> +
> +        /* Restore registers */
> +        popfq
> +        pop    %r15
> +        pop    %r14
> +        pop    %r13
> +        pop    %r12
> +        pop    %r11
> +        pop    %r10
> +        pop    %r9
> +        pop    %r8
> +        pop    %rbp
> +        pop    %rdi
> +        pop    %rsi
> +        pop    %rdx
> +        pop    %rcx
> +        pop    %rbx
> +        pop    %rax
> +        ret
> +
> +/* Finished in user mode so return */
> +ENTRY(hvm_deprivileged_finish_user_mode_asm)
> +        /* The source is the copied stack in our buffer.
> +         * The destination is our current stack.
> +         *
> +         * We need to:
> +         * - Move the stack pointer to where it was before we entered
> +         *   deprivileged mode.
> +         * - Setup rsi, rdi and rcx as needed to perform the copy
> +         * - Jump to the address held at the top of the stack which
> +         *   is the user mode return address
> +         */
> +        cli
> +        GET_CURRENT(%rbx)
> +        movq   VCPU_stack(%rbx), %rsi
> +        movq   VCPU_rsp(%rbx), %rdi
> +
> +        /* The return address that the near call pushed onto the
> +         * buffer is pointed to by stack, so use that for rip.
> +         */
> +        movq   %rdi, %rsp
> +
> +        /* The following is equivalent to
> +         * (get_cpu_info() + sizeof(struct cpu_info) - vcpu->rsp)
> +         * This works out how many bytes we need to copy:
> +         * (top of stack - bottom of stack)
> +         */
> +        GET_STACK_BASE(%rcx)
> +        addq   $STACK_SIZE, %rcx
> +        subq   %rdi, %rcx
> +
> +        /* Go to user mode return code */
> +        jmp    *(%rsi)
> +
> +/* Entry point from the assembly syscall handlers */
> +ENTRY(hvm_deprivileged_handle_user_mode)
> +
> +        /* Handle a user mode hypercall here */
> +
> +
> +        /* We are finished in user mode */
> +        call hvm_deprivileged_finish_user_mode
> +
> +        ret
> +
> +.section .hvm_deprivileged_enhancement.text,"ax"
> +/* HVM deprivileged code */
> +ENTRY(hvm_deprivileged_ring3)
> +        /* sysret has loaded eip from rcx and rflags from r11.
> +         * CS and SS have been loaded from the MSR for ring 3.
> +         * We now need to  switch to the user mode stack
> +         */
> +        /* Setup usermode stack */
> +        movabs $HVM_STACK_PTR, %rsp
> +
> +        /* Perform user mode processing */
> +
> +        mov $0xf, %rcx
> +1: dec  %rcx
> +        cmp $0, %rcx
> +        jne 1b
> +
> +        /* Return to ring 0 */
> +        syscall
> +
> +.previous
> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> index c32d863..595b0f2 100644
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -59,6 +59,8 @@
>  #include <asm/event.h>
>  #include <asm/monitor.h>
>  #include <public/arch-x86/cpuid.h>
> +#include <xen/hvm/deprivileged.h>
> +
>  
>  static bool_t __initdata opt_force_ept;
>  boolean_param("force-ept", opt_force_ept);
> @@ -194,6 +196,10 @@ void vmx_save_host_msrs(void)
>          set_bit(VMX_INDEX_MSR_ ## address, &host_msr_state->flags);     \
>      } while ( 0 )
>  
> +struct vmx_msr_state *get_host_msr_state(void) {
> +    return &this_cpu(host_msr_state);
> +}
> +
>  static enum handler_return
>  long_mode_do_msr_read(unsigned int msr, uint64_t *msr_content)
>  {
> @@ -272,6 +278,7 @@ long_mode_do_msr_write(unsigned int msr, uint64_t 
> msr_content)
>      case MSR_LSTAR:
>          if ( !is_canonical_address(msr_content) )
>              goto uncanonical_address;
> +

Please avoid spurious changes like this.

>          WRITE_MSR(LSTAR);
>          break;
>  
> diff --git a/xen/arch/x86/x86_64/asm-offsets.c 
> b/xen/arch/x86/x86_64/asm-offsets.c
> index 447c650..fd5de44 100644
> --- a/xen/arch/x86/x86_64/asm-offsets.c
> +++ b/xen/arch/x86/x86_64/asm-offsets.c
> @@ -115,6 +115,11 @@ void __dummy__(void)
>      OFFSET(VCPU_nsvm_hap_enabled, struct vcpu, 
> arch.hvm_vcpu.nvcpu.u.nsvm.ns_hap_enabled);
>      BLANK();
>  
> +    OFFSET(VCPU_stack, struct vcpu, stack);
> +    OFFSET(VCPU_rsp, struct vcpu, rsp);
> +    OFFSET(VCPU_user_mode, struct vcpu, user_mode);
> +    BLANK();
> +
>      OFFSET(DOMAIN_is_32bit_pv, struct domain, arch.is_32bit_pv);
>      BLANK();
>  
> diff --git a/xen/arch/x86/x86_64/entry.S b/xen/arch/x86/x86_64/entry.S
> index 74677a2..fa9155c 100644
> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -102,6 +102,15 @@ restore_all_xen:
>          RESTORE_ALL adj=8
>          iretq
>  
> +/* Returning from user mode */
> +handle_hvm_user_mode:
> +
> +        call hvm_deprivileged_handle_user_mode
> +
> +        /* Go back into user mode */
> +        cli
> +        jmp  restore_all_guest
> +
>  /*
>   * When entering SYSCALL from kernel mode:
>   *  %rax                            = hypercall vector
> @@ -131,6 +140,14 @@ ENTRY(lstar_enter)
>          testb $TF_kernel_mode,VCPU_thread_flags(%rbx)
>          jz    switch_to_kernel
>  
> +        /* Were we in Xen's ring 3?  */
> +        push %rbx
> +        GET_CURRENT(%rbx)
> +        movq VCPU_user_mode(%rbx), %rbx
> +        cmp  $1, %rbx
> +        je   handle_hvm_user_mode
> +        pop  %rbx

No need for the movq or rbx clobber.  This entire block can be:

cmpb $1, VCPU_user_mode(%rbx)
je handle_hvm_user_mode

Similar to the $TF_kernel_mode check in context above.



> +
>  /*hypercall:*/
>          movq  %r10,%rcx
>          cmpq  $NR_hypercalls,%rax
> @@ -487,6 +504,13 @@ ENTRY(common_interrupt)
>  /* No special register assumptions. */
>  ENTRY(ret_from_intr)
>          GET_CURRENT(%rbx)
> +
> +        /* If we are in Xen's user mode, return into it */
> +        cmpq $1,VCPU_user_mode(%rbx)
> +        cli
> +        je    restore_all_guest
> +        sti
> +

None of this should be necessary - the exception frame created by
lstar_enter should cause ret_from_intr to do the correct thing.

>          testb $3,UREGS_cs(%rsp)
>          jz    restore_all_xen
>          movq  VCPU_domain(%rbx),%rax
> @@ -509,6 +533,14 @@ handle_exception_saved:
>          GET_CURRENT(%rbx)
>          PERFC_INCR(exceptions, %rax, %rbx)
>          callq *(%rdx,%rax,8)
> +
> +        /* If we are in Xen's user mode, return into it */
> +        /* TODO: Test this path */
> +        cmpq  $1,VCPU_user_mode(%rbx)
> +        cli
> +        je    restore_all_guest
> +        sti
> +
>          testb $3,UREGS_cs(%rsp)
>          jz    restore_all_xen
>          leaq  VCPU_trap_bounce(%rbx),%rdx
> @@ -664,6 +696,9 @@ handle_ist_exception:
>          movl  $EVENT_CHECK_VECTOR,%edi
>          call  send_IPI_self
>  1:      movq  VCPU_domain(%rbx),%rax
> +        /* This also handles Xen ring3 return for us.
> +         * So, there is no need to explicitly do a user mode check.
> +         */
>          cmpb  $0,DOMAIN_is_32bit_pv(%rax)
>          je    restore_all_guest
>          jmp   compat_restore_all_guest
> diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h 
> b/xen/include/asm-x86/hvm/vmx/vmx.h
> index 3fbfa44..98e269e 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
> @@ -565,4 +565,6 @@ typedef struct {
>      u16 eptp_index;
>  } ve_info_t;
>  
> +struct vmx_msr_state *get_host_msr_state(void);
> +
>  #endif /* __ASM_X86_HVM_VMX_VMX_H__ */
> diff --git a/xen/include/xen/hvm/deprivileged.h 
> b/xen/include/xen/hvm/deprivileged.h
> index 6cc803e..e42f39a 100644
> --- a/xen/include/xen/hvm/deprivileged.h
> +++ b/xen/include/xen/hvm/deprivileged.h
> @@ -68,6 +68,37 @@ int hvm_deprivileged_copy_l1(struct domain *d,
>                               unsigned int l1_flags);
>  
>  
> +/* Used to prepare each vcpu's data for user mode. Call for each HVM vcpu. */
> +int hvm_deprivileged_prepare_vcpu(struct vcpu *vcpu);
> +
> +/* Destroy each vcpu's data for Xen user mode. Again, call for each vcpu. */
> +void hvm_deprivileged_destroy_vcpu(struct vcpu *vcpu);
> +
> +/* Called to perform a user mode operation. */
> +void hvm_deprivileged_user_mode(void);
> +
> +/* Called when the user mode operation has completed */
> +void hvm_deprivileged_finish_user_mode(void);
> +
> +/* Called to move into and then out of user mode. Needed for accessing
> + * assembly features.
> + */
> +void hvm_deprivileged_user_mode_asm(void);
> +
> +/* Called on the return path to return to the correct execution point */
> +void hvm_deprivileged_finish_user_mode_asm(void);
> +
> +/* Handle any syscalls that the user mode makes */
> +void hvm_deprivileged_handle_user_mode(void);
> +
> +/* The ring 3 code */
> +void hvm_deprivileged_ring3(void);
> +
> +/* Call when inside a trap that should cause a domain crash if in user mode
> + * e.g. an invalid_op is trapped whilst in user mode.
> + */
> +void hvm_deprivileged_check_trap(const char* func_name);
> +
>  /* The segments where the user mode .text and .data are stored */
>  extern unsigned long int __hvm_deprivileged_text_start;
>  extern unsigned long int __hvm_deprivileged_text_end;
> @@ -91,4 +122,11 @@ extern unsigned long int __hvm_deprivileged_data_end;
>  
>  #define HVM_ERR_PG_ALLOC -1
>  
> +/* The user mode stack pointer.
> ++ * The stack grows down so set this to top of the stack region. Then,
> ++ * as this is 0-indexed, move into the stack, not just after it.
> ++ * Subtract 16 bytes for correct stack alignment.
> ++ */
> +#define HVM_STACK_PTR (HVM_DEPRIVILEGED_STACK_ADDR + STACK_SIZE - 16)
> +
>  #endif
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index 73d3bc8..180643e 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -137,7 +137,7 @@ void evtchn_destroy_final(struct domain *d); /* from 
> complete_domain_destroy */
>  
>  struct waitqueue_vcpu;
>  
> -struct vcpu 
> +struct vcpu

Trailing whitespace is nasty, but we avoid inflating the patch by
dropping whitespace on lines not touched by semantic changes.

>  {
>      int              vcpu_id;
>  
> @@ -158,6 +158,22 @@ struct vcpu
>  
>      void            *sched_priv;    /* scheduler-specific data */
>  
> +    /* HVM deprivileged mode state */
> +    void *stack;             /* Location of stack to save data onto */
> +    unsigned long rsp;       /* rsp of our stack to restore our data to */
> +    unsigned long user_mode; /* Are we (possibly moving into) in user mode? 
> */
> +
> +    /* The mstar of the processor that we are currently executing on.
> +     *  we need to save this because Xen does lazy saving of these.
> +     */
> +    unsigned long int msr_lstar; /* lstar */
> +    unsigned long int msr_star;

There should be no need to store this like this.  Follow what the
current context switching code does.

~Andrew

> +
> +    /* Debug info */
> +    unsigned long int old_rsp;
> +    unsigned long int old_processor;
> +    unsigned long int old_msr_lstar;
> +    unsigned long int old_msr_star;
>      struct vcpu_runstate_info runstate;
>  #ifndef CONFIG_COMPAT
>  # define runstate_guest(v) ((v)->runstate_guest)


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.