[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 4/5] x86/alternative: Implement NMI/#MC-safe patching



>>> On 29.01.18 at 16:38, <andrew.cooper3@xxxxxxxxxx> wrote:
> +bool init_or_livepatch text_poke_live(const struct cpu_user_regs *regs)
> +{
> +    struct live_poke_info *i = &live_poke_info;
> +
> +    if ( unlikely(i->cpu != smp_processor_id()) )
> +    {
> +        ASSERT(regs);
> +
> +        /*
> +         * We hit a breakpoint, on a CPU which was not performing the
> +         * patching.  This is only expected to be possible for the NMI/#MC
> +         * paths, and even then, only if we hit the tiny race window below
> +         * while patching in the NMI/#MC handlers.
> +         *
> +         * We can't safely evaluate whether we hit a transient breakpoint
> +         * because i->cpu has likely completed the patch and moved on to the
> +         * next patch site.
> +         *
> +         * Go to sleep for a bit and synchronise the pipeline as we are now 
> in
> +         * a cross-modifying scenario.
> +         */
> +        cpu_relax();
> +        cpuid_eax(0);
> +
> +        /*
> +         * Presume that we hit the transient breakpoint, as we can't confirm
> +         * whether we did or not.  We don't expect there to be any other
> +         * breakpoints to hit, but if we did hit another one, then in the
> +         * worse case we will only livelock until i->cpu has finished all of
> +         * its patching.
> +         */
> +        return true;
> +    }
> +
> +    /*
> +     * We are the CPU performing the patching, and might have ended up here 
> by
> +     * hitting a breakpoint.
> +     *
> +     * Either way, we need to complete particular patch to make forwards
> +     * progress.  This logic is safe even if executed recursively in the
> +     * breakpoint handler; the worst that will happen when normal execution
> +     * resumes is that we will rewrite the same bytes a second time.
> +     */
> +
> +    /* First, insert a breakpoint to prevent execution of the patch site. */
> +    i->addr[0] = 0xcc;
> +    smp_wmb();

This is necessary, but not sufficient when replacing more than a
single insn: The other CPU may be executing instructions _after_
the initial one that is being replaced, and ...

> +    /* Second, copy the remaining instructions into place. */
> +    memcpy(i->addr + 1, i->opcode + 1, i->len - 1);

... this may be altering things underneath its feet.

> @@ -153,7 +231,31 @@ void init_or_livepatch add_nops(void *insns, unsigned 
> int len)
>  void init_or_livepatch noinline
>  text_poke(void *addr, const void *opcode, size_t len, bool live)
>  {
> -    memcpy(addr, opcode, len);
> +    if ( !live || len == 1 )
> +    {
> +        /*
> +         * If we know *addr can't be executed, or we are patching a single
> +         * byte, it is safe to use a straight memcpy().
> +         */
> +        memcpy(addr, opcode, len);

Is it really worth special casing this? Whether to actually ack
patches 2 and 3 depends on that.

> +    }
> +    else
> +    {
> +        /*
> +         * If not, arrange safe patching via the use of breakpoints.  
> Ordering
> +         * of actions here are between this CPU, and the instruction fetch of
> +         * the breakpoint exception handler on any CPU.
> +         */
> +        live_poke_info = (struct live_poke_info){
> +            addr, opcode, len, smp_processor_id()

Better use C99 field initializers here? At which point (together with
the next comment) it may become better not to use a compound
initializer in the first place.

> +        };
> +        smp_wmb();
> +        active_text_patching = true;
> +        smp_wmb();
> +        text_poke_live(NULL);
> +        smp_wmb();
> +        active_text_patching = false;

Perhaps better to zap live_poke_info.cpu again here? That could
in fact replace the separate active_text_patching flag.

> --- a/xen/arch/x86/traps.c
> +++ b/xen/arch/x86/traps.c
> @@ -119,6 +119,8 @@ boolean_param("ler", opt_ler);
>  #define stack_words_per_line 4
>  #define ESP_BEFORE_EXCEPTION(regs) ((unsigned long *)regs->rsp)
>  
> +bool active_text_patching;

Why here rather than in alternative.c?

> --- a/xen/arch/x86/x86_64/entry.S
> +++ b/xen/arch/x86/x86_64/entry.S
> @@ -530,7 +530,10 @@ ENTRY(page_fault)
>          movl  $TRAP_page_fault,4(%rsp)
>  /* No special register assumptions. */
>  GLOBAL(handle_exception)
> -        SAVE_ALL CLAC
> +        SAVE_ALL
> +
> +handle_exception_gprs_saved:
> +        ASM_CLAC

I'm not convinced it is a good idea to defer the CLAC here. I see
no problem doing the CLAC below in the INT3 path before jumping
here.

> @@ -686,9 +689,36 @@ ENTRY(debug)
>          jmp   handle_exception
>  
>  ENTRY(int3)
> +        /* For patching-safety, there must not be any alternatives here. */
>          pushq $0
>          movl  $TRAP_int3,4(%rsp)
> -        jmp   handle_exception
> +
> +        /* If there is no patching active, continue normally.  */
> +        cmpb  $1, active_text_patching(%rip)

I think it is better to compare against zero in cases like this. But
then - is this safe? There's no guarantee that the INT3 handling
on a non-patching CPU makes it here before the patching CPU
clears the flag again.

Additionally the comment near anything like this should probably
call out that no CPU can be in guest context at the point of any
patching activity (otherwise you'd have to check the saved
CS.RPL first).

> +        jne   handle_exception
> +
> +        /*
> +         * We hit a debug trap, but not necessarily the one from active
> +         * patching.  Let text_poke_live() work out what to do.
> +         */
> +        SAVE_ALL
> +        mov   %rsp, %rdi
> +        call  text_poke_live
> +
> +        /*
> +         * Does text_poke_live() think we hit the transient debug trap?  If
> +         * not, continue down the normal int3 handler.
> +         */
> +        cmp   $0, %eax

test %al, %al (and then jz below)

> +        je    handle_exception_gprs_saved
> +
> +        /*
> +         * We think we hit the transient debug trap.  text_poke_live() has
> +         * probably completed the patching, so rewind the instruction pointer
> +         * and try again.
> +         */
> +        subq  $1, UREGS_rip(%rsp)
> +        jmp   restore_all_xen

If it was an NMI that was interrupted, you mustn't return using
IRET.

Overall I think your attempt to solve two problems at once here
goes beyond what we need immediately: Live patching in an NMI/
#MC safe way ought to be a second step. The immediate goal
should be to make boot time alternatives patching NMI/#MC safe,
and that can be had with a slimmed down version of this patch
(as at that time only a single CPU is up); most of the not purely
mechanical issues pointed out above are solely related to the
live patching scenario.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.