[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
Date: Tue, 23 Feb 2016 21:10:30 +0000
Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Keir Fraser <keir@xxxxxxx>, Ian Campbell <ian.campbell@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, jinsong.liu@xxxxxxxxxxxxxxx, xen-devel@xxxxxxxxxxxxx, mpohlack@xxxxxxxxx, ross.lagerwall@xxxxxxxxxx, Stefano Stabellini <stefano.stabellini@xxxxxxxxxx>, Aravind Gopalakrishnan <Aravind.Gopalakrishnan@xxxxxxx>, Jan Beulich <jbeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>, Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>, sasha.levin@xxxxxxxxxx
Delivery-date: Tue, 23 Feb 2016 21:10:55 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 23/02/2016 20:41, Konrad Rzeszutek Wilk wrote:
> . snip..
>>> + * Note that because of this NOP code the do_nmi is not safely patchable.
>>> + * Also if we do receive 'real' NMIs we have lost them.
>> The MCE path needs consideration as well.  Unlike the NMI path however,
>> that one cannot be ignored.
>>
>> In both cases, it might be best to see about raising a tasklet or
>> softirq to pick up some deferred work.
> I will put that in a seperate patch as this is patch is big enough.

Actually, after subsequent thought, raising a tasklet wont help.

The biggest risk is the SMAP alternative in the asm entrypoints.  The
only way patching that can be made safe is to play fun and games with
debug traps. i.e.

Patch a 0xcc first
Then patch the rest of the bytes in the replacement
Then replace the 0xcc with the first byte of the replacement

This way if the codepath is hit while patching is in progress, you will
end up in the debug trap handler rather than executing junk.  There then
has to be some scheduling games for the NMI/MCE handler to take over
patching the code if it interrupted the patching pcpu.  Patching in
principle is a short operation, so performing it the handlers is not too
much of a problem.

The tricky part is patching the top of the debug trap handler and not
ending in an infinite loop.  I have a cunning idea, and will see if I
can find some copious free time to experiment with.

For v1 however, the implementation is fine.  It can be documented that
patching functions on the NMI/MCE path is liable to end in sadness, and
the asm entry points will have been taken care of during boot.

>
>>> + */
>>> +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
>>> +{
>>> +    return 1;
>>> +}
>>> +
>>> +static void reschedule_fn(void *unused)
>>> +{
>>> +    smp_mb(); /* Synchronize with setting do_work */
>>> +    raise_softirq(SCHEDULE_SOFTIRQ);
>> As you have to IPI each processor to raise a schedule softirq, you can
>> set a per-cpu "xsplice enter rendezvous" variable.  This prevents the
>> need for the return-to-guest path to poll one single byte.
> .. Not sure I follow. The IPI we send to the other CPU is 0xfb - which
> makes the smp_call_function_interrupt run, which calls this function:
> reschedule_fn(). Then raise_softirq sets the bit on softirq_pending.

Correct

>
> Great. Since we caused an IPI that means we ended up calling VMEXIT which
> eventually ends calling process_pending_softirqs() which calls schedule().
> And after that it calls check_for_xsplice_work().

Correct

> Are you suggesting to add new softirq that would call in 
> check_for_xsplice_work()?

No.  I am concerned that check_for_xsplice_work() is reading a single
global variable which, when patching is occurring, will be in a
repeatedly-dirtied cacheline.

> Or are you suggesting to skip the softirq_pending check and all the
> code around that and instead have each VMEXIT code path check this
> per-cpu "xsplice enter" variable? If so, why not use the existing
> softirq infrastructure? 

What I am suggesting is having reschedule_fn() set
this_cpu(xsplice_work_pending) = 1 and have check_for_xsplice_work()
check this_cpu(xsplice_work_pending) rather than the global semaphore.

This should ease the impact on the cache coherency fabric.

>
> .. snip..
>>> +}
>>> +
>>> +void do_xsplice(void)
>>> +{
>>> +    struct payload *p = xsplice_work.data;
>>> +    unsigned int cpu = smp_processor_id();
>>> +
>>> +    /* Fast path: no work to do. */
>>> +    if ( likely(!xsplice_work.do_work) )
>>> +        return;
>>> +    ASSERT(local_irq_is_enabled());
>>> +
>>> +    /* Set at -1, so will go up to num_online_cpus - 1 */
>>> +    if ( atomic_inc_and_test(&xsplice_work.semaphore) )
>>> +    {
>>> +        unsigned int total_cpus;
>>> +
>>> +        if ( !get_cpu_maps() )
>>> +        {
>>> +            printk(XENLOG_DEBUG "%s: CPU%u - unable to get cpu_maps 
>>> lock.\n",
>>> +                   p->name, cpu);
>>> +            xsplice_work.data->rc = -EBUSY;
>>> +            xsplice_work.do_work = 0;
>>> +            return;
>> This error path leaves a ref in the semaphore.
> It does. And it also does so in xsplice_do_single() - if the xsplice_do_wait()
> fails, 
>>> +        }
>>> +
>>> +        barrier(); /* MUST do it after get_cpu_maps. */
>>> +        total_cpus = num_online_cpus() - 1;
>>> +
>>> +        if ( total_cpus )
>>> +        {
>>> +            printk(XENLOG_DEBUG "%s: CPU%u - IPIing the %u CPUs.\n", 
>>> p->name,
>>> +                   cpu, total_cpus);
>>> +            smp_call_function(reschedule_fn, NULL, 0);
>>> +        }
>>> +        (void)xsplice_do_single(total_cpus);
> .. here, we never decrement the semaphore.
>
> Which is a safe-guard (documenting that).
>
> The issue here is that say we have two CPUs:
>
> CPU0                          CPU1
>
> semaphore=0                   semaphore=1
>  !get_cpu_maps()
>   do_work = 0;                        .. now goes in the 'slave' part below 
> and exits out
>                                 as do_work=0
>
> Now if we decremented the semaphore back on the error path:
>
> CPU0                          CPU1
>
> semaphore=0                   
>  !get_cpu_maps()
>                               .. do_work is still set.
>   do_work = 0;                        
>                    
>   semaphore=-1
>                               atomic_inc_and_test(semaphore) == 0
>                               .. now it assumes the role of a master.
>
>                               .. it will fail as the other CPU will never
>                                 renezvous (the do_work is set to zero).
>                               But we waste another 30ms spinning.
>
>
> The end result is that after patching the semaphore should equal
> num_online_cpus-1.

Yay concurrency!  I am going to have to consider this more closely.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  - From: Jan Beulich

References:
- [Xen-devel] [PATCH v3] xSplice v1 implementation and design.
  - From: Konrad Rzeszutek Wilk
- [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  - From: Andrew Cooper
- Re: [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
  - From: Konrad Rzeszutek Wilk

Prev by Date: Re: [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
Next by Date: [Xen-devel] [linux-mingo-tip-master test] 83636: regressions - FAIL
Previous by thread: Re: [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
Next by thread: Re: [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.