[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v3 07/23] xsplice: Implement support for applying/reverting/replacing patches. (v5)
. snip.. > > + * Note that because of this NOP code the do_nmi is not safely patchable. > > + * Also if we do receive 'real' NMIs we have lost them. > > The MCE path needs consideration as well. Unlike the NMI path however, > that one cannot be ignored. > > In both cases, it might be best to see about raising a tasklet or > softirq to pick up some deferred work. I will put that in a seperate patch as this is patch is big enough. > > > + */ > > +static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu) > > +{ > > + return 1; > > +} > > + > > +static void reschedule_fn(void *unused) > > +{ > > + smp_mb(); /* Synchronize with setting do_work */ > > + raise_softirq(SCHEDULE_SOFTIRQ); > > As you have to IPI each processor to raise a schedule softirq, you can > set a per-cpu "xsplice enter rendezvous" variable. This prevents the > need for the return-to-guest path to poll one single byte. .. Not sure I follow. The IPI we send to the other CPU is 0xfb - which makes the smp_call_function_interrupt run, which calls this function: reschedule_fn(). Then raise_softirq sets the bit on softirq_pending. Great. Since we caused an IPI that means we ended up calling VMEXIT which eventually ends calling process_pending_softirqs() which calls schedule(). And after that it calls check_for_xsplice_work(). Are you suggesting to add new softirq that would call in check_for_xsplice_work()? Or are you suggesting to skip the softirq_pending check and all the code around that and instead have each VMEXIT code path check this per-cpu "xsplice enter" variable? If so, why not use the existing softirq infrastructure? .. snip.. > > > +} > > + > > +void do_xsplice(void) > > +{ > > + struct payload *p = xsplice_work.data; > > + unsigned int cpu = smp_processor_id(); > > + > > + /* Fast path: no work to do. */ > > + if ( likely(!xsplice_work.do_work) ) > > + return; > > + ASSERT(local_irq_is_enabled()); > > + > > + /* Set at -1, so will go up to num_online_cpus - 1 */ > > + if ( atomic_inc_and_test(&xsplice_work.semaphore) ) > > + { > > + unsigned int total_cpus; > > + > > + if ( !get_cpu_maps() ) > > + { > > + printk(XENLOG_DEBUG "%s: CPU%u - unable to get cpu_maps > > lock.\n", > > + p->name, cpu); > > + xsplice_work.data->rc = -EBUSY; > > + xsplice_work.do_work = 0; > > + return; > > This error path leaves a ref in the semaphore. It does. And it also does so in xsplice_do_single() - if the xsplice_do_wait() fails, > > > + } > > + > > + barrier(); /* MUST do it after get_cpu_maps. */ > > + total_cpus = num_online_cpus() - 1; > > + > > + if ( total_cpus ) > > + { > > + printk(XENLOG_DEBUG "%s: CPU%u - IPIing the %u CPUs.\n", > > p->name, > > + cpu, total_cpus); > > + smp_call_function(reschedule_fn, NULL, 0); > > + } > > + (void)xsplice_do_single(total_cpus); .. here, we never decrement the semaphore. Which is a safe-guard (documenting that). The issue here is that say we have two CPUs: CPU0 CPU1 semaphore=0 semaphore=1 !get_cpu_maps() do_work = 0; .. now goes in the 'slave' part below and exits out as do_work=0 Now if we decremented the semaphore back on the error path: CPU0 CPU1 semaphore=0 !get_cpu_maps() .. do_work is still set. do_work = 0; semaphore=-1 atomic_inc_and_test(semaphore) == 0 .. now it assumes the role of a master. .. it will fail as the other CPU will never renezvous (the do_work is set to zero). But we waste another 30ms spinning. The end result is that after patching the semaphore should equal num_online_cpus-1. > > + > > + ASSERT(local_irq_is_enabled()); > > + > > + put_cpu_maps(); > > + > > + printk(XENLOG_DEBUG "%s finished with rc=%d\n", p->name, p->rc); > > + } > > + else > > + { > > + /* Wait for all CPUs to rendezvous. */ > > + while ( xsplice_work.do_work && !xsplice_work.ready ) > > + { > > + cpu_relax(); > > + smp_rmb(); > > + } > > + > > What happens here if the rendezvous initiator times out? Looks like we > will spin forever waiting for do_work which will never drop back to 0. Ross answered that, but the other code (master) will set do_work to zero so we will exit this. > > > + /* Disable IRQs and signal. */ > > + local_irq_disable(); > > + atomic_inc(&xsplice_work.irq_semaphore); > > + > > + /* Wait for patching to complete. */ > > + while ( xsplice_work.do_work ) Ditto for this. > > + { > > + cpu_relax(); > > + smp_rmb(); > > + } > > + local_irq_enable(); > > Splitting the modification of do_work and ready across multiple > functions makes it particularly hard to reason about the correctness of > the rendezvous. It would be better to have a xsplice_rendezvous() > function whose purpose was to negotiate the rendezvous only, using local > static state. The action can then be just the switch() from > xsplice_do_single(). The earlier code was like that but it ended up being quite big. Let me make it happen and leave the actions in the xsplice_do_single() (and rename it to xsplice_do_action(). > > > + } > > +} > > + > > diff --git a/xen/include/asm-arm/nmi.h b/xen/include/asm-arm/nmi.h > > index a60587e..82aff35 100644 > > --- a/xen/include/asm-arm/nmi.h > > +++ b/xen/include/asm-arm/nmi.h > > @@ -4,6 +4,19 @@ > > #define register_guest_nmi_callback(a) (-ENOSYS) > > #define unregister_guest_nmi_callback() (-ENOSYS) > > > > +typedef int (*nmi_callback_t)(const struct cpu_user_regs *regs, int cpu); > > + > > +/** > > + * set_nmi_callback > > + * > > + * Set a handler for an NMI. Only one handler may be > > + * set. Return the old nmi callback handler. > > + */ > > +static inline nmi_callback_t set_nmi_callback(nmi_callback_t callback) > > +{ > > + return NULL; > > +} > > + > > This addition suggests that there should probably be an > arch_xsplice_prepair_rendezvous() and arch_xsplice_finish_rendezvous(). Yes indeed. > > ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |