[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v2 09/13] xsplice: Implement support for applying/reverting/replacing patches. (v2)

To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
From: Ross Lagerwall <ross.lagerwall@xxxxxxxxxx>
Date: Mon, 25 Jan 2016 11:43:10 +0000
Cc: wei.liu2@xxxxxxxxxx, ian.campbell@xxxxxxxxxx, andrew.cooper3@xxxxxxxxxx, ian.jackson@xxxxxxxxxxxxx, mpohlack@xxxxxxxxxx, stefano.stabellini@xxxxxxxxxx, jbeulich@xxxxxxxx, sasha.levin@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx
Delivery-date: Mon, 25 Jan 2016 11:43:25 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 01/19/2016 04:55 PM, Konrad Rzeszutek Wilk wrote:
snip

+/* Must be holding the payload_list lock. */


payload lock?

+static int schedule_work(struct payload *data, uint32_t cmd)
+{
+    /* Fail if an operation is already scheduled. */
+    if ( xsplice_work.do_work )
+        return -EAGAIN;


Hmm, I don't think EAGAIN is correct. It will cause xen-xsplice to poll for
a status update, but the operation hasn't actually been submitted.


-EBUSY -EDEADLK ?


I would choose -EBUSY.

+
+    xsplice_work.cmd = cmd;
+    xsplice_work.data = data;
+    atomic_set(&xsplice_work.semaphore, -1);
+    atomic_set(&xsplice_work.irq_semaphore, -1);
+
+    xsplice_work.ready = 0;
+    smp_wmb();
+    xsplice_work.do_work = 1;
+    smp_wmb();
+
+    return 0;
+}
+
+/*
+ * Note that because of this NOP code the do_nmi is not safely patchable.
+ * Also if we do receive 'real' NMIs we have lost them.
+ */
+static int mask_nmi_callback(const struct cpu_user_regs *regs, int cpu)
+{
+    return 1;
+}
+
+static void reschedule_fn(void *unused)
+{
+    smp_mb(); /* Synchronize with setting do_work */
+    raise_softirq(SCHEDULE_SOFTIRQ);
+}
+
+static int xsplice_do_wait(atomic_t *counter, s_time_t timeout,
+                           unsigned int total_cpus, const char *s)
+{
+    int rc = 0;
+
+    while ( atomic_read(counter) != total_cpus && NOW() < timeout )
+        cpu_relax();
+
+    /* Log & abort. */
+    if ( atomic_read(counter) != total_cpus )
+    {
+        printk(XENLOG_DEBUG "%s: %s %u/%u\n", xsplice_work.data->name,
+               s, atomic_read(counter), total_cpus);
+        rc = -EBUSY;
+        xsplice_work.data->rc = rc;
+        xsplice_work.do_work = 0;
+        smp_wmb();
+        return rc;
+    }
+    return rc;
+}
+
+static void xsplice_do_single(unsigned int total_cpus)
+{
+    nmi_callback_t saved_nmi_callback;
+    s_time_t timeout;
+    struct payload *data, *tmp;
+    int rc;
+
+    data = xsplice_work.data;
+    timeout = data->timeout ? data->timeout : MILLISECS(30);


The design doc says that a timeout of 0 means infinity.


True. Lets update the document.

+    printk(XENLOG_DEBUG "%s: timeout is %"PRI_stime"ms\n", data->name,
+           timeout / MILLISECS(1));
+
+    timeout += NOW();
+
+    if ( xsplice_do_wait(&xsplice_work.semaphore, timeout, total_cpus,
+                         "Timed out on CPU semaphore") )
+        return;
+
+    /* "Mask" NMIs. */
+    saved_nmi_callback = set_nmi_callback(mask_nmi_callback);
+
+    /* All CPUs are waiting, now signal to disable IRQs. */
+    xsplice_work.ready = 1;
+    smp_wmb();
+
+    atomic_inc(&xsplice_work.irq_semaphore);
+    if ( xsplice_do_wait(&xsplice_work.irq_semaphore, timeout, total_cpus,
+                         "Timed out on IRQ semaphore.") )
+        return;
+
+    local_irq_disable();


As far as I can tell, the mechanics of how this works haven't changed, the
code has just been reorganized. Which means the points that Martin raised
about this mechanism are still outstanding.


A bit. I added the extra timeout on both of the 'spin-around' and also
moved some of the barriers around. Also removed your spin-lock and used
the atomic_t mechanism to synchronize.

But the one thing that I didn't do was the spin on the 'workers?' that
are just spinnig idly. They will do that forever if say the 'master'
hasn't gone to the IRQ semaphore part.

My thinking was that the 'workers' could also use the timeout feature
but just multiple it by two?

After looking at this again, I remembered that the algorithm I used isthe same as the one used by stop_machine_run(). That function runswithout timeouts at all (seemingly without problems), so why shouldn'tthis one? (The only reason stop_machine_run() itself isn't used forpatching is because we need to enter the function without a stack, i.e.not from a tasklet.)


--
Ross Lagerwall

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

References:
- [Xen-devel] [PATCH v2] xSplice v1 implementation.
  - From: Konrad Rzeszutek Wilk
- Re: [Xen-devel] [PATCH v2 09/13] xsplice: Implement support for applying/reverting/replacing patches. (v2)
  - From: Ross Lagerwall
- Re: [Xen-devel] [PATCH v2 09/13] xsplice: Implement support for applying/reverting/replacing patches. (v2)
  - From: Konrad Rzeszutek Wilk

Prev by Date: Re: [Xen-devel] [RFC] VirtFS support on Xen
Next by Date: [Xen-devel] [linux-4.1 test] 78925: regressions - trouble: broken/fail/pass
Previous by thread: Re: [Xen-devel] [PATCH v2 09/13] xsplice: Implement support for applying/reverting/replacing patches. (v2)
Next by thread: Re: [Xen-devel] [PATCH v2 10/13] xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.