Re: [Xen-devel] [PATCH v14 for-xen-4.5 11/21] x86/VPMU: Interface for setting PMU mode and flags

On 10/28/2014 04:29 AM, Jan Beulich wrote:
On 27.10.14 at 19:52, <boris.ostrovsky@xxxxxxxxxx> wrote:
On 10/27/2014 12:24 PM, Jan Beulich wrote:
On 17.10.14 at 23:17, <boris.ostrovsky@xxxxxxxxxx> wrote:
+    mycpu = smp_processor_id();
+    if ( sync_vcpu != NULL ) /* if set, we may be in hypercall continuation */
+    {
+        if (sync_vcpu != curr_vcpu )
+            /* We are not the original caller */
+            return -EAGAIN;
That would seem to be the wrong return value then. Also, the HAP
side fix for XSA-97 taught us that identifying a caller by vCPU is
problematic - in the course of the retries the kernel's scheduler
may move the calling process to a different vCPU, yet it's still the
legitimate original caller.
If the process is rescheduled then we will time out operation.
And potentially never be able to complete it. Not an acceptable
model imo.

I suppose I can set a bit in the argument's val to mark that particular
argument as pending a continuation completion (I don't think we need to
worry about malicious domain here since this is a privileged operation).
Privileged in the sense that it (conceptually) will always be restricted
to the hardware domain (or one granted equivalent privileges)?
Please don't forget about disaggregation - newly added code will not
get exceptions granted along the lines of XSA-77.

Also "I can set a bit in ..." is too vague to say whether that would end
up being an acceptable approach. The rationale behind the final
behavior we gave the XSA-97 fix was that if the operation is privileged
enough, it is okay for any vCPU of the originating domain to continue
the current one (including the non-determinism of which of them will
see the final successful completion of the hypercall, should more than
one of them race). I think you ought to follow that model here and
store/check the domain rather than the vCPU, in which case I don't
think you'll need any extra bit(s).

I am not sure just keeping domainID is sufficient in this case. True, it doesn't matter which VCPU completes the operation but what we want to avoid is to have two simultaneous (and possibly different) requests from the same domain. If we keep it as some sort of a static variable (like I do now with sync_vcpu) then it will be difficult to distinguish which request is the continuation and which is a new one.

What I was suggesting is keeping some sort of state in the hypercall argument making it unique to the call. I said "a bit" but it can be, for example, setting the pad value in xen_pmu_params to some cookie (although that's probably not a particularly good idea since then the caller/domain would have to clear it before making the hypercall). So, if we set, say, the upper bit in xen_pmu_params.val before creating continuation then when we come back we will know for sure that this is indeed the continuation and not a new call.

The comment about privileged domain was to mean that we don't need to worry that the caller may maliciously try setting this bit in the hope of causing trouble. The caller can only be a privileged guest (even if it is the "disaggregated" guest that is responsible for VPMU management) and if it does it --- well, it will break VPMUs but nothing else. As far as I can tell, that is.

+        goto cont_wait;
+    }
+    for_each_online_cpu ( i )
+    {
+        if ( i == mycpu )
+            continue;
+        per_cpu(sync_task, i) = xmalloc(struct tasklet);
+        if ( per_cpu(sync_task, i) == NULL )
+        {
+            printk(XENLOG_WARNING "vpmu_force_context_switch: out of 
+            ret = -ENOMEM;
+            goto out;
+        }
+        tasklet_init(per_cpu(sync_task, i), vpmu_sched_checkin, 0);
+    }
+    /* First count is for self */
+    atomic_set(&vpmu_sched_counter, 1);
+    for_each_online_cpu ( i )
+    {
+        if ( i != mycpu )
+            tasklet_schedule_on_cpu(per_cpu(sync_task, i), i);
+    }
+    vpmu_save(current);
+    sync_vcpu = curr_vcpu;
+    start = NOW();
+ cont_wait:
+    /*
+     * Note that we may fail here if a CPU is hot-plugged while we are
+     * waiting. We will then time out.
+     */
And I continue to miss the handling of the hot-unplug case (or at the
very least a note on this being unimplemented [and going to crash],
to at least clarify matters to the curious reader).
Where would we crash? I have no interest in that.
per_cpu() accesses are invalid for offline CPUs.


How about I get/put_cpu_maps() to prevent hotplug/unplug while we are doing this?

+    while ( atomic_read(&vpmu_sched_counter) != numcpus )
+    {
+        s_time_t now;
+        cpu_relax();
+        now = NOW();
+        /* Give up after (arbitrarily chosen) 5 seconds */
+        if ( now > start + SECONDS(5) )
+        {
+            printk(XENLOG_WARNING
+                   "vpmu_force_context_switch: failed to sync\n");
+            ret = -EBUSY;
+            break;
+        }
+        if ( hypercall_preempt_check() )
+            return hypercall_create_continuation(
+                __HYPERVISOR_xenpmu_op, "i", XENPMU_mode_set);
Did you test this code path? I don't see how with the missing second
hypercall argument the continuation could reliably succeed.
I did test it (and retested it now) and it works. I guess it may be
picking the same value from the stack during continuation which is why
it does not fail.
Oh, right, hypercall argument clobbering (in debug builds) gets
skipped for continuations (and no clobbering is being done for
HVM/PVH at all). But I don't think you should rely on this, i.e. the
invocation above should get fixed in any event.

Of course. I was simply saying why the tests passed, not why it should stay this way.


Xen-devel mailing list



