[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v10 11/20] x86/VPMU: Interface for setting PMU mode and flags



On 09/12/2014 02:49 AM, Jan Beulich wrote:
On 11.09.14 at 18:10, <boris.ostrovsky@xxxxxxxxxx> wrote:
On 09/11/2014 10:59 AM, Jan Beulich wrote:
On 11.09.14 at 16:12, <boris.ostrovsky@xxxxxxxxxx> wrote:
On 09/11/2014 02:44 AM, Jan Beulich wrote:
On 10.09.14 at 19:37, <boris.ostrovsky@xxxxxxxxxx> wrote:
On 09/10/2014 11:05 AM, Jan Beulich wrote:
On 04.09.14 at 05:41, <boris.ostrovsky@xxxxxxxxxx> wrote:
+ cont_wait:
+    /*
+     * Note that we may fail here if a CPU is hot-(un)plugged while we are
+     * waiting. We will then time out.
+     */
+    while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
+    {
+        /* Give up after 5 seconds */
+        if ( NOW() > start + SECONDS(5) )
+        {
+            printk(XENLOG_WARNING
+                   "vpmu_force_context_switch: failed to sync\n");
+            ret = -EBUSY;
+            break;
+        }
+        cpu_relax();
+        if ( hypercall_preempt_check() )
+            return hypercall_create_continuation(
+                __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
+    }
I wouldn't complain about this not being synchronized with CPU
hotplug if there wasn't this hypercall continuation and relatively
long timeout. Much of the state you latch in static variables will
cause this operation to time out if in between a CPU got brought
down.
It seemed to me that if we were to correctly deal with CPU hotplug it
would add a bit too much complexity to the code. So I felt that letting
the operation timeout would be a better way out.
The please at least add a code comment making this explicit to
future readers.
Is the comment above 'while' keyword not sufficient?
Oh, it is of course. Must have not scrolled back enough...

And as already alluded to, all this looks rather fragile anyway,
even if I can't immediately spot any problems with it anymore.
The continuation is really a carry-over from earlier patch version when
I had double loops over domain and VCPUs to explicitly unload VPMUs. At
that time Andrew pointed out that these loops may take really long time
and so I added continuations.

Now that I changed that after realizing that having each PCPU go through
a context switch is sufficient perhaps I don't need it any longer. Is
the worst case scenario of being stuck here for 5 seconds (chosen
somewhat arbitrary) acceptable without continuation?
5 seconds is _way_ too long for doing this without continuation.
Then I am also adding back your other comment from this thread

   > > +long do_xenpmu_op(int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
   > > +{
   > > +    int ret = -EINVAL;
   > > +    xen_pmu_params_t pmu_params;
   > > +
   > > +    switch ( op )
   > > +    {
   > > +    case XENPMU_mode_set:
   > > +    {
   > > +        static DEFINE_SPINLOCK(xenpmu_mode_lock);
   > > +        uint32_t current_mode;
   > > +
   > > +        if ( !is_control_domain(current->domain) )
   > > +            return -EPERM;
   > > +
   > > +        if ( copy_from_guest(&pmu_params, arg, 1) )
   > > +            return -EFAULT;
   > > +
   > > +        if ( pmu_params.val & ~XENPMU_MODE_SELF )
   > > +            return -EINVAL;
   > > +
   > > +        /*
   > > +         * Return error is someone else is in the middle of changing
mode ---
   > > +         * this is most likely indication of two system administrators
   > > +         * working against each other
   > > +         */
   > > +        if ( !spin_trylock(&xenpmu_mode_lock) )
   > > +            return -EAGAIN;
   >
   > So what happens if you can't take the lock in a continuation? If
   > returning -EAGAIN in that case is not a problem, what do you
   > need the continuation for in the first place?

EAGAIN this case means that the caller was not able to initiate the
operation. Continuation will allow the caller to finish operation in
progress.
But that's only what you want, not what the code does. Also now
that I look again I don't think the comment really applies to this if().
Oh, I see. Then both first and second will fail.

I can make the second caller reset everything so that when continuation
gets to run it will start anew. And if it (i.e. the first caller) did
get -EAGAIN while trying to get the lock then it's just as well --- the
state will be clean when user tries this again.

As for the question why continuation is needed in the firs place ---
it's to make sure this hypercall doesn't prevent other unrelated
operations from executing. Not to manage simultaneous execution of this
hypercall from multiple VCPUs (if this is what you were asking).
No, that's not what I was asking. The point I'm trying to make is - if
the caller is in need of dealing with -EAGAIN anyway (i.e. you
require it to retry), why can't you simply return -EAGAIN also for
the case where you currently use a continuation?

You mean

 while ( atomic_read(&vpmu_sched_counter) != allbutself_num )
{
        /* Give up after 5 seconds */
        if ( NOW() > start + SECONDS(5) )
{
printk(XENLOG_WARNING "vpmu_force_context_switch: failed to sync\n");
            ret = -EBUSY;
break;
}
cpu_relax();
        if ( hypercall_preempt_check() ) {
            //return hypercall_create_continuation(
            //    __HYPERVISOR_xenpmu_op, "ih", XENPMU_mode_set, arg);
            return -EAGAIN; // plus cleanup
}
    }

?

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.