[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v12 1/9] x86: add generic resource (e.g. MSR) access hypercall



On 15/07/14 03:23, Xu, Dongxiao wrote:
>> -----Original Message-----
>> From: Andrew Cooper [mailto:andrew.cooper3@xxxxxxxxxx]
>> Sent: Friday, July 11, 2014 5:25 PM
>> To: Xu, Dongxiao; Jan Beulich
>> Cc: Ian.Campbell@xxxxxxxxxx; George.Dunlap@xxxxxxxxxxxxx;
>> Ian.Jackson@xxxxxxxxxxxxx; stefano.stabellini@xxxxxxxxxxxxx;
>> xen-devel@xxxxxxxxxxxxx; konrad.wilk@xxxxxxxxxx; dgdegra@xxxxxxxxxxxxx;
>> keir@xxxxxxx
>> Subject: Re: [PATCH v12 1/9] x86: add generic resource (e.g. MSR) access
>> hypercall
>>
>> On 11/07/14 05:29, Xu, Dongxiao wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>>>> Sent: Friday, July 04, 2014 6:44 PM
>>>> To: Xu, Dongxiao
>>>> Cc: andrew.cooper3@xxxxxxxxxx; Ian.Campbell@xxxxxxxxxx;
>>>> George.Dunlap@xxxxxxxxxxxxx; Ian.Jackson@xxxxxxxxxxxxx;
>>>> stefano.stabellini@xxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxx;
>>>> konrad.wilk@xxxxxxxxxx; dgdegra@xxxxxxxxxxxxx; keir@xxxxxxx
>>>> Subject: Re: [PATCH v12 1/9] x86: add generic resource (e.g. MSR) access
>>>> hypercall
>>>>
>>>>>>> On 04.07.14 at 10:34, <dongxiao.xu@xxxxxxxxx> wrote:
>>>>> Add a generic resource access hypercall for tool stack or other
>>>>> components, e.g., accessing MSR, port I/O, etc.
>>>> Sigh - you're still allocating an unbounded amount of memory for
>>>> passing around the input arguments, despite it being possible (and
>>>> having been suggested) to read these from the original buffer on
>>>> each iteration. You're still not properly checking for preemption
>>>> between iterations. And you're still not making use of
>>>> continue_hypercall_on_cpu(). Plus you now silently ignore the
>>>> upper 32-bits of the passing in "idx" value as well as not
>>>> understood XEN_RESOURCE_OP_* values.
>>> continue_hypercall_on_cpu() is asynchronized, which requires the "data" 
>>> field
>> always points to the right place before the hypercall returns.
>>> However in our function, we have a "for" loop to cover multiple operations, 
>>> so
>> the "data" field will be modified in each iteration, which cannot meet the
>> continue_hypercall_on_cpu() requirements...
>>
>> This is because you are still copying all resource data at once from the
>> hypercall.
>>
>> As Jan points out, this is an unbounded allocation in Xen which must be
>> addresses.  If instead you were to copy each element one at a time, you
>> would avoid this allocation entirely and be able to correctly use
>> continue_hypercall_on_cpu().
> I've accepted the idea to copy the element one by one, however it seems that 
> it doesn't help on continue_hypercall_on_cpu().
> The full code looks like the following, where the variable "ra" will be 
> updated on every "for" loop, and couldn't be used in 
> continue_hypercall_on_cpu().
> Do you have idea on how to solve this issue and use 
> continue_hypercall_on_cpu() here?

Hmm - I am very sorry for pushing at continue_hypercall_on_cpu(), but
looking at the code, there is no possible way it can be made to work in
its current form.  It will BUG() at the second attempt to nest, making
it inappropriate to use.

For now, the on_selected_cpus() solution is acceptable, although I still
have some comments.

> static int resource_access_helper(struct xenpf_resource_op *op)
> {
>     struct xen_resource_access ra;
>     unsigned int i;
>     int ret = 0;
>
>     for ( i = 0; i < op->nr; i++ )
>     {
>         if ( copy_from_guest_offset(&ra.data, op->data, i, 1) )
>         {
>             ret = -EFAULT;
>             break;
>         }
>
>         if ( ra.data.cpu == smp_processor_id() )

The call to smp_processor_id() is expensive.  Please read once outside
the loop.

>             resource_access_one(&ra);
>         else
>             on_selected_cpus(cpumask_of(ra.data.cpu),

ra.data.cpu needs validating otherwise Xen could walk off the end of the
cpu_bit_bitmap array.

>                              resource_access_one, &ra, 1);
>
>         if ( copy_to_guest_offset(op->data, i, &ra.data, 1) )
>         {
>             ret = -EFAULT;
>             break;
>         }

You still need a preemption check here, and to possibly create a
continuation.  See the uses of hypercall_create_continuation()
(do_set_trap_table() is a particularly relevant example)

>     }
>
>     return ret;
> }
>
>>
>>> For the preemption check, what about the following? Here the preemption is
>> checked within each resource_access_one() function.
>>
>> None of this preemption works.
>>
>> In the case a hypercall gets preempted, you need to increment the guest
>> handle along to the next element to process, and decrement the count by
>> the number of elements processed in *the guest context*.
>>
>> That way, when the hypercall continues in Xen, it shall pick up with the
>> next action to perform rather than restarting from the beginning.
> Some actions (like CQM) requires the read/write the MSRs in a continuous way, 
> if it is interrupted, this "continuity" couldn't be guaranteed. The RESTART 
> return value indicates to re-run the operations.
> BTW, I tested it in my box, and the "failure" case doesn't happen frequently.
>
> Thanks,
> Dongxiao

-ERESTART should never be returned to userspace.  It is purely for
internal accounting.  -EAGAIN is also inappropriate in this case.

Xen needs some way of knowing about the indirect MSR accesses.  This
either means it needs knowledge of the MSRs themselves (inflexible in
this circumstance), or userspace could set a "dont preempt yet" flag in
a xen_resource_access struct when it knows the following access has to
be adjacent.

To prevent misuse of the "don't preempt" flag (name subject to
improvement), improper use of it ("don't preempt" flags on consecutive
'ra's, or a "don't preempt" flag between a pair of 'ra's which switch
cpu) should result in a hypercall failure.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.