|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/PV: simplify (and thus correct) guest accessor functions
On 02/09/2024 1:28 pm, Jan Beulich wrote:
> Taking a fault on a non-byte-granular insn means that the "number of
> bytes not handled" return value would need extra care in calculating, if
> we want callers to be able to derive e.g. exception context (to be
> injected to the guest) - CR2 for #PF in particular - from the value. To
> simplify things rather than complicating them, reduce inline assembly to
> just byte-granular string insns. On recent CPUs that's also supposed to
> be more efficient anyway.
>
> For singular element accessors, however, alignment checks are added,
> hence slightly complicating the code. Misaligned (user) buffer accesses
> will now be forwarded to copy_{from,to}_guest_ll().
>
> Naturally copy_{from,to}_unsafe_ll() accessors end up being adjusted the
> same way, as they're produced by mere re-processing of the same code.
> Otoh copy_{from,to}_unsafe() aren't similarly adjusted, but have their
> comments made match reality; down the road we may want to change their
> return types, e.g. to bool.
>
> Fixes: 76974398a63c ("Added user-memory accessing functionality for x86_64")
> Fixes: 7b8c36701d26 ("Introduce clear_user and clear_guest")
> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
This is definitely simpler, however the code gen less so.
add/remove: 0/0 grow/shrink: 42/9 up/down: 2759/-178 (2581)
> --- a/xen/arch/x86/include/asm/uaccess.h
> +++ b/xen/arch/x86/include/asm/uaccess.h
> @@ -251,7 +251,8 @@ do {
> static always_inline unsigned long
> __copy_to_guest_pv(void __user *to, const void *from, unsigned long n)
> {
> - if (__builtin_constant_p(n)) {
> + if ( __builtin_constant_p(n) && !((unsigned long)to & (n - 1)) )
> + {
The problem is that we now emit this if condition unconditionally,
because the alignment check isn't constant-foldable. This in turn
forces setup for both the trivial case and the function call case,
compounding the bloat.
e.g. the same example:
unsigned int foo(void *ptr)
{
uint16_t val;
return __copy_from_guest_pv(&val, ptr, sizeof(val));
}
now generates:
<foo>:
48 89 f8 mov %rdi,%rax
48 83 ec 08 sub $0x8,%rsp
48 89 fe mov %rdi,%rsi
83 e0 01 and $0x1,%eax
75 31 jne <foo+0x40>
0f 1f 00 nopl (%rax) // STAC
48 89 fa mov %rdi,%rdx
49 b8 ff ff ff ff ff movabs $0xffff87ffffffffff,%r8
87 ff ff
48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi
49 39 d0 cmp %rdx,%r8
48 d1 df rcr %rdi
48 21 fa and %rdi,%rdx
66 8b 0a mov (%rdx),%cx
66 89 4c 24 06 mov %cx,0x6(%rsp)
0f 1f 00 nopl (%rax) // CLAC
48 83 c4 08 add $0x8,%rsp
c3 ret
90 nop
48 8d 7c 24 06 lea 0x6(%rsp),%rdi
ba 02 00 00 00 mov $0x2,%edx
e8 11 bc 03 00 call <copy_from_guest_ll>
48 83 c4 08 add $0x8,%rsp
c3 ret
If we're not sure of the alignment in the first place, then it's better
to hand off to copy_*_guest_ll than to emit logic like this.
However, I can't think of any way of doing this without forgoing the
optimisation entirely. We can't base anything on the type of the
guest-side pointer because while a guest expected to align it, there's
no hard requirement. In turn, this means there's about nothing we can
do with compiler heuristics in Xen.
Perhaps the right thing to do is have copy_*_guest_ll have a fastpath
for aligned single accesses, and forgo the inline code generation entirely.
~Andrew
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |