[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/4] x86/pv: Introduce pv_create_exception_frame()



>>> On 09.05.17 at 19:09, <andrew.cooper3@xxxxxxxxxx> wrote:
> On 09/05/17 16:58, Jan Beulich wrote:
>>>>> On 08.05.17 at 17:48, <andrew.cooper3@xxxxxxxxxx> wrote:
>>> +void pv_create_exception_frame(void)
>>> +{
>>> +    struct vcpu *curr = current;
>>> +    struct trap_bounce *tb = &curr->arch.pv_vcpu.trap_bounce;
>> const (twice)?
>>
>>> +    struct cpu_user_regs *regs = guest_cpu_user_regs();
>>> +    const bool user_mode_frame = !guest_kernel_mode(curr, regs);
>>> +    uint8_t *evt_mask = &vcpu_info(curr, evtchn_upcall_mask);
>>> +    unsigned long rflags;
>> Does this really need to be "long"?
> 
> The answer to several of these questions are "probably not, but that's
> how load_segments() did it".

Since you overhaul it, no reason to not do sensible adjustments,
I would think.

>>> +    unsigned int bytes, missing;
>>> +
>>> +    ASSERT_NOT_IN_ATOMIC();
>>> +
>>> +    if ( unlikely(null_trap_bounce(curr, tb)) )
>>> +    {
>>> +        gprintk(XENLOG_ERR, "Fatal: Attempting to inject null trap 
>>> bounce\n");
>>> +        __domain_crash_synchronous();
>> Why not domain_crash() followed by "return"?
> 
> Because the existing code uses synchronous crashes.
> 
> Looking again at the callsites of pv_create_exception_frame(), we
> immediately jump back to {compat_,}test_all_events, which proceeds to
> run softirqs again.
> 
> Therefore, domain_crash() and a return should work.  (I think?)

I think so too.

>>> +    }
>>> +
>>> +    /* Fold the upcall mask and architectural IOPL into the guests rflags. 
>>> */
>>> +    rflags  = regs->rflags & ~(X86_EFLAGS_IF | X86_EFLAGS_IOPL);
>>> +    rflags |= ((*evt_mask ? 0 : X86_EFLAGS_IF) |
>>> +               (VM_ASSIST(curr->domain, architectural_iopl)
>>> +                ? curr->arch.pv_vcpu.iopl : 0));
>>> +
>>> +    if ( is_pv_32bit_vcpu(curr) )
>>> +    {
>>> +        /* { [ERRCODE,] EIP, CS/MASK , EFLAGS, [ESP, SS] } */
>>> +        unsigned int frame[6], *ptr = frame, ksp =
>>> +            (user_mode_frame ? curr->arch.pv_vcpu.kernel_sp : regs->esp);
>>> +
>>> +        if ( tb->flags & TBF_EXCEPTION_ERRCODE )
>>> +            *ptr++ = tb->error_code;
>>> +
>>> +        *ptr++ = regs->eip;
>>> +        *ptr++ = regs->cs | (((unsigned int)*evt_mask) << 16);
>> Do you really need the cast here?
> 
> Does it promote correctly if the top bit of the mask is set?

It's a uint8_t -> plain int conversion, which is a zero-extension.

>>> +        *ptr++ = rflags;
>>> +
>>> +        if ( user_mode_frame )
>>> +        {
>>> +            *ptr++ = regs->esp;
>>> +            *ptr++ = regs->ss;
>>> +        }
>>> +
>>> +        /* Copy the constructed frame to the guest kernel stack. */
>>> +        bytes = _p(ptr) - _p(frame);
>>> +        ksp -= bytes;
>>> +
>>> +        if ( unlikely((missing = __copy_to_user(_p(ksp), frame, bytes)) != 
>>> 0) )
>> While I don't think we need to be really bothered, it's perhaps still
>> worth noting in a comment that the wrapping behavior here is
>> wrong (and slightly worse than the assembly original), due to
>> (implicit) address arithmetic all being done with 64-bit operands.
> 
> Ah - At some point, I had a comment here explaining the lack of an
> __access_ok() check, but it appears to have got lost in a rebase.  I
> will try to reinstate it.

Just to avoid misunderstandings - if anything other than a comment,
it ought to be __compat_access_ok().

> The wrapping behaviour around the 4GB => 0 boundary is undefined, and
> different between Intel and AMD (as we discovered with XSA-186).  If we
> passing the exception back to the guest we would need to swap #PF for
> #SS (for Intel), or properly wrap around (for AMD).
> 
> Would it be ok just to comment this point and leave it as is?

Sure - that's what I did ask for.

>>> +    /*
>>> +     * Clobber the injection information now it has been completed.  Buggy
>>> +     * attempts to inject the same event twice will hit the 
>>> null_trap_bounce()
>>> +     * check above.
>>> +     */
>>> +    *tb = (struct trap_bounce){};
>> Ah, so that prevents tb becoming a pointer to const. I wonder
>> though whether, on a rather hot path, we really want to zap the
>> entire structure here. As I can see the value in satisfying
>> null_trap_bounce(), how about zapping just ->eip / ->cs on the
>> split paths above?
> 
> This ends up being two 8-byte writes of zeroes into a cache-hot line; it
> isn't by any means a slow part of this path, whereas the 16bit write to
> clobber just %cs would be.
> 
> Irrespective of that, the following patch depends on this clobbering of
> ->flags.

Right, I did see this. Keep it as it is then.

>> Overall, did you compare generated code with the current
>> assembly implementation? That one surely would have had some
>> room for improvement, so the result here at least shouldn't be
>> worse than that.
> 
> The final C version (including failsafe, and some error handling the asm
> functions didn't have) is a bit less than twice the size of the asm
> functions in terms of absolute size.

Size alone is no meaningful criteria anyway.

> I haven't done any performance analysis, but I trust the compiler to
> make better code overall (there are definitely pipeline stalls in the
> asm versions), and wouldn't be surprised if it is faster to execute.

That's the hope, but if it was double the number on instructions on
the common path (we can surely leave error paths out of
consideration), I wouldn't be convinced these are faster to execute
than the asm code with a couple of stalls (some of which we surely
could take care of). If this wasn't on some highly critical paths, I
wouldn't be worried, but this code really needs to be fast (besides
the very reasonable desire of making it better maintainable). For
example, at this point I'm not (yet) convinced making this a single
function serving both 64- and 32-bit guests is really the right choice,
even if it's only one or two extra conditionals.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.