[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up



On 03/12/2013 07:42 PM, Xu Zhang wrote:
> On 03/09/2013 04:44 PM, Jeremy Fitzhardinge wrote:
>> On 03/08/2013 01:30 PM, Xu Zhang wrote:
>>> +# [How we do the fixup]. We want to merge the current stack frame
>>> with the
>>> +# just-interrupted frame. How we do this depends on where in the
>>> critical
>>> +# region the interrupted handler was executing, and so how many saved
>>> +# registers are in each frame. We do this quickly using the lookup
>>> table
>>> +# 'critical_fixup_table'. For each byte offset in the critical
>>> region, it
>>> +# provides the number of bytes which have already been popped from the
>>> +# interrupted stack frame. This is the number of bytes from the
>>> current stack
>>> +# that we need to copy at the end of the previous activation frame
>>> so that
>>> +# we can continue as if we've never even reached 11 running in the old
>>> +# activation frame.
>>> +critical_region_fixup:
>>> +        addq $critical_fixup_table - scrit, %rax
>>> +        movzbq (%rax),%rax    # %rax contains num bytes popped
>>> +        mov  %rsp,%rsi
>>> +        add  %rax,%rsi        # %esi points at end of src region
>>> +
>>> +        movq RSP(%rsp),%rdi   # acquire interrupted %rsp from
>>> current stack frame
>>> +                              # %edi points at end of dst region
>>> +        mov  %rax,%rcx
>>> +        shr  $3,%rcx          # convert bytes into count of 64-bit
>>> entities
>>> +        je   16f              # skip loop if nothing to copy
>>> +15:        subq $8,%rsi          # pre-decrementing copy loop
>>> +        subq $8,%rdi
>>> +        movq (%rsi),%rax
>>> +        movq %rax,(%rdi)
>>> +        loop 15b
>>> +16:        movq %rdi,%rsp        # final %rdi is top of merged stack
>>> +        andb $KERNEL_CS_MASK,CS(%rsp)      # CS on stack might have
>>> changed
>>> +        jmp  11b
>>> +
>>> +
>>> +/* Nested event fixup look-up table*/
>>> +critical_fixup_table:
>>> +    .byte 0x00,0x00,0x00                    # XEN_TEST_PENDING(%rsi)
>>> +    .byte 0x00,0x00,0x00,0x00,0x00,0x00     # jnz    14f
>>> +    .byte 0x00,0x00,0x00,0x00               # mov    (%rsp),%r15
>>> +    .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x8(%rsp),%r14
>>> +    .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x10(%rsp),%r13
>>> +    .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x18(%rsp),%r12
>>> +    .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x20(%rsp),%rbp
>>> +    .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x28(%rsp),%rbx
>>> +    .byte 0x00,0x00,0x00,0x00               # add    $0x30,%rsp
>>> +    .byte 0x30,0x30,0x30,0x30               # mov    (%rsp),%r11
>>> +    .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x8(%rsp),%r10
>>> +    .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x10(%rsp),%r9
>>> +    .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x18(%rsp),%r8
>>> +    .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x20(%rsp),%rax
>>> +    .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x28(%rsp),%rcx
>>> +    .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x30(%rsp),%rdx
>>> +    .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x38(%rsp),%rsi
>>> +    .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x40(%rsp),%rdi
>>> +    .byte 0x30,0x30,0x30,0x30               # add    $0x50,%rsp
>>> +    .byte 0x80,0x80,0x80,0x80               # testl 
>>> $NMI_MASK,2*8(%rsp)
>>> +    .byte 0x80,0x80,0x80,0x80
>>> +    .byte 0x80,0x80                         # jnz    2f
>>> +    .byte 0x80,0x80,0x80,0x80               # testb 
>>> $1,(xen_features+XENFEAT_supervisor_mode_kernel)
>>> +    .byte 0x80,0x80,0x80,0x80
>>> +    .byte 0x80,0x80                         # jnz    1f
>>> +    .byte 0x80,0x80,0x80,0x80,0x80          # orb    $3,1*8(%rsp)
>>> +    .byte 0x80,0x80,0x80,0x80,0x80          # orb    $3,4*8(%rsp)
>>> +    .byte 0x80,0x80                         # iretq
>>> +    .byte 0x80,0x80,0x80,0x80               # andl   $~NMI_MASK,
>>> 16(%rsp)
>>> +    .byte 0x80,0x80,0x80,0x80
>>> +    .byte 0x80,0x80                         # pushq  $\flag
>>> +    .byte 0x78,0x78,0x78,0x78,0x78          # jmp    hypercall_page
>>> + (__HYPERVISOR_iret * 32)
>>> +    .byte 0x00,0x00,0x00,0x00               #
>>> XEN_LOCKED_BLOCK_EVENTS(%rsi)
>>> +    .byte 0x00,0x00,0x00                    # mov    %rsp,%rdi
>>> +    .byte 0x00,0x00,0x00,0x00,0x00          # jmp    11b
>> This looks super-fragile.  The original Xen-linux kernel code had a
>> similar kind of fixup table, but I went to some lengths to make it as
>> simple and robust as possible in the pvops kernels.  See the comment in
>> xen-asm_32.S:
>>
>>   * Because the nested interrupt handler needs to deal with the current
>>   * stack state in whatever form its in, we keep things simple by only
>>   * using a single register which is pushed/popped on the stack.
>>
>> 64-bit pvops Linux always uses the iret hypercall, so the issue is moot
>> there.  (In principle a nested kernel interrupt could avoid the iret,
>> but it wasn't obvious that the extra complexity was worth it.)
>>
>>      J
>>
>
> Thanks very much for you feedbacks.
>
> I guess the concern is that using a chunky look-up table like this is
> somewhat complicated and would be less easier to maintain? Samuel made
> some suggestions, making it less "fragile" to some extent:
>
> Also, it would be good to check against critical section size change, in
> case somebody e.g. changes a value, or a macro like XEN_PUT_VCPU_INFO.
> For instance, stuff right after the table:
>
>     .if (ecrit-scrit) != (critical_fixup_table_end -
> critical_fixup_table)
>     .error "The critical has changed, the fixup table needs updating"
>     .endif

That would catch some problems, but not all.  I worry that the whole
construction is too error-prone, and its too easy for some later
unsuspecting developer to come in and introduce a subtle bug that only
occurs under very particular timing conditions.

> I read through Xen-Linux 32bit implementation for this issue. If I
> understand it right, the use of look-up table is avoided by leaving
> RESTORE_ALL/REST out of the critical section. So when a nested event
> came, you always certain about the stack frame layout and how many
> bytes to copy over when fixing up (in Xen-Linux 32bit, PT_EIP/4 - 1).
> And the register that would be clobbered in order to read and write
> event mask (%eax in Xen-Linux 32bit case) shall be saved on stack.
> This time we only need to worry about this one register instead of
> many when fixing up the stack frame.

Right.

> Basically both approaches are correct to me and do the same job.
> Xen-Linux 32bit is definitely a more neat solution. However, one thing
> to notice is that mini-os x86 32bit also uses a fixup table.

I don't have mini-os in front of me at the moment, but I wouldn't be
surprised if the original Xen-Linux and mini-os share heritage in this area.

> I guess we could apply these patches as a temporary solution to make
> sure everything is correct and consistent, and having following
> patches to do the refinement.

That makes sense to me if the refinement is a large amount of new work
on top of what you've already done, but I don't think that's obviously
true.  Given that you're familiar with the code and the problem domain,
I expect you could knock out a clean solution pretty quickly.

I guess you're interested in this path for 64-bit as well, because a
same ring-ring interrupt return is much more likely in minios.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.