[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up

To: Jeremy Fitzhardinge <jeremy@xxxxxxxx>
From: Xu Zhang <xzhang54@xxxxxxx>
Date: Tue, 12 Mar 2013 21:42:32 -0500
Cc: samuel.thibault@xxxxxxxxxxxx, stefano.stabellini@xxxxxxxxxxxxx, Xu Zhang <xzhang@xxxxxxxxxx>, gm281@xxxxxxxxx, xen-devel@xxxxxxxxxxxxx
Delivery-date: Wed, 13 Mar 2013 05:52:12 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 03/09/2013 04:44 PM, Jeremy Fitzhardinge wrote:

On 03/08/2013 01:30 PM, Xu Zhang wrote:

+# [How we do the fixup]. We want to merge the current stack frame with the
+# just-interrupted frame. How we do this depends on where in the critical
+# region the interrupted handler was executing, and so how many saved
+# registers are in each frame. We do this quickly using the lookup table
+# 'critical_fixup_table'. For each byte offset in the critical region, it
+# provides the number of bytes which have already been popped from the
+# interrupted stack frame. This is the number of bytes from the current stack
+# that we need to copy at the end of the previous activation frame so that
+# we can continue as if we've never even reached 11 running in the old
+# activation frame.
+critical_region_fixup:
+               addq $critical_fixup_table - scrit, %rax
+               movzbq (%rax),%rax    # %rax contains num bytes popped
+               mov  %rsp,%rsi
+               add  %rax,%rsi        # %esi points at end of src region
+
+               movq RSP(%rsp),%rdi   # acquire interrupted %rsp from current 
stack frame
+                                     # %edi points at end of dst region
+               mov  %rax,%rcx
+               shr  $3,%rcx          # convert bytes into count of 64-bit 
entities
+               je   16f              # skip loop if nothing to copy
+15:            subq $8,%rsi          # pre-decrementing copy loop
+               subq $8,%rdi
+               movq (%rsi),%rax
+               movq %rax,(%rdi)
+               loop 15b
+16:            movq %rdi,%rsp        # final %rdi is top of merged stack
+               andb $KERNEL_CS_MASK,CS(%rsp)      # CS on stack might have 
changed
+               jmp  11b
+
+
+/* Nested event fixup look-up table*/
+critical_fixup_table:
+       .byte 0x00,0x00,0x00                    # XEN_TEST_PENDING(%rsi)
+       .byte 0x00,0x00,0x00,0x00,0x00,0x00     # jnz    14f
+       .byte 0x00,0x00,0x00,0x00               # mov    (%rsp),%r15
+       .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x8(%rsp),%r14
+       .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x10(%rsp),%r13
+       .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x18(%rsp),%r12
+       .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x20(%rsp),%rbp
+       .byte 0x00,0x00,0x00,0x00,0x00          # mov    0x28(%rsp),%rbx
+       .byte 0x00,0x00,0x00,0x00               # add    $0x30,%rsp
+       .byte 0x30,0x30,0x30,0x30               # mov    (%rsp),%r11
+       .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x8(%rsp),%r10
+       .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x10(%rsp),%r9
+       .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x18(%rsp),%r8
+       .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x20(%rsp),%rax
+       .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x28(%rsp),%rcx
+       .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x30(%rsp),%rdx
+       .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x38(%rsp),%rsi
+       .byte 0x30,0x30,0x30,0x30,0x30          # mov    0x40(%rsp),%rdi
+       .byte 0x30,0x30,0x30,0x30               # add    $0x50,%rsp
+       .byte 0x80,0x80,0x80,0x80               # testl  $NMI_MASK,2*8(%rsp)
+       .byte 0x80,0x80,0x80,0x80
+       .byte 0x80,0x80                         # jnz    2f
+       .byte 0x80,0x80,0x80,0x80               # testb  
$1,(xen_features+XENFEAT_supervisor_mode_kernel)
+       .byte 0x80,0x80,0x80,0x80
+       .byte 0x80,0x80                         # jnz    1f
+       .byte 0x80,0x80,0x80,0x80,0x80          # orb    $3,1*8(%rsp)
+       .byte 0x80,0x80,0x80,0x80,0x80          # orb    $3,4*8(%rsp)
+       .byte 0x80,0x80                         # iretq
+       .byte 0x80,0x80,0x80,0x80               # andl   $~NMI_MASK, 16(%rsp)
+       .byte 0x80,0x80,0x80,0x80
+       .byte 0x80,0x80                         # pushq  $\flag
+       .byte 0x78,0x78,0x78,0x78,0x78          # jmp    hypercall_page + 
(__HYPERVISOR_iret * 32)
+       .byte 0x00,0x00,0x00,0x00               # XEN_LOCKED_BLOCK_EVENTS(%rsi)
+       .byte 0x00,0x00,0x00                    # mov    %rsp,%rdi
+       .byte 0x00,0x00,0x00,0x00,0x00          # jmp    11b

This looks super-fragile.  The original Xen-linux kernel code had a
similar kind of fixup table, but I went to some lengths to make it as
simple and robust as possible in the pvops kernels.  See the comment in
xen-asm_32.S:

  * Because the nested interrupt handler needs to deal with the current
  * stack state in whatever form its in, we keep things simple by only
  * using a single register which is pushed/popped on the stack.

64-bit pvops Linux always uses the iret hypercall, so the issue is moot
there.  (In principle a nested kernel interrupt could avoid the iret,
but it wasn't obvious that the extra complexity was worth it.)

     J


Thanks very much for you feedbacks.

I guess the concern is that using a chunky look-up table like this issomewhat complicated and would be less easier to maintain? Samuel madesome suggestions, making it less "fragile" to some extent:


Also, it would be good to check against critical section size change, in
case somebody e.g. changes a value, or a macro like XEN_PUT_VCPU_INFO.
For instance, stuff right after the table:

        .if (ecrit-scrit) != (critical_fixup_table_end - critical_fixup_table)
        .error "The critical has changed, the fixup table needs updating"
        .endif

I read through Xen-Linux 32bit implementation for this issue. If Iunderstand it right, the use of look-up table is avoided by leavingRESTORE_ALL/REST out of the critical section. So when a nested eventcame, you always certain about the stack frame layout and how many bytesto copy over when fixing up (in Xen-Linux 32bit, PT_EIP/4 - 1). And theregister that would be clobbered in order to read and write event mask(%eax in Xen-Linux 32bit case) shall be saved on stack. This time weonly need to worry about this one register instead of many when fixingup the stack frame.

Basically both approaches are correct to me and do the same job.Xen-Linux 32bit is definitely a more neat solution. However, one thingto notice is that mini-os x86 32bit also uses a fixup table. I guess wecould apply these patches as a temporary solution to make sureeverything is correct and consistent, and having following patches to dothe refinement.


Thanks,

--
xu

:q!


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up
  - From: Jeremy Fitzhardinge
- Re: [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up
  - From: Xu Zhang

References:
- [Xen-devel] [PATCH 0/6] mini-os: check and fix up against nested events in x86-64 kernel entry
  - From: Xu Zhang
- [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up
  - From: Xu Zhang
- Re: [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up
  - From: Jeremy Fitzhardinge

Prev by Date: Re: [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up
Next by Date: Re: [Xen-devel] Is: SKB_MAX_LEN bites again. Was: Re: bug disabling guest interface
Previous by thread: Re: [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up
Next by thread: Re: [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.