Xen project Mailing List

Re: [Xen-devel] [PATCH 6/6] mini-os/x86-64 entry: check against nested events and try to fix up

From: Jeremy Fitzhardinge <jeremy@xxxxxxxx>

Date: Wed, 13 Mar 2013 18:09:31 -0700

Cc: samuel.thibault@xxxxxxxxxxxx, stefano.stabellini@xxxxxxxxxxxxx, Xu Zhang <xzhang@xxxxxxxxxx>, gm281@xxxxxxxxx, xen-devel@xxxxxxxxxxxxx

Delivery-date: Thu, 14 Mar 2013 01:17:45 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 03/12/2013 07:42 PM, Xu Zhang wrote: > On 03/09/2013 04:44 PM, Jeremy Fitzhardinge wrote: >> On 03/08/2013 01:30 PM, Xu Zhang wrote: >>> +# [How we do the fixup]. We want to merge the current stack frame >>> with the >>> +# just-interrupted frame. How we do this depends on where in the >>> critical >>> +# region the interrupted handler was executing, and so how many saved >>> +# registers are in each frame. We do this quickly using the lookup >>> table >>> +# 'critical_fixup_table'. For each byte offset in the critical >>> region, it >>> +# provides the number of bytes which have already been popped from the >>> +# interrupted stack frame. This is the number of bytes from the >>> current stack >>> +# that we need to copy at the end of the previous activation frame >>> so that >>> +# we can continue as if we've never even reached 11 running in the old >>> +# activation frame. >>> +critical_region_fixup: >>> + addq $critical_fixup_table - scrit, %rax >>> + movzbq (%rax),%rax # %rax contains num bytes popped >>> + mov %rsp,%rsi >>> + add %rax,%rsi # %esi points at end of src region >>> + >>> + movq RSP(%rsp),%rdi # acquire interrupted %rsp from >>> current stack frame >>> + # %edi points at end of dst region >>> + mov %rax,%rcx >>> + shr $3,%rcx # convert bytes into count of 64-bit >>> entities >>> + je 16f # skip loop if nothing to copy >>> +15: subq $8,%rsi # pre-decrementing copy loop >>> + subq $8,%rdi >>> + movq (%rsi),%rax >>> + movq %rax,(%rdi) >>> + loop 15b >>> +16: movq %rdi,%rsp # final %rdi is top of merged stack >>> + andb $KERNEL_CS_MASK,CS(%rsp) # CS on stack might have >>> changed >>> + jmp 11b >>> + >>> + >>> +/* Nested event fixup look-up table*/ >>> +critical_fixup_table: >>> + .byte 0x00,0x00,0x00 # XEN_TEST_PENDING(%rsi) >>> + .byte 0x00,0x00,0x00,0x00,0x00,0x00 # jnz 14f >>> + .byte 0x00,0x00,0x00,0x00 # mov (%rsp),%r15 >>> + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x8(%rsp),%r14 >>> + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x10(%rsp),%r13 >>> + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x18(%rsp),%r12 >>> + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x20(%rsp),%rbp >>> + .byte 0x00,0x00,0x00,0x00,0x00 # mov 0x28(%rsp),%rbx >>> + .byte 0x00,0x00,0x00,0x00 # add $0x30,%rsp >>> + .byte 0x30,0x30,0x30,0x30 # mov (%rsp),%r11 >>> + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x8(%rsp),%r10 >>> + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x10(%rsp),%r9 >>> + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x18(%rsp),%r8 >>> + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x20(%rsp),%rax >>> + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x28(%rsp),%rcx >>> + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x30(%rsp),%rdx >>> + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x38(%rsp),%rsi >>> + .byte 0x30,0x30,0x30,0x30,0x30 # mov 0x40(%rsp),%rdi >>> + .byte 0x30,0x30,0x30,0x30 # add $0x50,%rsp >>> + .byte 0x80,0x80,0x80,0x80 # testl >>> $NMI_MASK,2*8(%rsp) >>> + .byte 0x80,0x80,0x80,0x80 >>> + .byte 0x80,0x80 # jnz 2f >>> + .byte 0x80,0x80,0x80,0x80 # testb >>> $1,(xen_features+XENFEAT_supervisor_mode_kernel) >>> + .byte 0x80,0x80,0x80,0x80 >>> + .byte 0x80,0x80 # jnz 1f >>> + .byte 0x80,0x80,0x80,0x80,0x80 # orb $3,1*8(%rsp) >>> + .byte 0x80,0x80,0x80,0x80,0x80 # orb $3,4*8(%rsp) >>> + .byte 0x80,0x80 # iretq >>> + .byte 0x80,0x80,0x80,0x80 # andl $~NMI_MASK, >>> 16(%rsp) >>> + .byte 0x80,0x80,0x80,0x80 >>> + .byte 0x80,0x80 # pushq $\flag >>> + .byte 0x78,0x78,0x78,0x78,0x78 # jmp hypercall_page >>> + (__HYPERVISOR_iret * 32) >>> + .byte 0x00,0x00,0x00,0x00 # >>> XEN_LOCKED_BLOCK_EVENTS(%rsi) >>> + .byte 0x00,0x00,0x00 # mov %rsp,%rdi >>> + .byte 0x00,0x00,0x00,0x00,0x00 # jmp 11b >> This looks super-fragile. The original Xen-linux kernel code had a >> similar kind of fixup table, but I went to some lengths to make it as >> simple and robust as possible in the pvops kernels. See the comment in >> xen-asm_32.S: >> >> * Because the nested interrupt handler needs to deal with the current >> * stack state in whatever form its in, we keep things simple by only >> * using a single register which is pushed/popped on the stack. >> >> 64-bit pvops Linux always uses the iret hypercall, so the issue is moot >> there. (In principle a nested kernel interrupt could avoid the iret, >> but it wasn't obvious that the extra complexity was worth it.) >> >> J >> > > Thanks very much for you feedbacks. > > I guess the concern is that using a chunky look-up table like this is > somewhat complicated and would be less easier to maintain? Samuel made > some suggestions, making it less "fragile" to some extent: > > Also, it would be good to check against critical section size change, in > case somebody e.g. changes a value, or a macro like XEN_PUT_VCPU_INFO. > For instance, stuff right after the table: > > .if (ecrit-scrit) != (critical_fixup_table_end - > critical_fixup_table) > .error "The critical has changed, the fixup table needs updating" > .endif That would catch some problems, but not all. I worry that the whole construction is too error-prone, and its too easy for some later unsuspecting developer to come in and introduce a subtle bug that only occurs under very particular timing conditions. > I read through Xen-Linux 32bit implementation for this issue. If I > understand it right, the use of look-up table is avoided by leaving > RESTORE_ALL/REST out of the critical section. So when a nested event > came, you always certain about the stack frame layout and how many > bytes to copy over when fixing up (in Xen-Linux 32bit, PT_EIP/4 - 1). > And the register that would be clobbered in order to read and write > event mask (%eax in Xen-Linux 32bit case) shall be saved on stack. > This time we only need to worry about this one register instead of > many when fixing up the stack frame. Right. > Basically both approaches are correct to me and do the same job. > Xen-Linux 32bit is definitely a more neat solution. However, one thing > to notice is that mini-os x86 32bit also uses a fixup table. I don't have mini-os in front of me at the moment, but I wouldn't be surprised if the original Xen-Linux and mini-os share heritage in this area. > I guess we could apply these patches as a temporary solution to make > sure everything is correct and consistent, and having following > patches to do the refinement. That makes sense to me if the refinement is a large amount of new work on top of what you've already done, but I don't think that's obviously true. Given that you're familiar with the code and the problem domain, I expect you could knock out a clean solution pretty quickly. I guess you're interested in this path for 64-bit as well, because a same ring-ring interrupt return is much more likely in minios. J _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.