|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Minios-devel] [UNIKRAFT PATCH] plat/kvm: Fix current thread retrieval in interrupt context on x86_64
On 8/28/19 1:28 PM, Simon Kuenzer wrote:
>
>
> On 20.06.19 12:30, Costin Lupu wrote:
>> Hi Simon,
>>
>> Please see inline.
>>
>> On 6/19/19 12:35 AM, Simon Kuenzer wrote:
>>> Hey Costin,
>>>
>>> On 17.06.19 21:15, Costin Lupu wrote:
>>>> On 6/17/19 5:15 PM, Simon Kuenzer wrote:
>>>>> Hey Costin,
>>>>>
>>>>> thanks a lot for the patch. I have a couple of questions which I need
>>>>> for my understanding. I put those inline...
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Simon
>>>>>
>>>>> On 11.06.19 21:33, Costin Lupu wrote:
>>>>>> Commit 017fffd5 introduced support for setting the current thread
>>>>>> pointer on top
>>>>>> of interrupt stacks in order to retrieve the current thread in
>>>>>> interrupt context
>>>>>> as well. Unfortunately, the wrong stack was picked for KVM platform.
>>>>>> This patch
>>>>>> fixes that and sets the thread on cpu_intr_stack instead.
>>>>>>
>>>>>> cpu_intr_stack was resized to STACK_SIZE because this is a mandatory
>>>>>> condition
>>>>>> when saving threads on top. However, given that it also needs a
>>>>>> STACK_SIZE
>>>>>> alignment, a new section was created for it, .intrstack, in order to
>>>>>> avoid
>>>>>> breaking the entire binary image layout. Without this new section,
>>>>>> the
>>>>>> entire
>>>>>> .text section would have a STACK_SIZE alignment (i.e. 64KB) and this
>>>>>> would imply
>>>>>> that the multiboot header, which is included in .text section, would
>>>>>> also be
>>>>>> moved at an address higher than STACK_SIZE, even though it must stay
>>>>>> in the
>>>>>> first 8KB of the binary.
>>>>>
>>>>> Maybe, this is a stupid question: What if we take the pre-allocated
>>>>> bootstack also for the interrupts? As soon as we enable scheduling the
>>>>> bootstack is not used anymore by any thread. Except the case where
>>>>> we do
>>>>> not have scheduling, the bootstack will be still used during life
>>>>> time,
>>>>> would this be an issue?
>>>>>
>>>>
>>>> In theory this should work. But what about the case when we don't have
>>>> scheduling? Wouldn't we still need a different stack?
>>>>
>>>
>>> Hum... this should be fine, I expect the CPU can handle this.
>>
>> What do you mean? The CPU will handle it properly only if you provide it
>> a different stack on which it will save the registers of the interrupted
>> context.
>>
>>> But we have actually also stacks for the NMI and the trap handler. So
>>> maybe this suggestion is not generic enough.
>>
>> Of course it's not generic enough. One way would be to align NMI and
>> trap stacks as well, but that would just bloat the binary. Fortunately,
>> (1) the NMI stack isn't used and (2) current thread is not needed when
>> traps are handled, so we can skip updating these 2 stacks for now.
>>
>>>>>>
>>>>>> Signed-off-by: Costin Lupu <costin.lupu@xxxxxxxxx>
>>>>>> ---
>>>>>> plat/kvm/Makefile.uk | 1 +
>>>>>> plat/kvm/memory.c | 6 ------
>>>>>> plat/kvm/x86/link64.lds.S | 10 ++++++++++
>>>>>> plat/kvm/x86/memory.c | 44
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>>>> plat/kvm/x86/traps.c | 5 ++++-
>>>>>> 5 files changed, 59 insertions(+), 7 deletions(-)
>>>>>> create mode 100644 plat/kvm/x86/memory.c
>>>>>>
>>>>>> diff --git a/plat/kvm/Makefile.uk b/plat/kvm/Makefile.uk
>>>>>> index 71c4c419..8eb162d4 100644
>>>>>> --- a/plat/kvm/Makefile.uk
>>>>>> +++ b/plat/kvm/Makefile.uk
>>>>>> @@ -46,6 +46,7 @@ LIBKVMPLAT_SRCS-$(CONFIG_ARCH_X86_64) +=
>>>>>> $(LIBKVMPLAT_BASE)/x86/lcpu.c
>>>>>> LIBKVMPLAT_SRCS-$(CONFIG_ARCH_X86_64) +=
>>>>>> $(LIBKVMPLAT_BASE)/x86/intctrl.c
>>>>>> LIBKVMPLAT_SRCS-$(CONFIG_ARCH_X86_64) +=
>>>>>> $(LIBKVMPLAT_BASE)/x86/tscclock.c
>>>>>> LIBKVMPLAT_SRCS-$(CONFIG_ARCH_X86_64) +=
>>>>>> $(LIBKVMPLAT_BASE)/x86/time.c
>>>>>> +LIBKVMPLAT_SRCS-$(CONFIG_ARCH_X86_64) +=
>>>>>> $(LIBKVMPLAT_BASE)/x86/memory.c|x86
>>>>>> ifeq ($(findstring y,$(CONFIG_KVM_KERNEL_VGA_CONSOLE)
>>>>>> $(CONFIG_KVM_DEBUG_VGA_CONSOLE)),y)
>>>>>> LIBKVMPLAT_SRCS-$(CONFIG_ARCH_X86_64) +=
>>>>>> $(LIBKVMPLAT_BASE)/x86/vga_console.c
>>>>>> endif
>>>>>> diff --git a/plat/kvm/memory.c b/plat/kvm/memory.c
>>>>>> index e96bad2c..7f2fb46a 100644
>>>>>> --- a/plat/kvm/memory.c
>>>>>> +++ b/plat/kvm/memory.c
>>>>>> @@ -178,9 +178,3 @@ int _ukplat_mem_mappings_init(void)
>>>>>> {
>>>>>> return 0;
>>>>>> }
>>>>>> -
>>>>>> -void ukplat_stack_set_current_thread(void *thread_addr)
>>>>>> -{
>>>>>> - *((unsigned long *) _libkvmplat_cfg.bstack.end) =
>>>>>> - (unsigned long) thread_addr;
>>>>>> -}
>>>>>> diff --git a/plat/kvm/x86/link64.lds.S b/plat/kvm/x86/link64.lds.S
>>>>>> index 362ba3e6..6103fc2d 100644
>>>>>> --- a/plat/kvm/x86/link64.lds.S
>>>>>> +++ b/plat/kvm/x86/link64.lds.S
>>>>>> @@ -99,6 +99,16 @@ SECTIONS
>>>>>> . = ALIGN(__PAGE_SIZE);
>>>>>> }
>>>>>> + /* We keep the interrupt stack on a different section
>>>>>> + * given that it may have a big alignment and it would
>>>>>> + * change the entire binary layout
>>>>>> + */
>>>>>> + .intrstack :
>>>>>> + {
>>>>>> + *(.intrstack)
>>>>>> + . = ALIGN(__PAGE_SIZE);
>>>>>> + }
>>>>>> +
>>>>>
>>>>> Would every platform need to do this?
>>>>>
>>>>
>>>> On Xen, the interrupt is already aligned to STACK_SIZE. On linuxu, the
>>>> signals use the current process stack, we should change it to using an
>>>> alternative stack if we want to use the same approach.
>>>>
>>>>>> _end = .;
>>>>>> .comment 0 : { *(.comment) }
>>>>>> diff --git a/plat/kvm/x86/memory.c b/plat/kvm/x86/memory.c
>>>>>> new file mode 100644
>>>>>> index 00000000..b8c7c7e7
>>>>>> --- /dev/null
>>>>>> +++ b/plat/kvm/x86/memory.c
>>>>>> @@ -0,0 +1,44 @@
>>>>>> +/* SPDX-License-Identifier: BSD-3-Clause */
>>>>>> +/*
>>>>>> + * Authors: Costin Lupu <costin.lupu@xxxxxxxxx>
>>>>>> + *
>>>>>> + * Copyright (c) 2019, University Politehnica of Bucharest. All
>>>>>> rights reserved.
>>>>>> + *
>>>>>> + * Redistribution and use in source and binary forms, with or
>>>>>> without
>>>>>> + * modification, are permitted provided that the following
>>>>>> conditions
>>>>>> + * are met:
>>>>>> + *
>>>>>> + * 1. Redistributions of source code must retain the above copyright
>>>>>> + * notice, this list of conditions and the following disclaimer.
>>>>>> + * 2. Redistributions in binary form must reproduce the above
>>>>>> copyright
>>>>>> + * notice, this list of conditions and the following
>>>>>> disclaimer in
>>>>>> the
>>>>>> + * documentation and/or other materials provided with the
>>>>>> distribution.
>>>>>> + * 3. Neither the name of the copyright holder nor the names of its
>>>>>> + * contributors may be used to endorse or promote products
>>>>>> derived
>>>>>> from
>>>>>> + * this software without specific prior written permission.
>>>>>> + *
>>>>>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
>>>>>> CONTRIBUTORS "AS IS"
>>>>>> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
>>>>>> TO, THE
>>>>>> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
>>>>>> PARTICULAR
>>>>>> PURPOSE
>>>>>> + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
>>>>>> CONTRIBUTORS BE
>>>>>> + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
>>>>>> EXEMPLARY, OR
>>>>>> + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
>>>>>> PROCUREMENT OF
>>>>>> + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
>>>>>> BUSINESS
>>>>>> + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
>>>>>> WHETHER IN
>>>>>> + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
>>>>>> OTHERWISE)
>>>>>> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
>>>>>> ADVISED OF THE
>>>>>> + * POSSIBILITY OF SUCH DAMAGE.
>>>>>> + *
>>>>>> + * THIS HEADER MAY NOT BE EXTRACTED OR MODIFIED IN ANY WAY.
>>>>>> + */
>>>>>> +
>>>>>> +#include <uk/plat/memory.h>
>>>>>> +
>>>>>> +
>>>>>> +extern char cpu_intr_stack[];
>>>>>> +
>>>>>> +void ukplat_stack_set_current_thread(void *thread_addr)
>>>>>> +{
>>>>>> + *((unsigned long *) cpu_intr_stack) =
>>>>>> + (unsigned long) thread_addr;
>>>>>
>>>>> In general, instead of using the stack top, we could save the current
>>>>> thread always on the stack bottom. Those bytes would be reserved and
>>>>> would only be popped on a misbehaving program. Would that avoid the
>>>>> issue that you are facing? Maybe we could get away from the problem
>>>>> that
>>>>> the stacks have to have a fixed size for just getting the current
>>>>> value...
>>>>>
>>>>
>>>> How do you determine the stack bottom when you are using it, e.g. when
>>>> retrieving the current thread? I think you would use the same way as
>>>> for
>>>> getting the current thread from the stack top. So you still need this
>>>> alignment constraint.
>>>>
>>>
>>> You are right.
>>>
>>>>>> +}
>>>>>> diff --git a/plat/kvm/x86/traps.c b/plat/kvm/x86/traps.c
>>>>>> index 27ef6d93..fe1dd5a4 100644
>>>>>> --- a/plat/kvm/x86/traps.c
>>>>>> +++ b/plat/kvm/x86/traps.c
>>>>>> @@ -25,7 +25,9 @@
>>>>>> */
>>>>>> #include <string.h>
>>>>>> +#include <uk/essentials.h>
>>>>>> #include <uk/arch/lcpu.h>
>>>>>> +#include <uk/plat/config.h>
>>>>>> #include <x86/desc.h>
>>>>>> #include <kvm-x86/traps.h>
>>>>>> @@ -59,7 +61,8 @@ static void gdt_init(void)
>>>>>> static struct tss64 cpu_tss;
>>>>>> -static char cpu_intr_stack[4096]; /* IST1 */
>>>>>> + __section(".intrstack") __align(STACK_SIZE)
>>>>>> +char cpu_intr_stack[STACK_SIZE]; /* IST1 */
>>>>>
>>>>> You removed the stack actually from the .bss section (instead of text)
>>>>> and moved it to an own section. What if you keep it on the .bss but
>>>>> with
>>>>> the bigger size? I expect this should not be a problem for the text
>>>>> section.
>>>>>
>>>>
>>>> In the commit message it should have been '.text segment', or the first
>>>> segment of the binary, which contains both .text and .bss sections. So,
>>>> yeah, it's the segment that gets realigned. If we don't move it to
>>>> another section, it would remain in .bss, which is what this fix tries
>>>> to avoid. If we move it to a new section, it will also be moved to a
>>>> different segment and that's how the segment containing .text will keep
>>>> its original alignment.
>>>>
>>>
>>> I am still not getting exactly why the .bss section has influence on the
>>> positioning on the .text section. The multiboot header is just one
>>> exceptional data section that we put into the beginning of the .text
>>> section. The rest should still go to its respective section. What is the
>>> bad thing about using the .bss section?
>>>
>>
>> Aligning cpu_intr_stack to 0x10000 also aligns .bss section to 0x10000.
>> For reasons I do not know, the linker also decides to move the .text
>> section to 0x10000 offset inside the binary file (please see
>> attachment). Now this is bad, because the .text section also contains
>> the multiboot information which must stay in the first 0x2000 bytes of
>> the binary. Moreover, the segment which will contain both sections in
>> the end will be aligned to 0x10000.
>>
>> One solution would be to move multiboot information in a section of its
>> own, out of the .text section. But unfortunately, we still cannot
>> control the offset where the linker will decide to put it in the binary.
>> From what I could find, the only way to fix that would be to put it with
>> objcopy at the beginning of the binary.
>>
>> Therefore, the least intrusive solution would be to just move the
>> aligned interrupt stack in a section of its own and the rest of the
>> binary would just keep its original layout.
>>
>>> Anyways, since we need a quick fix, I would suggest something else. The
>>> problem I have with the irq_stack section is that it complicates the
>>> linked layout. Additionally for being complete, the nmi and trap stack
>>> would also need to go in there. The other problem we have is that the
>>> stack sizes and/or their alignment need to be fulfilled.
>>> - If we put the thread context pointer on top of the stack (low
>>> address), all stacks need to be exactly sized and aligned to
>>> STACK_SIZE.
>>> - If we put this pointer to the stack bottom (high address), we need
>>> to make sure that the high address is aligned to STACK_SIZE.
>>> STACK_SIZE becomes automatically the maximum stack size but we could
>>> have stacks that are smaller sized than this. However, I am not
>>> sure how we could teach the alignment to the linker.
>>>
>>> Although I do not like the alternative but it may solve our bug right
>>> now: What if we store the current thread pointer to a platform-internal
>>> variable instead of the stack. On every context switch, we update this
>>> variable and get_current() is returning its value. In principle, with
>>> SMP you would need to introduce this variable CPU-wise but we do not
>>> have this now. This way we could keep the stack sizes that we currently
>>> have.
>>> I think we have to revisit this low-level platform API at some point
>>> considering all the lessons we have learned and that longer-term we
>>> should also protect the each stacks from over- and underflows (for
>>> example with an unmapped page before and after each stack). But this
>>> involves a bigger restructuring anyways and this we should not do with
>>> this patch.
>>>
>>> What do you think?
>>>
>>
>> This is something we need to do especially if we want to support
>> different stack sizes in the system. I also had this discussion with
>> Florian a few weeks ago about how this is also necessary in order to
>> avoid some races in the scheduler for which we had to apply a workaround.
>>
>> However, this needs a more complex analysis and for sure it would have a
>> big impact, with potential side effects, for the whole system. I would
>> wait to do that until we'll have a proper regression testing system.
>>
>> This current fix is the simplest one for now and it doesn't bring any
>> side effects. I strongly believe this is the best solution for now.
>
> Hum. Understood. Yeah, I agree we need to revisit this when we are going
> re-visiting the API and re-architecture the platform interfaces. As far
> as I can see, we are currently using the same technique to save and get
> the current thread context also on the other platforms. Because of
> consistency reasons, it would not be wise to change this to something
> completely different for the KVM platform.
>
> Could you make sure with a v2 that also the trap stack is properly sized
> and aligned? This implies storing the current thread pointer twice,
> right? I am not really a fan of this but if we want to keep get_current
> in its current form in order to avoid further implication to uksched and
> the other platforms for now, we should at least have a TODO comment
> there that explains our reason. I expect we will have a better suited
> platform API later that will avoid this, so this would be anyways
> temporary.
Alright, I sent a v2 on the list including amendment according to your
comments.
Cheers,
Costin
_______________________________________________
Minios-devel mailing list
Minios-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/minios-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |