[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot



> -----Original Message-----
> From: Juergen Gross [mailto:jgross@xxxxxxxx]
> Sent: 07 June 2017 10:03
> To: Jan Beulich <JBeulich@xxxxxxxx>; Paul Durrant
> <Paul.Durrant@xxxxxxxxxx>
> Cc: Julien Grall (julien.grall@xxxxxxx) <julien.grall@xxxxxxx>; xen-devel
> (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-devel@xxxxxxxxxxxxxxxxxxxx>; 'Boris
> Ostrovsky' <boris.ostrovsky@xxxxxxxxxx>
> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> 
> On 07/06/17 10:27, Jan Beulich wrote:
> >>>> On 07.06.17 at 10:07, <Paul.Durrant@xxxxxxxxxx> wrote:
> >>>  -----Original Message-----
> >>> From: Boris Ostrovsky [mailto:boris.ostrovsky@xxxxxxxxxx]
> >>> Sent: 06 June 2017 18:00
> >>> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>; 'Jan Beulich'
> >>> <JBeulich@xxxxxxxx>
> >>> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-
> >>> devel@xxxxxxxxxxxxxxxxxxxx>
> >>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> >>>
> >>> On 06/06/2017 12:28 PM, Paul Durrant wrote:
> >>>>> -----Original Message-----
> >>>>> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf
> Of
> >>>>> Paul Durrant
> >>>>> Sent: 06 June 2017 16:52
> >>>>> To: 'Jan Beulich' <JBeulich@xxxxxxxx>
> >>>>> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-
> >>>>> devel@xxxxxxxxxxxxxxxxxxxx>
> >>>>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >>>>>> Sent: 06 June 2017 16:11
> >>>>>> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> >>>>>> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-
> >>>>>> devel@xxxxxxxxxxxxxxxxxxxx>
> >>>>>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> >>>>>>
> >>>>>>>>> On 06.06.17 at 16:32, <Paul.Durrant@xxxxxxxxxx> wrote:
> >>>>>>> I've been having fun setting up a new test rig...
> >>>>>>>
> >>>>>>> I have a skull canyon NUC and I put debian stretch (rc4) on it (so
> that's a
> >>>>>>> 4.9 kernel) and then tried building and installing the latest Xen
> staging-
> >>> 4.9
> >>>>>>> code. The system failed to boot... basically it got stuck before even
> >>>>>>> managing to get sufficiently into Xen to spit out anything on the
> >>> console.
> >>>>>>> Xen 4.8 OTOH booted just fine so I started bisecting and after 14
> >>>>> iterations
> >>>>>>> I got down to the following commit is being the problem:
> >>>>>>>
> >>>>>>> commit c0655e492e6b33e26ec9cd33f59725d0db89cdd0
> >>>>>>> Author: Juergen Gross <jgross@xxxxxxxx>
> >>>>>>> Date:   Fri Mar 24 14:18:54 2017 +0100
> >>>>>>>
> >>>>>>>     x86: split boot trampoline into permanent and temporary part
> >>>>>>>
> >>>>>>>     The hypervisor needs a trampoline in low memory for early boot
> and
> >>>>>>>     later for bringing up cpus and during wakeup from suspend.
> Today
> >>> this
> >>>>>>>     trampoline is kept completely even if most of it isn't needed
> later.
> >>>>>>>
> >>>>>>>     Split the trampoline into a permanent part and a temporary part
> >>>>> needed
> >>>>>>>     at early boot only. Introduce a new entry at the boundary.
> >>>>>>>
> >>>>>>>     Reduce the stack for wakeup code in order for the permanent
> >>>>>>>     trampoline to fit in a single page. 4k of stack seems excessive,
> about
> >>>>>>>     3k should be more than enough.
> >>>>>>>
> >>>>>>>     Add an ASSERT() to the linker script to ensure the wakeup stack is
> >>>>>>>     always at least 3k.
> >>>>>>>
> >>>>>>>     Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
> >>>>>>>     Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
> >>>>>>>
> >>>>>>> To verify this I checked out master, reverted that commit, and tried
> >>> again.
> >>>>>>> The NUC still booted fine.
> >>>>>> Well, interesting, but I don't think it is very realistic to expect any
> >>>>>> fix with just the information you supply. There must be something
> >>>>>> rather special about that system, and likely it would help if we
> >>>>>> knew what that is. E.g. an unusual E820 map. Worse would be if
> >>>>>> they used memory outside of properly marked E820 regions in a
> >>>>>> way colliding with what we do.
> >>>>>>
> >>>>>> Otherwise I'm afraid we need to hope for you to debug the issue.
> >>>>>>
> >>>>> Yes, I was posting this more a heads-up for the moment, so that 4.9
> does
> >>> not
> >>>>> go out with this regression.
> >>>>>
> >>>>> I will try to figure out what is going on... My initial thoughts on 
> >>>>> looking
> >> at
> >>> what
> >>>>> the patch does are that it may be something to do with the fact I am
> using
> >>> a
> >>>>> vga console rather than a serial one. I need to try another 4.9 on
> another
> >>>>> system (gigabyte brix) to see if the problem manifests there too. I'll
> also
> >>> have
> >>>>> to play with the BIOS settings on the skull canyon.
> >>>>>
> >>>> The problem definitely doesn't manifest on the brix, so the next theory
> is
> >>> that it is something to do with the BIOS of the skull canyon.
> >>>>
> >>>
> >>>
> >>> FWIW, one of machines in our test farm choked on this very patch. I
> >>> don't remember details now but essentially it turned out that syslinux
> >>> (we are pxe-booting) could not handle changes in ELF sections layout
> >>> (the way syslinux calculated how to load the binary into memory
> resulted
> >>> in overlap of some sort).
> >>>
> >>> I hacked it (mboot.c32 specifically) to work around this but never came
> >>> up with a proper solution.
> >>>
> >>
> >> In my case it was grub2... and thinking about it I am running an older
> >> version on the brix so I guess it may still manifest there if I update.
> >> Either way it sounds like it may be better to revert the patch until the
> >> issue is better understood.
> >
> > I'm not sure if we could simply revert this one patch - it's the first of a
> > 3-patch series. At the first glance I can't really see any dependency
> > of the later two patches on it, but then again I seem to recall that the
> > split was a prereq. Adding Jürgen.
> 
> I think it could be reverted. It was a prerequisite for another patch I
> prepared but didn't send as it was quite late in the 4.9 cycle and it
> depended on the other patches of Daniel.
> 
> TBH: I really can't see what is wrong with that patch. The only change
> which should be able to break something seems to be the reduction of the
> wakeup stack size to 3kB, but this shouldn't affect booting the system
> at all...
> 

Yeah, my next test is going to be increasing the size of the wakeup stack 
again, but there is really nothing obviously wrong with the patch.

  Paul

> 
> Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.