[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot



> -----Original Message-----
> From: Boris Ostrovsky [mailto:boris.ostrovsky@xxxxxxxxxx]
> Sent: 06 June 2017 18:00
> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>; 'Jan Beulich'
> <JBeulich@xxxxxxxx>
> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-
> devel@xxxxxxxxxxxxxxxxxxxx>
> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> 
> On 06/06/2017 12:28 PM, Paul Durrant wrote:
> >> -----Original Message-----
> >> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of
> >> Paul Durrant
> >> Sent: 06 June 2017 16:52
> >> To: 'Jan Beulich' <JBeulich@xxxxxxxx>
> >> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-
> >> devel@xxxxxxxxxxxxxxxxxxxx>
> >> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> >>
> >>> -----Original Message-----
> >>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >>> Sent: 06 June 2017 16:11
> >>> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> >>> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-
> >>> devel@xxxxxxxxxxxxxxxxxxxx>
> >>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> >>>
> >>>>>> On 06.06.17 at 16:32, <Paul.Durrant@xxxxxxxxxx> wrote:
> >>>> I've been having fun setting up a new test rig...
> >>>>
> >>>> I have a skull canyon NUC and I put debian stretch (rc4) on it (so 
> >>>> that's a
> >>>> 4.9 kernel) and then tried building and installing the latest Xen 
> >>>> staging-
> 4.9
> >>>> code. The system failed to boot... basically it got stuck before even
> >>>> managing to get sufficiently into Xen to spit out anything on the
> console.
> >>>> Xen 4.8 OTOH booted just fine so I started bisecting and after 14
> >> iterations
> >>>> I got down to the following commit is being the problem:
> >>>>
> >>>> commit c0655e492e6b33e26ec9cd33f59725d0db89cdd0
> >>>> Author: Juergen Gross <jgross@xxxxxxxx>
> >>>> Date:   Fri Mar 24 14:18:54 2017 +0100
> >>>>
> >>>>     x86: split boot trampoline into permanent and temporary part
> >>>>
> >>>>     The hypervisor needs a trampoline in low memory for early boot and
> >>>>     later for bringing up cpus and during wakeup from suspend. Today
> this
> >>>>     trampoline is kept completely even if most of it isn't needed later.
> >>>>
> >>>>     Split the trampoline into a permanent part and a temporary part
> >> needed
> >>>>     at early boot only. Introduce a new entry at the boundary.
> >>>>
> >>>>     Reduce the stack for wakeup code in order for the permanent
> >>>>     trampoline to fit in a single page. 4k of stack seems excessive, 
> >>>> about
> >>>>     3k should be more than enough.
> >>>>
> >>>>     Add an ASSERT() to the linker script to ensure the wakeup stack is
> >>>>     always at least 3k.
> >>>>
> >>>>     Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
> >>>>     Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx>
> >>>>
> >>>> To verify this I checked out master, reverted that commit, and tried
> again.
> >>>> The NUC still booted fine.
> >>> Well, interesting, but I don't think it is very realistic to expect any
> >>> fix with just the information you supply. There must be something
> >>> rather special about that system, and likely it would help if we
> >>> knew what that is. E.g. an unusual E820 map. Worse would be if
> >>> they used memory outside of properly marked E820 regions in a
> >>> way colliding with what we do.
> >>>
> >>> Otherwise I'm afraid we need to hope for you to debug the issue.
> >>>
> >> Yes, I was posting this more a heads-up for the moment, so that 4.9 does
> not
> >> go out with this regression.
> >>
> >> I will try to figure out what is going on... My initial thoughts on 
> >> looking at
> what
> >> the patch does are that it may be something to do with the fact I am using
> a
> >> vga console rather than a serial one. I need to try another 4.9 on another
> >> system (gigabyte brix) to see if the problem manifests there too. I'll also
> have
> >> to play with the BIOS settings on the skull canyon.
> >>
> > The problem definitely doesn't manifest on the brix, so the next theory is
> that it is something to do with the BIOS of the skull canyon.
> >
> 
> 
> FWIW, one of machines in our test farm choked on this very patch. I
> don't remember details now but essentially it turned out that syslinux
> (we are pxe-booting) could not handle changes in ELF sections layout
> (the way syslinux calculated how to load the binary into memory resulted
> in overlap of some sort).
> 
> I hacked it (mboot.c32 specifically) to work around this but never came
> up with a proper solution.
> 

In my case it was grub2... and thinking about it I am running an older version 
on the brix so I guess it may still manifest there if I update. Either way it 
sounds like it may be better to revert the patch until the issue is better 
understood.

  Paul

> -boris


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.