[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> -----Original Message----- > From: Juergen Gross [mailto:jgross@xxxxxxxx] > Sent: 07 June 2017 10:03 > To: Jan Beulich <JBeulich@xxxxxxxx>; Paul Durrant > <Paul.Durrant@xxxxxxxxxx> > Cc: Julien Grall (julien.grall@xxxxxxx) <julien.grall@xxxxxxx>; xen-devel > (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-devel@xxxxxxxxxxxxxxxxxxxx>; 'Boris > Ostrovsky' <boris.ostrovsky@xxxxxxxxxx> > Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > On 07/06/17 10:27, Jan Beulich wrote: > >>>> On 07.06.17 at 10:07, <Paul.Durrant@xxxxxxxxxx> wrote: > >>> -----Original Message----- > >>> From: Boris Ostrovsky [mailto:boris.ostrovsky@xxxxxxxxxx] > >>> Sent: 06 June 2017 18:00 > >>> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>; 'Jan Beulich' > >>> <JBeulich@xxxxxxxx> > >>> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen- > >>> devel@xxxxxxxxxxxxxxxxxxxx> > >>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > >>> > >>> On 06/06/2017 12:28 PM, Paul Durrant wrote: > >>>>> -----Original Message----- > >>>>> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf > Of > >>>>> Paul Durrant > >>>>> Sent: 06 June 2017 16:52 > >>>>> To: 'Jan Beulich' <JBeulich@xxxxxxxx> > >>>>> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen- > >>>>> devel@xxxxxxxxxxxxxxxxxxxx> > >>>>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >>>>>> Sent: 06 June 2017 16:11 > >>>>>> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx> > >>>>>> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen- > >>>>>> devel@xxxxxxxxxxxxxxxxxxxx> > >>>>>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > >>>>>> > >>>>>>>>> On 06.06.17 at 16:32, <Paul.Durrant@xxxxxxxxxx> wrote: > >>>>>>> I've been having fun setting up a new test rig... > >>>>>>> > >>>>>>> I have a skull canyon NUC and I put debian stretch (rc4) on it (so > that's a > >>>>>>> 4.9 kernel) and then tried building and installing the latest Xen > staging- > >>> 4.9 > >>>>>>> code. The system failed to boot... basically it got stuck before even > >>>>>>> managing to get sufficiently into Xen to spit out anything on the > >>> console. > >>>>>>> Xen 4.8 OTOH booted just fine so I started bisecting and after 14 > >>>>> iterations > >>>>>>> I got down to the following commit is being the problem: > >>>>>>> > >>>>>>> commit c0655e492e6b33e26ec9cd33f59725d0db89cdd0 > >>>>>>> Author: Juergen Gross <jgross@xxxxxxxx> > >>>>>>> Date: Fri Mar 24 14:18:54 2017 +0100 > >>>>>>> > >>>>>>> x86: split boot trampoline into permanent and temporary part > >>>>>>> > >>>>>>> The hypervisor needs a trampoline in low memory for early boot > and > >>>>>>> later for bringing up cpus and during wakeup from suspend. > Today > >>> this > >>>>>>> trampoline is kept completely even if most of it isn't needed > later. > >>>>>>> > >>>>>>> Split the trampoline into a permanent part and a temporary part > >>>>> needed > >>>>>>> at early boot only. Introduce a new entry at the boundary. > >>>>>>> > >>>>>>> Reduce the stack for wakeup code in order for the permanent > >>>>>>> trampoline to fit in a single page. 4k of stack seems excessive, > about > >>>>>>> 3k should be more than enough. > >>>>>>> > >>>>>>> Add an ASSERT() to the linker script to ensure the wakeup stack is > >>>>>>> always at least 3k. > >>>>>>> > >>>>>>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx> > >>>>>>> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> > >>>>>>> > >>>>>>> To verify this I checked out master, reverted that commit, and tried > >>> again. > >>>>>>> The NUC still booted fine. > >>>>>> Well, interesting, but I don't think it is very realistic to expect any > >>>>>> fix with just the information you supply. There must be something > >>>>>> rather special about that system, and likely it would help if we > >>>>>> knew what that is. E.g. an unusual E820 map. Worse would be if > >>>>>> they used memory outside of properly marked E820 regions in a > >>>>>> way colliding with what we do. > >>>>>> > >>>>>> Otherwise I'm afraid we need to hope for you to debug the issue. > >>>>>> > >>>>> Yes, I was posting this more a heads-up for the moment, so that 4.9 > does > >>> not > >>>>> go out with this regression. > >>>>> > >>>>> I will try to figure out what is going on... My initial thoughts on > >>>>> looking > >> at > >>> what > >>>>> the patch does are that it may be something to do with the fact I am > using > >>> a > >>>>> vga console rather than a serial one. I need to try another 4.9 on > another > >>>>> system (gigabyte brix) to see if the problem manifests there too. I'll > also > >>> have > >>>>> to play with the BIOS settings on the skull canyon. > >>>>> > >>>> The problem definitely doesn't manifest on the brix, so the next theory > is > >>> that it is something to do with the BIOS of the skull canyon. > >>>> > >>> > >>> > >>> FWIW, one of machines in our test farm choked on this very patch. I > >>> don't remember details now but essentially it turned out that syslinux > >>> (we are pxe-booting) could not handle changes in ELF sections layout > >>> (the way syslinux calculated how to load the binary into memory > resulted > >>> in overlap of some sort). > >>> > >>> I hacked it (mboot.c32 specifically) to work around this but never came > >>> up with a proper solution. > >>> > >> > >> In my case it was grub2... and thinking about it I am running an older > >> version on the brix so I guess it may still manifest there if I update. > >> Either way it sounds like it may be better to revert the patch until the > >> issue is better understood. > > > > I'm not sure if we could simply revert this one patch - it's the first of a > > 3-patch series. At the first glance I can't really see any dependency > > of the later two patches on it, but then again I seem to recall that the > > split was a prereq. Adding Jürgen. > > I think it could be reverted. It was a prerequisite for another patch I > prepared but didn't send as it was quite late in the 4.9 cycle and it > depended on the other patches of Daniel. > > TBH: I really can't see what is wrong with that patch. The only change > which should be able to break something seems to be the reduction of the > wakeup stack size to 3kB, but this shouldn't affect booting the system > at all... > Yeah, my next test is going to be increasing the size of the wakeup stack again, but there is really nothing obviously wrong with the patch. Paul > > Juergen _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |