[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
>>> On 07.06.17 at 10:07, <Paul.Durrant@xxxxxxxxxx> wrote: >> -----Original Message----- >> From: Boris Ostrovsky [mailto:boris.ostrovsky@xxxxxxxxxx] >> Sent: 06 June 2017 18:00 >> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>; 'Jan Beulich' >> <JBeulich@xxxxxxxx> >> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen- >> devel@xxxxxxxxxxxxxxxxxxxx> >> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot >> >> On 06/06/2017 12:28 PM, Paul Durrant wrote: >> >> -----Original Message----- >> >> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of >> >> Paul Durrant >> >> Sent: 06 June 2017 16:52 >> >> To: 'Jan Beulich' <JBeulich@xxxxxxxx> >> >> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen- >> >> devel@xxxxxxxxxxxxxxxxxxxx> >> >> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot >> >> >> >>> -----Original Message----- >> >>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx] >> >>> Sent: 06 June 2017 16:11 >> >>> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx> >> >>> Cc: xen-devel (xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen- >> >>> devel@xxxxxxxxxxxxxxxxxxxx> >> >>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot >> >>> >> >>>>>> On 06.06.17 at 16:32, <Paul.Durrant@xxxxxxxxxx> wrote: >> >>>> I've been having fun setting up a new test rig... >> >>>> >> >>>> I have a skull canyon NUC and I put debian stretch (rc4) on it (so >> >>>> that's a >> >>>> 4.9 kernel) and then tried building and installing the latest Xen >> >>>> staging- >> 4.9 >> >>>> code. The system failed to boot... basically it got stuck before even >> >>>> managing to get sufficiently into Xen to spit out anything on the >> console. >> >>>> Xen 4.8 OTOH booted just fine so I started bisecting and after 14 >> >> iterations >> >>>> I got down to the following commit is being the problem: >> >>>> >> >>>> commit c0655e492e6b33e26ec9cd33f59725d0db89cdd0 >> >>>> Author: Juergen Gross <jgross@xxxxxxxx> >> >>>> Date: Fri Mar 24 14:18:54 2017 +0100 >> >>>> >> >>>> x86: split boot trampoline into permanent and temporary part >> >>>> >> >>>> The hypervisor needs a trampoline in low memory for early boot and >> >>>> later for bringing up cpus and during wakeup from suspend. Today >> this >> >>>> trampoline is kept completely even if most of it isn't needed later. >> >>>> >> >>>> Split the trampoline into a permanent part and a temporary part >> >> needed >> >>>> at early boot only. Introduce a new entry at the boundary. >> >>>> >> >>>> Reduce the stack for wakeup code in order for the permanent >> >>>> trampoline to fit in a single page. 4k of stack seems excessive, >> >>>> about >> >>>> 3k should be more than enough. >> >>>> >> >>>> Add an ASSERT() to the linker script to ensure the wakeup stack is >> >>>> always at least 3k. >> >>>> >> >>>> Signed-off-by: Juergen Gross <jgross@xxxxxxxx> >> >>>> Reviewed-by: Jan Beulich <jbeulich@xxxxxxxx> >> >>>> >> >>>> To verify this I checked out master, reverted that commit, and tried >> again. >> >>>> The NUC still booted fine. >> >>> Well, interesting, but I don't think it is very realistic to expect any >> >>> fix with just the information you supply. There must be something >> >>> rather special about that system, and likely it would help if we >> >>> knew what that is. E.g. an unusual E820 map. Worse would be if >> >>> they used memory outside of properly marked E820 regions in a >> >>> way colliding with what we do. >> >>> >> >>> Otherwise I'm afraid we need to hope for you to debug the issue. >> >>> >> >> Yes, I was posting this more a heads-up for the moment, so that 4.9 does >> not >> >> go out with this regression. >> >> >> >> I will try to figure out what is going on... My initial thoughts on >> >> looking > at >> what >> >> the patch does are that it may be something to do with the fact I am using >> a >> >> vga console rather than a serial one. I need to try another 4.9 on another >> >> system (gigabyte brix) to see if the problem manifests there too. I'll >> >> also >> have >> >> to play with the BIOS settings on the skull canyon. >> >> >> > The problem definitely doesn't manifest on the brix, so the next theory is >> that it is something to do with the BIOS of the skull canyon. >> > >> >> >> FWIW, one of machines in our test farm choked on this very patch. I >> don't remember details now but essentially it turned out that syslinux >> (we are pxe-booting) could not handle changes in ELF sections layout >> (the way syslinux calculated how to load the binary into memory resulted >> in overlap of some sort). >> >> I hacked it (mboot.c32 specifically) to work around this but never came >> up with a proper solution. >> > > In my case it was grub2... and thinking about it I am running an older > version on the brix so I guess it may still manifest there if I update. > Either way it sounds like it may be better to revert the patch until the > issue is better understood. I'm not sure if we could simply revert this one patch - it's the first of a 3-patch series. At the first glance I can't really see any dependency of the later two patches on it, but then again I seem to recall that the split was a prereq. Adding Jürgen. Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |