[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot



> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@xxxxxxxxxxxxx] On Behalf Of
> Paul Durrant
> Sent: 12 June 2017 15:29
> To: 'Jan Beulich' <JBeulich@xxxxxxxx>
> Cc: Juergen Gross <jgross@xxxxxxxx>; Andrew Cooper
> <Andrew.Cooper3@xxxxxxxxxx>; Julien Grall (julien.grall@xxxxxxx)
> <julien.grall@xxxxxxx>; 'Boris Ostrovsky' <boris.ostrovsky@xxxxxxxxxx>;
> xen-devel(xen-devel@xxxxxxxxxxxxxxxxxxxx) <xen-
> devel@xxxxxxxxxxxxxxxxxxxx>
> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> 
> > -----Original Message-----
> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > Sent: 12 June 2017 14:55
> > To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> > Cc: Julien Grall (julien.grall@xxxxxxx) <julien.grall@xxxxxxx>; Andrew
> > Cooper <Andrew.Cooper3@xxxxxxxxxx>; xen-devel(xen-
> > devel@xxxxxxxxxxxxxxxxxxxx) <xen-devel@xxxxxxxxxxxxxxxxxxxx>; 'Boris
> > Ostrovsky' <boris.ostrovsky@xxxxxxxxxx>; Juergen Gross
> > <jgross@xxxxxxxx>
> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> >
> > >>> On 12.06.17 at 14:05, <Paul.Durrant@xxxxxxxxxx> wrote:
> > >>  -----Original Message-----
> > >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> > >> Sent: 12 June 2017 12:12
> > >> To: Paul Durrant <Paul.Durrant@xxxxxxxxxx>
> > >> Cc: Julien Grall (julien.grall@xxxxxxx) <julien.grall@xxxxxxx>; Andrew
> > >> Cooper <Andrew.Cooper3@xxxxxxxxxx>; xen-devel(xen-
> > >> devel@xxxxxxxxxxxxxxxxxxxx) <xen-devel@xxxxxxxxxxxxxxxxxxxx>; 'Boris
> > >> Ostrovsky' <boris.ostrovsky@xxxxxxxxxx>; Juergen Gross
> > >> <jgross@xxxxxxxx>
> > >> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> > >>
> > >> >>> On 12.06.17 at 12:53, <Paul.Durrant@xxxxxxxxxx> wrote:
> > >> >>  -----Original Message-----
> > >> > [snip]
> > >> >> > >
> > >> >> > > What do you think it best to do for Xen 4.9? Hardcoding a 4k
> > alignment
> > >> is
> > >> >> > > clearly easy and would work around this BIOS issue but, as you
> say,
> > it
> > >> >> does
> > >> >> > > grow the image. Reverting Juergen's patch also works round the
> > issue,
> > >> >> but
> > >> >> > > that is more by luck. Re-working the code is preferable, but I
> guess
> > it's
> > >> >> too
> > >> >> > > late to introduce such code-churn in 4.9.
> > >> >> >
> > >> >> > Reverting Jürgen's code is out of question with all the information
> > >> >> > you've gathered by now. I think re-working the EDD code slightly
> > >> >> > is the best option. Would you mind giving the attached patch a
> > >> >> > try? This still slightly grows the trampoline due to a few more
> > >> >> > instructions being needed, but should still be far better than
> > >> >> > embedding a whole 4k buffer (and then later finding a BIOS/disk
> > >> >> > combination which wants even more). Note that I've left a tiny
> > >> >> > bit of debugging code in there.
> > >> >> >
> > >> >>
> > >> >> Sure, I'll give that a go now.
> > >> >>
> > >> >
> > >> > That worked fine:
> > >> >
> > >> > (XEN) MBR[80] @ 85e0 (86000)
> > >>
> > >> But that's contrary to your earlier findings: Didn't you say simply
> > >> avoiding a 4k-boundary wasn't enough? And it certainly tells us
> > >> that this isn't a 4k drive (or at least the BIOS doesn't surface 4k
> > >> sectors) - I was really expecting a larger gap between the two
> > >> logged values.
> > >>
> > >
> > > I'll go dump out the edd and double check what it is saying.
> > >
> > > My findings indicated that the problem seemed to be doing a read that
> > > spanned a 4k boundary caused a problem, so using 0x85e00 would be
> safe.
> > The
> > > anomaly was that simply aligning the edd_info buffer and a 512 byte
> > boundary
> > > and continuing to use that for reading did not work.
> >
> > But a 512-byte aligned 512-byte buffer can't possibly cross a page
> > boundary.
> 
> Indeed, which is why I was perplexed. I found that 0x60e00 was ok. Your
> patch chose 0x85e00, which was ok too, but for some reason a '.align 512' in
> front of boot_edd_info yielded an address which was not ok. I just checked
> what address that yielded though (by booting with edd=off to avoid the
> hang) and it was 0x86f40... which clearly means that '.align 512' is not doing
> what I thought it would do.

No, the problem turns out to be the GLOBAL() macro which, in assembly files, 
contains an implicit .align 16!

  Paul

> 
>   Paul
> 
> >
> > >> > so you can add my Tested-by to that.
> > >>
> > >> I.e. I'm not sure about this, as I'm still uncertain whether some
> > >> corruption didn't again occur. Of course APs coming up properly
> > >> would already be a relatively good sign (as now the permanent
> > >> part of the trampoline would be the predestined area for
> > >> corruption to occur in).
> > >>
> > >
> > > None of my findings ever indicated memory corruption (although there,
> of
> > > course, may have been some that I happened to miss), but rather
> > misbehaviour
> > > of the int13 handler itself - either locking up, having odd effects (e.g.
> > > black screen), or both.
> >
> > Ah, I didn't understand it this way so far, and instead had implied
> > that the handler did return, but corrupt our trampoline area in
> > one way or another.
> >
> > Jan
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@xxxxxxxxxxxxx
> https://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.