[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Xen 4.14.0 fails on Dell IoT Gateway without efi=no-rs
On Fri, Aug 21, 2020 at 1:23 AM Jan Beulich <jbeulich@xxxxxxxx> wrote: > > On 21.08.2020 09:38, Roman Shaposhnik wrote: > > On Thu, Aug 20, 2020 at 11:47 PM Jan Beulich <jbeulich@xxxxxxxx> wrote: > >> On 20.08.2020 21:31, Roman Shaposhnik wrote: > >>> Well, default is overloaded. What I would like to see (and consider it > >>> a void of a small downstream/distro) is a community-agreed and > >>> maintained way of working around these issues. Yes, I'd love to see > >>> it working by default -- but if we can at least agree on an officially > >>> supported knob that is less of a hammer than efi=attr=uc -- that'd > >>> be a good first step. > >>> > >>> Makes sense? > >> > >> Sure, just that I don't see what less heavyweight alternatives > >> to "efi=attr=uc" there are (short of supplying an option to > >> provide per-range memory attributes, which would end up ugly > >> to use). For the specific case here, "efi=attr=wp" could be > >> made work, but might not be correct for all of the range (it's > >> a EfiMemoryMappedIO range, after all); in the majority of cases > >> of lacking attribute information that I've seen, UC was indeed > >> what was needed. > > > > I think we're talking slightly past each other here -- you seem to be > > more after trying to figure out how to make this box look like a dozen > > killobucks worth a server, I'm after trying to figure out what callsites > > in Xen tickle that region. > > What I'm trying is to understand what exactly is wrong in the firmware, > as that'll likely allow determining a minimal workaround. Fair enough. So let me start with a major update. After a bit of trial and error it became apparent that a combination of efi=attr=uc AND removing the call to efi_get_time as per: https://lists.archive.carbon60.com/xen/devel/408709 allows Xen to boot just fine and function properly on that device. I'm not sure if that answers your question around what's wrong with this firmware, but perhaps it suggests that the point from that old thread above still maybe valid: perhaps avoiding GetTime() altogether may help a lot of downstream users (especially those running on more consumer-like h/w -- since this issue seems to come up in QubesOS context as well). Btw, just out of curiosity -- I poked around GetTime() disassembly and while it is pretty convoluted my hunch is that it is indeed broken for some internal reasons, not something as simple as page mapping. So I guess a short version of answering your question would be: GetTime() seems to be broken on this firmware. > Figuring out > the call sites is certainly also an approach, but the stack trace > provided isn't enough for doing so, I'm afraid. Even the raw hex stack > dump contains only two pointers into Xen's .text, and to figure what > they represent one would need the xen.efi that was used. Possibly even > a deeper stack dump might be needed. Agreed. I was mostly using it to poke around possible reasons for it failing. > > I appreciate and respect your position, but please hear mine as well: > > yes we're clearly into the "workaround" territory here, but clearly > > Linux is fully capable of these workaround and I would like to understand > > how expensive it will be to teach Xen those tricks as well. > > My prime example here is their blanket avoiding of the time related > runtime services, despite the EFI spec saying the exact opposite. Well, to be fair, it seems that the practical experience with various bits of hardware suggests that in this particular case avoidance may be the lesser of all the evils. Or to ask a complimentary question: what's the danger of making that patch (in a cleaned up form) the default behaviour? Will there be any instances of hardware where it may actually hurt? > "efi=no-rs" is just a wider scope workaround of this same kind. The problem with "efi=no-rs" is that it is actually unbounded. IOW, compare two cases: 1. disable a single call to GetTime() 2. disable all calls to EFI RS? Case #1 I can reason about -- case #2 -- not so much (unless somebody explains to me the full scope of what gets disabled when efi=no-rs). Now, you may say (and seems like you do ;-)) that if a small part of the implementation can't be trusted -- the entire thing shouldn't be trusted -- I don't think I will buy into that policy -- but it is a policy. > The reasoning I see behind this is that if the time related runtime > services are problematic, how likely is it that others are fine to > use? And how would an admin know without first having run into some > crash? If there are fair reasons to have finer grained disabling of > runtime services - why not? But it'll still take a command line > option to do so, unless (as was proposed) a build-time option of > enabling all (common?) workarounds by default gets made use of. Well, policy (and trust issues) aside -- I think the real question is -- it seems that there's quite a bit of downstream that agrees that avoiding GetTime() is a good idea. What options do we have to make that possible without each downstream carrying a custom patch (which I'm adding to EVE as we speak)? > > Now, whether you'd accept these tricks upstream or not is an entirely > > orthogonal question. > > Well, I'd say "separate", not "orthogonal", because the nature of > such workarounds qualifies (to me) what is or is not acceptable as > default behavior. Good point. Thanks, Roman.
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |