[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] mm: fix LLVM code-generation issue
On Fri, Nov 23, 2018 at 11:36:48AM +0000, Julien Grall wrote: > > > On 23/11/2018 11:23, Roger Pau Monné wrote: > > On Thu, Nov 22, 2018 at 05:46:19PM +0000, Julien Grall wrote: > > > > > > > > > On 11/22/18 5:04 PM, George Dunlap wrote: > > > > On 11/22/18 4:45 PM, Julien Grall wrote: > > > > > Hi Roger, > > > > > > > > > > On 11/22/18 4:39 PM, Roger Pau Monné wrote: > > > > > > On Thu, Nov 22, 2018 at 04:22:34PM +0000, Andrew Cooper wrote: > > > > > > > On 22/11/2018 16:07, Roger Pau Monné wrote: > > > > > > > > On Thu, Nov 22, 2018 at 03:23:41PM +0000, Andrew Cooper wrote: > > > > > > > > > On 22/11/2018 15:20, Roger Pau Monné wrote: > > > > > > > > > > On Thu, Nov 22, 2018 at 02:03:55PM +0000, Julien Grall > > > > > > > > > > wrote: > > > > > > > > > > > Hi Jan, > > > > > > > > > > > > > > > > > > > > > > On 11/22/18 1:36 PM, Jan Beulich wrote: > > > > > > > > > > > > > > > On 22.11.18 at 14:31, <andrew.cooper3@xxxxxxxxxx> > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > I think Julien's point is that without explicitly > > > > > > > > > > > > > barriers, CPU0's > > > > > > > > > > > > > update to system_state may not be visible on CPU1, > > > > > > > > > > > > > even though the > > > > > > > > > > > > > mappings have been shot down. > > > > > > > > > > > > > > > > > > > > > > > > > > Therefore, from the processors point of view, it did > > > > > > > > > > > > > everything > > > > > > > > > > > > > correctly, and hit a real pagefault. > > > > > > > > > > > > Boot time updates of system_state should be of no > > > > > > > > > > > > interest here, > > > > > > > > > > > > as at that time the APs are all idling. > > > > > > > > > > > That's probably true today. But this code looks rather > > > > > > > > > > > fragile as > > > > > > > > > > > you don't > > > > > > > > > > > know how this is going to be used in the future. > > > > > > > > > > > > > > > > > > > > > > If you decide to gate init code with system_state, then > > > > > > > > > > > you need > > > > > > > > > > > a barrier > > > > > > > > > > > to ensure the code is future proof. > > > > > > > > > > Wouldn't it be enough to declare system_state as volatile? > > > > > > > > > No. volatility (or lack thereof) is a compiler level > > > > > > > > > construct. > > > > > > > > > > > > > > > > > > ARM has weaker cache coherency than x86, so a write which has > > > > > > > > > completed > > > > > > > > > on one CPU0 in the past may legitimately not be visible on > > > > > > > > > CPU1 yet. > > > > > > > > > > > > > > > > > > If you need guarantees about the visibility of updated, you > > > > > > > > > must use > > > > > > > > > appropriate barriers. > > > > > > > > Right. There's some differences between ARM and x86, ARM sets > > > > > > > > SYS_STATE_active and continues to make use of init functions. > > > > > > > > In any > > > > > > > > case I have the following diff: > > > > > > > > > > > > > > > > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c > > > > > > > > index e83221ab79..cf50d05620 100644 > > > > > > > > --- a/xen/arch/arm/setup.c > > > > > > > > +++ b/xen/arch/arm/setup.c > > > > > > > > @@ -966,6 +966,7 @@ void __init start_xen(unsigned long > > > > > > > > boot_phys_offset, > > > > > > > > serial_endboot(); > > > > > > > > system_state = SYS_STATE_active; > > > > > > > > + smp_wmb(); > > > > > > > > create_domUs(); > > > > > > > > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c > > > > > > > > index 9cbff22fb3..41044c0b6f 100644 > > > > > > > > --- a/xen/arch/x86/setup.c > > > > > > > > +++ b/xen/arch/x86/setup.c > > > > > > > > @@ -593,6 +593,7 @@ static void noinline init_done(void) > > > > > > > > unsigned long start, end; > > > > > > > > system_state = SYS_STATE_active; > > > > > > > > + smp_wmb(); > > > > > > > > domain_unpause_by_systemcontroller(dom0); > > > > > > > > > > > > > > I'm afraid that that won't do anything to help at all. > > > > > > > > > > > > > > smp_{wmb,rmb}() must be in matched pairs, and mb() must be > > > > > > > matched with > > > > > > > itself. > > > > > > > > > > > > Then I'm not sure about whether our previous plan still stands, are > > > > > > we > > > > > > OK with using ACCESS_ONCE here and forgetting about the memory > > > > > > barriers given the current usage? > > > > > > > > > > The problem is not the current usage but how it could be used. > > > > > Debugging > > > > > memory ordering is quite a pain so I would prefer this to be fixed > > > > > correctly. > > > > > > > > But in this case it wouldn't be a pain, because the only possible > > > > failure mode is if the processor faults trying to read opt_bootscrub, > > > > right? > > > > > > Possibly. But I don't see any reason to defer the fix until someone comes > > > up > > > with unreliable crash. > > > > If we have to go down that route, shouldn't we also protect > > system_state with a lock so that it cannot be modified by a CPU while > > it's being read from another? > > The locking might be a bit too much. Modifying the system_state should not > be an issue if you put the correct barrier in place. What about the following scenario? BSP write;wmb(); remove init mapping; AP rmb();read read init var; -----------time------> Yes matching barriers are in place, but the result is still wrong. Can this happen? Even if we make opt_bootscrub non-init to avoid the fault, we just defer the error to a later point. This isn't really about coherency. Maybe we should put reading state under heap lock? Wei. > > Cheers, > > -- > Julien Grall _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |