[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.
On 17-02-08 14:03:15, Konrad Rzeszutek Wilk wrote: > > > +static void cpu_fini_work(unsigned int cpu) > > > +{ > > > + unsigned int socket = cpu_to_socket(cpu); > > > + > > > + if ( !socket_cpumask[socket] || > > > cpumask_empty(socket_cpumask[socket]) ) > > > > I fear I don't understand this. > > > > Looking at 65a399ac it says: > > > > + .notifier_call = cpu_callback, > > + /* > > + * Ensure socket_cpumask is still valid in CPU_DEAD notification > > + * (E.g. our CPU_DEAD notification should be called ahead of > > + * cpu_smpboot_free). > > > > Which means that socket_cpumask[socket] should have an value. > > > > In fact cpumask_any(socket_cpumask[socket]) will return true at this > > point. > > > > Which means that code above gets called if this psr_cpu_fini is called > > _after_ cpu_smpboot_free (b/c socket_cpumask[socket] will be NULL), see > > cpu_smpboot_free. > > > > But with .priority being -1 that won't happen. > > > > But lets ignore that, lets say it does get called _after_ > > cpu_smpboot_free. In which case the first part of the 'if' condition > > is true and we end up doing: cpumask_empty(NULL) which will result > > in an page fault. > > > > I think this is a bug. > > I missed the fact that we call this function on CPU_UP_CANCELED > _and_ CPU_DEAD and that remove_siblinginfo is called before CPU_DEAD. > > So say the socket is being onlined and we are the first CPU on > that (CPU=2). > > > 1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates > 'feat_l3_cat'. > > 2). Some other notifies dies and we end up calling CPU_UP_CANCELED. > psr_cpu_fini -> cpu_fini_work: > > Since __cpu_up has not been called that means > socket_cpumask[socket] == NULL > > and we enter free_feature. > > It does 'if !(socket_info + socket)' effectively. And that is false > as init_psr was the one initializing that array which means the > pointer (info aka socket_info + socket) certainly is not NULL. > > Anyhow this means free_feature tries to walk the list - but 'info' > is full of zeros. > > Which means list_for_each_entry_safe is going to blow when trying > to set 'n' (as pos is zero aka NULL). > > Perhaps you want an INIT_LIST_HEAD in 'init_psr' so that free_feature > can be idempotent? > > Or perhaps you need '&&' in the if conditional (and naturally > remove the !)? > > I wonder what other CPU notifiers are guilty of this.. > You are right. I made a mistake here. Thanks a lot for finding out the issue! In previous version, I called INIT_LIST_HEAD in init_psr. But I moved it into cpu_init_work to place it together with spin_lock_init per review comments. This causes this mis-match issue. I think your suggestion is good to change check criteria as below. if ( socket_cpumask[socket] && cpumask_empty(socket_cpumask[socket]) ) Then, free_feature will only be executed when CPU_STARTING has been done. > > Lets try another example: > > 1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates > 'feat_l3_cat'. > > 2) All notifiers were OK, the __cpu_up has been called. The > socket_cpumask[socket] = <some data> > > 3) All notifiers are called with CPU_ONLINE. > All good. > > > ..some time later .. > > 4). CPU is brought down. Notifiers for CPU_DOWN_PREPARE are called > [psr is not part of it] > > 5) CPU notifiers CPU_DEAD are called. They all have to return > NOTIFY_DONE otherwise we hit 'BUG_ON' > > We call psr_cpu_fini -> cpu_fini_work > > socket_cpumask[socket] has some data, so the first conditional > is false: > > ( !socket_cpumask[socket] || > > the second: > > cpumask_empty(socket_cpumask[socket]) ) > > is true as take_cpu_down -> __cpu_disable -> remove_siblinginfo > > was called before us.(and this is the first CPU in the socket). > > And we call free_feature. > which is correct and we free our data. > > And the 'cpumask_empty' guards us so that we only call this > on the very last CPU. > > > And we would still call this if the conditional was: > > if ( socket_cpumask[socket] && cpumask_empty(socket_cpumask[socket])) > > I think? > > Feel free to poke holes at my analysis - I am sure I missed some edge > case. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |