|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v6 04/24] x86: refactor psr: implement CPU init and free flow.
> > +static void cpu_fini_work(unsigned int cpu)
> > +{
> > + unsigned int socket = cpu_to_socket(cpu);
> > +
> > + if ( !socket_cpumask[socket] || cpumask_empty(socket_cpumask[socket]) )
>
> I fear I don't understand this.
>
> Looking at 65a399ac it says:
>
> + .notifier_call = cpu_callback,
> + /*
> + * Ensure socket_cpumask is still valid in CPU_DEAD notification
> + * (E.g. our CPU_DEAD notification should be called ahead of
> + * cpu_smpboot_free).
>
> Which means that socket_cpumask[socket] should have an value.
>
> In fact cpumask_any(socket_cpumask[socket]) will return true at this
> point.
>
> Which means that code above gets called if this psr_cpu_fini is called
> _after_ cpu_smpboot_free (b/c socket_cpumask[socket] will be NULL), see
> cpu_smpboot_free.
>
> But with .priority being -1 that won't happen.
>
> But lets ignore that, lets say it does get called _after_
> cpu_smpboot_free. In which case the first part of the 'if' condition
> is true and we end up doing: cpumask_empty(NULL) which will result
> in an page fault.
>
> I think this is a bug.
I missed the fact that we call this function on CPU_UP_CANCELED
_and_ CPU_DEAD and that remove_siblinginfo is called before CPU_DEAD.
So say the socket is being onlined and we are the first CPU on
that (CPU=2).
1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates
'feat_l3_cat'.
2). Some other notifies dies and we end up calling CPU_UP_CANCELED.
psr_cpu_fini -> cpu_fini_work:
Since __cpu_up has not been called that means
socket_cpumask[socket] == NULL
and we enter free_feature.
It does 'if !(socket_info + socket)' effectively. And that is false
as init_psr was the one initializing that array which means the
pointer (info aka socket_info + socket) certainly is not NULL.
Anyhow this means free_feature tries to walk the list - but 'info'
is full of zeros.
Which means list_for_each_entry_safe is going to blow when trying
to set 'n' (as pos is zero aka NULL).
Perhaps you want an INIT_LIST_HEAD in 'init_psr' so that free_feature
can be idempotent?
Or perhaps you need '&&' in the if conditional (and naturally
remove the !)?
I wonder what other CPU notifiers are guilty of this..
Lets try another example:
1) CPU_UP_PREPARE notifier calls psr_cpu_prepare which allocates
'feat_l3_cat'.
2) All notifiers were OK, the __cpu_up has been called. The
socket_cpumask[socket] = <some data>
3) All notifiers are called with CPU_ONLINE.
All good.
..some time later ..
4). CPU is brought down. Notifiers for CPU_DOWN_PREPARE are called
[psr is not part of it]
5) CPU notifiers CPU_DEAD are called. They all have to return
NOTIFY_DONE otherwise we hit 'BUG_ON'
We call psr_cpu_fini -> cpu_fini_work
socket_cpumask[socket] has some data, so the first conditional
is false:
( !socket_cpumask[socket] ||
the second:
cpumask_empty(socket_cpumask[socket]) )
is true as take_cpu_down -> __cpu_disable -> remove_siblinginfo
was called before us.(and this is the first CPU in the socket).
And we call free_feature.
which is correct and we free our data.
And the 'cpumask_empty' guards us so that we only call this
on the very last CPU.
And we would still call this if the conditional was:
if ( socket_cpumask[socket] && cpumask_empty(socket_cpumask[socket]))
I think?
Feel free to poke holes at my analysis - I am sure I missed some edge
case.
_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |