[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] patch "x86/cpufreq: relocate the driver register function" breaks cpu hot(un)plug



On 09/10/2015 04:01, Konrad Rzeszutek Wilk wrote:
> On Fri, Oct 09, 2015 at 06:48:23PM +0200, Dario Faggioli wrote:
> > Hey,
> >
> > As far as my bisection goes, commit
> > 49388f11d512bb92706ce046643bfbb3c1d963c9 "x86/cpufreq: relocate the
> > driver register function" prevents me from hot unplugging pCPUs.
> >
> > Xen does not crash or anything, but dom0 is stalled. In fact, with
> > current staging, here's what I see:
> >
> > root@Zhaman:~# echo 0 > /sys/devices/system/xen_cpu/xen_cpu6/online
> > [   81.583001] INFO: rcu_sched detected stalls on CPUs/tasks: { 12} 
> > (detected
> by 3, t=5252 jiffies, g=1691, c=1690, q=76)
> > [   81.583036] Task dump for CPU 12:
> > [   81.583044] bash            R  running task        0  1347   1094 
> > 0x00000008
> > [   81.583056]  ffffffff00000000 0000000000000000 0000000000000000
> ffff8800192c2e38
> > [   81.583070]  ffff8800008472e8 0000000000000002 ffff8800008472e8
> ffff880013817858
> > [   81.583082]  0000000000000000 00000000000081a4 ffffffff811e8137
> ffff8800192c2e38
> > [   81.583095] Call Trace:
> > [   81.583110]  [<ffffffff811e8137>] ? notify_change+0x2f7/0x390
> > [   81.583148]  [<ffffffff811c8c74>] ? do_truncate+0x74/0x90
> > [   81.583158]  [<ffffffff811e2866>] ? dput+0x26/0x230
> > [   81.583167]  [<ffffffff811d53c5>] ? terminate_walk+0x35/0x40
> > [   81.583176]  [<ffffffff811d92b1>] ? do_last+0x621/0x12c0
> > [   81.583188]  [<ffffffff8139f0e7>] ? xen_pcpu_down+0x47/0x70
> > [   81.583199]  [<ffffffff8156c64d>] ? store_online+0x9d/0xb0
> > [   81.583210]  [<ffffffff81240bfc>] ? kernfs_fop_write+0x12c/0x180
> > [   81.583220]  [<ffffffff811ca513>] ? __vfs_write+0x23/0xf0
> > [   81.583230]  [<ffffffff811cd142>] ? __sb_start_write+0x42/0xf0
> > [   81.583241]  [<ffffffff8125f711>] ? security_file_permission+0x21/0xa0
> > [   81.583250]  [<ffffffff811caea1>] ? vfs_write+0xa1/0x1c0
> > [   81.583259]  [<ffffffff811c828f>] ? filp_close+0x4f/0x70
> > [   81.583268]  [<ffffffff811cbb12>] ? SyS_write+0x42/0xb0
> > [   81.583277]  [<ffffffff811e9031>] ? __close_fd+0x71/0xb0
> > [   81.583287]  [<ffffffff815780f2>] ? system_call_fastpath+0x16/0x75
> > [  144.555020] INFO: rcu_sched detected stalls on CPUs/tasks: { 12}
> > (detected by 4, t=21007 jiffies, g=1691, c=1690, q=244) [  144.555046] Task
> dump for CPU 12:
> > [  144.555051] bash            R  running task        0  1347   1094 
> > 0x00000008
> > [  144.555059]  ffffffff00000000 0000000000000000 0000000000000000
> > ffff8800192c2e38 [  144.555068]  ffff8800008472e8 0000000000000002
> > ffff8800008472e8 ffff880013817858 [  144.555076]  0000000000000000
> > 00000000000081a4 ffffffff811e8137 ffff8800192c2e38 [  144.555084] Call
> Trace:
> > [  144.555096]  [<ffffffff811e8137>] ? notify_change+0x2f7/0x390 [
> > 144.555105]  [<ffffffff811c8c74>] ? do_truncate+0x74/0x90 [
> > 144.555112]  [<ffffffff811e2866>] ? dput+0x26/0x230 [  144.555118]
> > [<ffffffff811d53c5>] ? terminate_walk+0x35/0x40 [  144.555124]
> > [<ffffffff811d92b1>] ? do_last+0x621/0x12c0 [  144.555164]
> > [<ffffffff8139f0e7>] ? xen_pcpu_down+0x47/0x70 [  144.555172]
> > [<ffffffff8156c64d>] ? store_online+0x9d/0xb0 [  144.555179]
> > [<ffffffff81240bfc>] ? kernfs_fop_write+0x12c/0x180 [  144.555186]
> > [<ffffffff811ca513>] ? __vfs_write+0x23/0xf0 [  144.555192]
> > [<ffffffff811cd142>] ? __sb_start_write+0x42/0xf0 [  144.555200]
> > [<ffffffff8125f711>] ? security_file_permission+0x21/0xa0
> > [  144.555206]  [<ffffffff811caea1>] ? vfs_write+0xa1/0x1c0 [
> > 144.555212]  [<ffffffff811c828f>] ? filp_close+0x4f/0x70 [
> > 144.555217]  [<ffffffff811cbb12>] ? SyS_write+0x42/0xb0 [  144.555223]
> > [<ffffffff811e9031>] ? __close_fd+0x71/0xb0 [  144.555230]
> > [<ffffffff815780f2>] ? system_call_fastpath+0x16/0x75
> >
> > If I revert that patch, the issue goes away.
> >
> > Any ideas?

Hi Dario,

Please also remove "register_cpu_notifier(&cpu_nfb)" in the 
cpufreq_register_driver function as well. (found that it has already been 
included in cpufreq_presmp_nfb()).

Best,
Wei

> I think it is due to xen-acpi-processor re-uploading the C and P states 
> whenever
> an CPU goes up. It also does this after S3 suspend.
> 
> Anyhow it may be due to the fact that cpufreq_register_driver in Xen is now
> '__init' If you remove that little thing would it work?
> 
> >
> > Regards,
> > Dario
> >
> > PS. yes, I'll implement a cpu hotplug/unplug testcase ASAP. :-)
> >
> > --
> > <<This happens because I choose it to happen!>> (Raistlin Majere)
> > -----------------------------------------------------------------
> > Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software
> > Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
> >
> 
> 
> 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@xxxxxxxxxxxxx
> > http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.