[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] x86/SMP: don't try to stop already stopped CPUs
>>> On 03.06.19 at 16:12, <roger.pau@xxxxxxxxxx> wrote: > On Wed, May 29, 2019 at 04:17:49AM -0600, Jan Beulich wrote: >> In particular with an enabled IOMMU (but not really limited to this >> case), trying to invoke fixup_irqs() after having already done >> disable_IO_APIC() -> clear_IO_APIC() is a rather bad idea: >> >> RIP: e008:[<ffff82d08026a036>] amd_iommu_read_ioapic_from_ire+0xde/0x113 >> RFLAGS: 0000000000010006 CONTEXT: hypervisor (d0v0) >> rax: ffff8320291de00c rbx: 0000000000000003 rcx: ffff832035000000 >> rdx: 0000000000000000 rsi: 0000000000000000 rdi: ffff82d0805ca840 >> rbp: ffff83009e8a79c8 rsp: ffff83009e8a79a8 r8: 0000000000000000 >> r9: 0000000000000004 r10: 000000000008b9f9 r11: 0000000000000006 >> r12: 0000000000010000 r13: 0000000000000003 r14: 0000000000000000 >> r15: 00000000fffeffff cr0: 0000000080050033 cr4: 00000000003406e0 >> cr3: 0000002035d59000 cr2: ffff88824ccb4ee0 >> fsb: 00007f2143f08840 gsb: ffff888256a00000 gss: 0000000000000000 >> ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008 >> Xen code around <ffff82d08026a036> > (amd_iommu_read_ioapic_from_ire+0xde/0x113): >> ff 07 00 00 39 d3 74 02 <0f> 0b 41 81 e4 00 f8 ff ff 8b 10 89 d0 25 00 00 >> Xen stack trace from rsp=ffff83009e8a79a8: >> ... >> Xen call trace: >> [<ffff82d08026a036>] amd_iommu_read_ioapic_from_ire+0xde/0x113 >> [<ffff82d08026bf7b>] iommu_read_apic_from_ire+0x10/0x12 >> [<ffff82d08027f718>] io_apic.c#modify_IO_APIC_irq+0x5e/0x126 >> [<ffff82d08027f9c5>] io_apic.c#unmask_IO_APIC_irq+0x2d/0x41 >> [<ffff82d080289bc7>] fixup_irqs+0x320/0x40b >> [<ffff82d0802a82c4>] smp_send_stop+0x4b/0xa8 >> [<ffff82d0802a7b2f>] machine_restart+0x98/0x288 >> [<ffff82d080252242>] console_suspend+0/0x28 >> [<ffff82d0802b01da>] do_general_protection+0x204/0x24e >> [<ffff82d080385a3d>] x86_64/entry.S#handle_exception_saved+0x68/0x94 >> [<00000000aa5b526b>] 00000000aa5b526b >> [<ffff82d0802a7c7d>] machine_restart+0x1e6/0x288 >> [<ffff82d080240f75>] hwdom_shutdown+0xa2/0x11d >> [<ffff82d08020baa2>] domain_shutdown+0x4f/0xd8 >> [<ffff82d08023fe98>] do_sched_op+0x12f/0x42a >> [<ffff82d08037e404>] pv_hypercall+0x1e4/0x564 >> [<ffff82d080385432>] lstar_enter+0x112/0x120 >> >> Don't call fixup_irqs() and don't send any IPI if there's only one >> online CPU anyway, and don't call __stop_this_cpu() at all when the CPU >> we're on was already marked offline (by a prior invocation of >> __stop_this_cpu()). >> >> Reported-by: Andrew Cooper <andrew.cooper3@xxxxxxxxxx> >> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> >> >> --- a/xen/arch/x86/smp.c >> +++ b/xen/arch/x86/smp.c >> @@ -302,23 +302,31 @@ static void stop_this_cpu(void *dummy) >> */ >> void smp_send_stop(void) >> { >> - int timeout = 10; >> + unsigned int cpu = smp_processor_id(); >> >> - local_irq_disable(); >> - fixup_irqs(cpumask_of(smp_processor_id()), 0); >> - local_irq_enable(); >> - >> - smp_call_function(stop_this_cpu, NULL, 0); >> - >> - /* Wait 10ms for all other CPUs to go offline. */ >> - while ( (num_online_cpus() > 1) && (timeout-- > 0) ) >> - mdelay(1); >> - >> - local_irq_disable(); >> - disable_IO_APIC(); >> - hpet_disable(); >> - __stop_this_cpu(); >> - local_irq_enable(); >> + if ( num_online_cpus() > 1 ) >> + { >> + int timeout = 10; >> + >> + local_irq_disable(); >> + fixup_irqs(cpumask_of(cpu), 0); >> + local_irq_enable(); >> + >> + smp_call_function(stop_this_cpu, NULL, 0); >> + >> + /* Wait 10ms for all other CPUs to go offline. */ >> + while ( (num_online_cpus() > 1) && (timeout-- > 0) ) >> + mdelay(1); >> + } >> + >> + if ( cpu_online(cpu) ) > > Won't this be better placed inside the previous if? Is it valid to > have a single CPU and try to offline it? No to the first question, and I'm not sure I see how you came to the 2nd one. If there's just a single online CPU, then there's no need to fixup_irqs(), and there's no-one to IPI. Yet the local CPU still needs to do everything that should happen once in UP mode, unless this CPU has been offlined already before (as was the case in Andrew's report, at least as far as I was able to deduce). Jan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |