[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH for-4.19 3/9] xen/cpu: ensure get_cpu_maps() returns false if CPU operations are underway
On 31.05.2024 09:31, Roger Pau Monné wrote: > On Fri, May 31, 2024 at 09:02:20AM +0200, Jan Beulich wrote: >> On 29.05.2024 18:14, Roger Pau Monné wrote: >>> On Wed, May 29, 2024 at 05:49:48PM +0200, Jan Beulich wrote: >>>> On 29.05.2024 17:03, Roger Pau Monné wrote: >>>>> On Wed, May 29, 2024 at 03:35:04PM +0200, Jan Beulich wrote: >>>>>> On 29.05.2024 11:01, Roger Pau Monne wrote: >>>>>>> Due to the current rwlock logic, if the CPU calling get_cpu_maps() does >>>>>>> so from >>>>>>> a cpu_hotplug_{begin,done}() region the function will still return >>>>>>> success, >>>>>>> because a CPU taking the rwlock in read mode after having taken it in >>>>>>> write >>>>>>> mode is allowed. Such behavior however defeats the purpose of >>>>>>> get_cpu_maps(), >>>>>>> as it should always return false when called with a CPU hot{,un}plug >>>>>>> operation >>>>>>> is in progress. >>>>>> >>>>>> I'm not sure I can agree with this. The CPU doing said operation ought >>>>>> to be >>>>>> aware of what it is itself doing. And all other CPUs will get back false >>>>>> from >>>>>> get_cpu_maps(). >>>>> >>>>> Well, the CPU is aware in the context of cpu_{up,down}(), but not in >>>>> the interrupts that might be handled while that operation is in >>>>> progress, see below for a concrete example. >>>>> >>>>>>> Otherwise the logic in send_IPI_mask() for example is wrong, >>>>>>> as it could decide to use the shorthand even when a CPU operation is in >>>>>>> progress. >>>>>> >>>>>> It's also not becoming clear what's wrong there: As long as a CPU isn't >>>>>> offline enough to not be in cpu_online_map anymore, it may well need to >>>>>> still >>>>>> be the target of IPIs, and targeting it with a shorthand then is still >>>>>> fine. >>>>> >>>>> The issue is in the online path: there's a window where the CPU is >>>>> online (and the lapic active), but cpu_online_map hasn't been updated >>>>> yet. A specific example would be time_calibration() being executed on >>>>> the CPU that is running cpu_up(). That could result in a shorthand >>>>> IPI being used, but the mask in r.cpu_calibration_map not containing >>>>> the CPU that's being brought up online because it's not yet added to >>>>> cpu_online_map. Then the number of CPUs actually running >>>>> time_calibration_rendezvous_fn won't match the weight of the cpumask >>>>> in r.cpu_calibration_map. >>>> >>>> I see, but maybe only partly. Prior to the CPU having its bit set in >>>> cpu_online_map, can it really take interrupts already? Shouldn't it be >>>> running with IRQs off until later, thus preventing it from making it >>>> into the rendezvous function in the first place? But yes, I can see >>>> how the IRQ (IPI) then being delivered later (once IRQs are enabled) >>>> might cause problems, too. >>> >>> The interrupt will get set in IRR and handled when interrupts are >>> enabled. >>> >>>> >>>> Plus, with how the rendezvous function is invoked (via >>>> on_selected_cpus() with the mask copied from cpu_online_map), the >>>> first check in smp_call_function_interrupt() ought to prevent the >>>> function from being called on the CPU being onlined. A problem would >>>> arise though if the IPI arrived later and call_data was already >>>> (partly or fully) overwritten with the next request. >>> >>> Yeah, there's a small window where the fields in call_data are out of >>> sync. >>> >>>>>> In any event this would again affect only the CPU leading the CPU >>>>>> operation, >>>>>> which should clearly know at which point(s) it is okay to send IPIs. Are >>>>>> we >>>>>> actually sending any IPIs from within CPU-online or CPU-offline paths? >>>>> >>>>> Yes, I've seen the time rendezvous happening while in the middle of a >>>>> hotplug operation, and the CPU coordinating the rendezvous being the >>>>> one doing the CPU hotplug operation, so get_cpu_maps() returning true. >>>> >>>> Right, yet together with ... >>>> >>>>>> Together with the earlier paragraph the critical window would be between >>>>>> the >>>>>> CPU being taken off of cpu_online_map and the CPU actually going "dead" >>>>>> (i.e. >>>>>> on x86: its LAPIC becoming unresponsive to other than INIT/SIPI). And >>>>>> even >>>>>> then the question would be what bad, if any, would happen to that CPU if >>>>>> an >>>>>> IPI was still targeted at it by way of using the shorthand. I'm pretty >>>>>> sure >>>>>> it runs with IRQs off at that time, so no ordinary IRQ could be >>>>>> delivered. >>>>>> >>>>>>> Adjust the logic in get_cpu_maps() to return false when the CPUs lock is >>>>>>> already hold in write mode by the current CPU, as read_trylock() would >>>>>>> otherwise return true. >>>>>>> >>>>>>> Fixes: 868a01021c6f ('rwlock: allow recursive read locking when already >>>>>>> locked in write mode') >>>>>> >>>>>> I'm puzzled by this as well: Prior to that and the change referenced by >>>>>> its >>>>>> Fixes: tag, recursive spin locks were used. For the purposes here that's >>>>>> the >>>>>> same as permitting read locking even when the write lock is already held >>>>>> by >>>>>> the local CPU. >>>>> >>>>> I see, so the Fixes should be: >>>>> >>>>> x86/smp: use APIC ALLBUT destination shorthand when possible >>>>> >>>>> Instead, which is the commit that started using get_cpu_maps() in >>>>> send_IPI_mask(). >>>> >>>> ... this I then wonder whether it's really only the condition in >>>> send_IPI_mask() which needs further amending, rather than fiddling with >>>> get_cpu_maps(). >>> >>> That the other option, but I have impression it's more fragile to >>> adjust the condition in send_IPI_mask() rather than fiddle with >>> get_cpu_maps(). >>> >>> However if that's the preference I can adjust. >> >> I guess we need other REST input here then. The two of us clearly disagree on >> what use of get_cpu_maps() is meant to guarantee. And I deem fiddling with >> common code here more risky (and more intrusive - the other change would be >> a single-line code change afaict, plus extending the related comment). > > How do you envision that other change to be done? Adding an extra > variable and toggling it in cpu_hotplug_{begin,done}() to signal > whether a CPU hotplug is in progress? I was thinking of an is-write-locked-by-me check on cpu_add_remove_lock. Jan
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |