[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v2 6/7] x86/irq: handle moving interrupts in _assign_irq_vector()
On 12.06.2024 17:36, Roger Pau Monné wrote: > On Wed, Jun 12, 2024 at 03:42:58PM +0200, Jan Beulich wrote: >> On 12.06.2024 12:39, Roger Pau Monné wrote: >>> On Tue, Jun 11, 2024 at 03:18:32PM +0200, Jan Beulich wrote: >>>> On 10.06.2024 16:20, Roger Pau Monne wrote: >>>>> Currently there's logic in fixup_irqs() that attempts to prevent >>>>> _assign_irq_vector() from failing, as fixup_irqs() is required to >>>>> evacuate all >>>>> interrupts from the CPUs not present in the input mask. The current >>>>> logic in >>>>> fixup_irqs() is incomplete, as it doesn't deal with interrupts that have >>>>> move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field. >>>>> >>>>> Instead of attempting to fixup the interrupt descriptor in fixup_irqs() >>>>> so that >>>>> _assign_irq_vector() cannot fail, introduce logic in _assign_irq_vector() >>>>> to deal with interrupts that have either move_{in_progress,cleanup_count} >>>>> set >>>>> and no remaining online CPUs in ->arch.cpu_mask. >>>>> >>>>> If _assign_irq_vector() is requested to move an interrupt in the state >>>>> described above, first attempt to see if ->arch.old_cpu_mask contains any >>>>> valid >>>>> CPUs that could be used as fallback, and if that's the case do move the >>>>> interrupt back to the previous destination. Note this is easier because >>>>> the >>>>> vector hasn't been released yet, so there's no need to allocate and setup >>>>> a new >>>>> vector on the destination. >>>>> >>>>> Due to the logic in fixup_irqs() that clears offline CPUs from >>>>> ->arch.old_cpu_mask (and releases the old vector if the mask becomes >>>>> empty) it >>>>> shouldn't be possible to get into _assign_irq_vector() with >>>>> ->arch.move_{in_progress,cleanup_count} set but no online CPUs in >>>>> ->arch.old_cpu_mask. >>>>> >>>>> However if ->arch.move_{in_progress,cleanup_count} is set and the >>>>> interrupt has >>>>> also changed affinity, it's possible the members of ->arch.old_cpu_mask >>>>> are no >>>>> longer part of the affinity set, >>>> >>>> I'm having trouble relating this (->arch.old_cpu_mask related) to ... >>>> >>>>> move the interrupt to a different CPU part of >>>>> the provided mask >>>> >>>> ... this (->arch.cpu_mask related). >>> >>> No, the "provided mask" here is the "mask" parameter, not >>> ->arch.cpu_mask. >> >> Oh, so this describes the case of "hitting" the comment at the very bottom of >> the first hunk then? (I probably was misreading this because I was expecting >> it to describe a code change, rather than the case where original behavior >> needs retaining. IOW - all fine here then.) >> >>>>> and keep the current ->arch.old_{cpu_mask,vector} for the >>>>> pending interrupt movement to be completed. >>>> >>>> Right, that's to clean up state from before the initial move. What isn't >>>> clear to me is what's to happen with the state of the intermediate >>>> placement. Description and code changes leave me with the impression that >>>> it's okay to simply abandon, without any cleanup, yet I can't quite figure >>>> why that would be an okay thing to do. >>> >>> There isn't much we can do with the intermediate placement, as the CPU >>> is going offline. However we can drain any pending interrupts from >>> IRR after the new destination has been set, since setting the >>> destination is done from the CPU that's the current target of the >>> interrupts. So we can ensure the draining is done strictly after the >>> target has been switched, hence ensuring no further interrupts from >>> this source will be delivered to the current CPU. >> >> Hmm, I'm afraid I still don't follow: I'm specifically in trouble with >> the ... >> >>>>> --- a/xen/arch/x86/irq.c >>>>> +++ b/xen/arch/x86/irq.c >>>>> @@ -544,7 +544,53 @@ static int _assign_irq_vector(struct irq_desc *desc, >>>>> const cpumask_t *mask) >>>>> } >>>>> >>>>> if ( desc->arch.move_in_progress || desc->arch.move_cleanup_count ) >>>>> - return -EAGAIN; >>>>> + { >>>>> + /* >>>>> + * If the current destination is online refuse to shuffle. >>>>> Retry after >>>>> + * the in-progress movement has finished. >>>>> + */ >>>>> + if ( cpumask_intersects(desc->arch.cpu_mask, &cpu_online_map) ) >>>>> + return -EAGAIN; >>>>> + >>>>> + /* >>>>> + * Due to the logic in fixup_irqs() that clears offlined CPUs >>>>> from >>>>> + * ->arch.old_cpu_mask it shouldn't be possible to get here with >>>>> + * ->arch.move_{in_progress,cleanup_count} set and no online >>>>> CPUs in >>>>> + * ->arch.old_cpu_mask. >>>>> + */ >>>>> + ASSERT(valid_irq_vector(desc->arch.old_vector)); >>>>> + ASSERT(cpumask_intersects(desc->arch.old_cpu_mask, >>>>> &cpu_online_map)); >>>>> + >>>>> + if ( cpumask_intersects(desc->arch.old_cpu_mask, mask) ) >>>>> + { >>>>> + /* >>>>> + * Fallback to the old destination if moving is in progress >>>>> and the >>>>> + * current destination is to be offlined. This is only >>>>> possible if >>>>> + * the CPUs in old_cpu_mask intersect with the affinity mask >>>>> passed >>>>> + * in the 'mask' parameter. >>>>> + */ >>>>> + desc->arch.vector = desc->arch.old_vector; >>>>> + cpumask_and(desc->arch.cpu_mask, desc->arch.old_cpu_mask, >>>>> mask); >> >> ... replacing of vector (and associated mask), without any further >> accounting. > > It's quite likely I'm missing something here, but what further > accounting you would like to do? > > The current target of the interrupt (->arch.cpu_mask previous to > cpumask_and()) is all going offline, so any attempt to set it in > ->arch.old_cpu_mask would just result in a stale (offline) CPU getting > set in ->arch.old_cpu_mask, which previous patches attempted to > solve. > > Maybe by "further accounting" you meant something else not related to > ->arch.old_{cpu_mask,vector}? Indeed. What I'm thinking of is what normally release_old_vec() would do (of which only desc->arch.used_vectors updating would appear to be relevant, seeing the CPU's going offline). The other one I was thinking of, updating vector_irq[], likely is also unnecessary, again because that's per-CPU data of a CPU going down. Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |