[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch v5 2/5] x86/hpet: Use singe apic vector rather than irq_descs for HPET interrupts



On 27/11/2013 08:35, Jan Beulich wrote:
>>>> On 26.11.13 at 19:32, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>> On 25/11/13 07:50, Jan Beulich wrote:
>>>>>> On 22.11.13 at 17:23, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>>> On 22/11/13 15:45, Jan Beulich wrote:
>>>>>>>> On 14.11.13 at 17:01, Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
>>>>>> The new logic is as follows:
>>>>>>  * A single high priority vector is allocated and uses on all cpus.
>>>>> Does this really need to be a high priority one? I'd think we'd be
>>>>> fine with the lowest priority one we can get, as we only need the
>>>>> wakeup here if nothing else gets a CPU to wake up.
>>>> Yes - absolutely.  We cannot have an HPET interrupt lower priority than
>>>> a guest line level interrupt.
>>>>
>>>> Another cpu could be registered with our HPET channel to be worken up,
>>>> and we need to service them in a timely fashon.
>>> Which I meanwhile think hints at an issue with the (re)design:
>>> These wakeups, from an abstract pov, shouldn't be high
>>> priority interrupts - they're meant to wake a CPU only when
>>> nothing else would wake them in time. And this could be
>>> accomplished by transferring ownership of the channel during
>>> wakeup from the waking CPU to the next one to wake.
>>>
>>> WHich at once would eliminate the bogus logic selecting a channel
>>> for a CPU to re-use when no free one is available: It then wouldn't
>>> really matter which one gets re-used (i.e. could be assigned in e.g.
>>> a round robin fashion).
>>>
>>> The fundamental requirement would be to run the wakeup (in
>>> particular channel re-assignment) logic not just from the HPET
>>> interrupt, but inside an exit_idle() construct called from all IRQ
>>> paths (similar to how Linux does this).
>>>
>>> Jan
>>>
>> Irrespective of the problem of ownership, the HPET interrupt still needs
>> to be high priority.  Consider the following scenario:
>>
>> * A line level interrupt is received on pcpu 0.  It is left outstanding
>> at the LAPIC.
>> * A domain is scheduled on pcpu 0, and has has an event injected for the
>> line level interrupt.
>> * The event handler takes a long time, and during the process, the
>> domains vcpu is rescheduled elsewhere
>> * pcpu0 is now completely idle and goes to sleep.
>>
>> This scenario has pcpu 0 going to sleep with an outstanding line level
>> irq unacked at the LAPIC, with a low priority HPET interrupt blocked
>> until the domain has signalled the completion of the event.
>>
>> There is no safe scenario (given Xen's handling of line level interrupt)
>> for timer interrupts to be lower priority than the highest possible line
>> level priority.
> That's true for the "new ack" model, but not the "old" one (which is
> in particular also being used when using directed EOI). Which may
> in turn be an argument to consider whether the vector selection
> (low or high priority) shouldn't depend on the IO-APIC ack model.
>
> Jan
>

How would you go about allocating vectors then?  All IRQs are allocated
randomly between vector 0x21 and 0xdf.  You can certainly preferentially
prefer lower vectors for line level interrupts, but the only way to
guarantee that the HPET vector is greater than line level interrupts is
to have it higher than 0xdf.

I do think that allocating line level vectors lower is a good idea,
given how long they remain outstanding.  I also think that a useful
performance tweak would be for device driver domains to be able to
request a preferentially higher vector for their interrupts.

From a pragmatic point of view, there are plenty of spare high priority
vectors for use, where as we at XenServer already have usecases where we
are running out of free vectors in the range 0x21 -> 0xdf due to shear
quantities of SRIOV.  I already have half a mind to see whether I can
optimise the current allocation of vectors to make the dynamic range larger.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.