[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Xen HPET improvement proposal
Hello, Having now had enough time to mentally unravel the current HPET code, I present a proposal here for new logic, to replace the currently buggy handling, which can be observed using a debug build of Xen, where it is possible to have a stack overflow because of incorrectly pointed interrupts, and errors nesting them. In a system using HPETs in combination with idle, there are N HPETs and M cpus. When a cpu wishes to idle, it must set up an external interrupt to wake it back up again in time for its next deadline. If there are free HPETs, this is easy; grab a free one and program it to interrupt you at some point in the future. If however all the HPETs are already allocated, one must be shared. The root problems with the current situation are twofold. Xen runs the action handler for all IRQs with interrupts enabled and having already been acknowledged at the LAPIC. This allows arbitrary stacking of interrupts, including lower priority interrupts. (This is a serious problem which shall be fixed, but not part of this proposal). Currently, HPET interrupts use the regular IRQ machinery including irq migration, which results in the HPET interrupts being delivered to the wrong cpus. It also means that they can be delayed for arbitrary lengths of time behind an active line level interrupt being delivered to a guest. Furthermore, there appears to be extra, overly-complicated fixup code in the interrupt handler itself, apparently working around a lack of understanding of why the interrupts are arriving at the wrong cpus/wrong time. Independently of the HPET issues themselves, I have identified a race condition in the mwait-idle routines where a cpu which is preparing to sleep can arrange for another cpu to wake it up, and have that other cpu wake it up before it has enabled its mwait trigger, meaning that it will idle for an arbitrary length of time in mwait. Realistically, the cpu will be woken up by the time calibration rendezvous once a second, and possibly by the watchdog NMI every half second. For the new mechanism, I propose that HPET interrupts get a direct_apic_vector and completely bypass the IRQ mechanism. This gives the HPET interrupts guaranteed higher priority than all guest interrupts. When a cpu wishes to idle, tries to find an HPET. If there is a free HPET, the cpu becomes the owner of the HPET. It sets the HPET up to interrupt itself at some point in the future and goes to sleep. If there is not a free HPET, a cpu will need to share with another cpu. If this cpu can find another HPET which will fire at an appropriate time, the cpu can merely ask for it to be woken up by the HPET owner when the owner wakes up. If all the HPETs are programmed to fire a sufficient time into the future, one needs to be shortened. The cpu should choose the soonest HPET, add itself to the owner's list of other pcpus to wake, and reprogram the HPET to fire sooner. It should not reprogram the HPET to point to itself. The final requirement makes it far far easier to validate the correctness of the correctness of the fix, and in particular that interrupts are arriving at the expected cpu. Given a validated solution proved to work, it might be possible to relax the requirement, so long as a reasonable solution to waking up the original owner is found (and I can't offhand think of a neat way of doing this, as ownership could move around arbitrarily). I would appreciate thoughts and comments. This will end up being a substantial rewrite of most of hpet.c, but I believe the result will be shorter, more simple and far more reliable. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |