[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH 2/2 V2] iommu/amd: Workaround for erratum 787



>>> On 10.06.13 at 17:11, Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx>
wrote:
> On 6/10/2013 8:59 AM, Jan Beulich wrote:
>>>>> On 10.06.13 at 15:53, Suravee Suthikulanit <suravee.suthikulpanit@xxxxxxx>
>> wrote:
>>> On 6/10/2013 5:53 AM, Jan Beulich wrote:
>>>>>>> On 10.06.13 at 12:40, Tim Deegan <tim@xxxxxxx> wrote:
>>>>> At 10:47 +0100 on 10 Jun (1370861263), Jan Beulich wrote:
>>>>>>>>> On 10.06.13 at 11:35, Tim Deegan <tim@xxxxxxx> wrote:
>>>>>>> At 00:05 -0500 on 10 Jun (1370822751), suravee.suthikulpanit@xxxxxxx 
>>>>>>> wrote:
>>>>>>>> From: Suravee Suthikulpanit <suravee.suthikulpanit@xxxxxxx>
>>>>>>>>
>>>>>>>> The IOMMU interrupt handling in bottom half must clear the PPR log
>>>>> interrupt
>>>>>>>> and event log interrupt bits to re-enable the interrupt. This is done 
>>>>>>>> by
>>>>>>>> writing 1 to the memory mapped register to clear the bit. Due to 
>>>>>>>> hardware
>>>>>>> bug,
>>>>>>>> if the driver tries to clear this bit while the IOMMU hardware also 
>>>>>>>> setting
>>>>>>>> this bit, the conflict will result with the bit being set. If the 
>>>>>>>> interrupt
>>>>>>>> handling code does not make sure to clear this bit, subsequent changes 
>>>>>>>> in
>>>>>>> the
>>>>>>>> event/PPR logs will no longer generating interrupts, and would result 
>>>>>>>> if
>>>>>>>> buffer overflow. After clearing the bits, the driver must read back
>>>>>>>> the register to verify.
>>>>>>> Is there a risk of livelock here?  That is, if some device is causing a
>>>>>>> lot of IOMMU faults, a CPU could get stuck in this loop re-enabling
>>>>>>> interrupts as fast as they are raised.
>>>>>>>
>>>>>>> The solution suggested in the erratum seems better: if the bit is set
>>>>>>> after clearing, process the interrupts again (i.e. run/schedule the
>>>>>>> top-half handler).  That way the bottom-half handler will definitely
>>>>>>> terminate and the system can make some progress.
>>>>>> That's what's being done really: The actual interrupt handler disables
>>>>>> the interrupt sources, and the tasklet re-enables them (or at least is
>>>>>> supposed to do so - patch 1 isn't really correct in the respect).
>>>>> Oh I see, the idea is that we use the control register to mask
>>>>> interrupts instead of relying on the status register?  That seems
>>>>> better.  But doesn't this IOMMU already mask further interrupts when the
>>>>> pending bit in the status register is set?  I can't see any wording
>>>>> about that in the IOMMU doc but the erratum implies it.  Suravee, do you
>>>>> know if this is the case?
>>>> Actually, the documentation has a subtle but perhaps important
>>>> difference int the wording here: For EventLogInt and ComWaitInt
>>>> is read "An interrupt is generated when <status bit> = 1b and MMIO
>>>> Offset 0018h[<control bit>] = 1b", where as for PPRInt and GAInt
>>>> it says "An interrupt is generated when <status bit> changes from 0b
>>>> to 1b and MMIO Offset 0018h[<control bit>] = 1b".
>>>>
>>>> So I'd like to be one the safe side and assume further interrupts can
>>>> be generated in all cases - see also the emulation behavior in
>>>> iommu_guest.c which - afaict - always raises an interrupt, not just on
>>>> a 0 -> 1 transition.
>>> The behavior should be that the interrupt will be generated if the "xxxInt"
>>> bit is 0.  Once generated, it will be set to "1" by the hardware.  If this
>>> bit is 1, events will be added to the log but interrupt will not be
>>> generated.
>> "Should be" isn't enough here, even more so given the mentioned
>> wording differences in your documentation. We need to know how
>> actual (past, current, and future) hardware behaves.
>>
> I am sure this is what the behavior of the hardware.  Besides, only the 
> hardware can set this bit.
> I have also tested by not clearing the bit, and basically I did not see 
> additional interrupts.

Which would allow us to simplify patch 1 quite a bit - there's then
no need to clear the two interrupt-enable bits in the interrupt
handler, and iommu_check_*_log() then also wouldn't need to set
them again. We would just need to find the right point in time
when to clear the corresponding status flag. And - as written in a
response to Tim already - probably some parts of what we
discussed for patch 2 really would need to move to patch 1.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.