[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Interrupt injection with ISR set on Intel hardware


  • To: Chao Gao <chao.gao@xxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Mon, 22 Oct 2018 08:57:33 +0100
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Kevin Tian <kevin.tian@xxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Jan Beulich <JBeulich@xxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Mon, 22 Oct 2018 07:57:55 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 22/10/2018 08:33, Chao Gao wrote:
> On Mon, Oct 15, 2018 at 01:06:12PM +0100, Andrew Cooper wrote:
>> On 15/10/18 11:30, Roger Pau Monné wrote:
>>> Hello,
>>>
>>> Wei recently discovered an issue when running a Linux PVH Dom0 on a
>>> box with a Intel Family 6 (0x6), Model 158 (0x9e), Stepping 9 (raw
>>> 000906e9) CPU, we are not sure whether the issue is limited to a PVH
>>> Dom0, or it just happens to be easier to trigger in this scenario.
>> This issue has been seen very occasionally for years.  My debugging
>> patch dates back to 2013, and it has been observed on Haswell systems as
>> well.  There have also been a handful of reports on xen-devel over the
>> years.
>>
>> Wei is the first person to get a reliable enough repro to debug.  It is
>> not exclusive to PVH Dom0, but that appears to be the easiest way to
>> tickle the problem.
>>
>>> The issue is caused by what seems to be an interrupt injection while
>>> Xen is still servicing a previous interrupt (ie: the interrupt hasn't
>>> been EOI'ed and ISR for the vector is set) with the same or lower
>>> priority than the interrupt currently being serviced. This injection
>>> always happen when returning from idle from a state ACPI_STATE_C3 or
>>> lower.
>> As a bit of background, for some guest irqs, we need to inject the
>> interrupt into the guest and wait for an explicit ack.
>>
>> If the irq source doesn't have a mask bit which Xen can use, the only
>> option we have is to avoid repeated interruption is to leave the irq in
>> service at the LAPIC.  The purpose of the Pending EOI stack is to manage
>> these as acks arrive back from guest context.
>>
>> For reasons which aren't clear, guest-bound MSI vectors which don't have
>> a mask bit also use this PEOI stack mechanism.  I think this is probably
>> a Xen bug, but it also relevant to the issue.
>>
>> In Wei's case, the interrupt in question is an MSI non-maskable
>> interrupt from the USB controller.
>>
>>> Note that I haven't been able to reproduce this issue when using
>>> mwait-idle=0 or max_cstate=2 on the Xen command line, but again
>>> without knowing the underlying issue it's impossible to tell whether
>>> it's relevant.
>>>
>>> Andrew provided a debug patch which I've expanded to also log power
>>> state transition, and is attached to this email.
>>>
>>> Here is a trace of a crash, together with the debug info.
>>>
>>> (XEN) *** Pending EOI error ***
>>> (XEN)   cpu #1, irq 30, vector 0x21, sp 1
>>> (XEN) Peoi stack: sp 1
>>> (XEN)   [ 0] irq  30, vec 0x21, ready 0, ISR 1, TMR 0, IRR 0
>>> (XEN) Peoi stack trace records:
>>> (XEN)   [22619] POP      {sp  1, irq  30, vec 0x21}
>>> (XEN)   [22620] POWER    TYPE 4
>>> (XEN)   [22621] IDLE     PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22622] WAKE     PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000004
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22623] ACK_PRE  PPR 0x000000f0
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000004
>>> (XEN)   [22624] ACK_POST PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22625] POWER    TYPE 5
>>> (XEN)   [22626] IDLE     PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22627] WAKE     PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22628] PUSH     {sp  0, irq  30, vec 0x21}
>>> (XEN)   [22629] POWER    TYPE 5
>>> (XEN)   [22630] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22631] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22632] POWER    TYPE 5
>>> (XEN)   [22633] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22634] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000004
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22635] ACK_PRE  PPR 0x000000f0
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000004
>>> (XEN)   [22636] ACK_POST PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22637] READY    {sp  1, irq  30, vec 0x21}
>>> (XEN)   [22638] ACK_PRE  PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22639] ACK_POST PPR 0x00000010
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22640] POP      {sp  1, irq  30, vec 0x21}
>>> (XEN)   [22641] PUSH     {sp  0, irq  30, vec 0x21}
>>> (XEN)   [22642] POWER    TYPE 4
>>> (XEN)   [22643] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000000000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22644] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22645] POWER    TYPE 3
>>> (XEN)   [22646] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22647] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22648] POWER    TYPE 3
>>> (XEN)   [22649] IDLE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)   [22650] WAKE     PPR 0x00000020
>>> (XEN)                    IRR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>>> (XEN)                    ISR 
>>> 0000000002000000000000000000000000000000000000000000000000000000
>> What has happened here is that, despite vector 0x21 being in service
>> (starting at the PUSH), we see it injected a second time.  The ASSERT()
>> fires because we find this vector still on the pending EOI stack.
>>
>> After that, we go idle a few times, but never haven't yet acked the
>> vector (i.e. whatever we're waiting for the guest to acknowledge hasn't
>> happened yet, and Xen has nothing else to do on this CPU).
>>
> >From the debugging, we see that PPR/IRR/ISR appear to retain their state
>> across the mwait, and there is nothing in the manual which I can see
>> discussing the interaction of LAPIC state and C states.
>>
>> However, from the behaviour seen here, we occasionally get woken from
>> mwait by an interrupt which already pending.  I can only conclude that
>> there is some issue with priority calculations for edge triggered
>> interrupts when idle, which allows another one to slip in.  The fact
> Hi, Roger, Andrew and Wei,
>
> Jan's patch
> (https://lists.xen.org/archives/html/xen-devel/2018-10/msg01031.html)
> fixs an issue in handling SVI. Currently, when dealing with EOI from guest, 
> the
> SVI was cleared. But the correct way is clearing the corresponding bit in VISR
> and then setting SVI to the highest index of bit set in VISR (please refer to
> SDM 29.1.4). If SVI is set to a value lower than the vector of the highest
> priority interrupt that is in service, the PPR virtualization (29.1.3) might
> set the VPPR to a lower value on VMEntry too. Thus an interrupt with same or
> lower priority, which should be blocked by VPPR, slips in.
>
> Could you apply Jan's patch and try to reproduce it again?

Hello,

I'm aware of Jan's patch, but pertains to Xen's emulation of the virtual
Local APIC for a guest.

This bug is with the real hardware APIC, as it pertains waking from
MWAIT.  At the point that things go wrong, there is no VT-x involved at all.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.