[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] One-off crash on staging d36b770458


  • To: Wei Liu <wei.liu2@xxxxxxxxxx>, Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Fri, 5 Oct 2018 12:06:26 +0100
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= xsFNBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABzSlBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPsLBegQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86M7BTQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAcLB XwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Jan Beulich <JBeulich@xxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Delivery-date: Fri, 05 Oct 2018 11:06:37 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 05/10/18 11:48, Wei Liu wrote:
> Got this one-off crash while booting staging (d36b770458) on a skylake
> server. After rebooting it went away.
>
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at 
> irq.c:1173          
> (XEN) ----[ Xen-4.12-unstable  x86_64  debug=y   Tainted:  C   ]----          
>                
> (XEN) CPU:    5                                                               
>                            
> (XEN) RIP:    e008:[<ffff82d080286921>] do_IRQ+0x496/0x680                    
>             
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor                          
>             
> (XEN) rax: ffff83085df7a4c0   rbx: ffff83085df81e00   rcx: 0000000000000001   
>             
> (XEN) rdx: 0000000000000021   rsi: 0000000000000021   rdi: 0000000000000001   
>             
> (XEN) rbp: ffff83085df77d98   rsp: ffff83085df77d38   r8:  0000000000000021   
>             
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000000   
>  
> (XEN) r12: ffff8308740e4f10   r13: 0000000000000021   r14: ffff83085df81e00   
>   
> (XEN) r15: 000000000000001e   cr0: 0000000080050033   cr4: 00000000003526e0   
>   
> (XEN) cr3: 000000085da9e000   cr2: 00007fc5b6a8cfe8                           
>   
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000   
>   
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008         
>   
> (XEN) Xen code around <ffff82d080286921> (do_IRQ+0x496/0x680):                
>   
> (XEN)  be 00 00 00 7e 93 0f 0b <0f> 0b 0f 0b 0f 0b b8 00 00 00 00 eb 4e 83 bb 
> 1c
> (XEN) Xen stack trace from rsp=ffff83085df77d38:                              
>   
> (XEN)    ffff82d000000000 ffff83085df81e24 0000000000000000 0000001e8037a835  
>   
> (XEN)    ffff82d08037a841 ffff82d08037a835 ffff82d08037a841 0000000000000000  
>      
> (XEN)    0000000000000000 0000000000000000 ffff83085df77fff 0000000000000000  
>      
> (XEN)    00007cf7a2088237 ffff82d08037a8aa 0000000380f0b241 0000000000000008  
>      
> (XEN)    ffff83085df79448 ffff83085df79390 ffff83085df77ec0 0000000380f62e26  
>      
> (XEN)    00000003810cc680 ffff8307de5670a8 00000000001f644f 0000000000000809  
>      
> (XEN)    ffff83085df7a02c 0000000000000000 ffff83085df77fff 00000000000051f3  
>      
> (XEN)    ffff83085df793c0 0000002100000000 ffff82d0802e1684 000000000000e008  
>      
> (XEN)    0000000000000202 ffff83085df77e50 0000000000000000 ffff82d08059bc80  
>      
> (XEN)    00000020ffffffff ffff83085df77fff ffff82d0805a3c80 ffff83085df77eb0  
>                                    
> (XEN)    0000000000000000 0000000000000000 0000033b00000212 ffff82d08059bf00  
>   
> (XEN)    0000000000000005 ffff82d08059bf00 0000000000000005 0000000000000005  
>   
> (XEN)    ffff83085df39000 ffff83085df77ef0 ffff82d0802770b8 ffff830864159000  
>   
> (XEN)    ffff8300791fd000 ffff8300791fb000 ffff830864159000 ffff83085df77db8  
>   
> (XEN)    0000000000000000 0000000000000000 ffff88017dbe3d00 ffff88017dbe3d00  
>   
> (XEN)    0000000000000002 0000000000000002 0000000000000000 0000000000000000  
>        
> (XEN)    0000000148106000 000000005236fe09 ffffffff816fe980 ffff880182a9a7c0  
>  
> (XEN)    ffffffff82049af8 ffff880182a9a7c0 0000000000000082 0000beef0000beef  
>   
> (XEN)    ffffffff816fec52 000000bf0000beef 0000000000000246 ffffc900006dbe98  
>   
> (XEN)    000000000000beef 000000000000beef 000000000000beef 000000000000beef  
>   
> (XEN) Xen call trace:                                                         
>   
> (XEN)    [<ffff82d080286921>] do_IRQ+0x496/0x680                              
>        
> (XEN)    [<ffff82d08037a8aa>] common_interrupt+0x10a/0x120                    
>                
> (XEN)    [<ffff82d0802e1684>] mwait-idle.c#mwait_idle+0x296/0x372             
>    
> (XEN)    [<ffff82d0802770b8>] domain.c#idle_loop+0xb3/0xb5                    
>   
> (XEN)                                                                         
>   
> (XEN)                                                                         
>   
> (XEN) ****************************************                                
>   
> (XEN) Panic on CPU 5:                                                         
>   
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at 
> irq.c:1173
> (XEN) ****************************************                                
>   
> (XEN)                                                                         
>   
> (XEN) Manual reset required ('noreboot' specified)  
>
> Let me know what else is needed.

We've seen this reported sporadically in the past, and never with enough
information to investigate.  I had one reliable repro of the issue in
the past, which disappeared with a microcode update.

It is always out of mwait, and Xen's logic for which interrupts are
pending now disagrees with hardware.  This means we've seen an interrupt
at a lower priority that one we believe to be pending, which is (to a
first approximation), a violation of LAPIC priority logic.

Perhaps I should insert a tonne of debugging in place of this assertion,
with the hope that the next time we randomly encounter it, we've got
some better idea of what is going on.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.