[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-users] Dom0 crashes without logging lately on Debian Stretch with Xen 4.8
 
- To: xen-users@xxxxxxxxxxxxxxxxxxxx
 
- From: Michael <delajamal@xxxxxx>
 
- Date: Tue, 6 Nov 2018 10:08:02 +0100
 
- Autocrypt: addr=mjs@xxxxxxxxxx; keydata= xsFNBFRPcX4BEACx8zwNH8NYu57EJS81DMf2JG9t90gu4M3ovbGjj86SQt7j0qw02aVIIOw+ w3++9wv9Wgi/2XahKWRoEaablILwE1jlo2sGeNSmRTbOB6uUYsO8b9gTjgGKYsMK1wg1DEM1 5wQExCs6nTTMkwDekPrclRPmDFBN1SEUXlGSR/u3meMovsJRZD0Iy/apAEaBf7XJgGNGQMht mVsO4jS/X/0p7q3njRFgo9KZL0OCqRUDRcENI07lJY3HILY0wLKbAxnj80Cvz/EYSq/jSjYB YfQ3YA3FIXx0POEfNLEvXctEqXanfNkFLRki5LHd1RTNjRXynu6IHzDtAC4VwhjoUA9JFVZj g2qp0SDGXIA4b3rWlxtfUMdYVfr4z46h7AH0nWsxfCaoLSCwvE0u9UgQq+ZbaSNDXz+tsbFs oYY2qWdvGPPwWXh2R3i0t93SElKrZVHt9OhUCRJKfQKiuGoaigLDN/asyS0bqfw0olNUcsF2 ai00WHUIzKO15nyuObHlJ0747Oork7+Xn9vk9nARB4IYSFgRwD3Ruiur1K8ZhDWRYENd6uQ1 qZ5S2Q3NNJyH3LWrqjdMraxtp9okVuPccrBurzSK1aqzS2XukHYR0Lzt5jeAvQG5l3FsyEXj hNVJBo242mMp9UKEUjqDVTXxTEUCiqwWsZLRC9ouGcIJxHPb4QARAQABzSFNaWNoYWVsIFN0 aWVnbGVyIDxtanNAY2VwaGVpLmNvbT7CwZYEEwEIAEACGyMHCwkIBwMCAQYVCAIJCgsEFgID AQIeAQIXgBYhBP6kJTifkSbfbn6+g1772gvxRt6VBQJbj3azBQllPBQ1AAoJEF772gvxRt6V DwEQAItTtMOUDQvQrB0hp0gbMpdaph50pFGSuQlneMl+5oQlVXkHRR8CcQAhDppQ7Yda7a1U 2s2z/QStzhJuiYlw94rSXKVbUwXYdOAKDqjLclJDn72+hHHkEmRYmUZ1zLprsTjV/EuuhnQO BXEumj17xenav3yMwH41eN67TIvLOfvt18a+ZpF4q2CaoVQ5Lqmcszr1I9NIrHqkkKG/+n1V GV3qEIWR11/WCNLyvUscyApLdWRTBMftRaMnFrkZ/kYpea9KymUYr7VhjLGqaPdWf6zQ4Qa7 3MDVxICPU4NqryspeGh5gPRX5t4CDDx9QqzEXrNuYDd0rnG96XIzVWZ7TDctmP9eLfmpHCzI DASM1Ubnf0HmCC0TE7Z+8a7y+Fti3Eu8I+4bHkcTDNLEf/lCVIUO0SScVERy1YJJFNWKq2+8 lq+JY4bX/kTRw8zEMXe4VmzXZ9SQ0FpCfrJM8kdRJP+ujHmojM36TqEGYGuufo4MQTC4X027 WLoUxn/0tMWMYICcnjnMYDN1uFwcYaPQ2KLyMXiVOcueL/Pv9m7FwtW2YJl4HfQHzfoc/ki6 hb5pd43Lxo0QEiqaJ/xWSN84IhaMrGWig9nsZJv1BBKr+2w0n0pBCcdBudan9ErUkNq9Sfke ml5Cm+sCFp+HVfzAGCfEtza28dZPHZhzcOoSg+thzsFNBFRPcX4BEAC1FIhNiNvlx8+Pc69X eh2jHumTosiu5D5Li+PaxbazerxaqYPZe6z5f39iFDQycLKCOauDyybAMmydmVztUrLBCag+ SPr2yWQaEJIaOwdSqPlBv0zJHrEu7vIZ+9i6C3cIiXSrfBVxEaAiurhl7WWpVaSxO7t7ya1B RsKSOY6yttRsAMCm5Tu8GyNoRCbh3+7qIyaYwVpbJETgowgZU68u9TOMnkG1fE0BlJb7qbCh fcXLJqBmj7R3xfCVMhXmyQ8PxXLUKwQKguGej46QzQlRjeQYABMRUkWPg//h3QfJlQmUW97k FAyV9gNwP+FsCfKx0mTON+iGheiV/0W3PQZ3+3J/i7LxtqixGrw0aPNXymvmxYOmBeNBTk0V 13IhcZXyW/r+E8lT6SYPx4PGSRNhahYns0TsE1TMTlNgjz7PibpBopOq0RnPs4cRnMCdt/Cs H9VW3TQMvZR6CgCk502YvPn7G82lDXLntU/fHxDksT3XRl+aWtluaLKNHjnRx8MRUn3QU9kL lAVzoROpWIKhdsM/BckXran/+DY/A30n8z3OUaEy3RZpadDZGJEuF+FoOYs+UDlq+YKBQt4Z 8gCUnx41KuVp9JxupyXMaK2uROzNF8KAZ4dRkzB/B42gHlmfKBb0pz9xkpX7xoBtihzcFeqx wsUPRTne/PMnZxgLXwARAQABwsF8BBgBCAAmAhsMFiEE/qQlOJ+RJt9ufr6DXvvaC/FG3pUF AluPdrMFCWU8FDUACgkQXvvaC/FG3pVExg//ZTy+3kGrhWPfKa96i4ET3PcG84PjcZZVNhPQ Crp253GJWw4sUk+6O94Z0IUdtUSrQHvxdkkQn8FCFP6SZaZVjpd/bcfO6FSc6xoMK9YRHPl7 PYa20uUzXnldJCXYdGXCiBWAj0igTdTFaAbNIruHE7lwIUq2lMwtBzLH5nJqPgxCEcWFRtg4 aDwtyLncrNXLVx8zXDlVhsaafU3O8bMJOzr20otFf2LGBWy1w+PaA5io3/4YOkhcLZj36a6T /M3BXjRfSLHYyg7xgTUvhx47LK0Fxb4T4oM6e+dPTqQO0HPFYJubpUH3Fy717SptFVlrTG5O FFHkGMYu4D7AMwflIRMEMiuR3cuMkYnW65kz8W7aWinuqEwwuB+NCda5r/Cct7eBTvoO9avi a1DlRMlDmhuoV2diReiDPy+GZdPAh4CTNhmGh3oohVLYmGlC9vmUR7lFpJxLFIEpJXGgqvRI ZCQwH4BD2vSlvvi0OpCmBGt0X7LP0qREqS1Bkpk/egGIod7gNIlEeXfuSEdOtqqgQzYqmGm5 Pk1DKdaFpen1AJgVOFghgL9k/aq9ZNtymk7MXlk2PJv0W3rcbb2tEgHIM7R4MbPGDIfLR79n zrIrgTrNPBWM5q0inWGNwUfDag6mn9U1Ou5k2vrXGmUggQJA/8HEDqPsZy85Vx6uyaws94A=
 
- Delivery-date: Tue, 06 Nov 2018 09:09:04 +0000
 
- List-id: Xen user discussion <xen-users.lists.xenproject.org>
 
- Openpgp: preference=signencrypt
 
 
 
| 
  
  
     Hello,  
     
     
     
    i had the same Issues. 
    In my case i tried  
    Ubuntu 18.04 with xen 4.9 and the Kernel Version 4.15.9 was the only
    one wo has start up the DomU. 
     
    Tested on AMD Ryzen 1800X and Intel 8700. 
     
    In my case i got random system freezes Uptimes between 7 and 30
    Days. 
     
    Older and never Kernels wont run. 
    This Problem is still present, i going to switch all Services to
    Docker... 
     
    Regards,  
    Michael 
     
     
     
     
     
    Am 06.11.2018 um 09:37 schrieb Roalt
      Zijlstra | webpower: 
     
    
      
      
        
          
            Hi
              John, 
             
             
            Yes,
              we are using PV only and we only run Debian Linux on the
              servers. We still have some DomU Jessie servers running
              with the stock kernel. We did update our Dells to the
              latest firmware so it does include more recent intel
              microcode with that. But on Debian we did not yet enable
              the intel-firmware yet, since we had so much instability
              and so much parameters that could be the culprit, we did
              not want to add another. 
            If
              your server is very busy, I think the chance to have a
              crash is higher. We have seen crashes on our active MySQL
              databases whereas the slave MySQL database server did not
              crash that quickly, however after using the slave MySQL
              database as primary database for a while (because we were
              debugging the crashed master database) it could very well
              happen that the slave would crash too. 
             
             
            We
              have done tests with downgrading firmware of Dell (which
              also means using an older intel microcode) but that did
              not help. So having the latest firmware is okay.  
            We
              are now testing a few scenarios: 
            
              
                -  one server with an older kernel (4.9.0-4-amd64),
                  with DomU 3.16 kernel, which runs for 16 days now
 
                 
                -  one server with the updated -kernel
                  (4.9.0-8-amd64), with DomU 3.16 kernel, which runs for
                  28 days now surprisingly
 
                 
                -  one server with the updated -kernel
                  (4.9.0-8-amd64), and all DomUs on the backported 4.9
                  kernel.
 
               
             
            It
              all doesn't really make much sense. We do have the
              expectation that the older kernel will keep on running and
              that the 4.9 DomUs will help to keep the servers alive.  
            We
              have tested with 4.14 and 4.16 kernels (from backports)
              but that did not make a difference in stability. 
             
             
            
              
                
                  
                    
                      
                        
                          
                            
                              
                                
                                  
                                    
                                      
                                        
                                          
                                            
                                              
                                                
                                                  
                                                    
                                                    
                                                    
                                                    
                                                      
                                                        
                                                           
                                                           | 
                                                          Barcelona
                                                          | Barneveld |
                                                          Beijing |
                                                          Chengdu |
                                                          Guangzhou 
                                                          Hamburg |
                                                          Shanghai |
                                                          Shenzhen |
                                                          Stockholm | 
                                                            | 
                                                         
                                                      
                                                     
                                                   | 
                                                 
                                              
                                             
                                            
                                           
                                         
                                       
                                     
                                   
                                 
                               
                             
                           
                         
                       
                     
                   
                 
               
             
            
           
         
       
       
      
        
        
          
            
              
                It could be as you mention... your domU are they
                  PV? I am using paravirtualization exclusively and on
                  this specific server have the following CPU: 
                 
                 
                Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz 
                 
                 
                Do you have the intel-microcode Debian package from
                  the non-free repo installed on your servers? I
                  currently don't... 
                 
                 
                 
                J. 
                 
                 
                 
               
             
           
           
          
            
            
              
                
                  
                    Hi
                      John, 
                     
                     
                    It
                      could very well be that it is also restricted to
                      some CPUs, but I am inclinded to believe that the
                      used DomU kernels can influence stability.  We did
                      have a pretty busy SSL offloader running on a 3.16
                      kernel, which might have caused the crashes.  
                     
                     
                    Just
                      for reference, we have the following two CPUs
                      causing us trouble, but I am not sure if it
                      matters. 
                    Intel(R) Xeon(R) CPU
                        E5-2640 0 @ 2.50GHz 
                     
                    Intel(R) Xeon(R) CPU
                        E5-2670 v3 @ 2.30GHz 
                       
                     
                       
                    Roalt 
                    
                   
                 
               
               
              
                
                
                  
                    
                      
                        Hi, 
                         
                         
                        Thanks for your feedback. I was wondering
                          because I have just upgraded a Debian 9 server
                          to the latest kernel with the latest Xen
                          packages from the official Debian repo. The
                          only difference is that I have an older IBM
                          server which is already ~7 years old patched
                          with the latest BIOS/UEFI and so far so good
                          no crash. The uptime is 6 days for now. Here
                          are the details about my kernel and xen
                          packages. 
                         
                         
                        ii  xen-hypervisor-4.8-amd64      
                          4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10
                          amd64        Xen Hypervisor on AMD64 
                          ii  linux-image-4.9.0-8-amd64     
                          4.9.110-3+deb9u6                        
                          amd64        Linux 4.9 for 64-bit PCs 
                         
                         
                         
                        Regards, 
                        J. 
                         
                         
                         
                       
                     
                   
                   
                  
                    
                    
                      
                        Hi John,
                          
                         
                        the problem is that I cannot provide any
                          metrics or logfiles showing an error. I can
                          only tell that dom0 is rebooting for a reason
                          that is not logged. I have no physical access
                          to the server. I got one other report about
                          this kind of issue. 
                         
                         
                        My assumption the cause are the backported
                          patches is based on the current 16 day uptime.
                          16 days ago the server rebooted every 3-5
                          days. It won’t be a useful bug report from my
                          point of view. 
                         
                         
                        The other thing is that my two servers are
                          now running upstream Xen and kernel and I
                          might not go back to both old versions in
                          Debian stretch. The other server had always
                          running upstream versions and had never a
                          problem, that’s why I updated the other, too. 
                         
                         
                         
                         
                        Best regards 
                          
                          
                          
                            
                              
                                I was wondering if any of
                                  you guys reported this
                                  bug/issue/problem back to the Debian
                                  community? For example on their
                                  bugs.debian org web site? 
                                 
                               
                              
                              
                                
                                
                                  Hi, 
                                    
                                    I had these crash
                                      problems with the Xen version in
                                      Debian stretch, too. After 3 to 7
                                      days the Xen server rebooted
                                      without log entry or something
                                      else to observe. The problems
                                      started when the first patches
                                      were applied by Debian. Some
                                      updates made it better, the last
                                      worse again. I checked hard
                                      drives, RAM and closely monitored
                                      metrics what might be the cause. 
                                     
                                     
                                    My solution after no
                                      longer suspecting a hardware
                                      fault: build upstream Xen 4.11 for
                                      Debian stretch. I am currently
                                      running this setup with my own
                                      build of kernel 4.19. The machines
                                      are now working stable again. 
                                     
                                     
                                    
                                    
                                    
                                      
                                        
                                          
                                            Hi there, 
                                          
                                             
                                           
                                          
                                            Ever since all the Meltdown
                                            and Spectre kernel updates
                                            and possibly also Xen 4.8
                                            updates, we experience
                                            crashes of the Dom0 just out
                                            of the blue. Sometimes after
                                            1 day, sometimes after a few
                                            days or even 14 days,
                                            completely random. 
                                          
                                             
                                           
                                          
                                            We have two Dell P730
                                            servers and two Dell P720
                                            servers with this behaviour.
                                            One thing is that we updated
                                            these machine to the latest
                                            available firmware, because
                                            that is the most secure way.
                                            Then we installed Debian
                                            Stretch with Xen 4.8 support 
                                          
                                             
                                           
                                          
                                            We have done serveral
                                            installs and 4 servers seem
                                            to crash pretty fast and
                                            other don't. In the end we
                                            think that we can lead it
                                            back to the xen-4.8.4-pre
                                            version being stable and the
                                            xen-4.8.5-pre being
                                            unstable. This was kinda
                                            independent of the kernel
                                            that we were using 4.14 or
                                            4.9.0-8-amd64. This is off
                                            course all Debian package
                                            numbering. 
                                          
                                             
                                           
                                          
                                            As last resort  we updated
                                            on one server all DomU
                                            kernels of our Jessie
                                            servers on this Dom0 to
                                            4.9.0 from backports instead
                                            of the 3.16 kernel. For now
                                            that seems to work, but the
                                            crashes are random so it
                                            could happen any time again.
                                            The idea is that these
                                            kernels are completely
                                            spectre& meltdown
                                            unaware and might cause
                                            trouble in Xen kernel
                                            support. I am not sure if
                                            this is true at all, but we
                                            are pretty lost what the
                                            actual cause is. 
                                          
                                             
                                           
                                          
                                            We also tested with CentOS
                                            and we also had these
                                            crashes there with certain
                                            combinations of kernel/Xen.
                                            The most recent updates seem
                                            to be more stable tough. The
                                            most frustrating part is the
                                            there is absolutely no logs
                                            to be found. No kernel oops
                                            or what.. the server just
                                            resets and boots again. 
                                          
                                             
                                           
                                          
                                            Are there others
                                            experiencing problems like
                                            this? Do you see more
                                            frequent server/kernel
                                            crashes on production
                                            servers?   
                                           
                                          
                                             
                                           
                                          
                                         
                                       
                                     
                                    
                                      
                                     
                                   
_______________________________________________ 
                                  Xen-users mailing list 
                                  Xen-users@xxxxxxxxxxxxxxxxxxxx 
                                  https://lists.xenproject.org/mailman/listinfo/xen-users 
                               
                             
                           
                         
                       
                      _______________________________________________ 
                      Xen-users mailing list 
                      Xen-users@xxxxxxxxxxxxxxxxxxxx 
                      https://lists.xenproject.org/mailman/listinfo/xen-users 
                   
                  _______________________________________________ 
                  Xen-users mailing list 
                  Xen-users@xxxxxxxxxxxxxxxxxxxx 
                  https://lists.xenproject.org/mailman/listinfo/xen-users 
               
             
           
         
       
       
      
       
      _______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users 
     
  
 |  
 _______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users 
 
    
     |