Hi
          there,
        
        
        Ever
          since all the Meltdown and Spectre kernel updates and possibly
          also Xen 4.8 updates, we experience crashes of the Dom0 just
          out of the blue. Sometimes after 1 day, sometimes after a few
          days or even 14 days, completely random.
        
        
        We
          have two Dell P730 servers and two Dell P720 servers with this
          behaviour. One thing is that we updated these machine to the
          latest available firmware, because that is the most secure
          way. Then we installed Debian Stretch with Xen 4.8 support
        
        
        We
          have done serveral installs and 4 servers seem to crash pretty
          fast and other don't. In the end we think that we can lead it
          back to the xen-4.8.4-pre version being stable and the
          xen-4.8.5-pre being unstable. This was kinda independent of
          the kernel that we were using 4.14 or 4.9.0-8-amd64. This is
          off course all Debian package numbering.
        
        
        As
          last resort  we updated on one server all DomU kernels of our
          Jessie servers on this Dom0 to 4.9.0 from backports instead of
          the 3.16 kernel. For now that seems to work, but the crashes
          are random so it could happen any time again. The idea is that
          these kernels are completely spectre& meltdown unaware and
          might cause trouble in Xen kernel support. I am not sure if
          this is true at all, but we are pretty lost what the actual
          cause is.
        
        
        We
          also tested with CentOS and we also had these crashes there
          with certain combinations of kernel/Xen. The most recent
          updates seem to be more stable tough. The most frustrating
          part is the there is absolutely no logs to be found. No kernel
          oops or what.. the server just resets and boots again.
        
        
        Are
          there others experiencing problems like this? Do you see more
          frequent server/kernel crashes on production servers?