[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Xen-users] Dom0 crashes without logging lately on Debian Stretch with Xen 4.8
Hi Michael,
I am not sure about the status of Ubuntu and Xen. My advise would be to downgrade you Xen version to the previous version and see if that is more stable. For Debian that worked, it is less secure, but crashing servers is not what you want. Maybe that an updated Xen will have stability fixes.
| Barcelona | Barneveld | Beijing | Chengdu | Guangzhou Hamburg | Shanghai | Shenzhen | Stockholm | |
|
Hello,
i had the same Issues.
In my case i tried
Ubuntu 18.04 with xen 4.9 and the Kernel Version 4.15.9 was the only
one wo has start up the DomU.
Tested on AMD Ryzen 1800X and Intel 8700.
In my case i got random system freezes Uptimes between 7 and 30
Days.
Older and never Kernels wont run.
This Problem is still present, i going to switch all Services to
Docker...
Regards,
Michael
Am 06.11.2018 um 09:37 schrieb Roalt
Zijlstra | webpower:
Hi
John,
Yes,
we are using PV only and we only run Debian Linux on the
servers. We still have some DomU Jessie servers running
with the stock kernel. We did update our Dells to the
latest firmware so it does include more recent intel
microcode with that. But on Debian we did not yet enable
the intel-firmware yet, since we had so much instability
and so much parameters that could be the culprit, we did
not want to add another.
If
your server is very busy, I think the chance to have a
crash is higher. We have seen crashes on our active MySQL
databases whereas the slave MySQL database server did not
crash that quickly, however after using the slave MySQL
database as primary database for a while (because we were
debugging the crashed master database) it could very well
happen that the slave would crash too.
We
have done tests with downgrading firmware of Dell (which
also means using an older intel microcode) but that did
not help. So having the latest firmware is okay.
We
are now testing a few scenarios:
- one server with an older kernel (4.9.0-4-amd64),
with DomU 3.16 kernel, which runs for 16 days now
- one server with the updated -kernel
(4.9.0-8-amd64), with DomU 3.16 kernel, which runs for
28 days now surprisingly
- one server with the updated -kernel
(4.9.0-8-amd64), and all DomUs on the backported 4.9
kernel.
It
all doesn't really make much sense. We do have the
expectation that the older kernel will keep on running and
that the 4.9 DomUs will help to keep the servers alive.
We
have tested with 4.14 and 4.16 kernels (from backports)
but that did not make a difference in stability.
|
Barcelona
| Barneveld |
Beijing |
Chengdu |
Guangzhou
Hamburg |
Shanghai |
Shenzhen |
Stockholm |
|
|
It could be as you mention... your domU are they
PV? I am using paravirtualization exclusively and on
this specific server have the following CPU:
Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
Do you have the intel-microcode Debian package from
the non-free repo installed on your servers? I
currently don't...
J.
Hi
John,
It
could very well be that it is also restricted to
some CPUs, but I am inclinded to believe that the
used DomU kernels can influence stability. We did
have a pretty busy SSL offloader running on a 3.16
kernel, which might have caused the crashes.
Just
for reference, we have the following two CPUs
causing us trouble, but I am not sure if it
matters.
Intel(R) Xeon(R) CPU
E5-2640 0 @ 2.50GHz
Intel(R) Xeon(R) CPU
E5-2670 v3 @ 2.30GHz
Roalt
Hi,
Thanks for your feedback. I was wondering
because I have just upgraded a Debian 9 server
to the latest kernel with the latest Xen
packages from the official Debian repo. The
only difference is that I have an older IBM
server which is already ~7 years old patched
with the latest BIOS/UEFI and so far so good
no crash. The uptime is 6 days for now. Here
are the details about my kernel and xen
packages.
ii xen-hypervisor-4.8-amd64
4.8.4+xsa273+shim4.10.1+xsa273-1+deb9u10
amd64 Xen Hypervisor on AMD64
ii linux-image-4.9.0-8-amd64
4.9.110-3+deb9u6
amd64 Linux 4.9 for 64-bit PCs
Regards,
J.
Hi John,
the problem is that I cannot provide any
metrics or logfiles showing an error. I can
only tell that dom0 is rebooting for a reason
that is not logged. I have no physical access
to the server. I got one other report about
this kind of issue.
My assumption the cause are the backported
patches is based on the current 16 day uptime.
16 days ago the server rebooted every 3-5
days. It won’t be a useful bug report from my
point of view.
The other thing is that my two servers are
now running upstream Xen and kernel and I
might not go back to both old versions in
Debian stretch. The other server had always
running upstream versions and had never a
problem, that’s why I updated the other, too.
Best regards
I was wondering if any of
you guys reported this
bug/issue/problem back to the Debian
community? For example on their
bugs.debian org web site?
Hi,
I had these crash
problems with the Xen version in
Debian stretch, too. After 3 to 7
days the Xen server rebooted
without log entry or something
else to observe. The problems
started when the first patches
were applied by Debian. Some
updates made it better, the last
worse again. I checked hard
drives, RAM and closely monitored
metrics what might be the cause.
My solution after no
longer suspecting a hardware
fault: build upstream Xen 4.11 for
Debian stretch. I am currently
running this setup with my own
build of kernel 4.19. The machines
are now working stable again.
Hi there,
Ever since all the Meltdown
and Spectre kernel updates
and possibly also Xen 4.8
updates, we experience
crashes of the Dom0 just out
of the blue. Sometimes after
1 day, sometimes after a few
days or even 14 days,
completely random.
We have two Dell P730
servers and two Dell P720
servers with this behaviour.
One thing is that we updated
these machine to the latest
available firmware, because
that is the most secure way.
Then we installed Debian
Stretch with Xen 4.8 support
We have done serveral
installs and 4 servers seem
to crash pretty fast and
other don't. In the end we
think that we can lead it
back to the xen-4.8.4-pre
version being stable and the
xen-4.8.5-pre being
unstable. This was kinda
independent of the kernel
that we were using 4.14 or
4.9.0-8-amd64. This is off
course all Debian package
numbering.
As last resort we updated
on one server all DomU
kernels of our Jessie
servers on this Dom0 to
4.9.0 from backports instead
of the 3.16 kernel. For now
that seems to work, but the
crashes are random so it
could happen any time again.
The idea is that these
kernels are completely
spectre& meltdown
unaware and might cause
trouble in Xen kernel
support. I am not sure if
this is true at all, but we
are pretty lost what the
actual cause is.
We also tested with CentOS
and we also had these
crashes there with certain
combinations of kernel/Xen.
The most recent updates seem
to be more stable tough. The
most frustrating part is the
there is absolutely no logs
to be found. No kernel oops
or what.. the server just
resets and boots again.
Are there others
experiencing problems like
this? Do you see more
frequent server/kernel
crashes on production
servers?
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
_______________________________________________
Xen-users mailing list
Xen-users@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-users
|