[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: CPU oversubscription =?=> spontaneous reboots
On Wed, Dec 04, 2024 at 10:21:38PM +0000, Mike wrote: > > The second domU is the sole worker node for the cluster. The command that I > ran in it that triggered the reboot was `kubectl delete -f` of a Deployment > that was already running from an `apply`. Okay, do you have a full list of what this command does? Might it cause a crucial Xen domain to panic (domain 0) and this in turn cause Xen to panic? On Thu, Dec 05, 2024 at 03:15:10PM +0000, Mike wrote: > Joost Roeleveld wrote: > > How do you overcommit memory? > > I don't. I took the memory from the other domU. How much free memory does Xen have? Try running `xl info`, what does the "free_memory" line say? Might be 0 if Xen is ballooning memory from domain 0 to handle allocations. If ballooning memory from domain 0 has been disabled this should stay above 50 so Xen can allocate memory to handle activity. On Thu, Dec 05, 2024 at 03:23:22PM +0000, Mike wrote: > Paul Leiber wrote: > > Could it be possible that it's not the activity on the DomU that is > > triggering the reboot, but rather network activity between two DomUs? > > Sure, that's possible. The domUs are a k8s control and worker node, > respectively, so they need to communicate with each other when I issue the > `kubectl delete` that trigger it. > > But I resolved the issue (for now) by increasing the control node's > admittedly tight memory. So that doesn't point to a network issue in my > mind. Is either of these also domain 0? Domain 0 exhausting its free memory and panicing might cause the issue you're describing. > > What CPU architecture is your system based on? > > amd64 > Intel Core i9-14900T Apparently there is a major issue with 14900K processors. I've been reading mentions of other Intel 13xxx and 14xxx chips reputedly having failures at a lower rates. Right now there could still be configuration issues, but I would keep an eye out for hardware failure. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@xxxxxxx PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |