[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/pod: Do not fragment PoD memory allocations
On 27/01/2021 20:12, Elliott Mitchell wrote: > On Wed, Jan 27, 2021 at 10:47:19AM +0100, Jan Beulich wrote: >> On 26.01.2021 18:51, Elliott Mitchell wrote: >>> Okay, this has been reliably reproducing for a while. I had originally >>> thought it was a problem of HVM plus memory != maxmem, but the >>> non-immediate restart disagrees with that assessment. >> I guess it's not really clear what you mean with this, but anyway: >> The important aspect here that I'm concerned about is what the >> manifestations of the issue are. I'm still hoping that you would >> provide such information, so we can then start thinking about how >> to solve these. If, of course, there is anything worse than the >> expected effects which use of PoD can have on the guest itself. > Manifestation is domain 0 and/or Xen panic a few seconds after the > domain.cfg file is loaded via `xl`. Everything on the host is lost and > the host restarts. Any VMs which were present are lost and need to > restart, similar to power loss without UPS. > > Upon pressing return for `xl create domain.cfg` there is a short period > of apparently normal behavior in domain 0. After this there is a short > period of very laggy behavior in domain 0. Finally domain 0 goes > unresponsive and so far by the time I've gotten to the host's console it > has already started to reboot. > > The periods of apparently normal and laggy behavior are perhaps 5-10 > seconds each. > > The configurations I've reproduced with have had maxmem substantially > larger than the total host memory (this is intended as a prototype of a > future larger VM). The first recorded observation of this was with > Debian's build of Xen 4.8, though I recall running into it with Xen 4.4 > too. > > Part of the problem might also be attributeable to QEMU touching all > memory on start (thus causing PoD to try to populate *all* memory) or > OVMF. So. What *should* happen is that if QEMU/OVMF dirties more memory than exists in the PoD cache, the domain gets terminated. Irrespective, Xen/dom0 dying isn't an expected consequence of any normal action like this. Do you have a serial log of the crash? If not, can you set up a crash kernel environment to capture the logs, or alternatively reproduce the issue on a different box which does have serial? Whatever the underlying bug is, avoiding 2M degrading to 4K allocations isn't a real fix, and is at best, sidestepping the problem. ~Andrew
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |