[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] x86/pod: Do not fragment PoD memory allocations
On Tue, Jan 26, 2021 at 12:08:15PM +0100, Jan Beulich wrote: > On 25.01.2021 18:46, Elliott Mitchell wrote: > > On Mon, Jan 25, 2021 at 10:56:25AM +0100, Jan Beulich wrote: > >> On 24.01.2021 05:47, Elliott Mitchell wrote: > >>> > >>> --- > >>> Changes in v2: > >>> - Include the obvious removal of the goto target. Always realize you're > >>> at the wrong place when you press "send". > >> > >> Please could you also label the submission then accordingly? I > >> got puzzled by two identically titled messages side by side, > >> until I noticed the difference. > > > > Sorry about that. Would you have preferred a third message mentioning > > this mistake? > > No. But I'd have expected v2 to have its subject start with > "[PATCH v2] ...", making it relatively clear that one might > save looking at the one labeled just "[PATCH] ...". Yes, in fact I spotted this just after. I was in a situation of, "does this deserve sending an additional message out?" (ugh, yet more damage from that issue...) > >>> I'm not including a separate cover message since this is a single hunk. > >>> This really needs some checking in `xl`. If one has a domain which > >>> sometimes gets started on different hosts and is sometimes modified with > >>> slightly differing settings, one can run into trouble. > >>> > >>> In this case most of the time the particular domain is most often used > >>> PV/PVH, but every so often is used as a template for HVM. Starting it > >>> HVM will trigger PoD mode. If it is started on a machine with less > >>> memory than others, PoD may well exhaust all memory and then trigger a > >>> panic. > >>> > >>> `xl` should likely fail HVM domain creation when the maximum memory > >>> exceeds available memory (never mind total memory). > >> > >> I don't think so, no - it's the purpose of PoD to allow starting > >> a guest despite there not being enough memory available to > >> satisfy its "max", as such guests are expected to balloon down > >> immediately, rather than triggering an oom condition. > > > > Even Qemu/OVMF is expected to handle ballooning for a *HVM* domain? > > No idea how qemu comes into play here. Any preboot environment > aware of possibly running under Xen of course is expected to > tolerate running with maxmem > memory (or clearly document its > inability, in which case it may not be suitable for certain > use cases). For example, I don't see why a preboot environment > would need to touch all of the memory in a VM, except maybe > for the purpose of zeroing it (which PoD can deal with fine). I'm reading that as your answer to the above question is "yes". > >>> For example try a domain with the following settings: > >>> > >>> memory = 8192 > >>> maxmem = 2147483648 > >>> > >>> If type is PV or PVH, it will likely boot successfully. Change type to > >>> HVM and unless your hardware budget is impressive, Xen will soon panic. > >> > >> Xen will panic? That would need fixing if so. Also I'd consider > >> an excessively high maxmem (compared to memory) a configuration > >> error. According to my experiments long, long ago I seem to > >> recall that a factor beyond 32 is almost never going to lead to > >> anything good, irrespective of guest type. (But as said, badness > >> here should be restricted to the guest; Xen itself should limp > >> on fine.) > > > > I'll confess I haven't confirmed the panic is in Xen itself. Problem is > > when this gets triggered, by the time the situation is clear and I can > > get to the console the computer is already restarting, thus no error > > message has been observed. > > If the panic isn't in Xen itself, why would the computer be > restarting? I was thinking there was a possibility it is actually Domain 0 which is panicing. > > This is most certainly a configuration error. Problem is this is a very > > small delta between a perfectly valid configuration and the one which > > reliably triggers a panic. > > > > The memory:maxmem ratio isn't the problem. My example had a maxmem of > > 2147483648 since that is enough to exceed the memory of sub-$100K > > computers. The crucial features are maxmem >= machine memory, > > memory < free memory (thus potentially bootable PV/PVH) and type = "hvm". > > > > When was the last time you tried running a Xen machine with near zero > > free memory? Perhaps in the past Xen kept the promise of never panicing > > on memory exhaustion, but this feels like this hasn't held for some time. > > Such bugs needs fixing. Which first of all requires properly > pointing them out. A PoD guest misbehaving when there's not > enough memory to fill its pages (i.e. before it manages to > balloon down) is expected behavior. If you can't guarantee the > guest ballooning down quickly enough, don't configure it to > use PoD. A PoD guest causing a Xen crash, otoh, is very likely > even a security issue. Which needs to be treated as such: It > needs fixing, not avoiding by "curing" one of perhaps many > possible sources. Okay, this has been reliably reproducing for a while. I had originally thought it was a problem of HVM plus memory != maxmem, but the non-immediate restart disagrees with that assessment. > As an aside - if the PoD code had proper 1Gb page support, > would you then propose to only allocate in 1Gb chunks? And if > there was a 512Gb page feature in hardware, in 512Gb chunks > (leaving aside the fact that scanning 512Gb of memory to be > all zero would simply take too long with today's computers)? That answer would vary over time. Today or tommorrow, certainly not. In a decade's time (or several) when a typical pocket computer^W^W cellphone has 4TB of memory and a $30K server has a minimum of 128TB then doing allocations in 1GB chunks would be worthy of consideration. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@xxxxxxx PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |