[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] xen/page_alloc: Simplify domain_adjust_tot_pages



On Wed Mar 5, 2025 at 1:39 PM GMT, Jan Beulich wrote:
> On 05.03.2025 14:22, Alejandro Vallejo wrote:
> > On Wed Mar 5, 2025 at 10:49 AM GMT, Jan Beulich wrote:
> >> On 27.02.2025 15:36, Alejandro Vallejo wrote:
> >>> On Wed Feb 26, 2025 at 2:05 PM GMT, Jan Beulich wrote:
> >>>> On 24.02.2025 15:49, Alejandro Vallejo wrote:
> >>>>> Open question to whoever reviews this...
> >>>>>
> >>>>> On Mon Feb 24, 2025 at 1:27 PM GMT, Alejandro Vallejo wrote:
> >>>>>>      spin_lock(&heap_lock);
> >>>>>> -    /* adjust domain outstanding pages; may not go negative */
> >>>>>> -    dom_before = d->outstanding_pages;
> >>>>>> -    dom_after = dom_before - pages;
> >>>>>> -    BUG_ON(dom_before < 0);
> >>>>>> -    dom_claimed = dom_after < 0 ? 0 : dom_after;
> >>>>>> -    d->outstanding_pages = dom_claimed;
> >>>>>> -    /* flag accounting bug if system outstanding_claims would go 
> >>>>>> negative */
> >>>>>> -    sys_before = outstanding_claims;
> >>>>>> -    sys_after = sys_before - (dom_before - dom_claimed);
> >>>>>> -    BUG_ON(sys_after < 0);
> >>>>>> -    outstanding_claims = sys_after;
> >>>>>> +    BUG_ON(outstanding_claims < d->outstanding_pages);
> >>>>>> +    if ( pages > 0 && d->outstanding_pages < pages )
> >>>>>> +    {
> >>>>>> +        /* `pages` exceeds the domain's outstanding count. Zero it 
> >>>>>> out. */
> >>>>>> +        outstanding_claims -= d->outstanding_pages;
> >>>>>> +        d->outstanding_pages = 0;
> >>>>>
> >>>>> While this matches the previous behaviour, do we _really_ want it? It's 
> >>>>> weird,
> >>>>> quirky, and it hard to extend to NUMA-aware claims (which is something 
> >>>>> in
> >>>>> midway through).
> >>>>>
> >>>>> Wouldn't it make sense to fail the allocation (earlier) if the claim 
> >>>>> has run
> >>>>> out? Do we even expect this to ever happen this late in the allocation 
> >>>>> call
> >>>>> chain?
> >>>>
> >>>> This goes back to what a "claim" means. Even without any claim, a domain 
> >>>> may
> >>>> allocate memory. So a claim having run out doesn't imply allocation has 
> >>>> to
> >>>> fail.
> >>>
> >>> Hmmm... but that violates the purpose of the claim infra as far as I 
> >>> understand
> >>> it. If a domain may overallocate by (e.g) ballooning in memory it can 
> >>> distort the
> >>> ability of another domain to start up, even if it succeeded in its own 
> >>> claim.
> >>
> >> Why would that be? As long as we hold back enough memory to cover the 
> >> claim, it
> >> shouldn't matter what kind of allocation we want to process. I'd say that 
> >> a PV
> >> guest starting ballooned ought to be able to deflate its balloon as far as
> >> there was a claim established for it up front.
> > 
> > The fact a domain allocated something does not mean it had it claimed 
> > before. A
> > claim is a restriction to others' ability to allocate/claim, not to the 
> > domain
> > that made the claim.
>
> Yes and no. No in so far as the target's "ability to allocate" may or may not
> be meant to extend beyond domain creation.
>
> > e.g:
> > 
> >   0. host is idle. No domU.
> >   1. dom1 is created with a claim to 10% of host memory and max_mem of 80% 
> > of
> >      host meomory.
> >   2. dom1 balloons in 70% of host memory.
> >   3. dom1 ballons out 70% of host memory.
> >   4. dom1 now holds a a claim to 80% of host memory.
>
> Sure, this shouldn't be the result. Yet that may merely be an effect of claim
> accounting being insufficient.
>
> > It's all quite perverse. Fortunately, looking at adjacent claims-related 
> > code
> > xl seems to default to making a claim prior to populating the physmap and
> > cancelling the claim at the end of the meminit() hook so this is never a 
> > real
> > problem.
> > 
> > This tells me that the logic intent is to force early failure of
> > populate_physmap and nothing else. It's never active by the time ballooning 
> > or
> > memory exchange matter at all.
>
> Ah yes, this I find more convincing. (Oddly enough this is all x86-only code.)

(about claims being an x86-ism in toolstack). Yeah, that's weird. It's should
be moved in time to a common area of xg. I guess no other arch has cared about
bootstorms or massive VMs just yet.

>
> > Xen ought to cancel the claim by itself though, toolstack should not NEED 
> > to do
> > it.
>
> Fundamentally yes. Except that the toolstack can do it earlier than Xen could.
>
> Jan

Yes, in the sense that it can do it just after it's done populating the
physmap, but Xen should still zero it at the end in case toolstack hasn't done
so. Nothing good can come out of that counter going up and down in such a
dubious fashion.

Cheers,
Alejandro



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.