|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] mm/pdx: Add comments throughout the codebase for pdx
On Mon, Jun 19, 2023 at 05:30:20PM +0200, Jan Beulich wrote:
> > + * ma_{top,bottom}_mask is simply a shifted pfn_{top,pdx_bottom}_mask
> > where the
> > + * bottom one shifts in 1s rather than 0s.
> > + */
>
> Nit: That 2nd bottom variable is ma_va_bottom_mask.
Sure
>
> > @@ -57,9 +99,25 @@ uint64_t __init pdx_init_mask(uint64_t base_addr)
> > (uint64_t)1 << (MAX_ORDER + PAGE_SHIFT)) - 1);
> > }
> >
> > -u64 __init pdx_region_mask(u64 base, u64 len)
> > +uint64_t __init pdx_region_mask(uint64_t base, uint64_t len)
> > {
> > - return fill_mask(base ^ (base + len - 1));
> > + uint64_t last = base + len - 1;
> > + /*
> > + * The only bit that matters in base^last is the MSB. There are 2
> > cases.
> > + *
> > + * case msb(base) < msb(last):
> > + * then fill_mask(base^last) == fill_mask(last). This is non
> > + * compressible.
> > + * case msb(base) == msb(last):
> > + * This means that there _may_ be a sequence of compressible zeroes
> > + * for all addresses between `base` and `last` iff `base` has
> > enough
> > + * trailing zeroes. That is, it's compressible when
> > + * fill_mask(base^last) < fill_mask(last)
> > + *
> > + * The resulting mask is effectively the moving bits between `base` and
> > + * `last`
> > + */
> > + return fill_mask(base ^ last);
> > }
>
> I don't see a need for you to actually change the code here. You can
> as well introduce "last" as shorthand just for the comment.
I thought as you did initially and wrote it as such. In the end it felt
wrong to have an explanation in terms of a token not present in the code.
Furthermore, explaining what the shorthand is in the comment takes more
space than introducing `last` in the code itself.
```
uint64_t last = base + len - 1;
/*
* The only bit that matters in base^last is the MSB. There are 2 cases.
```
vs
```
/*
* Let `last = base + len -1` out of convenience.
* The only bit that matters in base^last is the MSB. There are 2 cases.
```
TL;DR: I didn't factor out `last` due to aesthetics (I'd rather not touch
the code in this patch, in fact), but it seems warranted in order to reduce
the impedance mismatch between this big-ish comment and the call it
describes. I'll post v2 without that adjustment in case I managed to
convince you. Otherwise feel free to adjust it on commit.
> What I dislike in your way of putting it is the use of fill_mask(last) when
> such a call never occurs in code. Starting from the first sentence,
> can't you explain things just in terms of said MSB
I see. I can refer to the MSBs instead. Works equally well.
e.g:
fill_mask(base^last) == fill_mask(last)
|
V
msb(fill_mask(base^last)) == msb(last)
> (where the two cases are "set" and "clear")?
I'm not sure I understand what you mean here.
>
> > --- a/xen/include/xen/mm.h
> > +++ b/xen/include/xen/mm.h
> > @@ -31,6 +31,22 @@
> > * (i.e. all devices assigned to) a guest share a single DMA address
> > space
> > * and, by default, Xen will ensure dfn == pfn.
> > *
> > + * pdx: Page InDeX
> > + * Indices into the frame table holding the per-page's book-keeping
> > + * metadata. A compression scheme is used and there's a non-identity
> > + * mapping between valid(mfn) <-> valid(pdx) See the comments in pdx.c
> > + * for an in-depth explanation of that mapping.
>
> The mapping may very well be (and on x86 typically is) an identity
> one. IOW you want to describe not only the compression case, but also
> the "no compression possible" one.
Point taken. I'll rephrase it slightly as "possibly non-identity" and
explicitly state the "no compression is possible" case.
>
> PDXes also aren't just indexes to the frame table, but also to the
> direct mapping.
I had something to that effect earlier on, but I removed it because it
doesn't seem to be the case on ARM. There's a directmap_base_pdx global
that states the first pdx to be mapped on the directmap.
>
> > + * maddr: Machine Address
> > + * The physical address that corresponds to an mfn
> > + *
> > + * vaddr: Xen Virtual Address
> > + * A virtual address of memory accesible to Xen. It is typically either
> > + * an address into the direct map or to Xen's own code/data. The direct
> > + * map implements several compression tricks to save memory, so an offset
> > + * into it does _not_ necessarily correspond to an maddr due to pdx
> > + * compression.
>
> We need to be careful here: If I'm not mistaken at least Arm uses vaddr
> also for guest addresses. In fact I'm not sure vaddr (and perhaps even
> maddr) need explaining here
I'd like to have at least maddr. It's sufficiently project-specific to be
otherwise confusing to find unexplained elsewhere. i.e: In other bare-metal
projects that would be a paddr instead.
vaddr might be trying too hard to boil the ocean as far as definitions go.
I can get rid of it.
> the more that nothing in this header uses either term.
True. But it should be somewhere and this is the main memory-management
header, where the frame number definitions are. In general, things that
change together ought to stay together.
> > + * ## PDX compression
> > + *
> > + * This is a technique to avoid wasting memory on machines known to have
> > + * split their machine address space in two big discontinuous and highly
> > + * disjoint chunks.
>
> Why two? There can be any number, and in fact on the system I originally
> had data from for reference (when first writing this code) iirc there
> were 8 nodes, each with a chunk of memory far away from the other chunks.
> The compression scheme used merely requires that some "inner" bits are
> unused (always zero) in all of those ranges.
Well, our implementation only supports two and I didn't see any obvious
hints about intending to increasing that number. I see where you're coming
from, though. I can make it more general so it's not outdated if the
pfn_to_pdx()/pdx_to_pfn() pair ever increases in scope to do several holes.
Out of curiosity (and for posterity's sake), what was/is that system?
>
> > + * In its uncompressed form the frame table must have book-keeping metadata
> > + * structures for every page between [0, max_mfn) (whether they exist or
>
> s/they exist/there is RAM/ ?
They exist is ambiguous, true. Rewrote it as "they are backed by RAM"
>
> > + * not), and a similar condition exists for the direct map. We know some
> > + * architectures, however, that have some sparsity in their address space,
> > + * leading to a lot of wastage in the form of unused frame table entries.
>
> Hmm, "architectures" suggests e.g. Arm might have such, but x86 won't.
> Perhaps "systems", "designs", or "system designs"?
I like `systems` better. Sure.
>
> > @@ -13,22 +69,77 @@ extern unsigned long pfn_top_mask, ma_top_mask;
> > (sizeof(*frame_table) & -sizeof(*frame_table)))
> > extern unsigned long pdx_group_valid[];
> >
> > -extern uint64_t pdx_init_mask(u64 base_addr);
> > -extern u64 pdx_region_mask(u64 base, u64 len);
> > +/**
> > + * Calculates a mask covering "moving" bits of all addresses of a region
> > + *
> > + * e.g:
> > + * base=0x1B00000000
> > + * len+base=0x1B0008200;
> > + *
> > + * ought to return 0x00000FFFFF;
> > + *
> > + * @param base Base address of the region
> > + * @param len Size in octets of the region
> > + * @return Mask of moving bits at the bottom of all the region addresses
> > + */
>
> This looks to be a copy-and-paste of pdx_region_mask()'s comment, when
> the function has neither a "base" parameter, nor a and one at all.
Oops. A victim of incompatible rebases. I extracted these comments from an
ongoing patch series I'm working on. I'll (try to) write an actual comment
for it.
>
> > +uint64_t pdx_init_mask(u64 base_addr);
>
> No u64 -> uint64_t here?
>
> > -extern void set_pdx_range(unsigned long smfn, unsigned long emfn);
> > +/**
> > + * Calculates a mask covering "moving" bits of all addresses of a region
> > + *
> > + * e.g:
> > + * base=0x1B00000000
> > + * len+base=0x1B0008200;
> > + *
> > + * ought to return 0x00000FFFFF;
>
> I think it would help if you actually said how the return value actually
> derives. The term "moving" may be understood differently be different
> people, and hence such an explanation actually would also clarify what
> "moving" means.
Hmmm, I'd rather not explicitly state the XOR here though. I'm adding a
couple more lines explaining things in terms of the i-th bit of the mask
and all the region addresses. Ideally this comment ought to explain the
intuition, while the comment in pdx.c explains the implementation.
>
> I also thing there's a 0 missing in the len+base value, without which
> the result would be quite a bit different.
Indeed.
> > +/**
> > + * Mark range between smfn and emfn is allocatable in the frame table
> > + *
> > + * @param smfn Start mfn
> > + * @param emfn End mfn
> > + */
> > +void set_pdx_range(unsigned long smfn, unsigned long emfn);
>
> This could do with mathematically expressing the range in the comment,
> such that (in|ex)clusiveness of, in particular, emfn is clarified.
Good point. Sure.
> > +/**
> > + * Invoked to determine if an mfn maps to a legal pdx
>
> I wouldn't use "pdx" here, but refer to frame_table[] instead.
I can rewrite it as something along those lines, sure.
>
> > + * In order for it to be legal it must pass bounds, grouping and
> > + * compression sanity checks.
> > + *
> > + * @param smfn Start mfn
> > + * @param emfn End mfn
> > + * @return True iff all checks pass
> > + */
> > bool __mfn_valid(unsigned long mfn);
>
> Comment again mentions inapplicable parameters.
Ack.
>
> > @@ -38,7 +149,16 @@ static inline unsigned long pdx_to_pfn(unsigned long
> > pdx)
> > #define mfn_to_pdx(mfn) pfn_to_pdx(mfn_x(mfn))
> > #define pdx_to_mfn(pdx) _mfn(pdx_to_pfn(pdx))
> >
> > -extern void pfn_pdx_hole_setup(unsigned long);
> > +/**
> > + * Initializes global variables with information about the compressible
> > + * range of the current memory regions.
> > + *
> > + * @param mask This mask is the biggest pdx_mask of every region in the
> > + * system ORed with all base addresses of every region in the
> > + * system. The result is a mask where every sequence of zeroes
> > + * surrounded by ones is compressible.
> > + */
> > +void pfn_pdx_hole_setup(unsigned long mask);
>
> With the function returning void, I find "The result" problematic. How about
> "This results in ..."?
Sounds better. Sure.
>
> Btw, "surrounded by ones" isn't really necessary. We could compress shorter
> sequences of zeros, so this may want re-wording a little to be as precise
> as possible.
>
> Jan
Fair. I'll tweak the definition.
Alejandro
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |