|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] RE: [PATCH v5 2/6] xen/x86: move generically usable NUMA code from x86 to common
Hi Jan,
> -----Original Message-----
> From: Jan Beulich <jbeulich@xxxxxxxx>
> Sent: 2022年9月29日 20:14
> To: Wei Chen <Wei.Chen@xxxxxxx>
> Cc: nd <nd@xxxxxxx>; Andrew Cooper <andrew.cooper3@xxxxxxxxxx>; Roger Pau
> Monné <roger.pau@xxxxxxxxxx>; Wei Liu <wl@xxxxxxx>; George Dunlap
> <george.dunlap@xxxxxxxxxx>; Julien Grall <julien@xxxxxxx>; Stefano
> Stabellini <sstabellini@xxxxxxxxxx>; xen-devel@xxxxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v5 2/6] xen/x86: move generically usable NUMA code
> from x86 to common
>
> On 29.09.2022 09:43, Wei Chen wrote:
> > On 2022/9/27 16:19, Jan Beulich wrote:
> >> On 20.09.2022 11:12, Wei Chen wrote:
> >>> + nodes_used++;
> >>> + if ( epdx > memtop )
> >>> + memtop = epdx;
> >>> + }
> >>> +
> >>> + if ( nodes_used <= 1 )
> >>> + i = BITS_PER_LONG - 1;
> >>
> >> Is this actually going to be correct for all architectures? Aiui
> >> Arm64 has only up to 48 physical address bits, but what about an
> >> architecture allowing the use of all 64 bits? I think at the very
> >> least we want BUILD_BUG_ON(PADDR_BITS >= BITS_PER_LONG) here.
> >>
> >
> > Ok I will add above BUILD_BUG_ON. And I also have question why can't
> > we use PADDR_BITS here directly?
>
> Well, if you used PADDR_BITS, then you would use it without subtracting
> 1, and you'd be in trouble again when PADDR_BITS == BITS_PER_LONG. What
> may be possible to do instead of BUILD_BUG_ON() is
>
> if ( nodes_used <= 1 )
> i = min(PADDR_BITS, BITS_PER_LONG - 1);
>
This one seems better, I will follow it.
> >>> + else
> >>> + i = find_first_bit(&bitfield, sizeof(unsigned long) * 8);
> >>> +
> >>> + memnodemapsize = (memtop >> i) + 1;
> >>
> >> Again perhaps the subject of a separate patch: Isn't there an off-by-1
> >> mistake here? memtop is the maximum of all epdx-es, which are
> >> calculated to be the first PDX following the region. Hence I'd expect
> >>
> >> memnodemapsize = ((memtop - 1) >> i) + 1;
> >>
> >> here. I guess I'll make patches for both issues, which you may then
> >> need to re-base over.
> >>
> >
> > Thanks, I will wait your patches.
>
> Already sent out yesterday.
>
Ok.
> >>> +static void cf_check dump_numa(unsigned char key)
> >>> +{
> >>> + s_time_t now = NOW();
> >>> + unsigned int i, j, n;
> >>> + struct domain *d;
> >>> + const struct page_info *page;
> >>> + unsigned int page_num_node[MAX_NUMNODES];
> >>> + const struct vnuma_info *vnuma;
> >>> +
> >>> + printk("'%c' pressed -> dumping numa info (now = %"PRI_stime")\n",
> key,
> >>> + now);
> >>> +
> >>> + for_each_online_node ( i )
> >>> + {
> >>> + paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
> >>> +
> >>> + printk("NODE%u start->%lu size->%lu free->%lu\n",
> >>> + i, node_start_pfn(i), node_spanned_pages(i),
> >>> + avail_node_heap_pages(i));
> >>> + /* Sanity check phys_to_nid() */
> >>> + if ( phys_to_nid(pa) != i )
> >>> + printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
> >>> + pa, phys_to_nid(pa), i);
> >>> + }
> >>> +
> >>> + j = cpumask_first(&cpu_online_map);
> >>> + n = 0;
> >>> + for_each_online_cpu ( i )
> >>> + {
> >>> + if ( i != j + n || cpu_to_node[j] != cpu_to_node[i] )
> >>> + {
> >>> + if ( n > 1 )
> >>> + printk("CPU%u...%u -> NODE%d\n", j, j + n - 1,
> cpu_to_node[j]);
> >>> + else
> >>> + printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
> >>> + j = i;
> >>> + n = 1;
> >>> + }
> >>> + else
> >>> + ++n;
> >>> + }
> >>> + if ( n > 1 )
> >>> + printk("CPU%u...%u -> NODE%d\n", j, j + n - 1,
> cpu_to_node[j]);
> >>> + else
> >>> + printk("CPU%u -> NODE%d\n", j, cpu_to_node[j]);
> >>> +
> >>> + rcu_read_lock(&domlist_read_lock);
> >>> +
> >>> + printk("Memory location of each domain:\n");
> >>> + for_each_domain ( d )
> >>> + {
> >>> + process_pending_softirqs();
> >>> +
> >>> + printk("Domain %u (total: %u):\n", d->domain_id,
> domain_tot_pages(d));
> >>> +
> >>> + for_each_online_node ( i )
> >>> + page_num_node[i] = 0;
> >>
> >> I'd be inclined to suggest to use memset() here, but I won't insist
> >> on you doing this "on the fly". Along with this would likely go the
> >> request to limit the scope of page_num_node[] (and then perhaps also
> >> vnuma and page).
> >>
> >
> > memset for page_num_node makes sense, I will do it before
> > for_each_domain ( d ).
>
> That won't be right - array elements need clearing on every iteration.
> Plus ...
>
Oh, Yes.
> > About limit the scope, did you mean, we should move:
> >
> > "const struct page_info *page;
> > unsigned int page_num_node[MAX_NUMNODES];
> > const struct vnuma_info *vnuma;"
> >
> > to the block of for_each_domain ( d )?
>
> ... this limiting of scope (yes to your question) would also conflict
> with the movement you suggest. It is actually (among other things)
> such a mistaken movement which the more narrow scope is intended to
> prevent.
>
Thanks,
Wei Chen
> Jan
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |