[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] NUMA: replace phys_to_nid()


  • To: Julien Grall <julien@xxxxxxx>
  • From: Jan Beulich <jbeulich@xxxxxxxx>
  • Date: Tue, 13 Dec 2022 15:08:14 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UkRol2jtFrbk4nMGRwE3Zao6cWnE8PZ56ERcc3/2JGk=; b=mdNxB6uL0Q9DBQR6PgbaLVa51AAEVqtkWRwm1XvyeJJ9jZv+wGtdaOCkdQyzu4WLO3h3HwV0VVTxQVkXAlsxEvgaL9JKQXebYKkODsVTsmXunytPV9ArMqGaWzozdYHT2Xa71Y/4qCr26M+nOXAr/ARv4zQ7NwXqeigg+A2qrH9s1JxmeGl3ybDgz94T059FH9YJDcVJVeidSTCyRoFg+4TB8weJpkB+7pG6Q4stpZxfRxLvC3pwsL7zogr7xF1RH/evhI+nsFrenwCpEjVfhwjKYibuFslTXvBFOwvImvwhYnNtsGHB2Hkz2BfCevwbp+WP0xzjIsQcBAIiCleLSQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IoXDTplnB+a/h4Olv9XCeMZe5Ayru/arwuLxrp2KtaiNrpKmlSPEsESufcZI3RAZejWvBifAhKfzZOeXTiVl5dNZ6o7ikoicGS6eiNLhrTnrXVRmESfUYgzHfUt4ztplJe+E9P1Vu49V9AlL86LdcLxuB/ZTIbfx4Yp3sZ2PBtMPKUdF/pEog1jKt9Lu4k4FDMfXBbPLtTLoWEWjL+ygm0DmJiCk6pi4H/Rjxe2HJh309AQrx682dsRByEW1lAgOCrcUiEGfwCqin6qwpSbzSgcv94iS5h2Am1ariW5eps6zYIK8U4XaYAUtBgpYMgpZCtc0//P0lAxFZZ1pKczj+A==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Volodymyr Babchuk <volodymyr_babchuk@xxxxxxxx>
  • Delivery-date: Tue, 13 Dec 2022 14:08:27 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 13.12.2022 14:48, Julien Grall wrote:
> On 13/12/2022 12:46, Jan Beulich wrote:
>> On 13.12.2022 13:06, Julien Grall wrote:
>>> On 13/12/2022 11:38, Jan Beulich wrote:
>>>> All callers convert frame numbers (perhaps in turn derived from struct
>>>> page_info pointers) to an address, just for the function to convert it
>>>> back to a frame number (as the first step of paddr_to_pdx()). Replace
>>>> the function by mfn_to_nid() plus a page_to_nid() wrapper macro. Replace
>>>> call sites by the respectively most suitable one.
>>>>
>>>> While there also introduce a !NUMA stub, eliminating the need for Arm
>>>> (and potentially other ports) to carry one individually.
>>>>
>>>> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>>>> ---
>>>> At the top of free_heap_pages() mfn_to_nid() could also be used, since
>>>> the MFN is calculated immediately ahead. The choice of using
>>>> page_to_nid() (for now at least) was with the earlier patch's RFC in
>>>> mind, addressing of which may require to make mfn_to_nid() do weaker
>>>> checking than page_to_nid().
>>>
>>> I haven't looked in details at the previous patch. However, I don't like
>>> the idea of making mfn_to_nid() do weaker checking because this could
>>> easily confuse the reader/developper.
>>>
>>> If you want to use weaker check, then it would be better if a separate
>>> helper is provided with a name reflecting its purpose.
>>
>> Well, the purpose then still is the very same conversion, so the name
>> is quite appropriate. I don't view mfn_to_nid_bug_dont_look_very_closely()
>> (exaggerating) as very sensible a name.
> 
> I understand they are both doing the same conversion. But the checks 
> will be different. With your proposal, we are now going to say if the 
> caller is "buggy" then use mfn_to_nid() if not then you can use any.
> 
> I think this is wrong to hide the "bug" just because the name is longer. 
> In fact, it means that any non-buggy caller will still have relaxed 
> check. The risk if we are going to introduce more "buggy" caller in the 
> future.

While I, too, have taken your perspective as one possible one, I've
also been considering a slightly different perspective: page_to_nid()
implies the caller to have a struct page_info *, which in turn implies
you pass in something identifying valid memory (which hence should have
a valid node ID associated with it). mfn_to_nid(), otoh, has nothing
to pre-qualify (see patch 1's RFC remark as to mfn_valid() not being
sufficient). Hence less rigid checking there can make sense (and you'll
notice that mfn_to_nid() was also used quite sparingly in the course of
the conversion.)

> So from my perspective there are only two acceptable solutions:
>    1. Provide a different helper that will be used for just "buggy" 
> caller. This will make super clear that the helper should only be used 
> in very limited circumstances.
>    2. Fix the "buggy" callers.
> 
>  From your previous e-mails, it wasn't clear whether 2) is possible. So 
> that's leave us only with 1).

The buggy callers are the ones touched by patch 1; see (again) the RFC
remark there for limitations of that approach.

>>>> --- a/xen/common/numa.c
>>>> +++ b/xen/common/numa.c
>>>> @@ -671,15 +671,15 @@ static void cf_check dump_numa(unsigned
>>>>    
>>>>        for_each_online_node ( i )
>>>>        {
>>>> -        paddr_t pa = pfn_to_paddr(node_start_pfn(i) + 1);
>>>> +        mfn_t mfn = _mfn(node_start_pfn(i) + 1);
>>>>    
>>>>            printk("NODE%u start->%lu size->%lu free->%lu\n",
>>>>                   i, node_start_pfn(i), node_spanned_pages(i),
>>>>                   avail_node_heap_pages(i));
>>>> -        /* Sanity check phys_to_nid() */
>>>> -        if ( phys_to_nid(pa) != i )
>>>> -            printk("phys_to_nid(%"PRIpaddr") -> %d should be %u\n",
>>>> -                   pa, phys_to_nid(pa), i);
>>>> +        /* Sanity check mfn_to_nid() */
>>>> +        if ( node_spanned_pages(i) && mfn_to_nid(mfn) != i )
>>>
>>>
>>>   From the commit message, I would have expected that we would only
>>> replace phys_to_nid() with either mfn_to_nid() or page_to_nid().
>>> However, here you added node_spanned_pages(). Can you explain why?
>>
>> Oh, indeed, I meant to say a word on this but then forgot. This
>> simply is because the adding of 1 to the start PFN (which by
>> itself is imo a little funny) makes it so that the printk()
>> inside the conditional would be certain to be called for an
>> empty (e.g. CPU-only) node.
> 
> Ok. I think this wants to be a separate patch as this sounds like bug 
> and we should avoid mixing code conversion with bug fix.

Yet then this is only in a debug key handler. (Else I would have made
it a separate patch, yes.)

Jan



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.