|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH] xen/arm: Skip memory nodes if not enabled
Hi Julien,
On 27/09/2023 13:01, Julien Grall wrote:
>
>
> Hi Michal,
>
> On 26/09/2023 09:36, Michal Orzel wrote:
>> On 26/09/2023 07:33, Leo Yan wrote:
>>>
>>>
>>> During the Linux kernel booting, an error is reported by the Xen
>>> hypervisor:
>>>
>>> (XEN) arch/arm/p2m.c:2202: d0v0: Failing to acquire the MFN 0x1a02dc
>>>
>>> The kernel attempts to use an invalid memory frame number, which can be
>>> converted to: 0x1a02dc << PAGE_SHIFT, resulting in 0x1_a02d_c000.
>>>
>>> The invalid memory frame falls into a reserved memory node, which is
>>> defined in the device tree as below:
>>>
>>> reserved-memory {
>>> #address-cells = <0x02>;
>>> #size-cells = <0x02>;
>>> ranges;
>>>
>>> ...
>>>
>>> ethosn_reserved {
>>> compatible = "shared-dma-pool";
>>> reg = <0x01 0xa0000000 0x00 0x20000000>;
>>> status = "disabled";
>>> no-map;
>>> };
>>>
>>> ...
>>> };
>>>
>>> Xen excludes all reserved memory regions from the frame management
>>> through the dt_unreserved_regions() function. On the other hand, the
>>> reserved memory nodes are passed to the Linux kernel. However, the Linux
>>> kernel ignores the 'ethosn_reserved' node since its status is
>>> "disabled". This leads to the Linux kernel to allocate pages from the
>>> reserved memory range.
>>>
>>> As a result, when the kernel passes the data structure residing in the
>>> frame 0x1_a02d_c000 in the Xen hypervisor, the hypervisor detects that
>>> it misses to manage the frame and reports the error.
>>>
>>> Essentially, this issue is caused by the Xen hypervisor which misses to
>>> handle the status for the memory nodes (for both the normal memory nodes
>>> and the reserved memory nodes) and transfers them to the Linux kernel.
>>>
>>> This patch introduces a function memory_node_is_available(). If it
>>> detects a memory node is not enabled, the hypervisor will not add the
>>> memory region into the memory lists. In the end, this avoids to generate
>>> the memory nodes from the memory lists sent to the kernel and the kernel
>>> cannot use the disabled memory nodes any longer.
>>
>> Interesting. So FWICS, we have 2 issues that have a common ground:
>> 1) If the reserved memory node has a status "disabled", it implies that this
>> region
>> is no longer reserved and can be used which is not handled today by Xen and
>> leads
>> to the above mentioned problem.
>>
>> 2) If the memory node has a status "disabled" it implies that it should not
>> be used
>> which is not the case in current Xen. This means that at the moment, Xen
>> would try
>> to use such a memory region which is incorrect.
>>
>> I think the commit msg should be more generic and focus on these two issues.
>> Summary:
>> Xen does not handle the status property of memory nodes and ends up using
>> them.
>> Xen does not handle the status property of reserved memory nodes. If status
>> is disabled
>> it means the reserved region shall no longer be treated as reserved. Xen
>> passes the reserved
>> memory nodes untouched to dom0 fdt and creates a memory node to cover
>> reserved regions.
>> Disabled reserved memory nodes are ignored by the guest which leaves us with
>> the exposed
>> memory nodes. Guest can access such a region but it is excluded from the
>> page management in Xen
>> which results in Xen being unable to obtain the corresponding MFN.
>>
>>>
>>> Signed-off-by: Leo Yan <leo.yan@xxxxxxxxxx>
>>> ---
>>> xen/arch/arm/bootfdt.c | 16 ++++++++++++++++
>>> 1 file changed, 16 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/bootfdt.c b/xen/arch/arm/bootfdt.c
>>> index 2673ad17a1..b46dde06a9 100644
>>> --- a/xen/arch/arm/bootfdt.c
>>> +++ b/xen/arch/arm/bootfdt.c
>>> @@ -206,11 +206,27 @@ int __init device_tree_for_each_node(const void *fdt,
>>> int node,
>>> return 0;
>>> }
>>>
>>> +static bool __init memory_node_is_available(const void *fdt, unsigned long
>>> node)
>> This function is not strictly related to memory node so it would be better
>> to call it e.g. device_tree_node_is_available.
>
> +1.
>
>>> +{
>>> + const char *status = fdt_getprop(fdt, node, "status", NULL);
>>> +
>>> + if (!status)
>> white spaces between brackets and condition
>> if ( !status )
>>
>>> + return true;
>>> +
>>> + if (!strcmp(status, "ok") || !strcmp(status, "okay"))
>> white spaces between brackets and condition
>> if ( !strcmp(status, "ok") || !strcmp(status, "okay") )
>>
>>> + return true;
>>> +
>>> + return false;
>>> +}
>>> +
>>> static int __init process_memory_node(const void *fdt, int node,
>>> const char *name, int depth,
>>> u32 address_cells, u32 size_cells,
>>> void *data)
>>> {
>>> + if (!memory_node_is_available(fdt, node))
>>> + return 0;
>> I would move this check to device_tree_get_meminfo()
>
> I am ok with that. But the commit message would need to gain a paragraph
> explaining that we will now support "status" for static memory/heap.
>
>>> +
>>> return device_tree_get_meminfo(fdt, node, "reg", address_cells,
>>> size_cells,
>>> data, MEMBANK_DEFAULT);
>>> }
>>> --
>>> 2.39.2
>>>
>>>
>>
>> Also, I think it would be nice to add ASSERT(bootinfo.mem.nr_banks); e.g. in
>> boot_fdt_info().
>> Otherwise the panic from Xen when there is no memory bank:
>> The frametable cannot cover the physical region ...
>> is not really meaningful for normal users.
>>
>> This is just my suggestion (@Julien ?)
>
> I think a check for the number of banks makes sense. But I would prefer
> if the check also happens in production. So, something like:
>
> if ( !bootinfo.mem.nr_banks )
> panic(...);
>
> We already have one in the setup_mm() for arm32. So we need another one
> for the arm64 version. The other solution is to consolidate it in one
> place you suggested.
>
> I have a slightly preference for having it in setup_mm() even if this is
> duplicated.
Either way is fine. The advantage of placing the check in boot_fdt_info() is
that we can have a single check instead of duplicated and we do the check before
the "first" use which happens to be the banks sorting. The advantage of
setup_mm()
is that it is the one dealing with memory banks and is called after
early_print_info()
so user can see some additional info.
@Leo, will you send a patch for that? Don't feel obliged as it is not strictly
related
to your patch in which case I can handle it.
~Michal
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |