|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: RFC: arm64: Handling reserved memory nodes
Hello,
On 20/09/2023 12:03, Leo Yan wrote:
>
>
> Hi Julien,
>
> On Mon, Sep 18, 2023 at 08:26:21PM +0100, Julien Grall wrote:
>
> [...]
>
>> ... from my understanding reserved-memory are just normal memory that are
>> set aside for a specific purpose. So Xen has to create a 'memory' node *and*
>> a 'reserved-memory' region.
>
> To be clear, Xen passes the 'reserved-memory' regions as normal memory
> nodes, see [1].
>
>> With that the kernel is supposed to exclude all the 'reserved-memory' from
>> normal usage unless they have the node contains the property 'reusable'.
>> This was more clear before the binding was converted to YAML in [1].
>
> Linux kernel reserves pages for memory ranges in the 'reserved-memory'
> node, no matter the 'no-map' property for a range is set or not (see the
> function memmap_init_reserved_pages() -> __SetPageReserved() in Linux
> kernel).
>
> If a reserved memory range is set with 'no-map' property, the memory
> region will be not mapped in the kernel's identical address space. This
> avoids the data corruption caused between the memory speculative fetch
> with cachable mapping and the same memory region is used by devices
> (e.g. for DMA transferring).
>
> [...]
>
>>> Here the problem is these reserved memory regions are passed as normal
>>> memory nodes to Dom0 kernel, then Dom0 kernel allocates pages from
>>> these reserved memory regions. Apparently, this might lead to conflict,
>>> e.g. the reserved memory is used by Dom0 kernel, at the meantime the
>>> memory is used by another purpose (e.g. by MCU in the system).
>>
>> See above. I think this is correct to pass both 'memory' and
>> 'reserved-memory'. Now, it is possible that Xen may not create the
>> device-tree correctly.
>
> Agreed that now Xen wrongly create DT binding for 'reserved-memory'
> node, more specific, the reserved memory nodes are wrongly passed as
> normal memory nodes (again, see [1]).
>
>> I would suggest to look how Linux is populating the memory and whether it
>> actually skipped the regions.
>
> The Linux kernel reserves the corresponding pages for all reserved
> memory regions, which means the kernel page management (buddy
> alrogithm) doesn't allocate these pages at all.
>
> With 'no-map' property, the memory range will not be mapped into the
> kernel identical address space.
>
>>> Here I am a bit confused for "Xen doesn't have the capability to know
>>> the memory attribute". I looked into the file arch/arm/guest_walk.c,
>>> IIUC, it walks through the stage 1's page tables for the virtual
>>> machine and get the permission for the mapping, we also can get to
>>> know the mapping attribute, right?
>>
>> Most of the time, Xen will use the HW to translate the guest virtual address
>> to an intermediation physical address. Looking at the specification, it
>> looks like that PAR_EL1 will contain the memory attribute which I didn't
>> know.
>>
>> We would then need to read MAIR_EL1 to find the attribute and also the
>> memory attribute in the stage-2 to figure out the final memory attribute.
>
>> This is feasible but the Xen ABI mandates that region passed to Xen have a
>> specific memory attributes (see the comment at the top of
>> xen/include/public/arch-arm.h).
>
> If you refer to the comment "All memory which is shared with other
> entities in the system ... which is mapped as Normal Inner Write-Back
> Outer Write-Back Inner-Shareable", I don't think it's relevant with
> current issue. I will explain in details in below.
>
>> Anyway, in your case, Linux is using the buffer is on the stack. So the
>> region must have been mapped with the proper attribute.
>
> I think you may misunderstand the issue. I would like to divide the
> issue into two parts:
>
> - The first question is about how to pass reserved memory node from Xen
> hypervisor to Dom0 Linux kernel. Currently, Xen hypervisor coverts
> the reserved memory ranges and add them into the normal memory node.
>
> Xen hypervisor should keep the reserved memory node and pass it to
> Dom0 Linux kernel. With this change, the Dom0 kernel will only
> allocate pages from normal memory node and the data in these pages
> can be shared by Xen hypervisor and Dom0 Linux kernel.
>
> - The second question is for memory attribute for the reserved memory
> node. Note, the reserved memory ranges are not necessarily _shared_
> between the Xen hypervisor and Dom0 Linux kernel. I think in most
> cases, the reserved memory will be ioremaped by drivers (for stage-1);
> and the Xen hypervisor should map P2M with the attribute
> p2m_mmio_direct_c, or we can explore a bit based on different
> properties, e.g. for 'no-map' memory range, we map P2M with
> p2m_mmio_direct_c; for 'reusable' memory range, we map with
> attribute 'p2m_ram_rw'.
>
> To simplify the discussion, I think we can firstly finalize the fixing
> for the fist question and hold on the second question. After we fix
> the first one, we can come back to think about the second issue.
>
>>> Another question for the attribute for MMIO regions. For mapping MMIO
>>> regions, prepare_dtb_hwdom() sets the attribute 'p2m_mmio_direct_c'
>>> for the stage 2, but in the Linux kernel the MMIO's attribute can
>>> be one of below variants:
>>>
>>> - ioremap(): device type with nGnRE;
>>> - ioremap_np(): device type with nGnRnE (strong-ordered);
>>> - ioremap_wc(): normal non-cachable.
>>
>> The stage-2 memory attribute is used to restrict the final memory attribute.
>> In this case, p2m_mmio_direct_c allows the domain to set pretty much any
>> memory attribute.
>
> Thanks for confirmation. If so, I think the Xen hypervisor should
> follow the same attribute to map the reserved regions with attribute
> p2m_mmio_direct_c.
>
>>> If Xen hypervisor can handle these MMIO types in stage 2, then we should
>>> can use the same way to map stage 2 tables for the reserved memory. A
>>> difference for the reserved memory is it can be mapped as normal memory
>>> with cacheable.
>>
>> I am a bit confused. I read this as you think the region is not mapped in
>> the P2M (aka stage-2 page-tables for Arm). But from the logs you provided,
>> the regions are already mapped (you have an MFN in hand).
>
> You are right. The reserved memory regions have been mapped in P2M.
>
>> So to me the error is most likely in how we create the Device-Tree.
>
> Yeah, let's firstly focus on the DT binding for reserved memory nodes.
>
>>> The DT binding is something like (I tweaked a bit for readable):
>>
>> Just to confirm this is the host device tree, right? If so...
>
> Yes.
>
>>> memory@20000000 {
>>> #address-cells = <0x02>;
>>> #size-cells = <0x02>;
>>> device_type = "memory";
>>> reg = <0x00 0x20000000 0x00 0xa0000000>,
>>> <0x01 0xa0000000 0x01 0x60000000>;
>>> };
>>
>> ... you can see the reserved-regions are described in the normal memory. In
>> fact...
>>
>>>
>>>
>>> reserved-memory {
>>> #address-cells = <0x02>;
>>> #size-cells = <0x02>;
>>> ranges;
>>>
>>> reserved_mem1 {
>>> reg = <0x00 0x20000000 0x00 0x00010000>;
>>> no-map;
>>> };
>>>
>>> reserved_mem2 {
>>> reg = <0x00 0x40000000 0x00 0x20000000>;
>>> no-map;
>>> };
>>>
>>> reserved_mem3 {
>>> reg = <0x01 0xa0000000 0x00 0x20000000>;
>>> no-map;
>>> };
>>
>> ... no-map should tell the kernel to not use the memory at all. So I am a
>> bit puzzled why it is trying to use it.
>
> No, 'no-map' doesn't mean the Linux kernel doesn't use it, I quote from
> the kernel documentation
> Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml:
> 'no-map' means the kernel "must not create a virtual mapping of the
> region". The reserved memory regions are still "under the control of the
> device driver using the region".
>
>> I would suggest to check if somehow Linux doesn't understand the
>> reserved-memory nodes we wrote.
>
> Could you confirm the Xen does write reserved memory nodes? Or Xen
> converts the reserved memory nodes to normal memory nodes as I
> describe above :)
Xen passes the /reserved-memory node unchanged from host device tree to dom0
fdt.
Apart from that it creates an additional memory node covering the reserved
ranges.
Take a look at this example run(based on qemu):
Host dt:
memory@40000000 {
reg = <0x00 0x40000000 0x01 0x00>;
device_type = "memory";
};
reserved-memory {
#size-cells = <0x02>;
#address-cells = <0x02>;
ranges;
test@50000000 {
reg = <0x00 0x50000000 0x00 0x10000000>;
no-map;
};
};
Xen:
(XEN) MODULE[0]: 000000004ac00000 - 000000004ad65000 Xen
(XEN) MODULE[1]: 000000004ae00000 - 000000004ae03000 Device Tree
(XEN) MODULE[2]: 0000000042c00000 - 000000004aa8ea8b Ramdisk
(XEN) MODULE[3]: 0000000040400000 - 0000000042b30000 Kernel
(XEN) RESVD[0]: 0000000050000000 - 000000005fffffff
...
(XEN) BANK[0] 0x000000c0000000-0x00000100000000 (1024MB)
Linux dom0:
[ 0.000000] OF: reserved mem: 0x0000000050000000..0x000000005fffffff (262144
KiB) nomap non-reusable test@50000000
cat /proc/iomem:
50000000-5fffffff : reserved
c0000000-ffffffff : System RAM
dtc from Linux dom0:
memory@c0000000 {
device_type = "memory";
reg = <0x00 0xc0000000 0x00 0x40000000>;
};
memory@50000000 {
device_type = "memory";
reg = <0x00 0x50000000 0x00 0x10000000>;
};
reserved-memory {
#address-cells = <0x02>;
#size-cells = <0x02>;
ranges;
test@50000000 {
reg = <0x00 0x50000000 0x00 0x10000000>;
no-map;
};
};
~Michal
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |