[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: RFC: arm64: Handling reserved memory nodes
On 20/09/2023 11:31, Michal Orzel wrote: Hello, Hi Michal, On 20/09/2023 12:03, Leo Yan wrote:Hi Julien, On Mon, Sep 18, 2023 at 08:26:21PM +0100, Julien Grall wrote: [...]... from my understanding reserved-memory are just normal memory that are set aside for a specific purpose. So Xen has to create a 'memory' node *and* a 'reserved-memory' region.To be clear, Xen passes the 'reserved-memory' regions as normal memory nodes, see [1].With that the kernel is supposed to exclude all the 'reserved-memory' from normal usage unless they have the node contains the property 'reusable'. This was more clear before the binding was converted to YAML in [1].Linux kernel reserves pages for memory ranges in the 'reserved-memory' node, no matter the 'no-map' property for a range is set or not (see the function memmap_init_reserved_pages() -> __SetPageReserved() in Linux kernel). If a reserved memory range is set with 'no-map' property, the memory region will be not mapped in the kernel's identical address space. This avoids the data corruption caused between the memory speculative fetch with cachable mapping and the same memory region is used by devices (e.g. for DMA transferring). [...]Here the problem is these reserved memory regions are passed as normal memory nodes to Dom0 kernel, then Dom0 kernel allocates pages from these reserved memory regions. Apparently, this might lead to conflict, e.g. the reserved memory is used by Dom0 kernel, at the meantime the memory is used by another purpose (e.g. by MCU in the system).See above. I think this is correct to pass both 'memory' and 'reserved-memory'. Now, it is possible that Xen may not create the device-tree correctly.Agreed that now Xen wrongly create DT binding for 'reserved-memory' node, more specific, the reserved memory nodes are wrongly passed as normal memory nodes (again, see [1]).I would suggest to look how Linux is populating the memory and whether it actually skipped the regions.The Linux kernel reserves the corresponding pages for all reserved memory regions, which means the kernel page management (buddy alrogithm) doesn't allocate these pages at all. With 'no-map' property, the memory range will not be mapped into the kernel identical address space.Here I am a bit confused for "Xen doesn't have the capability to know the memory attribute". I looked into the file arch/arm/guest_walk.c, IIUC, it walks through the stage 1's page tables for the virtual machine and get the permission for the mapping, we also can get to know the mapping attribute, right?Most of the time, Xen will use the HW to translate the guest virtual address to an intermediation physical address. Looking at the specification, it looks like that PAR_EL1 will contain the memory attribute which I didn't know. We would then need to read MAIR_EL1 to find the attribute and also the memory attribute in the stage-2 to figure out the final memory attribute.This is feasible but the Xen ABI mandates that region passed to Xen have a specific memory attributes (see the comment at the top of xen/include/public/arch-arm.h).If you refer to the comment "All memory which is shared with other entities in the system ... which is mapped as Normal Inner Write-Back Outer Write-Back Inner-Shareable", I don't think it's relevant with current issue. I will explain in details in below.Anyway, in your case, Linux is using the buffer is on the stack. So the region must have been mapped with the proper attribute.I think you may misunderstand the issue. I would like to divide the issue into two parts: - The first question is about how to pass reserved memory node from Xen hypervisor to Dom0 Linux kernel. Currently, Xen hypervisor coverts the reserved memory ranges and add them into the normal memory node. Xen hypervisor should keep the reserved memory node and pass it to Dom0 Linux kernel. With this change, the Dom0 kernel will only allocate pages from normal memory node and the data in these pages can be shared by Xen hypervisor and Dom0 Linux kernel. - The second question is for memory attribute for the reserved memory node. Note, the reserved memory ranges are not necessarily _shared_ between the Xen hypervisor and Dom0 Linux kernel. I think in most cases, the reserved memory will be ioremaped by drivers (for stage-1); and the Xen hypervisor should map P2M with the attribute p2m_mmio_direct_c, or we can explore a bit based on different properties, e.g. for 'no-map' memory range, we map P2M with p2m_mmio_direct_c; for 'reusable' memory range, we map with attribute 'p2m_ram_rw'. To simplify the discussion, I think we can firstly finalize the fixing for the fist question and hold on the second question. After we fix the first one, we can come back to think about the second issue.Another question for the attribute for MMIO regions. For mapping MMIO regions, prepare_dtb_hwdom() sets the attribute 'p2m_mmio_direct_c' for the stage 2, but in the Linux kernel the MMIO's attribute can be one of below variants: - ioremap(): device type with nGnRE; - ioremap_np(): device type with nGnRnE (strong-ordered); - ioremap_wc(): normal non-cachable.The stage-2 memory attribute is used to restrict the final memory attribute. In this case, p2m_mmio_direct_c allows the domain to set pretty much any memory attribute.Thanks for confirmation. If so, I think the Xen hypervisor should follow the same attribute to map the reserved regions with attribute p2m_mmio_direct_c.If Xen hypervisor can handle these MMIO types in stage 2, then we should can use the same way to map stage 2 tables for the reserved memory. A difference for the reserved memory is it can be mapped as normal memory with cacheable.I am a bit confused. I read this as you think the region is not mapped in the P2M (aka stage-2 page-tables for Arm). But from the logs you provided, the regions are already mapped (you have an MFN in hand).You are right. The reserved memory regions have been mapped in P2M.So to me the error is most likely in how we create the Device-Tree.Yeah, let's firstly focus on the DT binding for reserved memory nodes.The DT binding is something like (I tweaked a bit for readable):Just to confirm this is the host device tree, right? If so...Yes.memory@20000000 { #address-cells = <0x02>; #size-cells = <0x02>; device_type = "memory"; reg = <0x00 0x20000000 0x00 0xa0000000>, <0x01 0xa0000000 0x01 0x60000000>; };... you can see the reserved-regions are described in the normal memory. In fact...reserved-memory { #address-cells = <0x02>; #size-cells = <0x02>; ranges; reserved_mem1 { reg = <0x00 0x20000000 0x00 0x00010000>; no-map; }; reserved_mem2 { reg = <0x00 0x40000000 0x00 0x20000000>; no-map; }; reserved_mem3 { reg = <0x01 0xa0000000 0x00 0x20000000>; no-map; };... no-map should tell the kernel to not use the memory at all. So I am a bit puzzled why it is trying to use it.No, 'no-map' doesn't mean the Linux kernel doesn't use it, I quote from the kernel documentation Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml: 'no-map' means the kernel "must not create a virtual mapping of the region". The reserved memory regions are still "under the control of the device driver using the region".I would suggest to check if somehow Linux doesn't understand the reserved-memory nodes we wrote.Could you confirm the Xen does write reserved memory nodes? Or Xen converts the reserved memory nodes to normal memory nodes as I describe above :)Xen passes the /reserved-memory node unchanged from host device tree to dom0 fdt. Apart from that it creates an additional memory node covering the reserved ranges. Take a look at this example run(based on qemu): Thanks for providing an example! This is quite handy. Host dt: memory@40000000 { reg = <0x00 0x40000000 0x01 0x00>; device_type = "memory"; }; reserved-memory { #size-cells = <0x02>; #address-cells = <0x02>; ranges; test@50000000 { reg = <0x00 0x50000000 0x00 0x10000000>; no-map; }; }; Xen: (XEN) MODULE[0]: 000000004ac00000 - 000000004ad65000 Xen (XEN) MODULE[1]: 000000004ae00000 - 000000004ae03000 Device Tree (XEN) MODULE[2]: 0000000042c00000 - 000000004aa8ea8b Ramdisk (XEN) MODULE[3]: 0000000040400000 - 0000000042b30000 Kernel (XEN) RESVD[0]: 0000000050000000 - 000000005fffffff ... (XEN) BANK[0] 0x000000c0000000-0x00000100000000 (1024MB) Linux dom0: [ 0.000000] OF: reserved mem: 0x0000000050000000..0x000000005fffffff (262144 KiB) nomap non-reusable test@50000000 So Linux should tell whether a region has been reserved. @Leo, can you share with us the serial console? Can you confirm the version of Xen you are using? Cheers, -- Julien Grall
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |