[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V2 2/3] xen/arm: Add handling of extended regions for Dom0




On 21.09.21 02:21, Stefano Stabellini wrote:

Hi Stefano

On Sun, 19 Sep 2021, Oleksandr wrote:
On 18/09/2021 03:37, Stefano Stabellini wrote:
On Fri, 17 Sep 2021, Stefano Stabellini wrote:
On Fri, 17 Sep 2021, Oleksandr wrote:
+
+    dt_dprintk("Find unallocated memory for extended regions\n");
+
+    unalloc_mem = rangeset_new(NULL, NULL, 0);
+    if ( !unalloc_mem )
+        return -ENOMEM;
+
+    /* Start with all available RAM */
+    for ( i = 0; i < bootinfo.mem.nr_banks; i++ )
+    {
+        start = bootinfo.mem.bank[i].start;
+        end = bootinfo.mem.bank[i].start +
bootinfo.mem.bank[i].size - 1;
+        res = rangeset_add_range(unalloc_mem, start, end);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to add:
%#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove RAM assigned to Dom0 */
+    for ( i = 0; i < assign_mem->nr_banks; i++ )
+    {
+        start = assign_mem->bank[i].start;
+        end = assign_mem->bank[i].start +
assign_mem->bank[i].size - 1;
+        res = rangeset_remove_range(unalloc_mem, start, end);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to remove:
%#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove reserved-memory regions */
+    for ( i = 0; i < bootinfo.reserved_mem.nr_banks; i++ )
+    {
+        start = bootinfo.reserved_mem.bank[i].start;
+        end = bootinfo.reserved_mem.bank[i].start +
+            bootinfo.reserved_mem.bank[i].size - 1;
+        res = rangeset_remove_range(unalloc_mem, start, end);
+        if ( res )
+        {
+            printk(XENLOG_ERR "Failed to remove:
%#"PRIx64"->%#"PRIx64"\n",
+                   start, end);
+            goto out;
+        }
+    }
+
+    /* Remove grant table region */
+    start = kinfo->gnttab_start;
+    end = kinfo->gnttab_start + kinfo->gnttab_size - 1;
+    res = rangeset_remove_range(unalloc_mem, start, end);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to remove:
%#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+
+    start = EXT_REGION_START;
+    end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
+    res = rangeset_report_ranges(unalloc_mem, start, end,
+                                 add_ext_regions, ext_regions);
+    if ( res )
+        ext_regions->nr_banks = 0;
+    else if ( !ext_regions->nr_banks )
+        res = -ENOENT;
+
+out:
+    rangeset_destroy(unalloc_mem);
+
+    return res;
+}
+
+static int __init find_memory_holes(const struct kernel_info
*kinfo,
+                                    struct meminfo *ext_regions)
+{
+    struct dt_device_node *np;
+    struct rangeset *mem_holes;
+    paddr_t start, end;
+    unsigned int i;
+    int res;
+
+    dt_dprintk("Find memory holes for extended regions\n");
+
+    mem_holes = rangeset_new(NULL, NULL, 0);
+    if ( !mem_holes )
+        return -ENOMEM;
+
+    /* Start with maximum possible addressable physical memory
range */
+    start = EXT_REGION_START;
+    end = min((1ULL << p2m_ipa_bits) - 1, EXT_REGION_END);
+    res = rangeset_add_range(mem_holes, start, end);
+    if ( res )
+    {
+        printk(XENLOG_ERR "Failed to add:
%#"PRIx64"->%#"PRIx64"\n",
+               start, end);
+        goto out;
+    }
+
+    /* Remove all regions described by "reg" property (MMIO, RAM,
etc) */
Well... The loop below is not going to handle all the regions
described in
the property "reg". Instead, it will cover a subset of "reg" where
the
memory is addressable.
As I understand, we are only interested in subset of "reg" where the
memory is
addressable.



You will also need to cover "ranges" that will describe the BARs for
the PCI
devices.
Good point.
Yes, very good point!


Could you please clarify how to recognize whether it is a PCI
device as long as PCI support is not merged? Or just to find any
device nodes
with non-empty "ranges" property
and retrieve addresses?
Normally any bus can have a ranges property with the aperture and
possible address translations, including /amba (compatible =
"simple-bus"). However, in these cases dt_device_get_address already
takes care of it, see xen/common/device_tree.c:dt_device_get_address.

The PCI bus is special for 2 reasons:
- the ranges property has a different format
- the bus is hot-pluggable

So I think the only one that we need to treat specially is PCI.

As far as I am aware PCI is the only bus (or maybe just the only bus
that we support?) where ranges means the aperture.
Now that I think about this, there is another "hotpluggable" scenario we
need to think about:

[1] https://marc.info/?l=xen-devel&m=163056546214978

Xilinx devices have FPGA regions with apertures currently not described
in device tree, where things can programmed in PL at runtime making new
devices appear with new MMIO regions out of thin air.

Now let me start by saying that yes, the entire programmable region
aperture could probably be described in device tree, however, in
reality it is not currently done in any of the device trees we use
(including the upstream device trees in linux.git).
This is rather annoying, but not unheard. There are a couple of platforms
where the MMIOs are not fully described in the DT.

In fact, we have a callback 'specific_mappings' which create additional
mappings (e.g. on the omap5) for dom0.

So, we have a problem :-(


I can work toward getting the right info on device tree, but in reality
that is going to take time and for now the device tree doesn't have the
FPGA aperture in it. So if we accept this series as is, it is going to
stop features like [1] from working. >
If we cannot come up with any better plans, I think it would be better
to drop find_memory_holes, only rely on find_unallocated_memory even
when the IOMMU is on. One idea is that we could add on top of the
regions found by find_unallocated_memory any MMIO regions marked as
xen,passthrough: they are safe because they are not going to dom0 anyway.
(Oleksandr, it looks like some rationale about the different approach is
missing in the commit message. Can you add it?)
Yes sure, but let me please clarify what is different approach in this
context. Is it to *also* take into the account MMIO regions of the devices for
passthrough for case when IOMMU is off (in addition to unallocated memory)? If
yes, I wonder whether we will gain much with that according to that device's
MMIO regions are usually not big enough and we stick to allocate extended
regions with bigger size (> 64MB).
That's fair enough. There are a couple of counter examples where the
MMIO regions for the device to assign are quite large, for instance a
GPU, Xilinx AIEngine, or the PCIe Root Complex with the entire aperture,
but maybe they are not that common. I am not sure if it is worth
scanning the tree for xen,passthrough regions every time at boot for
this.

ok, I will add a few sentences to commit message about this different approach for now. At least this could be implemented later on if there is a need.


--
Regards,

Oleksandr Tyshchenko




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.