|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Hyperlaunch Device Tree Discussion
On Wed May 21, 2025 at 5:39 PM CEST, Daniel P. Smith wrote:
> Greetings,
>
> Per my response to Allejandro's message, here is the response sent the
> the DTB working group formed last year to discuss DTB parsing for x86.
>
>
> Original Message:
>
> I have copied everyone that attended the hyperlaunch working group a few
> weeks back to ensure everyone has a chance to review and comment.
>
> As a start and to provide a common understanding, first is a quick
> overview of Flattened Device Tree and Xen's "Unflattened Device Tree".
> The intent is to assist everyone in having an equal footing when
> considering the impacts that Device Tree parsing brings.
>
> A Flattened Device Tree (FDT) is a nested linear tree structure that
> uses a combination of tags, layout definition, and headers to allow
> navigation through the tree. Because the layout is nested, if given the
> offset for a node in the FDT, it is possible to start at that node and
> jump directly into the tree to access child nodes and properties.
> Provided below is a visual representation of what any parent node,
> including the root node, may look like:
>
> +------------------------------+
> | NODE TAG (parent node) |
> +------------------------------+
> | Null-term String (node name) |
> +------------------------------+
> | PROPERTY TAG |
> +------------------------------+
> | struct property { |
> | u32 len |
> | u32 name_offset |
> | } |
> +------------------------------+
> | Property Data |
> +------------------------------+
> | NODE TAG (child node) |
> +------------------------------+
> | Null-term String (node name) |
> +------------------------------+
> | PROPERTY TAG |
> +------------------------------+
> | struct property { |
> | u32 len |
> | u32 name_offset |
> | } |
> +------------------------------+
> | Property Data |
> +------------------------------+
> | END NODE TAG (child node) |
> +------------------------------+
> | END NODE TAG (parent node) |
> +------------------------------+
>
> Before moving forward, let us clarify some terminology to ensure a
> common understanding when discussing a tree.
>
> - node path: represents a series of hierarchical child nodes starting
> at the root node
> - adjacent node: the logically next node that is at the same level in
> the tree
> - child node: a node that is a one level lower leaf to another node,
> the parent node
> - tree walk: incrementally walking the nodes, to locate a specific
> node or to iterate over the whole tree
>
> The libfdt library provides handlers for finding the offset of a node,
> as well as handlers to jump to a node offset and iterate only on the
> child nodes. While the libfdt is fairly optimized, the reality is that
> to find a node, the library must do a tree walk starting with the first
> node written in the FDT. If a node is not a path match at the current
> depth, it must cross a null terminated string, all the node's property
> entries and all children nodes to reach the next adjacent node. Once a
> path match for the depth is found, then the search may descend into the
> next depth and repeat the process until a match at that level is found.
>
> This brings us to Xen's "Unflattened Device Tree" (UDT), for which I am
> quoting as I find myself thinking of it in another way, which IMHO is a
> more descriptive name, which is that it is an FDT lookup index. It just
> happens that the implementation for the lookup index structure is a tree
> structure. UDT uses a structure to represent a node and one to represent
> a property. The node structure is a traditional tree structure with
> adjacent and child node pointers. The contents of both structures are
> pointers to the respective memory locations within the FDT. As with the
> FDT, in order to locate a node in the index, a tree walk of the index
> must be done. The difference comes when a node is not a path match, to
> reach the adjacent node, it only needs to access the node pointed to by
> the adjacent node pointer of the current node. UDT provides an API for
> walking the node tree, walking the property list for a node, and methods
> for type-interpreted extraction of property values. NB: the
> type-interpreted extraction API is codified around taking a UDT property
> structure, but the interpreted extraction logic isn't UDT specific as it
> is still reading the property value from the FDT.
>
> The benefit UDT brings is when repeated node lookups and/or repeated
> phandle dereferencing are done. For both FDT and UDT, a tree walk must
> be done. The walk will start with a node, either the root node or one
> for which a reference has already been found, walking each adjacent node
> and descending into a node's children when a path match occurs. For
> phandle dereferencing, the benefit is greater due to the fact that when
> indexing the FDT, phandles get dereferenced, thus allowing direct
> reference in the index. For comparison, a phandle dereference using
> libfdt does a walk of the tree to find the node referenced by the phandle.
>
> The UDT, as implemented, is not without cost. The current implementation
> takes two complete walks of the entire FDT using libfdt. The first pass
> is to obtain the amount of memory that is required to allocate enough
> instances of the UDT node and property structures to represent the full
> tree. The second pass is when the FDT nodes and properties are indexed
> into the UDT.
>
> With the expense of using FDT and UDT established, it is important to
> put that expense into context. Consider hyperlaunch on x86 where the
> arch itself has no DT requirements. In all likelihood, an FDT
> constructed for this arch would only contain the nodes necessary for
> hyperlaunch. If hyperlaunch were constructed x86 only, loading the
> configuration could be done in a single full walk of the FDT, even when
> considering phandle usage. The reason this is true for the phandles
> case, is that as nodes known to be a phandle target are encountered,
> their offset into the FDT could be stored with dereferencing resolved
> post walk.
>
> For DT based archs, currently accepted costs are two FDT node lookups
> along with the two full walks to construct the UDT. These first two node
> lookups being the memory allocation table and the Xen command line. As
> noted above, an FDT node lookup requires a walk of the linear tree until
> the node is located. AIUI at this point is that the number of nodes that
> must be crossed is indeterminate. I did not see anything in the device
> tree specification that the FDT must be packed in the same order as the
> string representation. NB: I have not reviewed the DT compiler to see if
> it optimizes for early access nodes to be packed at the beginning of the
> linear tree to reduce the number of nodes that must be crossed.
>
> While the aforementioned strategy for x86 would be optimal for x86, it
> is not necessarily the best for DT based archs. Hyperlaunch started, and
> currently is, focused on the x86 arch, but long term it was always
> understood that its more expansive design would be desirable by all
> archs. Like anything that moves into common, a slightly less efficient
> approach for one platform is accepted for the benefit of a common
> implementation that reduces the amount of code while increasing the
> number of reviewers.
>
> After listening to everyone's concerns, re-reviewing all of Arm's device
> tree logic, and considering everything in totality, the conclusion is
> that there is a core root cause from with which all the concerns raised
> flow. First a summary of the main concerns raised,
>
> - The issue of memory allocator(s) available at the time when the
> first FDT walk/parsing occurs.
> - Overhead of doing a more than one FDT walk to obtain the hyperlaunch
> configuration when phandles are in use.
> - Supporting FDT would require the introduction of a
> duplicate/competing set of property parsers.
>
> This root cause is due to a design decision difference made for the
> hyperlaunch domain builder versus the dom0less domain builder and Arm's
> approach to device tree parsing. For dom0less, the approach is to walk
> the UDT index tree at the domain construction time, which appears to
> stem from Arm's need and practice of repeatedly accessing device tree
> entries. Whereas x86 has no need for the device tree and took the
> approach to do a single walk to extract its configuration into a code
> usable structure.
>
> With that understanding, it is believed that these two approaches are
> not diametrically opposed and in fact can be blended together to result
> in a generally optimized approach. The approach will be to conduct two
> full walks of the FDT, an early boot pass before memory allocation is
> available and a second pass after a memory allocator is set up. Both
> passes serve to populate the proposed boot info structure, specifically
> the scope of these passes are as follows:
>
> Early FDT Walk: (static values)
> - calculate the space required for the device tree index
> - count the number of domains defined
> - Xen command line
> - XSM policy
> - arch specific static values, via an arch_early_fdt()
>
> Late FDT Walk: (dynamic values)
> - construct device tree index, aka "unflattened device tree"
> - populate boot modules entries (NB: boot modules are expected to be
> static array)
> - store device tree node index for top level index and hyperlaunch node
> - populate boot domain entries, basis will be device tree index node
> - arch specific dynamic values, via an arch_late_fdt()
>
> By taking this approach which is constructed around a set of arch
> neutral structures will enable another goal of the hyperlaunch series,
> which is to move to having a common domain creation/construction logic.
> Currently, there is a significant amount of duplication in each arch's
> branch for boot time construction of domains. It will also allow
> removing arch specific code from the initialization of common
> infrastructure such as XSM.
>
> V/r,
> Daniel P. Smith
This is helpful context, thanks for digging it up. I'll keep the answers
on the other thread to avoid spreading the discussion.
Cheers,
Alejandro
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |