[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Hyperlaunch Device Tree Discussion
On Wed May 21, 2025 at 5:39 PM CEST, Daniel P. Smith wrote: > Greetings, > > Per my response to Allejandro's message, here is the response sent the > the DTB working group formed last year to discuss DTB parsing for x86. > > > Original Message: > > I have copied everyone that attended the hyperlaunch working group a few > weeks back to ensure everyone has a chance to review and comment. > > As a start and to provide a common understanding, first is a quick > overview of Flattened Device Tree and Xen's "Unflattened Device Tree". > The intent is to assist everyone in having an equal footing when > considering the impacts that Device Tree parsing brings. > > A Flattened Device Tree (FDT) is a nested linear tree structure that > uses a combination of tags, layout definition, and headers to allow > navigation through the tree. Because the layout is nested, if given the > offset for a node in the FDT, it is possible to start at that node and > jump directly into the tree to access child nodes and properties. > Provided below is a visual representation of what any parent node, > including the root node, may look like: > > +------------------------------+ > | NODE TAG (parent node) | > +------------------------------+ > | Null-term String (node name) | > +------------------------------+ > | PROPERTY TAG | > +------------------------------+ > | struct property { | > | u32 len | > | u32 name_offset | > | } | > +------------------------------+ > | Property Data | > +------------------------------+ > | NODE TAG (child node) | > +------------------------------+ > | Null-term String (node name) | > +------------------------------+ > | PROPERTY TAG | > +------------------------------+ > | struct property { | > | u32 len | > | u32 name_offset | > | } | > +------------------------------+ > | Property Data | > +------------------------------+ > | END NODE TAG (child node) | > +------------------------------+ > | END NODE TAG (parent node) | > +------------------------------+ > > Before moving forward, let us clarify some terminology to ensure a > common understanding when discussing a tree. > > - node path: represents a series of hierarchical child nodes starting > at the root node > - adjacent node: the logically next node that is at the same level in > the tree > - child node: a node that is a one level lower leaf to another node, > the parent node > - tree walk: incrementally walking the nodes, to locate a specific > node or to iterate over the whole tree > > The libfdt library provides handlers for finding the offset of a node, > as well as handlers to jump to a node offset and iterate only on the > child nodes. While the libfdt is fairly optimized, the reality is that > to find a node, the library must do a tree walk starting with the first > node written in the FDT. If a node is not a path match at the current > depth, it must cross a null terminated string, all the node's property > entries and all children nodes to reach the next adjacent node. Once a > path match for the depth is found, then the search may descend into the > next depth and repeat the process until a match at that level is found. > > This brings us to Xen's "Unflattened Device Tree" (UDT), for which I am > quoting as I find myself thinking of it in another way, which IMHO is a > more descriptive name, which is that it is an FDT lookup index. It just > happens that the implementation for the lookup index structure is a tree > structure. UDT uses a structure to represent a node and one to represent > a property. The node structure is a traditional tree structure with > adjacent and child node pointers. The contents of both structures are > pointers to the respective memory locations within the FDT. As with the > FDT, in order to locate a node in the index, a tree walk of the index > must be done. The difference comes when a node is not a path match, to > reach the adjacent node, it only needs to access the node pointed to by > the adjacent node pointer of the current node. UDT provides an API for > walking the node tree, walking the property list for a node, and methods > for type-interpreted extraction of property values. NB: the > type-interpreted extraction API is codified around taking a UDT property > structure, but the interpreted extraction logic isn't UDT specific as it > is still reading the property value from the FDT. > > The benefit UDT brings is when repeated node lookups and/or repeated > phandle dereferencing are done. For both FDT and UDT, a tree walk must > be done. The walk will start with a node, either the root node or one > for which a reference has already been found, walking each adjacent node > and descending into a node's children when a path match occurs. For > phandle dereferencing, the benefit is greater due to the fact that when > indexing the FDT, phandles get dereferenced, thus allowing direct > reference in the index. For comparison, a phandle dereference using > libfdt does a walk of the tree to find the node referenced by the phandle. > > The UDT, as implemented, is not without cost. The current implementation > takes two complete walks of the entire FDT using libfdt. The first pass > is to obtain the amount of memory that is required to allocate enough > instances of the UDT node and property structures to represent the full > tree. The second pass is when the FDT nodes and properties are indexed > into the UDT. > > With the expense of using FDT and UDT established, it is important to > put that expense into context. Consider hyperlaunch on x86 where the > arch itself has no DT requirements. In all likelihood, an FDT > constructed for this arch would only contain the nodes necessary for > hyperlaunch. If hyperlaunch were constructed x86 only, loading the > configuration could be done in a single full walk of the FDT, even when > considering phandle usage. The reason this is true for the phandles > case, is that as nodes known to be a phandle target are encountered, > their offset into the FDT could be stored with dereferencing resolved > post walk. > > For DT based archs, currently accepted costs are two FDT node lookups > along with the two full walks to construct the UDT. These first two node > lookups being the memory allocation table and the Xen command line. As > noted above, an FDT node lookup requires a walk of the linear tree until > the node is located. AIUI at this point is that the number of nodes that > must be crossed is indeterminate. I did not see anything in the device > tree specification that the FDT must be packed in the same order as the > string representation. NB: I have not reviewed the DT compiler to see if > it optimizes for early access nodes to be packed at the beginning of the > linear tree to reduce the number of nodes that must be crossed. > > While the aforementioned strategy for x86 would be optimal for x86, it > is not necessarily the best for DT based archs. Hyperlaunch started, and > currently is, focused on the x86 arch, but long term it was always > understood that its more expansive design would be desirable by all > archs. Like anything that moves into common, a slightly less efficient > approach for one platform is accepted for the benefit of a common > implementation that reduces the amount of code while increasing the > number of reviewers. > > After listening to everyone's concerns, re-reviewing all of Arm's device > tree logic, and considering everything in totality, the conclusion is > that there is a core root cause from with which all the concerns raised > flow. First a summary of the main concerns raised, > > - The issue of memory allocator(s) available at the time when the > first FDT walk/parsing occurs. > - Overhead of doing a more than one FDT walk to obtain the hyperlaunch > configuration when phandles are in use. > - Supporting FDT would require the introduction of a > duplicate/competing set of property parsers. > > This root cause is due to a design decision difference made for the > hyperlaunch domain builder versus the dom0less domain builder and Arm's > approach to device tree parsing. For dom0less, the approach is to walk > the UDT index tree at the domain construction time, which appears to > stem from Arm's need and practice of repeatedly accessing device tree > entries. Whereas x86 has no need for the device tree and took the > approach to do a single walk to extract its configuration into a code > usable structure. > > With that understanding, it is believed that these two approaches are > not diametrically opposed and in fact can be blended together to result > in a generally optimized approach. The approach will be to conduct two > full walks of the FDT, an early boot pass before memory allocation is > available and a second pass after a memory allocator is set up. Both > passes serve to populate the proposed boot info structure, specifically > the scope of these passes are as follows: > > Early FDT Walk: (static values) > - calculate the space required for the device tree index > - count the number of domains defined > - Xen command line > - XSM policy > - arch specific static values, via an arch_early_fdt() > > Late FDT Walk: (dynamic values) > - construct device tree index, aka "unflattened device tree" > - populate boot modules entries (NB: boot modules are expected to be > static array) > - store device tree node index for top level index and hyperlaunch node > - populate boot domain entries, basis will be device tree index node > - arch specific dynamic values, via an arch_late_fdt() > > By taking this approach which is constructed around a set of arch > neutral structures will enable another goal of the hyperlaunch series, > which is to move to having a common domain creation/construction logic. > Currently, there is a significant amount of duplication in each arch's > branch for boot time construction of domains. It will also allow > removing arch specific code from the initialization of common > infrastructure such as XSM. > > V/r, > Daniel P. Smith This is helpful context, thanks for digging it up. I'll keep the answers on the other thread to avoid spreading the discussion. Cheers, Alejandro
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |