[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC 00/41] qom-topo: Abstract Everything about CPU Topology



From: Zhao Liu <zhao1.liu@xxxxxxxxx>

Hi list,

This series is our latest attempt after the previous RFC [1] about
hybrid topology support, which is based on the commit 4705fc0c8511
("Merge tag 'pull-for-8.2-fixes-231123-1' of https://gitlab.com/
stsquad/qemu into staging") with our previous cleanup (patches link:
https://lore.kernel.org/all/20231127145611.925817-1-zhao1.liu@xxxxxxxxxxxxxxx/).

In the previous RFC, Daniel suggested [2] us to use the modern QOM
approach to define CPU topology, and based on this way, defining hybrid
topology through cli is natural. (Thanks Daniel!)

About why we chose -device other than -object, please see the chapter.3
"History of QOM Topology".

In fact, S390x already implements heterogeneity at the CPU level with
QOM CPUs, i.e., different CPUs have different entitlements [3]. However,
for more thorough heterogeneity, i.e., heterogeneous cores, clusters,
dies, caches and even more, we still need to go farther in the QOM
direction.

With these background, we propose this series to implement QOM "smp"
topology in QEMU, and it's also the first step towards the heterogeneous
topology (including CPU & cache topology) for virtualization case.

The overall goal is to both use QOM "smp" topology to be compatible with
current -smp behavior, and to take into account different architectural
setups/requirements for CPU topology (even including different designs
for CPU hotplug and possible_cpus[] implementation), and ultimately to
extend QEMU's ability to define CPU topology via -device even without
-smp.

The current remining issue, mainly related to PPC, as it chose to build
the pssible_cpus[] list at core granularity. Please see chapter.5 "Open
Questions" for more thoughts on this issue.

For other architectures that build possible_cpus[] at CPU granularity,
the transition to QOM topology will be similar to what was done for i386
in this patchset (please feel free to point out any issues I've missed).

There's a lot of work, and the devil is in the details.


Welcome your feedbacks!


1. Summary about What We Did?
=============================

This series implements the basics of QOM topology and supports QOM
topology for the i386 as the example:

* Introduce the general topology device and abstract all CPU topology
  levels to topology devices:

    - including "cpu", "cpu-core", "cpu-cluster", "cpu-die",
      "cpu-socket", "cpu-book", "cpu-drawer", and a special topology
      root "cpu-slot" to manage topology tree.

* Allow user to create "smp" CPU topology via "-device", for example:

    The topology with 8 CPUs:

    -accel kvm -cpu host \
    -device cpu-socket,id=sock0 \
    -device cpu-die,id=die0,parent=sock0 \
    -device cpu-core,id=core0,parent=die0,nr-threads=2 \
    -device cpu-core,id=core1,parent=die0,nr-threads=2,plugged-threads=1 \
    -device cpu-core,id=core2,parent=die0,nr-threads=2,plugged-threads=2 \
    -device cpu-core,id=core3,parent=die0,nr-threads=2 \
    -device host-x86_64-cpu,socket-id=0,die-id=0,core-id=1,thread-id=1 \

* Build a topology tree under machine, for example:

    One of the above CPUs:

    {
        "props": {
            "core-id": 0,
            "socket-id": 0,
            "thread-id": 0
        },
        "qom-path": 
"/machine/peripheral/cpu-slot/sock0/die0/core0/host-x86_64-cpu[0]",
        "type": "host-x86_64-cpu",
        "vcpus-count": 1
    }

* Convert topology information configured via -smp into a QOM "smp"
  topology tree (if the arch supports QOM topology).


2. What's the Problem?
======================

As computer architectures continue to innovate, the need for
heterogeneous topology of virtual machine in QEMU is growing.

On the one hand, there is the need for heterogeneous CPU topology
support. Heterogeneous CPU topology refers to systems that use more than
one kind of processor or cores [4], the typical example is intel hybrid
architecture with 2 core types [5], which will even have heterogeneous
die topology [6]. And we can see that not only Intel, but also ARM and
AMD's heterogeneous platforms will have the ability to support
virtualization.

On the other hand, there is also a growing need for heterogeneous cache
topology in QEMU. Not only will Intel's hybrid platforms introduce
complex heterogeneous cache topology, but more platforms have aslo seen
the strong demand for heterogeneous cache topology definition (e.g. ARM
[7]). Although cache topology is strictly speaking a separate topology
from CPU topology, in terms of the actual processor build process and
the current QEMU's way to define cache topology (e.g., i386 [8] & ARM
[9]), cache topology is and should be dependent on CPU topology to be
defined.

With this background of increasing interest in heterogeneous computing,
we need a flexible way (for accel) to define a wide variety of CPU
topologies in QEMU.

Obviously, -smp is not enough, so here we propose a way to define CPU
topology based on -device, in the hope that this will greatly expand the
flexibility of QEMU's CPU topology definition.


3. History of QOM Topology
==========================

The intent of QOM topology is not to stop at QOM CPUs, but to abstract
more CPU topology levels.

In fact, it's not a new term.

Back in 2014, Fan proposed [10] a hierarchical tree based on QOM node,
QOM sockets, QOM cores and QOM cpu. Andreas also propsed [11] socket/
core/thread QOM model to create hierarchical topology in place. From
the discussion at that time, the hierarchy topology tree representation
was what people (Igor said [12]) wanted. However, this work was not
continued.

Then Bharata abstracted [13] the cpu-core to support core granularity
hotplug in spapr. Until now, the only user of cpu-core is still spapr.

Cpu-cluster was introduced by Luc [14] to organize CPUs in different
containers to support GDB for TCG case. Though this abstraction was
descripted as: "mainly an internal QEMU representation and does not
necessarily match with the notion of clusters on the real hardware",
its name and function actually make it easy to correspond to the
physical cluster/smp cluster (the difference, of course, is that TCG
cluster collects CPUs directly, while the cluster as a CPU topology
level collect cores, but this difference is not the impediment gap to
converting "TCG " cluster to "general" cluster).

As CPU architectures evolve, QEMU has supported more topology levels
(clusters, die, book and drawer) for virtualization case, while the
existing cpu-core and cpu-cluster abstractions become fragmented.

And now, the need for defining hybrid topology has prompted us to
rethink the QOM topology.

Daniel suggested [2] to use -object interfaces to define CPU topology,
and we absorbed his design idea. But in practice we found that -device
looked like a better approach to take advantage of the current QOM CPU
device, cpu-core device and cpu-cluster device.


4. Design Overview
==================

4.1. General Topology Device
============================

We introduce a new topology device as the basic abstraction, then all
levels of the CPU topology are derived from this general topology device
type.

This topology device is the basic unit for building the topology tree.
Children topology devices are inserted into the queue of the parent, and
the child<> property is created between the children and their parent.

As the root of the topology tree, we introduce a special topology device
"cpu-slot". It is created by the machine at machine's initialization and
collects the topology devices created by the user from the cli, and thus
builds the topology tree.

The cpu-slot is also responsible for statistics on global topology
information, and whenever there is a new topology child, the cpu-slot as
root is notified to update topology informantion. In addition, different
architectures have different requirements for topology (e.g., support
for different levels), and such limitations/properties are applied to
cpu-slot, which is checked when adding the new child topology unit in
topology tree.


4.2. Derived QOM Topology Devices
=================================

Based on the new general topology device type, we convert CPU, cpu-core
and cpu-cluster from general devices to topology devices.

And we also abstract cpu-die, cpu-socket, cpu-book and cpu-drawer as
topology devices.


4.3. New Device Category "DEVICE_CATEGORY_CPU_DEF"
==================================================

The topology devices can be divided into two general categories:

* One, as the basic topology components, should be created before board
  initialization, to predefine the basic topology structure for the
  system, and is used to initialize MachineState.possible_cpus at
  machine's initialization. This category doesn't support hotplug, such
  as:

    cpu-core (non-PPC core), cpu-cluster, cpu-die, cpu-socket, cpu-book
    and cpu-drawer.

  Thus, we introduce the new device category "DEVICE_CATEGORY_CPU_DEF"
  to mark these devices and create them from cli before board
  initialization.

* The other are CPU and PPC core, which are the granularity of
  MachineState.possible_cpus.

  They're created from MachineState.possible_cpus in place during
  machine initialization or are plugged into MachineState.possible_cpus
  through hotplug way.

  For these devices, they could be created from cli only after board
  initialization.


4.4. User-child Interface to Build Child<> from Cli
===================================================

Topology device is bus-less device, and needs child<> to build topology
hierarchical relationship like:

/machine/peripheral/cpu-slot/sock*/die*/core*/cpu*

Therefore, we introduce a new user-child interface to insert hooks into
device_add path to get/specify object parent for topology devices.

If a topology device specify "parent" option in -device, it will be add
to the corresponding topology parent with child<> property.

If no "parent" option, the topology device will have the default parent
"cpu-slot". This ensures cpu-slot could collect all topology units to
build complete topology tree.


5. Open Questions
=================

There's a special case, user could define topology via -device without
-smp (that's the future hybrid topology case!).

In the design of current QOM topology, the numbers of maximum CPUs and
pre plugged CPUs could be collected during core devices realize.

For the (non-PPC) architectures which build possible_cpus[] at CPU
granularity, the cores will be created before possible_cpus[]
initialization and then CPU slot could know how many maximum CPUs will
be supported to fill possible_cpus[].

But for PPC, the possible_cpus[] is at core granularity and PPC core
could only be created after possible_cpus[] initialization, so that
CPU slot cannot know the the numbers of maximum CPUs (PPC cores) and pre
plugged CPUs (PPC cores). So for PPC, the "-smp" is necessary and cannot
be omitted.

For PPC this potential impact, i.e., even though QOM topology is
supported in PPC, it is not possible to omit -smp to create the topology
only via -device as for i386, and since PPC does not currently support
heterogeneous topology, this potential impact might be acceptable?


6. Future TODOs
===============

The current QOM topology RFC is only the very first step to introduce
the most basic QOM support, and it tries to be as compatible as possible
with existing SMP facilities.

The ultimate goal is to completely replace the current smp-related
topology structures with cpu-slot.

There are many TODOs:

* Add unit tests.
* Support QOM topology for all architectures.
* Get rid of MachineState.smp and MachineClass.smp_props with cpu-slot.
* Extend QOM topology to hybrid topology.
* Introduce "-device-set" which is derived from Daniel's "-object-set"
  idea [2] to create multiple duplicate devices.
...


7. Patch Summary
================

Patch  1- 3: Create DEVICE_CATEGORY_CPU_DEF devices before board
             initialization.
Patch  4- 7: Support child<> creation from cli.
Ptach  8-12: Introduce general topology device.
Patch 13-26: Abstract all topology levels to topology devices.
Patch 27-34: Introduce cpu-slot to manage the CPU topology of machine.
Patch 35-41: Convert i386's CPU creation & hotplug to be based on QOM
             topology.


8. Reference
============

[1]:  Hybrid topology RFC:
      https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg03205.html
[2]:  Daniel's suggestion about QOM topology:
      https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg03320.html
[3]:  S390x topology document (by Nina):
      https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg04842.html
[4]:  Heterogeneous computing:
      https://en.wikipedia.org/wiki/Heterogeneous_computing
[5]:  12th gen’s Intel hybrid technology:
      
https://www.intel.com/content/www/us/en/support/articles/000091896/processors.html
[6]:  Intel Meteor Lake (14th gen) architecture overview:
      
https://www.intel.com/content/www/us/en/content-details/788851/meteor-lake-architecture-overview.html
[7]:  Need of ARM heterogeneous cache topology (by Yanan):
      https://mail.gnu.org/archive/html/qemu-devel/2023-02/msg05139.html
[8]:  Cache topology implementation for i386:
      https://lists.gnu.org/archive/html/qemu-devel/2023-10/msg08251.html
[9]:  Cluster for ARM to define shared L2 cache and L3 tag (by Yanan):
      https://lore.kernel.org/all/20211228092221.21068-1-wangyanan55@xxxxxxxxxx/
[10]: [PATCH v1 0/4] prebuild cpu QOM tree /machine/node/socket/core
      ->link-cpu (by Fan):
      
https://lore.kernel.org/all/cover.1395217538.git.chen.fan.fnst@xxxxxxxxxxxxxx/
[11]: [PATCH RFC 0/4] target-i386: PC socket/core/thread modeling,
      part 1 (by Andreas):
      
https://lore.kernel.org/all/1427131923-4670-1-git-send-email-afaerber@xxxxxxx/
[12]: "Now we want to have similar QOM tree for introspection which
      helps express topology as well" (by Igor):
      
https://lore.kernel.org/all/20150407170734.51faac90@igors-macbook-pro.local/
[13]: [for-2.7 PATCH v3 06/15] cpu: Abstract CPU core type (by Bharata)
      
https://lore.kernel.org/all/1463024905-28401-7-git-send-email-bharata@xxxxxxxxxxxxxxxxxx/
[14]: [PATCH v8 01/16] hw/cpu: introduce CPU clusters (by Luc):
      
https://lore.kernel.org/all/20181207090135.7651-2-luc.michel@xxxxxxxxxxxxx/

Thanks and Best Regards,
Zhao

---
Zhao Liu (41):
  qdev: Introduce new device category to cover basic topology device
  qdev: Allow qdev_device_add() to add specific category device
  system: Create base category devices from cli before board
    initialization
  qom/object: Introduce helper to resolve path from non-direct parent
  qdev: Set device parent and id after setting properties
  qdev: Introduce user-child interface to collect devices from -device
  qdev: Introduce parent option in -device
  hw/core/topo: Introduce CPU topology device abstraction
  hw/core/topo: Support topology index for topology device
  hw/core/topo: Add virtual method to update topology info for parent
  hw/core/topo: Add virtual method to check topology child
  hw/core/topo: Add helpers to traverse the CPU topology tree
  hw/core/cpu: Convert CPU from general device to topology device
  PPC/ppc-core: Offload core-id to PPC specific core abstarction
  hw/cpu/core: Allow to configure plugged threads for cpu-core
  PPC/ppc-core: Limit plugged-threads and nr-threads to be equal
  hw/cpu/core: Convert cpu-core from general device to topology device
  hw/cpu/cluster: Rename CPUClusterState to CPUCluster
  hw/cpu/cluster: Wrap TCG related ops and props into CONFIG_TCG
  hw/cpu/cluster: Descript cluster is not only used for TCG in comment
  hw/cpu/cluster: Allow cpu-cluster to be created by -device
  hw/cpu/cluster: Convert cpu-cluster from general device to topology
    device
  hw/cpu/die: Abstract cpu-die level as topology device
  hw/cpu/socket: Abstract cpu-socket level as topology device
  hw/cpu/book: Abstract cpu-book level as topology device
  hw/cpu/drawer: Abstract cpu-drawer level as topology device
  hw/core/slot: Introduce CPU slot as the root of CPU topology
  hw/core/slot: Maintain the core queue in CPU slot
  hw/core/slot: Statistics topology information in CPU slot
  hw/core/slot: Check topology child to be added under CPU slot
  hw/machine: Plug cpu-slot into machine to maintain topology tree
  hw/machine: Build smp topology tree from -smp
  hw/machine: Validate smp topology tree without -smp
  hw/core/topo: Implement user-child to collect topology device from cli
  hw/i386: Make x86_cpu_new() private in x86.c
  hw/i386: Allow x86_cpu_new() to specify parent for new CPU
  hw/i386: Allow i386 to create new CPUs from QOM topology
  hw/i386: Wrap apic id and topology sub ids assigning as helpers
  hw/i386: Add the interface to search parent for QOM topology
  hw/i386: Support QOM topology
  hw/i386: Cleanup non-QOM topology support

 MAINTAINERS                        |  16 +
 accel/kvm/kvm-all.c                |   4 +-
 gdbstub/system.c                   |   2 +-
 hw/core/cpu-common.c               |  25 +-
 hw/core/cpu-slot.c                 | 605 +++++++++++++++++++++++++++++
 hw/core/cpu-topo.c                 | 399 +++++++++++++++++++
 hw/core/machine-smp.c              |   9 +
 hw/core/machine.c                  |  10 +
 hw/core/meson.build                |   2 +
 hw/cpu/book.c                      |  46 +++
 hw/cpu/cluster.c                   |  50 ++-
 hw/cpu/core.c                      |  72 ++--
 hw/cpu/die.c                       |  46 +++
 hw/cpu/drawer.c                    |  46 +++
 hw/cpu/meson.build                 |   2 +-
 hw/cpu/socket.c                    |  46 +++
 hw/i386/x86.c                      | 319 ++++++++++-----
 hw/net/virtio-net.c                |   2 +-
 hw/ppc/meson.build                 |   1 +
 hw/ppc/pnv.c                       |   6 +-
 hw/ppc/pnv_core.c                  |  17 +-
 hw/ppc/ppc_core.c                  | 102 +++++
 hw/ppc/spapr.c                     |  28 +-
 hw/ppc/spapr_cpu_core.c            |  19 +-
 hw/usb/xen-usb.c                   |   3 +-
 hw/xen/xen-legacy-backend.c        |   2 +-
 include/hw/arm/armsse.h            |   2 +-
 include/hw/arm/xlnx-versal.h       |   4 +-
 include/hw/arm/xlnx-zynqmp.h       |   4 +-
 include/hw/boards.h                |  13 +
 include/hw/core/cpu-slot.h         | 108 +++++
 include/hw/core/cpu-topo.h         | 111 ++++++
 include/hw/core/cpu.h              |   8 +-
 include/hw/cpu/book.h              |  38 ++
 include/hw/cpu/cluster.h           |  51 ++-
 include/hw/cpu/core.h              |  26 +-
 include/hw/cpu/die.h               |  38 ++
 include/hw/cpu/drawer.h            |  38 ++
 include/hw/cpu/socket.h            |  38 ++
 include/hw/i386/x86.h              |   5 +-
 include/hw/ppc/pnv_core.h          |  11 +-
 include/hw/ppc/ppc_core.h          |  58 +++
 include/hw/ppc/spapr_cpu_core.h    |  12 +-
 include/hw/qdev-core.h             |   1 +
 include/hw/riscv/microchip_pfsoc.h |   4 +-
 include/hw/riscv/sifive_u.h        |   4 +-
 include/monitor/qdev.h             |   7 +-
 include/monitor/user-child.h       |  57 +++
 include/qom/object.h               |  26 ++
 qom/object.c                       |  31 ++
 system/meson.build                 |   1 +
 system/qdev-monitor.c              | 141 ++++++-
 system/user-child.c                |  72 ++++
 system/vl.c                        |  53 ++-
 target/i386/cpu.c                  |   4 +
 tests/unit/meson.build             |   5 +-
 56 files changed, 2607 insertions(+), 243 deletions(-)
 create mode 100644 hw/core/cpu-slot.c
 create mode 100644 hw/core/cpu-topo.c
 create mode 100644 hw/cpu/book.c
 create mode 100644 hw/cpu/die.c
 create mode 100644 hw/cpu/drawer.c
 create mode 100644 hw/cpu/socket.c
 create mode 100644 hw/ppc/ppc_core.c
 create mode 100644 include/hw/core/cpu-slot.h
 create mode 100644 include/hw/core/cpu-topo.h
 create mode 100644 include/hw/cpu/book.h
 create mode 100644 include/hw/cpu/die.h
 create mode 100644 include/hw/cpu/drawer.h
 create mode 100644 include/hw/cpu/socket.h
 create mode 100644 include/hw/ppc/ppc_core.h
 create mode 100644 include/monitor/user-child.h
 create mode 100644 system/user-child.c

-- 
2.34.1




 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.