[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: hardware domain and control domain separation

To: jbeulich@xxxxxxxx
From: Stefano Stabellini <sstabellini@xxxxxxxxxx>
Date: Mon, 23 Jun 2025 15:51:50 -0700 (PDT)
Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, jason.andryuk@xxxxxxx, Julien Grall <julien@xxxxxxx>, Bertrand Marquis <bertrand.marquis@xxxxxxx>, Michal Orzel <michal.orzel@xxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Anthony PERARD <anthony.perard@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, ayankuma@xxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, demiobenour@xxxxxxxxx
Delivery-date: Mon, 23 Jun 2025 22:52:05 +0000
List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

I am replying out of order hopefully to make things easier to follow.

On Mon, 23 Jun 2025, Demi Marie Obenour wrote:
> On 6/23/25 11:44, Jan Beulich wrote:
> > On 21.06.2025 02:41, Stefano Stabellini wrote:
> >> Regarding hardware domain and control domain separation, Ayan sent to
> >> xen-devel an architecture specification (a design document) that I wrote
> >> previously about the topic. This is written as safety document so it is
> >> using a language and structure specific for that. However, it contains
> >> much of the explanation needed on the topic:
> >>
> >> https://lore.kernel.org/xen-devel/20250304183115.2509666-1-ayan.kumar.halder@xxxxxxx/
> > 
> > Yet even there the line between Hardware and Control is already blurred
> > imo. Take "Reboot and shutdown the platform", for example. It seems
> > pretty likely that Hardware has ways to achieve that without involving
> > a hypercall.

That is not correct. We have all encountered similar issues with servers
and other boards we have worked on, and I have had comparable
experiences as well. However, such issues cannot happen on silicon
designed for safety-critical environments.

> I expect that in safety certified or fully disaggregated setups, even the
> hardware domain only gets an allowlist of devices, MMIO, and I/O ports.
> If an I/O resource could be abused, it is either assigned to a Safe domain
> or is simply not used at all.  This is going to be very platform-specific.

That's right.

> > Furthermore there it is (again) assumed that Control has full privileges.
> > I did mention before that I'm not convinced any domain, in a
> > disaggregated setup, would need to have (nor should have) full privilege.

Thinking more about this and also thanks to Jan's comments, I realized
that "the Control Domain has full privileges" is an oversimplification.
The main role of the Control Domain is to monitor other domains and
trigger a platform-wide reboot when necessary. As such, it is probably
not a good idea to describe it as "full privilege" because in reality it
doesn't create/destroy VMs, and doesn't do other operations that would
be part of the "full privilege" concept of Xen.

The Control Domain needs to be able to trigger a platform-wide reboot,
platform-wide shutdown, and platform-wide suspend. That is pretty much
it. I think I need to update the document I linked to explain this.

> > You can also see that (kind of) connection in the
> > hypervisor itself: The special handling of a domain shutting down is
> > in hwdom_shutdown(), with the call to it keyed to is_hardware_domain()
> > (as is to be expected from the function's name).

This is an important detail. As wrote above, maybe we don't actually
need the Control Domain to do anything else or need any extra
privileges, but the Control Domain needs to be able to trigger a
platform-wide reboot. We might have to change this check.

> > Also a more fundamental question I was wondering about: If Control had
> > full privilege, nothing else in the system ought to be able to interfere
> > with it. Yet then how does that domain communicate with the outside
> > world? It can't have PV or Virtio drivers after all. And even if its
> > sole communication channel was a UART, Hardware would likely be able to
> > interfere.

There are well-established methods for implementing domain-to-domain
communication that are free from interference, such as using carefully
defined rings on static shared memory. I believe one of these techniques
involves placing the indexes on separate pages and mapping them
read-only from one of the two domains.

Follow-Ups:
- Re: hardware domain and control domain separation
  - From: Jan Beulich

References:
- hardware domain and control domain separation
  - From: Stefano Stabellini
- Re: hardware domain and control domain separation
  - From: Jan Beulich
- Re: hardware domain and control domain separation
  - From: Demi Marie Obenour

Prev by Date: Re: [PATCH v6 3/7] x86: re-work memset()
Next by Date: [PATCH] docs/misra/rules.rst: allow string literals with memcmp
Previous by thread: Re: hardware domain and control domain separation
Next by thread: Re: hardware domain and control domain separation
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.