[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 00/12] const-ify vendor checks


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Alejandro Vallejo <alejandro.garciavallejo@xxxxxxx>
  • Date: Mon, 9 Feb 2026 15:37:59 +0100
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=suse.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0)
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Gn12UfPFihhH+wxtx1AeC7kZFezimNVmrRvx7f0cUCE=; b=Oh9J7H4ByCIxnCYJ1A/7+0BY5DbnZaEle9Uskni9rnNkActXL5+c32ONaMdu98OwTDChPlAS4uNNO6OY8yiuBMdPybF2TQG6+7YHTC1u6mn0yeSKP3RHws9nR7wXPCCumei16H/GYPXhgMVO3Xb00M9b9kUFNINCyOro0oEkYWbT86wWrtisG6q+qpa5r4e5tI/JO9k5yJoat7f9+hvrRfqzXHWWISNSKWoE2N1m1btfmnGS3KNgGaL2DrtEzb15WXFUdtVhW6XzqyoGtxdJAkMY4BDfSszu/+9tRO8K696z2NEHA643I23n8dA5k1FJ4AfOMVMRB0Nkap9a5/1I/Q==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=h+H+KWWznYDZbzRocsU4waH+x7nlb0rOSWPx+Tcw1ePoU4A8rzAbs5CPxD9ipbjELLzBvgj8Tdo2xr4aLBk5SWhhhf90v4eqp1snKfX8yFDpbCfTf4k5epjI6+qCyUSHC0B+YSAJwGPe1EWX8kGD/6yrH1oS236ZZc7vmF6vOQU0A/eHjlSL65MLIWa6RiL7BqY1OCOQr/RUuLODiSiM4OBd9EZBWuYKlJNJCKErg6hoEQw5cjcjGMIvMgDlpebkEh3X6GRMzJ1hFl670IbWsbiZK+wVm2AmIVIlRCiHiTPeEQcCI156dIPpED1Lj3pPKqVmmfidF6ynZjFu1PQ0Vw==
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Jason Andryuk <jason.andryuk@xxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Mon, 09 Feb 2026 14:38:23 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Mon Feb 9, 2026 at 1:52 PM CET, Jan Beulich wrote:
> On 09.02.2026 12:56, Alejandro Vallejo wrote:
>> On Mon Feb 9, 2026 at 11:15 AM CET, Jan Beulich wrote:
>>> On 09.02.2026 11:05, Alejandro Vallejo wrote:
>>>> On Mon Feb 9, 2026 at 10:21 AM CET, Jan Beulich wrote:
>>>>> On 06.02.2026 17:15, Alejandro Vallejo wrote:
>>>>>> High level description
>>>>>> ======================
>>>>>>
>>>>>> When compared to the RFC this makes a different approach The series 
>>>>>> introduces
>>>>>> cpu_vendor() which maps to a constant in the single vendor case and to
>>>>>> (boot_cpu_data.vendor & X86_ENABLED_VENDORS), where X86_ENABLED_VENDORS 
>>>>>> is a
>>>>>> mask of the compile-time chosen vendors. This enables the compiler to 
>>>>>> detect
>>>>>> dead-code at the uses and remove all unreachable branches, including in
>>>>>> switches.
>>>>>>
>>>>>> When compared to the x86_vendor_is() macro introduced in the RFC, this is
>>>>>> simpler. It achieves MOST of what the older macro did without touching 
>>>>>> the
>>>>>> switches, with a few caveats:
>>>>>>
>>>>>>   1. Compiled-out vendors cause a panic, they don't fallback onto the 
>>>>>> unknown
>>>>>>      vendor case. In retrospect, this is a much saner thing to do.
>>>>>
>>>>> I'm unconvinced here. Especially our Centaur and Shanghai support is at 
>>>>> best
>>>>> rudimentary. Treating those worse when configured-out than when 
>>>>> configured-in
>>>>> feels questionable.
>>>>
>>>> Isn't that the point of configuring out?
>>>
>>> That's what I'm unsure about.
>> 
>> I'm really missing what you're trying to make, sorry. How, if at all, is it
>> helpful for a hypervisor with a compiled out vendor to be bootable on a 
>> machine
>> of that vendor?
>
> No more and no less than for a system with CPUs from a vendor we don't have
> support for at all. Let's assume someone wants to start adding support for
> a new vendor. They may first try Xen as-is. This wouldn't panic.
> on how exactly they would start adding stuff, Xen might suddenly panic,
> despite functionally nothing having changed.

There's an "unknown CPU vendor" option for that. With that option enabled
unknown vendor keep working as intended. If it booted in the first place it's
because the unknown vendor option was enabled. The panic would happen iff they
add their own vendor to the code, hook it up on the lookup function and miss
the X86_ENABLED_VENDORS addition. Which, incidentally, would notify them right
away through a panic rather than having them waste time trying to trigger code
DCE is intentionally removing.

>
>>>> Besides the philosophical matter of whether or not a compiled-out vendor
>>>> should be allowed to run there's the more practical matter of what to do
>>>> with the x86_vendor field of boot_cpu_data. Because at that point our take
>>>> that cross-vendor support is forbidden is a lot weaker. If I can run an
>>>> AMD-hypervisor on an Intel host, what then? What policies would be 
>>>> allowed? If I
>>>> wipe x86_vendor then policies with some unknown vendor would be fine. 
>>>> Should the
>>>> leaves match too? If I do not wipe the field, should I do black magic to 
>>>> ensure
>>>> the behaviour is different depending on whether the vendor is compiled in 
>>>> or
>>>> not? What if I want to migrate a VM currently running in this hypothetical
>>>> hypervisor? The rules becomes seriously complex.
>>>>
>>>> It's just a lot cleaner to take the stance that compiled out vendors can't 
>>>> run.
>>>> Then everything else is crystal clear and we avoid a universe's worth of 
>>>> corner
>>>> cases. I expect upstream Xen to support all cases (I'm skeptical about the
>>>> utility of the unknown vendor path, but oh well), but many downstreams 
>>>> might
>>>> benefit from killing off support for vendors they really will never touch.
>>>
>>> To them, will panic()ing (or not) make a difference?
>> 
>> One would hope not because the're compiling them out for a reason.
>> But for upstream, not panicking brings a sea of corner cases. The ones I
>> mentioned above is not the whole list.
>> 
>> Turning the question around. Who benefits from not panicking?
>
> Certain things may work. But more generally - see above. Turning this
> question around also isn't quite appropriate imo: You want to introduce
> the panic(), so it's on you to justify doing so (which includes making
> clear why omitting that small piece of code would be a bad idea).

The justification is twofold. To aleviate complexity in Xen, and fullful a
security invariant in the presence of misuse. Letting a guest run is easy (not
quite a oneliner, but not much more). However it raises a number of questions
with complicated answers:

Save/restore | Live migration
=============================

The biggest hurdle, as I mentioned, is live-migrations/save-restore.

If I xl restore a Hygon VM on a host running in "unknown" mode (being an
AMD-only hypervisor with the unknown vendor cfg enabled on an Centaur host): 
What
should the vendor policy check say when I restore?

    a. They are incompatible: Because Hygon is compiled out.
    b. They are incompatible: Because Hygon doesn't match the vendor bytes of 
the host.
    b. They are compatible: Because they are both "unknown".
    c. What if the host _was_ Hygon, but Hygon was disabled?

It's just a mess. And the fact that the unknown vendor path has even less
testing than that of Centaur we can probably count on it being extremely buggy.

And on the topic of the consequences of bugs, there's the other argument...

Security argument
=================

If an AMD-only hypervisor runs on Intel hardware it just wouldn't be safe. We
can make statements that this configuration being unsupported and known unsafe,
and everything along those lines but mistakes happen. A mistake in production
where a vendor-trimmed hypervisor lands on the very trimmed vendor means a
security hole. And saying "that's a 'you' problem" doesn't cut it, because I can
make the same statement to users and developers of new vendors.

I can only see very theoretical use cases for the lack of a panic and a very
clear danger of misuse. early_cpu_init() runs after COM ports are configured so
it's not like we'd reboot without stating why.

================================================

If it's a hard requirement, I can change it so the panic behaves as it did
on the RFC, but IMO it creates real problems for the benefit of imaginary use
cases.

Cheers,
Alejandro



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.