[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 6/6] x86/HVM: report the set of enabled emulated devices through CPUID

On 22/01/16 12:43, Roger Pau Monné wrote:
> El 22/01/16 a les 11.57, Jan Beulich ha escrit:
>>>>> On 21.01.16 at 17:51, <roger.pau@xxxxxxxxxx> wrote:
>>> Add a new HVM-specific feature flag that signals the presence of a bitmap
>>> that contains the current set of enabled emulated devices. The bitmap is
>>> placed in the ecx register. The bit fields used in the bitmap are the same
>>> as the ones used in the xen_arch_domainconfig emulation_flags field, and
>>> their meaning can be found at arch-x86/xen.h.
>>> This will allow Xen to enable emulated devices for HVMlite guests in the
>>> future, by having a proper ABI for reporting which devices are enabled.
>> The idea is certainly nice and appreciated, but ...
>>> --- a/xen/include/public/arch-x86/cpuid.h
>>> +++ b/xen/include/public/arch-x86/cpuid.h
>>> @@ -78,12 +78,17 @@
>>>   * HVM-specific features
>>>   * EAX: Features
>>>   * EBX: vcpu id (iff EAX has XEN_HVM_CPUID_VCPU_ID_PRESENT flag)
>>> + * ECX: bitmap of enabled devices, according to the bit fields defined in
>>> + *      arch-x86/xen.h.
>> ... this set of definitions is not currently a stable ABI (limited to
>> hypervisor and tool stack), and if we wanted to make it stable
>> we'd first need to think a little about the complications that may
>> arise if the granularity chosen (think about the PM bit and the
>> discussion around it before your changes went in) turns out to
>> be a problem later on.
> Yes, in fact I'm having second thoughts on the PM flag, and I think I
> should have split it into ACPI_PM and ACPI_TIMER instead.
>> Also at least some of the features can be determined by other
>> means (CPUID, ACPI tables), so I'm not even sure we need all
>> of this, and I'd really prefer to avoid multiple distinct ways to
>> learn of a certain feature, as it's too easy for the two (or more)
>> mechanisms to get out of sync.
> So let's look at the flags and whether there's an existing way to signal
> it's presence:
> LAPIC: CPUID.01h:EDX[bit 9]
> IOAPIC: tied to LAPIC (so either both enabled or none).

An IOAPIC is by no means required - they are only for turning legacy
interrupts into MSIs.  It would be perfectly fine for a PVH domain to
have an LAPIC and an SRIOV virtual function, without an IOAPIC at all.

The presence of LAPICs and IOAPICs reside in the MADT ACPI table.

Note also that the cpuid bit is a fastforward of the hardware enable bit
in the APIC_BASE MSR.  The cpuid bit will disappear from view if you
hardware-disable the LAPIC.

> HPET: can only be enabled from/with ACPI, since it's base memory address
> is not fixed, and we would need to find a way to pass it's address to
> the OS in the absence of ACPI.

In reality, there are heuristics to guess if an HPET is present.  The
legacy HPET traditionally always resides at pfn fed000.  Linux even has
heuristics to find the legacy HPET based on the IOH, for when the BIOS
doesn't present the HPET properly in ACPI.

This leads to an awkward bug where Linux is able to turn off legacy
timer interrupts behinds Xen's back, and cause carnage for kdump
environment, as Xen didn't know to re-enable legacy interrupts on the
crash path.

> RTC: I don't know of any way to signal the RTC presence, AFAICT it's
> always assumed to be there in the PC architecture. Could maybe return ~0
> when reading from IO port 0x71, but that's meh..., not the best way IMHO.
> PIC: same as RTC, I don't know of any way to signal it's presence since
> it's assumed to be there.
> VGA: again I don't think there's an easy way to signal it's presence,
> apart from returning ~0 from the multiple IO ports it uses. The fact
> that the 0xA0000-0xBFFFF memory range is also marked as RAM in the e820
> map in HVMlite DomUs should also trigger OSes into disabling VGA due to
> the lack of proper MMIO range, but sadly I think most OSes just assume
> it's there.

VGA can be found by following the VGA routing bit in PCI config space. 
This is how real hardware makes the legacy IO ranges reach the graphics
card configured as the primary vga device.

> PIT: assumed to be always present in the PC architecture.

PIT, RTC and PIC have their presence always assumed, but returning ~0 on
reads is completely fine.  A DMLite OS knows it is booting in a
virtualised environment.

> PM: I'm leaning to split this into ACPI_PM and ACPI_TIMER as said
> before. ACPI_TIMER presence it's contained inside of ACPI tables, and
> the availability of ACPI_PM (power management) can be inferred from the
> presence of ACPI itself.
> AMD guest IOMMU: AFAICT this seems to be currently disabled, since the
> MMIO range it checks is [~0ULL, ~0ULL + 0x8000]. There is a function to
> change the base address ~0ULL to something else, but it doesn't seem to
> be reachable from any path. In any case, I guess the presence of this
> device will be reported from ACPI.

It is indeed currently disabled  (See
https://bugs.xenserver.org/browse/XSO-132 if you want to see why.  It
manifested as a very curious bug).

It will be available via an IVRS ACPI table when implemented.

> So, we have the following devices that are assumed to be there: RTC,
> PIC, PIT. Everything else I think can be signalled by other means
> already available.
> IMHO, I think we could say that the PIC is never going to be available
> to HVMlite guests (in any case we would enable the lapic/ioapic), and
> maybe enable the RTC and PIT by default?
> Then I think we could get away without any Xen-specific way of reporting
> enabled devices.

DMLite is a new container type.  I would far rather it was assumed that
there was no legacy hardware at all.


Xen-devel mailing list



Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.