[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] HVMlite ABI specification DRAFT A



On 05/02/16 18:05, Tim Deegan wrote:
> At 17:14 +0000 on 05 Feb (1454692488), Andrew Cooper wrote:
>> On 05/02/16 16:01, Tim Deegan wrote:
>>> At 18:48 +0100 on 04 Feb (1454611694), Roger Pau Monné wrote:
>>>> Hello,
>>>>
>>>> I've Cced a bunch of people who have expressed interest in the HVMlite
>>>> design/implementation, both from a Xen or OS point of view. If you
>>>> would like to be removed, please say so and I will remove you in
>>>> further iterations. The same applies if you want to be added to the Cc.
>>>>
>>>> This is an initial draft on the HVMlite design and implementation. I've
>>>> mixed certain aspects of the design with the implementation, because I
>>>> think we are quite tied by the implementation possibilities in certain
>>>> aspects, so not speaking about it would make the document incomplete. I
>>>> might be wrong on that, so feel free to comment otherwise if you would
>>>> prefer a different approach. At least this should get the conversation
>>>> started into a couple of pending items regarding HVMlite. I don't want
>>>> to spoil the fun, but IMHO they are:
>>>>
>>>>  - Local APIC: should we _always_ provide a local APIC to HVMlite
>>>>    guests?
>>>>  - HVMlite hardware domain: can we get rid of the PHYSDEV ops and PIRQ
>>>>    event channels?
>>>>  - HVMlite PCI-passthrough: can we get rid of pciback/pcifront?
>>> FWIW, I think we should err on the side of _not_ emulating hardware or
>>> providing ACPI; if the hypervisor interfaces are insufficient/unpleasant
>>> we should make them better.
>>>
>>> I understand that PCI passthrough is difficult because the hardware
>>> design is so awkward to retrofit isolation onto.  But I'm very
>>> uncomfortable with the idea of faking out things like PCI root
>>> complexes inside the hypervisor -- as a way of getting rid of qemu
>>> it's laughable.
>> Most certainly not.
>>
>> 90% of the necessary PCI infrastructure is already in the hypervisor,
>> and actively used for tracking interrupt mask bits.  Some of this was
>> even introduced in XSAs, and isn't going away.
> This is the chance to _make_ it go away.  If we commit to modelling
> IO-APICs and PCI bridges now, we'll be stuck with it for a while.

HVMLite at the moment has no emulated devices, and we definitely want to
keep that option available.

Both FreeBSD and Linux expect an LAPIC, and this appears to be a common
assumption (Reasonable as well, as the LAPIC is part of the CPU these
days).  I think it is worth offering an LAPIC by default, but retaining
the ability for the admin to configure it off.

Hardware extensions such as APICV/AVIC necessitate an LAPIC emulation
for the guest, and I expect there will be demand for using a
configuration like this, simply for the performance benefit (ARAT being
a common clocksource for guests which can now run without Xen interaction).

>
> I'm not suggesting that we have to stick with pcifront, and I
> appreciate the argument that at some point Xen must control the PCI
> devices, but it doesn't follow that emulated hardware is the ABI Xen
> should expose for that.

I don't think the IOAPIC or PCI bridges should be in the base ABI. 
Apologies if I gave that impression.

I expect the overwhelming majority of the use of HVMLite domains will be
without PCI passthrough.


However, if passthrough is wanted, these devices are going to be need,
one way or another.

>
>> Yes, this does involve adding a little extra emulation to Xen, but the
>> benefits are a substantially cleaner architecture for device models,
>> which doesn't require them to self-coordinate about their layout, or
>> have to talk to Qemu directly to negotiate hotplug notifications.
> Now that's a different thing altogether -- emulated device models
> presenting as PCI devices.  And here I still disagree with you -- Xen
> shouldn't have to decide device models' layouts.  That's _policy_, and
> the hypervisor's job is _enforcement_.

I am not suggesting that policy moves into Xen.

Currently, policy is in Qemu, even when multiple device models are
involved, and there is no enforcement anywhere.  All config accesses
must be broadcast to all ioreq servers because Xen has no idea which
ioreq server is serving which devices.  Secondary device models have to
choose a PCI BDF which it knows Qemu will ignore accesses for.

Instead, Xen should "own" bus 0, and be able to say yes/no to ioreq
servers requesting to set up emulation for a new device.  A traditional
device model would come along saying "I have $A, $B, $C and $D, and they
must be layed out like this".  A secondary device model can come along
and say "I have a hotplug $E. Please choose a free slot for me".

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.