[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC v2] Add SUPPORT.md



On 24/10/17 11:27, George Dunlap wrote:
> On 10/23/2017 06:55 PM, Andrew Cooper wrote:
>> On 23/10/17 17:22, George Dunlap wrote:
>>> On 09/11/2017 06:53 PM, Andrew Cooper wrote:
>>>> On 11/09/17 18:01, George Dunlap wrote:
>>>>> +### x86/RAM
>>>>> +
>>>>> +    Limit, x86: 16TiB
>>>>> +    Limit, ARM32: 16GiB
>>>>> +    Limit, ARM64: 5TiB
>>>>> +
>>>>> +[XXX: Andy to suggest what this should say for x86]
>>>> The limit for x86 is either 16TiB or 123TiB, depending on
>>>> CONFIG_BIGMEM.  CONFIG_BIGMEM is exposed via menuconfig without
>>>> XEN_CONFIG_EXPERT, so falls into at least some kind of support statement.
>>>>
>>>> As for practical limits, I don't think its reasonable to claim anything
>>>> which we can't test.  What are the specs in the MA colo?
>>> At the moment the "Limit" tag specifically says that it's theoretical
>>> and may not work.
>>>
>>> We could add another tag, "Limit-tested", or something like that.
>>>
>>> Or, we could simply have the Limit-security be equal to the highest
>>> amount which has been tested (either by osstest or downstreams).
>>>
>>> For simplicity's sake I'd go with the second one.
>> It think it would be very helpful to distinguish the upper limits from
>> the supported limits.  There will be a large difference between the two.
>>
>> Limit-Theoretical and Limit-Supported ?
> Well "supported" without any modifiers implies "security supported".  So
> perhaps we could just `s/Limit-security/Limit-supported/;` ?

By this, you mean use Limit-Supported throughout this document?  That
sounds like a good plan.

>
>>>>> +    Limit, x86 HVM: 128
>>>>> +    Limit, ARM32: 8
>>>>> +    Limit, ARM64: 128
>>>>> +
>>>>> +[XXX Andrew Cooper: Do want to add "Limit-Security" here for some of 
>>>>> these?]
>>>> 32 for each.  64 vcpu HVM guests can excerpt enough p2m lock pressure to
>>>> trigger a 5 second host watchdog timeout.
>>> Is that "32 for x86 PV and x86 HVM", or "32 for x86 HVM and ARM64"?  Or
>>> something else?
>> The former.  I'm not qualified to comment on any of the ARM limits.
>>
>> There are several non-trivial for_each_vcpu() loops in the domain_kill
>> path which aren't handled by continuations.  ISTR 128 vcpus is enough to
>> trip a watchdog timeout when freeing pagetables.
> I don't think 32 is a really practical limit.

What do you mean by practical here, and what evidence are you basing
this on?

Amongst other things, there is an ABI boundary in Xen at 32 vcpus, and
given how often it is broken in Linux, its clear that there isn't
regular testing happening beyond this limit.

> I'm inclined to say that if a rogue guest can crash a host with 33 vcpus, we 
> should issue an XSA
> and fix it.

The reason XenServer limits at 32 vcpus is that I can crash Xen with a
64 vcpu HVM domain.  The reason it hasn't been my top priority to fix
this is because there is very little customer interest in pushing this
limit higher.

Obviously, we should fix issues as and when they are discovered, and
work towards increasing the limits in the longterm, but saying "this
limit seems too low, so lets provisionally set it higher" is short
sighted and a recipe for more XSAs.

>>>>> +
>>>>> +### x86 PV/Event Channels
>>>>> +
>>>>> +    Limit: 131072
>>>> Why do we call out event channel limits but not grant table limits? 
>>>> Also, why is this x86?  The 2l and fifo ABIs are arch agnostic, as far
>>>> as I am aware.
>>> Sure, but I'm pretty sure that ARM guests don't (perhaps cannot?) use PV
>>> event channels.
>> This is mixing the hypervisor API/ABI capabilities with the actual
>> abilities of guests (which is also different to what Linux would use in
>> the guests).
> I'd say rather that you are mixing up the technical abilities of a
> system with user-facing features.  :-)  At the moment there is no reason
> for any ARM user to even think about event channels, so there's no
> reason to bother them with the technical details.  If at some point that
> changes, we can modify the document.

You do realise that receiving an event is entirely asymmetric with
sending an event?

Even on ARM, {net,blk}front needs to speak event_{2l,fifo} with Xen to
bind and use its interdomain event channel(s) with {net,blk}back.

>
>> ARM guests, as well as x86 HVM with APICV (configured properly) will
>> actively want to avoid the guest event channel interface, because its
>> slower.
>>
>> This solitary evtchn limit serves no useful purpose IMO.
> There may be a point to what you're saying: The event channel limit
> normally manifests itself as a limit on the number of guests / total
> devices.
>
> On the other hand, having these kinds of limits around does make sense.
>
> Let me give it some thoughts.  (If anyone else has any opinions...)

The event_fifo limit is per-domain, not system-wide.

In general this only matters for a monolithic dom0, as it is one end of
each event channel in the system.

>
>>>>> +## High Availability and Fault Tolerance
>>>>> +
>>>>> +### Live Migration, Save & Restore
>>>>> +
>>>>> +    Status, x86: Supported
>>>> * x86 HVM with nested-virt (no relevant information included in the stream)
>>> [snip]
>>>> Also, features such as vNUMA and nested virt (which are two I know for
>>>> certain) have all state discarded on the source side, because they were
>>>> never suitably plumbed in.
>>> OK, I'll list these, as well as PCI pass-through.
>>>
>>> (Actually, vNUMA doesn't seem to be on the list!)
>>>
>>> And we should probably add a safety-catch to prevent a VM started with
>>> any of these from being live-migrated.
>>>
>>> In fact, if possible, that should be a whitelist: Any configuration that
>>> isn't specifically known to work with migration should cause a migration
>>> command to be refused.
>> Absolutely everything should be in whitelist form, but Xen has 14 years
>> of history to clean up after.
>>
>>> What about the following features?
>> What do you mean "what about"?  Do you mean "are they migrate safe?"?
> "Are they compatible with migration", yes.  By which I mean, "Do they
> operate as one would reasonably expect?"
>
>>>  * Guest serial console
>> Which consoles?  A qemu emulated-serial will be qemus problem to deal
>> with.  Anything xenconsoled based will be the guests problem to deal
>> with, so pass.
> If the guest sets up extra consoles, these will show up in some
> appropriately-discoverable place after the migrate?

That is a complete can of worms.  Where do you draw the line?  log files
will get spliced across the migrate point, and `xl console $DOM` will
terminate, but whether this is "reasonably expected" is very subjective.

>
>>>  * Crash kernels
>> These are internal to the guest until the point of crash, at which point
>> you may need SHUTDOWN_soft_reset support to crash successfully.  I don't
>> think there is any migration interaction.
> For some reason I thought you had to upload your kernel before the soft
> reset.  If the crash kernel lives entirely in the guest until the crash
> actually happens, then yes, this should be safe.
>
>>>  * Transcendent Memory
>> Excluded from security support by XSA-17.
>>
>> Legacy migration claimed to have TMEM migration support, but the code
>> was sufficiently broken that I persuaded Konrad to not block Migration
>> v2 on getting TMEM working again.  Its current state is "will be lost on
>> migrate if you try to use it", because it also turns out it is
>> nontrivial to work out if there are TMEM pages needing moving.
>>
>>>  * Alternative p2m
>> Lost on migrate.
>>
>>>  * vMCE
>> There appears to be code to move state in the migrate stream.  Whether
>> it works or not is an entirely different matter.
>>
>>>  * vPMU
>> Lost on migrate.  Furthermore, levelling vPMU is far harder than
>> levelling CPUID.  Anything using vPMU and migrated to non-identical
>> hardware likely to blow up at the destination when a previously
>> established PMU setting now takes a #GP fault.
>>
>>>  * Intel Platform QoS
>> Not exposed to guests at all, so it has no migration interaction atm.
> Well suppose a user limited a guest to using only 1k of L3 cache, and
> then saved and restored it.  Would she be surprised that the QoS limit
> disappeared?
>
> I think so, so we should probably call it out.

Oh - you mean the xl configuration.

A quick `git grep` says that libxl_psr.c isn't referenced by any other
code in libxl, which means that the settings almost certainly get lost
on migrate.

>
>>>  * Remus
>>>  * COLO
>> These are both migration protocols themselves, so don't really fit into
>> this category.  Anything with works in normal migration should work when
>> using these.
> The question is, "If I have a VM which is using Remus, can I call `xl
> migrate/(save+restore)` on it?"

There is no such thing as "A VM using Remus/COLO" which isn't migrating.

Calling `xl migrate` a second time is user error, and they get to keep
all the pieces.

>
> I.e., suppose I have a VM on host A (local) being replicated to host X
> (remote) via REMUS.  Can I migrate that VM to host B (also local), while
> maintaining the replication to host X?
>
> Sounds like the answer is "no", so these are not compatible.

I think your expectations are off here.

To move a VM which is using remus/colo, you let it fail-over to the
destination then start replicating it again to a 3rd location.

Attempting to do what you describe is equivalent to `xl migrate $DOM $X
& xl migrate $DOM $Y` and expecting any pieces to remain intact.

(As a complete guess) what will most likely happen is that one stream
will get memory corruption, and the other stream will take a hard error
on the source side, because both of them are trying to be the
controlling entity for logdirty mode.  One stream has logdirty turned
off behind its back, and the other gets a hard error for trying to
enable logdirty mode a second time.

>
>>>  * PV protocols: Keyboard, PVUSB, PVSCSI, PVTPM, 9pfs, pvcalls?
>> Pass.  These will be far more to do with what is arranged in the
>> receiving dom0 by the toolstack.
> No, no pass.  This is exactly the question:  If I call "xl migrate" or
> "xl save+xl restore" on a VM using these, will the toolstack on receive
> / restore re-arrange these features in a sensible way?
>
> If the answer is "no", then these are not compatible with migration.

The answer is no until proved otherwise.  I do not know the answer to
these (hence the pass), although I heavily suspect the answer is
definitely no for PVTVM.

>
>> PVTPM is the only one I'm aware of with state held outside of the rings,
>> and I'm not aware of any support for moving that state.
>>
>>>  * FlASK?
>> I don't know what you mean by this.  Flask is a setting in the
>> hypervisor, and isn't exposed to the guest.
> Yes, so if I as an administrator give a VM a certain label limiting or
> extending its functionality, and then I do a migrate/save+restore, will
> that label be applied afterwards?
>
> If the answer is 'no' then we need to specify it.

I don't know the answer.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.