[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 1/2] docs/designs: Add a design document for non-cooperative live migration


  • To: Paul Durrant <pdurrant@xxxxxxxxxx>, <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>
  • Date: Wed, 29 Jan 2020 19:47:12 +0000
  • Authentication-results: esa3.hc3370-68.iphmx.com; dkim=none (message not signed) header.i=none; spf=None smtp.pra=andrew.cooper3@xxxxxxxxxx; spf=Pass smtp.mailfrom=Andrew.Cooper3@xxxxxxxxxx; spf=None smtp.helo=postmaster@xxxxxxxxxxxxxxx
  • Autocrypt: addr=andrew.cooper3@xxxxxxxxxx; prefer-encrypt=mutual; keydata= mQINBFLhNn8BEADVhE+Hb8i0GV6mihnnr/uiQQdPF8kUoFzCOPXkf7jQ5sLYeJa0cQi6Penp VtiFYznTairnVsN5J+ujSTIb+OlMSJUWV4opS7WVNnxHbFTPYZVQ3erv7NKc2iVizCRZ2Kxn srM1oPXWRic8BIAdYOKOloF2300SL/bIpeD+x7h3w9B/qez7nOin5NzkxgFoaUeIal12pXSR Q354FKFoy6Vh96gc4VRqte3jw8mPuJQpfws+Pb+swvSf/i1q1+1I4jsRQQh2m6OTADHIqg2E ofTYAEh7R5HfPx0EXoEDMdRjOeKn8+vvkAwhviWXTHlG3R1QkbE5M/oywnZ83udJmi+lxjJ5 YhQ5IzomvJ16H0Bq+TLyVLO/VRksp1VR9HxCzItLNCS8PdpYYz5TC204ViycobYU65WMpzWe LFAGn8jSS25XIpqv0Y9k87dLbctKKA14Ifw2kq5OIVu2FuX+3i446JOa2vpCI9GcjCzi3oHV e00bzYiHMIl0FICrNJU0Kjho8pdo0m2uxkn6SYEpogAy9pnatUlO+erL4LqFUO7GXSdBRbw5 gNt25XTLdSFuZtMxkY3tq8MFss5QnjhehCVPEpE6y9ZjI4XB8ad1G4oBHVGK5LMsvg22PfMJ ISWFSHoF/B5+lHkCKWkFxZ0gZn33ju5n6/FOdEx4B8cMJt+cWwARAQABtClBbmRyZXcgQ29v cGVyIDxhbmRyZXcuY29vcGVyM0BjaXRyaXguY29tPokCOgQTAQgAJAIbAwULCQgHAwUVCgkI CwUWAgMBAAIeAQIXgAUCWKD95wIZAQAKCRBlw/kGpdefoHbdD/9AIoR3k6fKl+RFiFpyAhvO 59ttDFI7nIAnlYngev2XUR3acFElJATHSDO0ju+hqWqAb8kVijXLops0gOfqt3VPZq9cuHlh IMDquatGLzAadfFx2eQYIYT+FYuMoPZy/aTUazmJIDVxP7L383grjIkn+7tAv+qeDfE+txL4 SAm1UHNvmdfgL2/lcmL3xRh7sub3nJilM93RWX1Pe5LBSDXO45uzCGEdst6uSlzYR/MEr+5Z JQQ32JV64zwvf/aKaagSQSQMYNX9JFgfZ3TKWC1KJQbX5ssoX/5hNLqxMcZV3TN7kU8I3kjK mPec9+1nECOjjJSO/h4P0sBZyIUGfguwzhEeGf4sMCuSEM4xjCnwiBwftR17sr0spYcOpqET ZGcAmyYcNjy6CYadNCnfR40vhhWuCfNCBzWnUW0lFoo12wb0YnzoOLjvfD6OL3JjIUJNOmJy RCsJ5IA/Iz33RhSVRmROu+TztwuThClw63g7+hoyewv7BemKyuU6FTVhjjW+XUWmS/FzknSi dAG+insr0746cTPpSkGl3KAXeWDGJzve7/SBBfyznWCMGaf8E2P1oOdIZRxHgWj0zNr1+ooF /PzgLPiCI4OMUttTlEKChgbUTQ+5o0P080JojqfXwbPAyumbaYcQNiH1/xYbJdOFSiBv9rpt TQTBLzDKXok86LkCDQRS4TZ/ARAAkgqudHsp+hd82UVkvgnlqZjzz2vyrYfz7bkPtXaGb9H4 Rfo7mQsEQavEBdWWjbga6eMnDqtu+FC+qeTGYebToxEyp2lKDSoAsvt8w82tIlP/EbmRbDVn 7bhjBlfRcFjVYw8uVDPptT0TV47vpoCVkTwcyb6OltJrvg/QzV9f07DJswuda1JH3/qvYu0p vjPnYvCq4NsqY2XSdAJ02HrdYPFtNyPEntu1n1KK+gJrstjtw7KsZ4ygXYrsm/oCBiVW/OgU g/XIlGErkrxe4vQvJyVwg6YH653YTX5hLLUEL1NS4TCo47RP+wi6y+TnuAL36UtK/uFyEuPy wwrDVcC4cIFhYSfsO0BumEI65yu7a8aHbGfq2lW251UcoU48Z27ZUUZd2Dr6O/n8poQHbaTd 6bJJSjzGGHZVbRP9UQ3lkmkmc0+XCHmj5WhwNNYjgbbmML7y0fsJT5RgvefAIFfHBg7fTY/i kBEimoUsTEQz+N4hbKwo1hULfVxDJStE4sbPhjbsPCrlXf6W9CxSyQ0qmZ2bXsLQYRj2xqd1 bpA+1o1j2N4/au1R/uSiUFjewJdT/LX1EklKDcQwpk06Af/N7VZtSfEJeRV04unbsKVXWZAk uAJyDDKN99ziC0Wz5kcPyVD1HNf8bgaqGDzrv3TfYjwqayRFcMf7xJaL9xXedMcAEQEAAYkC HwQYAQgACQUCUuE2fwIbDAAKCRBlw/kGpdefoG4XEACD1Qf/er8EA7g23HMxYWd3FXHThrVQ HgiGdk5Yh632vjOm9L4sd/GCEACVQKjsu98e8o3ysitFlznEns5EAAXEbITrgKWXDDUWGYxd pnjj2u+GkVdsOAGk0kxczX6s+VRBhpbBI2PWnOsRJgU2n10PZ3mZD4Xu9kU2IXYmuW+e5KCA vTArRUdCrAtIa1k01sPipPPw6dfxx2e5asy21YOytzxuWFfJTGnVxZZSCyLUO83sh6OZhJkk b9rxL9wPmpN/t2IPaEKoAc0FTQZS36wAMOXkBh24PQ9gaLJvfPKpNzGD8XWR5HHF0NLIJhgg 4ZlEXQ2fVp3XrtocHqhu4UZR4koCijgB8sB7Tb0GCpwK+C4UePdFLfhKyRdSXuvY3AHJd4CP 4JzW0Bzq/WXY3XMOzUTYApGQpnUpdOmuQSfpV9MQO+/jo7r6yPbxT7CwRS5dcQPzUiuHLK9i nvjREdh84qycnx0/6dDroYhp0DFv4udxuAvt1h4wGwTPRQZerSm4xaYegEFusyhbZrI0U9tJ B8WrhBLXDiYlyJT6zOV2yZFuW47VrLsjYnHwn27hmxTC/7tvG3euCklmkn9Sl9IAKFu29RSo d5bD8kMSCYsTqtTfT6W4A3qHGvIDta3ptLYpIAOD2sY3GYq2nf3Bbzx81wZK14JdDDHUX2Rs 6+ahAA==
  • Cc: Stefano Stabellini <sstabellini@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Wei Liu <wl@xxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxxxxx>, Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
  • Delivery-date: Wed, 29 Jan 2020 19:47:28 +0000
  • Ironport-sdr: gs6j9Fka4+h2kUISwvb4MLojkd87WG7K0Zerath0Fvms1YLa/wwV5NplKjkzkmYSw3Udwt9NLD DJHX//NMlRJiuZWpwYR0MrQdf55n4yIHqQGywa3RW6OYtrOlrH6wT3k8JlE820LuoefAO6w+FR /bB8w4qEez9hyvRE+QyiKQsp3whL1zZjEhcQqNOelXTAEAuOdQgk46/sXa6KCP0WW+Z+seOrsl kBqUP3ECXSIhd64J/9wI70n1ScCksBpKblqay6dgmTRE1RC0+OHQQQ5RabsQparmBZH3vZJ1t6 83w=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Openpgp: preference=signencrypt

On 29/01/2020 14:47, Paul Durrant wrote:
> diff --git a/docs/designs/non-cooperative-migration.md 
> b/docs/designs/non-cooperative-migration.md
> new file mode 100644
> index 0000000000..5db3939db5
> --- /dev/null
> +++ b/docs/designs/non-cooperative-migration.md
> @@ -0,0 +1,272 @@
> +# Non-Cooperative Migration of Guests on Xen
> +
> +## Background
> +
> +The normal model of migration in Xen is driven by the guest because it was
> +originally implemented for PV guests, where the guest must be aware it is
> +running under Xen and is hence expected to co-operate. 

For PV guests, is more than "expected to co-operate".

Migrating a PV guest involves rewriting every pagetable entry with a
different MFN, so even before you consider things like the PV protocols,
there is no way this could be done without the cooperation of the guest.

Sadly, this fact was depended upon for migration of the PV protocols,
and has migrated (excuse the pun) into the HVM world as well.

> This model dates from
> +an era when it was assumed that the host administrator had control of at 
> least
> +the privileged software running in the guest (i.e. the guest kernel) which 
> may
> +still be true in an enterprise deployment but is not generally true in a 
> cloud
> +environment.

I haven't seen it discussed elsewhere, but even enterprise environments
have problems.

Having host admin == guest admin doesn't mean that guest drivers aren't
buggy, or that the VM doesn't explode on migrate.

The simple fact is that involving the guest kernel adds unnecessary
moving parts which can (and do with a non-zero probability) go wrong.

>  The aim of this design is to provide a model which is purely host
> +driven, requiring no co-operation from the software running in the
> +guest, and is thus suitable for cloud scenarios.
> +
> +PV guests are out of scope for this project because, as is outlined above, 
> they
> +have a symbiotic relationship with the hypervisor and therefore a certain 
> level
> +of co-operation can be assumed.

If nothing else, I'd at least suggest s/can be assumed/is necessary/.

> +Because the service domain’s domid is used directly by the guest in setting
> +up grant entries and event channels, the backend drivers in the new host
> +environment must be provided by service domain with the same domid. Also,
> +because the guest can sample its own domid from the frontend area and use it 
> in
> +hypercalls (e.g. HVMOP_set_param) rather than DOMID_SELF, the guest domid 
> must
> +also be preserved to maintain the ABI.

Has this been true since forever?  The grant and event APIs took some
care to avoid the guest needing to know its own domid.

> +
> +Furthermore, it will necessary to modify backend drivers to re-establish
> +communication with frontend drivers without perturbing the content of the
> +backend area or requiring any changes to the values of the xenstore state 
> nodes.
> +
> +## Other Para-Virtual State
> +
> +### Shared Rings
> +
> +Because the console and store protocol shared pages are actually part of the
> +guest memory image (in an E820 reserved region just below 4G) 

Typically*.

Their exact location is entirely up to the domain builder, and tend not
to be there for PVH guests which aren't trying to fit the two frames
into a BAR.

> then the content
> +will get migrated as part of the guest memory image. Hence no additional code
> +is require to prevent any guest visible change in the content.

I do agree with this conclusion however.

> +### Shared Info
> +
> +There is already a record defined in *libxenctrl Domain Image Format* [3]
> +called `SHARED_INFO` which simply contains a complete copy of the domain’s
> +shared info page. It is not currently incuded in an HVM (type `0x0002`)
> +migration stream. It may be feasible to include it as an optional record
> +but it is not clear that the content of the shared info page ever needs
> +to be preserved for an HVM guest.
> +
> +For a PV guest the `arch_shared_info` sub-structure contains important
> +information about the guest’s P2M, but this information is not relevant for
> +an HVM guest where the P2M is not directly manipulated via the guest. The 
> other
> +state contained in the `shared_info` structure relates the domain wall-clock
> +(the state of which should already be transferred by the `RTC` HVM context
> +information which contained in the `HVM_CONTEXT` save record) and some event
> +channel state (particularly if using the *2l* protocol). Event channel state
> +will need to be fully transferred if we are not going to require the guest
> +co-operation to re-open the channels and so it should be possible to 
> re-build a
> +shared info page for an HVM guest from such other state.
> +
> +Note that the shared info page also contains an array of 
> `XEN_LEGACY_MAX_VCPUS`
> +(32) `vcpu_info` structures. A domain may nominate a different guest physical
> +address to use for the vcpu info. This is mandatory for if a domain wants to
> +use more than 32 vCPUs and optional for legacy vCPUs. This mapping is not
> +currently transferred in the migration state so this will either need to be
> +added into an existing save record, or an additional type of save record will
> +be needed.

For non-cooperative migration in the current ABI, a minimum is to know
where the shared info frame is mapped, so it can be re-mapped on behalf
of the guest on the destination side.

The rest of this section will be very good evidence in the "new guest
ABI" design.

> +### Grant table
> +
> +The grant table is essentially the para-virtual equivalent of an IOMMU.

TBH, I think "shared memory" is a much better analogy than an IOMMU. 
OTOH, perhaps that doesn't cope with the grant copy aspect quite as well
as I'd like.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.