[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4] x86/vmx: save guest non-register state in hvm_hw_cpu
- To: Tamas K Lengyel <tamas.lengyel@xxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
- From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
- Date: Tue, 22 Mar 2022 15:10:42 +0000
- Accept-language: en-GB, en-US
- Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
- Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=rJ1Xk2Smqnac0gsdG/9UAo8/R8+S4fhGVD96K3UI1Ig=; b=TKJXOnSZXRvNO3tyI9Icr4qm9fRK6qpcoNPO73i6hU3sOGrZa/6hi+sW5r/tu0V8YhkworHGXMuMQ6u4zaB/znL4wH6eiuYMNZRFIkwJvYX2RcR5dIWF8SosJ1UUBkmKveE+8RehI60WQH+U/IdgO4H/z68IDdhriItHwz6ZhcgR/2twd2ZO0qXT41YJRDbxCCWnUZ5rETZBbhtE84AXyyGBP89QacsurBVTpVAIhMG/3XcxxOrXkm7enfPaSkQcegkfj9KYv7+cNBr//3tT+kfW7V6PlRq17IV8TmyElR6FA69YW8joxPFeHguxDzX53B+9C10yxdMw+kafrbYWeA==
- Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BdGEGu6tk6FB1jHKd0nejgHG+jQZqM1u7nzPJ+Ar9QLKI0dbcerJhlZe09+hToKFqMnnfbpjGZ0RL86cjv8X9qpHZFc40LDr+SOejAkmo48tQ8CjDNfkCceB4+9eLY784u7+oGpMJgL6Vy3u0J4GFyVs9Dg8/+sBApgEu1Wa3pHV9rj/28oi6yZnKmXXOszRaCfiRRJeD+/jMkFu/XV0RUrhEtJZy0KD/LofeAoKlhx9kSE7w1MVMfwOFvZUdbt/QCQmZyBW2+Szsr+B9fE/MZE+V90UdjP+0ovnjiFJj2OxX5A6nLcVxVy/EatHf+jLuywDWd3Cgse2DAg9BRJICA==
- Authentication-results: esa4.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com
- Cc: Jan Beulich <jbeulich@xxxxxxxx>, Roger Pau Monne <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, Jun Nakajima <jun.nakajima@xxxxxxxxx>, Kevin Tian <kevin.tian@xxxxxxxxx>, David Vrabel <dvrabel@xxxxxxxxxx>
- Delivery-date: Tue, 22 Mar 2022 15:10:55 +0000
- Ironport-data: A9a23:esOCOaDJAGSajxVW/97jw5YqxClBgxIJ4kV8jS/XYbTApGgn1jwGm 2AaUGrXOKuDZDGge9x0bt/i8RwDu5LQzdVqQQY4rX1jcSlH+JHPbTi7wuYcHM8wwunrFh8PA xA2M4GYRCwMZiaA4E/raNANlFEkvU2ybuOU5NXsZ2YgHWeIdA970Ug5w7Vj2NYy6TSEK1jlV e3a8pW31GCNg1aYAkpMg05UgEoy1BhakGpwUm0WPZinjneH/5UmJMt3yZWKB2n5WuFp8tuSH I4v+l0bElTxpH/BAvv9+lryn9ZjrrT6ZWBigVIOM0Sub4QrSoXfHc/XOdJFAXq7hQllkPhx1 PAdm6OKEjwJL43Dx90UQxRlTgZxaPguFL/veRBTsOSWxkzCNXDt3+9vHAc9OohwFuRfWD8Us 6ZCcXZUM07F17neLLGTE4GAguwKKsXxMZxZkXZn1TzDVt4tQIzZQrWM7thdtNs1rp4fR6qCP ZtHAdZpRCvMURlOFnccMYNkteaHnnbzVxYJkE3A8MLb5ECMlVcsgdABKuH9eMGMA8NcnU+ap 2fP12X/HhwecteYzFKt8X+yh+mJgSLyXqoTEqG18rhhh1j77mANEhQcWF+TqvC/lke0HdRSN yQ85S4GvaU0skuxQbHVQxS9qWXCuhMaVMtdF8U77h2Azuzf5APxLngJSHtNZcIrsOcyRCc2z RmZktXxHzttvbaJD3WH+d+8rzm/JCwUJm8qfjIfQE0O5NyLnW0opkuRFJA5Svfz14CrX2Grq 9yXkMQgr7UPqJ4l6J/gxnDWoz3zn5HZQzNowQqCCwpJ8ThFTIKiYoWp733S4vBBMJuVQzG9g ZQUpySNxLtQVM/QzURhVM1IRej0vKjdbFUwlHY1R/EcGyKRF2lPlGy6yBV3Pw9XP8kNYlcFi 2eD6FoKtPe/0JZHBJKbgr5d6ex3lMAM9vy/D5g4i+aihLArL2drGwk0OSatM5jFyhRErE3GE c7znTyQJXgbE7976zG9Wv0Q17Qmrghnmz+MFcCkkkv2i+TDDJJwdVvjGAHSBgzexPnZyDg5D v4Fb5fao/mheLOWjtbrHX47cglRcClT6WHeoM1LbO+TSjeK60l6Y8I9NYgJItQ/94wMz7+g1 ijkBidwlQqu7VWaeF7iQi0yN9vSsWNX8CtT0doEZg3zhRDOoO+Hsc8iSnfAVeJ+pbI5kqIsE aVtlgfpKq0ndwkrMg81NPHVhIdjaA6qlUSJOS+kayI4ZJluW0rC/dqMQ+cl3HBm4vaf3Sfmn 4Cd6w==
- Ironport-hdrordr: A9a23:o7DTTqAsop+J8Q7lHegIsceALOsnbusQ8zAXPh9KJiC9I/b1qy nxppkmPEfP+UsssHFJo6HkBEEZKUmsu6KdkrNhQYtKOzOW+VdATbsSorcKpgePJ8SQzJ8l6U 4NSdkcNDS0NykBsS+Y2nj5Lz9D+qj+zEnAv463pB0NLT2CKZsQlDuRYjzrSHGeLzM2YabRYa DsgPav0ADQHkj/AP7LZEUtbqzmnZnmhZjmaRkJC1oM8w+Vlw6l77b8Dlyxwgoeeykn+8ZgzU H11yjCoomzufCyzRHRk0XJ6Y5NpdfnwtxfQOSRl8kuLCn2gArAXvUiZ1TChkFxnAic0idsrD D+mWZnAy210QKJQoiBm2qo5+An6kd315at8y7CvZKpm72HeNtzMbs+uWseSGqF16NohqAN7E oAtVjpxqZ/HFfOmj/w6MPPUAwvnk2ooWA6mepWlHBHV5ACAYUh57D3U3klZKvoMRiKoLzPKt MeR/00JcwmBm+yfjTcpC1i0dasVnM8ElOPRVUDoNWc13xTkGpix0UVycQDljNYnahNB6Vs9q DBKOBlhbtORsgZYeZ0A/oAW9K+DijITQjXOGyfLFz7HOUMOm7LqZTw/LIpjdvaNaAg3d83gt DMQVlYvWk9dwbnDtCPxoRC9lTXTGC0TV3Wu4hjDlhCy8vBrZbQQF++oQoV4ridSt0kc7jmZ8 o=
- List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
- Thread-index: AQHYPWhtORpQ2jUrTESMnz8fIDv8cqzLgyyA
- Thread-topic: [PATCH v4] x86/vmx: save guest non-register state in hvm_hw_cpu
On 21/03/2022 21:12, Tamas K Lengyel wrote:
> During VM forking and resetting a failed vmentry has been observed due
> to the guest non-register state going out-of-sync with the guest register
> state. For example, a VM fork reset right after a STI instruction can trigger
> the failed entry. This is due to the guest non-register state not being saved
> from the parent VM, thus the reset operation only copies the register state.
>
> Fix this by including the guest non-register state in hvm_hw_cpu so that when
> its copied from the parent VM the vCPU state remains in sync.
>
> SVM is not currently wired-in as VM forking is VMX only and saving
> non-register
> state during normal save/restore/migration operation hasn't been needed. If
> deemed necessary in the future it can be wired in by adding a svm-substructure
> to hvm_hw_cpu.
I disagree here. This bug isn't really to do with fuzzing; it's to do
with our APIs being able to get/set vCPU state correctly, and fuzzing is
the example which demonstrated the breakage.
This will also cause very subtle bugs for the guest-transparent
migration work, and the live update work, both of which want to shift
vcpu state behind a VM which hasn't actively entered Xen via hypercall.
(Quick tangent. Seriously, can the SDM be fixed so it actually
enumerates the Activity States.)
Xen doesn't currently support any situation where Intel's idea of
Activity State is anything other than 0. This in turn is buggy, because
we don't encode VPF_blocked anywhere. An activity state of hlt might
not be best place to encode this, because notably absent from Intel's
activity state is mwait. We'll also terminate things like schedop_poll
early.
Next, AMD does have various fields from interruptibility. If you want
me to write the patch then fine, but they should not be excluded from a
patch like this. AMD do not have separate bits for STI and MovSS, due
to microarchitectural differences, but the single INTERRUPT_SHADOW bit
does need managing, as does the blocked-by-NMI bit which is emulated on
SVM and older Intel CPUs.
Minor tangent, blocked-by-SMI needs cross-linking with vm-entry
controls, so I'm not sure it is something we ought to expose.
Next, it turns out that MSR_DEBUGCTL isn't included anywhere (not even
the MSR list). It is important, because it forms part of the VMEntry
cross-check with PENDING_DBG and TF.
Next, we also don't preserve PDPTRs. This is another area where Intel
and AMD CPUs behave differently, but under Intel's architecture, the
guest kernel can clobber the 32bit PAE block of pointers and everything
will function correctly until the next mov into cr3. There are
definitely bugs in Xen's emulated pagewalk in this area.
So there are a lot of errors with hvm_hw_cpu and I bet I haven't found
them all.
As discussed at multiple previous XenSummits, the current load/save mess
needs moving out of Xen, and that would be the correct time to fix the
other errors, but it probably is too much for this case.
At a minimum, there shouldn't be a VMX specific union, because we are
talking about architecture-agnostic properties of the vCPU. It's fine
for the format to follow Intel's bit layout for the subset of bits we
tolerate saving/restoring, but this needs discussing in the header, and
needs rejecting on set. An extra check/reject is 0% overhead for
forking, so I don't find that a credible argument against doing it.
I'm not sure if we want to include the activity state at the moment
without a better idea of how to encode VPF_blocked, but I think DEBUGCTL
does need including. This in turn involves transmitting the LBR MSRs too.
~Andrew
|