[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] x86/ept: limit calls to memory_type_changed()


  • To: Roger Pau Monne <roger.pau@xxxxxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: Andrew Cooper <Andrew.Cooper3@xxxxxxxxxx>
  • Date: Mon, 26 Sep 2022 18:03:24 +0000
  • Accept-language: en-GB, en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=pXW/vj+YwTiqrobAJ7BbUg1GgTQPb1zjGrdTbyLZBDQ=; b=gUyBkKXPA57KigJspkunI7TIFA9DjRJYfXbLDFYyRjCPr4VIKq7wtQu68u3/iw6aqHpQLNRimcec2u0esqcGG/+ZayYu7Kd9JO1UdffIs9pFdLGoJnV5TxCvjCl1FDU9PsVMQQWecoYeS4Io3BF4pmhA6Se7PVHwjHbl2xBO367SQXKT9CLCP7hJYe6Z1ItsxBCtmCPU9G7za2lnt0unINWxrOXGR857K59QIyMkIYpNAQbRAfWJRZnERsLG6pB1RsviTIlkSIJGlR3VUGzwQtiia6oGYa+Mwpg9f1sGHLNsXX3QGTNhtK5sUKX1ZVWBpTf5qOrKjWYSKfYFKUXQiw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=S1w1nky5XQR+QdD2+zxwLp1tKI8vBLZOgZ52CVIHd9chzpTqWw5hjGYSl+qyx+MNZIgBvQ6Lp4M0Qe4TfClG2S6ZDr0NK9nHmW3u18HjmXOEq0SDwUrksNpZ2KZZuxeMV16fjbatYLyp9nJJ5VwYg773sGfKBrQFrzdgC0tMhzGbxXKZxaqHz89jq0Ootb1OSGSYKgm25KkVFvF96alvR3AaTb6hAXk87XhVbTzGMM6/z+lrKrbvk5WbP8uxx20W5Ds4tlSgGqWF9fpA5tpFo8+BIwO48uFrvjXY6N8WVN58MFqLSAUoJO0jOBWderbrRE9XXnkF3RCC+uz4/Kv1iQ==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: Jan Beulich <jbeulich@xxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <George.Dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>
  • Delivery-date: Mon, 26 Sep 2022 18:03:58 +0000
  • Ironport-data: A9a23:BlFtmazSCW25jvAw1SN6t+f2xyrEfRIJ4+MujC+fZmUNrF6WrkUCm 2YXXWvTM/jYMzD9c4p2bI+18RxVsZDTxt9hHQtrryAxQypGp/SeCIXCJC8cHc8zwu4v7q5Dx 59DAjUVBJlsFhcwnj/0bv676yEUOZigHtLUEPTDNj16WThqQSIgjQMLs+Mii8tjjMPR7zml4 LsemOWCfg7+s9JIGjhMsfjb+Ukx5K2aVA4w5TTSW9ga5DcyqFFNZH4vDfnZB2f1RIBSAtm7S 47rpF1u1jqEl/uFIorNfofTKiXmcJaLVeS9oiM+t5yZqgpDvkQPPpMTb5LwX6v1ZwKhxLidw P0V3XC5pJxA0qfkwIzxWDEAe81y0DEvFBYq7hFTvOTKp3AqfUcAzN10BkQGFosAx9pnCDBt8 OQSOS8xZxKc0rfeLLKTEoGAh+wFBeyyZsYzny8lyjvUS/E7XZrEXqPGo8dC2ys9jdxPGvCYY NcFbT1ob1LLZBgn1lU/Ucpi2rv3wCSuNWQG+Dp5poJui4TX5CV33KLgL5z+fduSSN8Otk2Zu njH7yLyBRRy2Nm3mWHVoyv13rCncSXTeaApJrKSzfNWuESX4lUYVho8eUOSvqzs4qK5c5cFQ 6AOwQI+oK53+EG1Q93VWxyjvGXCrhMaQ8BXEeAx9EeK0KW8ywSTC3UATzVBQMc7r8JwTjsvv neWm/v5CDopt6eaIVqf67OVoDWaKSUTa2gYakcsVhAZ6tPupIUyiBPnTdt5FqOxyNrvFlnNL yuiqSE/g/AfiJAN3qDip1Tf2Wvy9t7OUxI/4RjRUiS99ARlaYW5Zouur1/G8fJHK4XfRV6E1 JQZp/WjACk1JcnlvESwrC8lR9lFO97t3OXgvGNS
  • Ironport-hdrordr: A9a23:bQH34q6xqNOpCchuCgPXwWuBI+orL9Y04lQ7vn2ZFiY5TiXIra qTdaogviMc0AxhI03Jmbi7Scq9qeu1z+853WBjB8bZYOCAghrlEGgC1/qp/9SEIUHDH4FmpM BdmsRFaeEYSGIK9foSgzPIXOrIouP3lpxA7N22pxgCcegpUdAY0+4TMHf4LqQCfngjOXNPLu v42iMonVqdUEVSSv7+KmgOXuDFqdGOvonhewQ6Cxku7xTLpS+06ZbheiLonys2Yndq+/MP4G LFmwv26uGIqPeg0CLR0GfV8tB/hMbh8N1eH8aB4/JlagkEyzzYJ7iJaYfy+Qzdk9vfrGrCV+ O85CvICv4DqU85uFvF5ycFlTOQiQrGoEWStGNwyUGT3fARAghKRfapzLgpDCcwoSAbza5B+b MO0GSDu5VNCxTc2Cz7+tjTThlv0lG5uHw4jIco/jViuKYlGchsRLYkjTVoOYZFGDi/5JEsEe FoAs2Z7PFKcUmCZ3ScumV02tSjUnk6Ax/DGyE5y4eo+ikTmGo8w1oTxcQZkHtF/JUhS4Nc7+ CBNqhzjrlBQsIfcKo4DuYcRsm8DHDLXHv3QSqvCEWiELtCN2PGqpbx7rlw7Oa2eIYQxJ93g5 jFWEMwjx9HR6svM7z64HRmyGG/fIzmZ0Wd9ih33ekIhpTsALz2LCaEVFci18O9vvR3OLyoZ8 qO
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHYzp0jOSXqM10oEkmgUDyRmTrHkK3yB08A
  • Thread-topic: [PATCH] x86/ept: limit calls to memory_type_changed()

On 22/09/2022 17:05, Roger Pau Monne wrote:
> memory_type_changed() is currently only implemented for Intel EPT, and
> results in the invalidation of EMT attributes on all the entries in
> the EPT page tables.  Such invalidation causes EPT_MISCONFIG vmexits
> when the guest tries to access any gfns for the first time, which
> results in the recalculation of the EMT for the accessed page.  The
> vmexit and the recalculations are expensive, and as such should be
> avoided when possible.
>
> Remove the call to memory_type_changed() from
> XEN_DOMCTL_memory_mapping: there are no modifications of the
> iomem_caps ranges anymore that could alter the return of
> cache_flush_permitted() from that domctl.
>
> Calls to memory_type_changed() resulting from changes to the domain
> iomem_caps or ioport_caps ranges are only relevant for EMT
> calculations if the IOMMU is not enabled, and the call has resulted in
> a change to the return value of cache_flush_permitted().

This, and the perf problem Citrix have found, is caused by a more
fundamental bug which I identified during XSA-402.

Xen is written with assumption that cacheability other than WB is
dependent on having devices.  While this is perhaps true of current
configurations available, it is a layering violation, and will cease
being true in order to support encrypted RAM (and by extension,
encrypted VMs).

At the moment, we know the IOMMU-ness of a domain right from the outset,
but the cacheability permits are dynamic, based on the non-emptyness of
the domain's device list, ioport list, and various others.

All the memory_type_changed() calls here are to cover the fact that the
original design was buggy by not having the cacheability-ness part of
domain create in the first place.

The appropriate fix, but definitely 4.18 work at this point, is to have
a new CDF flag which permits the use of non-WB cacheability.

For testing purposes alone, turning it on on an otherwise "plain VM" is
useful (its how I actually debugged XSA-402, and the only sane way to go
about investigating the MTRR per disasters for VGPU VMs[1]), but for
regular usecases, it wants cross-checking with the IOMMU flag (and
encrypted VM flag in the future), and for all dynamic list checks to
turn into a simple 'd->config & CDF_full_cacheability'.

This way, we delete all calls to memory_type_changed() which are trying
to cover the various dynamic lists becoming empty/non-empty, and we
remove several ordering-of-hypercalls bugs where non-cacheable mappings
can't actually be created on a VM declared to have an IOMMU until a
device has actually been assigned to start with.

~Andrew

[1] MTRR handling is also buggy with reduced cacheability, causing some
areas of RAM to be used UC; notably the grant table.  This manifests as
PV device perf being worse than qemu-emulated device perf, only when a
GPU is added to a VM[2].  Instead of fixing this properly, it was hacked
around by forcing IPAT=1 for Xenheap pages, which only "fixed" the
problem on Intel (AMD has no equivalent mechanism), and needs reverting
and fixing properly (i.e. get the vMTRR layout working correctly) to
support VMs with encrypted RAM.

[2] There's a second bug with memory_type_changed() in that it causes
dreadful system performance during VM migration, which is something to
do with the interaction of restoring vMTRRs for a VM that has a device
but isn't running yet.  This still needs investigating, and I suspect
it's got a similar root cause.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.