Xen project Mailing List

Re: [PATCH v3 1/2] IOMMU/x86: disallow device assignment to PoD guests

From: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Date: Thu, 3 Feb 2022 10:04:07 +0100

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zODdNV7ArjOdL7lhGTdFRD4+JJR5lQe2VVceAhdOiIM=; b=OkjnSV8Fm9AAXiPjj0+TVUVYmq5yZ3OfpsorPtjZEWoaEAbUc64HaWTRPQeIrkW5thim+KuIkWQPpDdTvTPMD70LAX5X5yVNlvQwvsjTr/B+tcE1QkQ9CVjpz9CaBOUCaphvqMB8eO5I4+3oE+Nt92fJ1YsSpZq+i4TLyzcCQua5h0jUo36AqpPZcBexXkuuZBNV+RyNINzrga07mYXNjru82hHC2xlSzFwUtwLqt+qstwoMaI8kzV0PsOR3jG8qhoiS3qJUF/u2K2grYLhuS+iwrrXbULrscfD2tCxOddMq0YaHQpGptrKEOs++qbu0UYsOWDoNr+ezFspPoRGwyQ==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ByTOdmzmr3RJXoiYNGgYKnCy4TVXCmlIyd2IHmPHIhw8kZgh2myFA9PKixeKOzFiBq6efWwoaopDbEph1JsDfA4eEtpQhrFzYVaQp6pC9Cs06dHfUXhG50BfLfxe2UJfQrwquFqkXoqbBn2pVM3Ph9g+RqZ/2d3RLuJgk40qcmPWNOxuaadOu3ALyJWhaFubMgv8TbGzM7cUPRTovJQXI6V0klMqbh0inhyjKIDVfkTUoeaWawm2ReQfA9PWV7waoWwYnIr+4Iuk72qta92JeOpMG0nFWrLo+3+q3CYHF5JNr2C51jJb8ybGKkJZzkDtUTVNMbYJnVDLBXa0sTh0hg==

Authentication-results: esa3.hc3370-68.iphmx.com; dkim=pass (signature verified) header.i=@citrix.onmicrosoft.com

Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Andrew Cooper" <andrew.cooper3@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Paul Durrant <paul@xxxxxxx>, Tamas K Lengyel <tamas@xxxxxxxxxxxxx>, Petre Pircalabu <ppircalabu@xxxxxxxxxxxxxxx>, Alexandru Isaila <aisaila@xxxxxxxxxxxxxxx>

Delivery-date: Thu, 03 Feb 2022 09:04:39 +0000

Ironport-data: A9a23:BhW6WaBEWLvIARVW/5fkw5YqxClBgxIJ4kV8jS/XYbTApD12hmcEy TNJW2/VPq3fY2ugfNl0aYq+9xsAscfczNc1QQY4rX1jcSlH+JHPbTi7wuYcHM8wwunrFh8PA xA2M4GYRCwMo/u1Si6FatANl1ElvU2zbue6WL6s1hxZH1c+En970Uk7wobVv6Yz6TSHK1LV0 T/Ni5W31G+Ng1aY5UpNtspvADs21BjDkGtwUm4WPJinj3eH/5UhN7oNJLnZEpfNatI88thW5 Qr05OrREmvxp3/BAz4++1rxWhVirrX6ZWBihpfKMkQLb9crSiEai84G2PQghUh/2wjRx89I6 8p3vIHvQlY5JIvutcIATEwNe81+FfUuFL7vJHG+tYqYzlHccmuqyPJrZK00FdRGoKAtWzgIr KFGbmBWBvyAr7veLLaTUO5ji95lNMD2FIgepmth3XfSCvNOrZXrHf6RtIMDgWpYasZmOvT9Q OhHRj9UXRnebDpUMFFUVKMMg7L97pX4W2IB8w/EzUYt2EDPxQl4y5DxM97YfNObSMETlUGdz kre52XjCRgePdaC4TCE7n6hiOLJkS7hHokVEdWQ//9xiVyXz0QZCQEaWFW2p/W0kAi1XNc3A 1Qd5y4GvaU0skuxQbHVXRe1vXqFtR40QMdLHqsx7wTl4rrZ5UOVC3YJShZFacc6r4kmSDoyz FiLktj1Qzt1v9W9Vna15rqS6zSoNkAowXQqPHFeC1Ffup+6/d913kmnostf/LCd3uCtQS7J4 C+wpTVhjOQDl+gp+oOm4gWS696znaThQgkw7wTRe2uq6AJleYKoD7CVBUjnAeVod9jAEATY1 JQQs43Htb1VU8nR/MCYaLhVRNmUC+C53CowaLKFN70o7HyT9nGqZui8CxkudR4yYq7oldIEC XI/WD+9BrcOZBNGjoctOupd7vjGK4C6TrwJsdiPNrJzjmBZLlPvwc2XTRf4M5rRuEYti7ojH pyQbNyhC30XYYw+kmbtF7xAj+J6nX9lrY82eXwd5078uVZ5TCXNIYrpzXPUNrxphE96iFm9H ylj2zuilEwEDbyWjtj/+o8PN1EaRUXX9ritw/G7gtWre1I8cEl4Uqe56ep4J+RNwvoJ/s+Vo CDVchIImTLX2CycQS3XOy8LVV8adcslxZ7NFXZybQ/ANrlKSdvH0ZrzgLNsLOR+qbw+kaUpJ xTHEu3Zaslypv380211RbH2rZB4dQTtggSLPiG/ZyM4cYImTAvMkuIItCO2nMXXJibo58Y4v ZO60QbXHcgKSwh4VZ6EY/Oz1VKh+3ManbsqDUfPJ9BSfmTq8ZRrdHOt3qNmfZlUJEWR3Cae2 iaXHQwc+bvHrbgq/YSbnquDtYqoTbdzRxIIA2nB4L+qHiDG5W7/k5RYWeOFcGmFBmP58amvf 8tPyPT4PKFVlVpGqdMkQb1q0bg/953koLoDllZoG3DCblKKDLJ8IybZgZkT5/MVnrIA4Fm4Q EOC/NVeKI6lAsK9HQ5DPhchY8SCyeoQxmvY48MqLRio/yRw5reGDxlfZkHelCxHIbJpG4o52 uN96tUO4gmyhxd2YNaLiidYqzaFInAaCvh1s5gbBMngixYxy0EEapvZU3ek7JaKYtRKE08rP j7L2/aS2+UCnhLPIygpCHzA/etBnpBf6hlFwWgLK0mNhteY1OQ82wdc8GhvQwlYpvmdPzmf5 oS/25VJGJiz

Ironport-hdrordr: A9a23:H3orz6spGEnIXY2C++NSwdgi7skC7IMji2hC6mlwRA09TyXGra 6TdaUguiMc1gx8ZJhBo7C90KnpewK7yXdQ2/htAV7EZnibhILIFvAZ0WKG+Vzd8kLFh4tgPM tbAsxD4ZjLfCdHZKXBkXmF+rQbsaG6GcmT7I+0pRodLnAJV0gj1XYDNu/yKDwGeOAsP+tBKH Pz3Lshm9L2Ek5nEPhTS0N1FdTrlpnurtbLcBQGDxko5E2nii6p0qfzF1y90g0FWz1C7L8++S yd+jaJqpmLgrWe8FvxxmXT55NZlJ/IzcZCPtWFjowwJi/3ggilSYx9U/mpvSwzosuo9FE2+e O87isIDoBW0Tf8b2u1qRzi103J1ysv0WbrzRuijX7qsaXCNXsHIvsEobgcXgrS6kImst05+r lMxXilu51eCg6FtDjh5vDTPisa13ackD4Hq6o+nnZfWYwRZPt6tooE5n5YF58GAWbT9J0nKu 9zF8vRjcwmP29yV0qp/VWH/ebcHEjaRny9Mw0/U42uondrdUlCvgslLJd1pAZFyHo/I6M0kd gsfJ4Y042mdfVmH56VMt1xNvdfOla9Mi4kD1jiVGgPNJt3c04l+KSHq4nc2omRCeg1Jd0J6d L8bG8=

Ironport-sdr: G0tugr9LC+vytqwYYIeJd3BzB1tQ1CFPDfPyixwhttoEwCQIQKyiU37FxW/80jNfciWUK70D5C Q8GLzkr5TD0xNqtJRH5p3/IEbHvsfBwQ7s8eGeV3gtF4OI5hBkD6y3GtrzSbjP2CesN0T6Q8ll s8U7cfcEyoFg/KjkUbd+6qKAmgzYoNxzHds9N9reQBZtVJ98w+TPE025wu5eCBCpq9Ti20sLKj z7ZOAPzHTuI511oXc95kRJ1sHOM+BsmITinr+h5J60UXpeDczqlJHlVKkrHSs8lM7eH5uKPYtj Zf6jEbn4rvjkN6TrAV3e67ie

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Thu, Feb 03, 2022 at 09:31:03AM +0100, Jan Beulich wrote: > On 02.02.2022 17:13, Roger Pau Monné wrote: > > On Tue, Jan 04, 2022 at 10:41:32AM +0100, Jan Beulich wrote: > >> @@ -359,7 +360,10 @@ p2m_pod_set_mem_target(struct domain *d, > >> > >> ASSERT( pod_target >= p2m->pod.count ); > >> > >> - ret = p2m_pod_set_cache_target(p2m, pod_target, 1/*preemptible*/); > >> + if ( has_arch_pdevs(d) || cache_flush_permitted(d) ) > > > > Is it possible to have cache flush allowed without any PCI device > > assigned? AFAICT the iomem/ioport_caps would only get setup when there > > are device passed through? > > One can assign MMIO or ports to a guest the raw way. That's not > secure, but functionally explicitly permitted. > > > TBH I would be fine if we just say that PoD cannot be used in > > conjunction with an IOMMU, and just check for is_iommu_enable(d) here. > > > > I understand it's technically possible for PoD to be used together > > with a domain that will later get a device passed through once PoD is > > no longer in use, but I doubt there's much value in supporting that > > use case, and I fear we might be introducing corner cases that could > > create issues in the future. Overall I think it would be safer to just > > disable PoD in conjunction with an IOMMU. > > I consider it wrong to put in place such a restriction, but I could > perhaps accept you and Andrew thinking this way if this was the only > aspect playing into here. However, this would then want an equivalent > tools side check, and while hunting down where to make the change as > done here, I wasn't able to figure out where that alternative > adjustment would need doing. Hence I would possibly(!) buy into this > only if someone else took care of doing so properly in the tool stack > (including the emission of a sensible error message). What about the (completely untested) chunk below: diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c index d7a40d7550..e585ef4c5c 100644 --- a/tools/libs/light/libxl_create.c +++ b/tools/libs/light/libxl_create.c @@ -1160,17 +1160,16 @@ int libxl__domain_config_setdefault(libxl__gc *gc, pod_enabled = (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV) && (d_config->b_info.target_memkb < d_config->b_info.max_memkb); - /* We cannot have PoD and PCI device assignment at the same time + /* We cannot have PoD and an active IOMMU at the same time * for HVM guest. It was reported that IOMMU cannot work with PoD * enabled because it needs to populated entire page table for - * guest. To stay on the safe side, we disable PCI device - * assignment when PoD is enabled. + * guest. */ if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV && - d_config->num_pcidevs && pod_enabled) { + d_config->c_info.passthrough != LIBXL_PASSTHROUGH_DISABLED && + pod_enabled) { ret = ERROR_INVAL; - LOGD(ERROR, domid, - "PCI device assignment for HVM guest failed due to PoD enabled"); + LOGD(ERROR, domid, "IOMMU cannot be enabled together with PoD"); goto error_out; } > Finally this still leaves out the "raw MMIO / ports" case mentioned > above. But the raw MMIO 'mode' doesn't care much about PoD, because if there's no PCI device assigned there's no IOMMU setup, and thus such raw MMIO regions (could?) belong to a device that's not constrained by the guest p2m anyway? > >> --- a/xen/common/vm_event.c > >> +++ b/xen/common/vm_event.c > >> @@ -639,7 +639,7 @@ int vm_event_domctl(struct domain *d, st > >> > >> rc = -EXDEV; > >> /* Disallow paging in a PoD guest */ > >> - if ( p2m_pod_entry_count(p2m_get_hostp2m(d)) ) > >> + if ( p2m_pod_active(d) ) > > > > Isn't it fine to just check for entry_count like you suggest in the > > change to libxl? > > I didn't think it would be, but I'm not entirely sure: If paging was > enabled before a guest actually starts, it wouldn't have any entries > but still be a PoD guest if it has a non-empty cache. The VM event > folks may be able to clarify this either way. But ... > > > This is what p2m_pod_entry_count actually does (rather than entry_count | > > count). > > ... you really mean "did" here, as I'm removing p2m_pod_entry_count() > in this patch. Of course locking could be added to it instead (or > p2m_pod_get_mem_target() be used in its place), but I'd prefer if we > could go with just the check which precisely matches what the comment > says (IOW otherwise I'd need to additionally know what exactly the > comment is to say). Could you briefly mention this in the commit message? Ie: VM event code is also adjusted to make sure PoD is not in use and cannot be used during the guest lifetime? Thanks, Roger.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.