Xen project Mailing List

Re: Design session "grant v3"

Date: Thu, 22 Sep 2022 20:43:40 +0200

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0AC7OTum4+IlzTQHZ8yDfylRjVI1NYmALQgd1LJyReo=; b=WkAzDxwYzz/wEKvUCg3WM2qKfP13xv99ANDH6xzGXustTULB2vCSRaYFtLA1A506fupPEPrVUgnqXrbJZr4CLCgdZgoH6Lpg9rW3k36k/LcPQdC9P7tNp0n/pK1G7xSxi7ospoR4Uh/rU2RJVoY0axNHtezmXQt0cmsrwQcwfCmvepVUttt38kqItgSWbzsRa2LbpFqIcOGyji3FJHOg/FpNcN23bQmfOPULoyclCJ+vkjFdF4THQtZSb35DoqGrgsefJjFgRpqU/rGVIp4erp4P9s1CUG1TYEUUNuDmcyVoM63VMpE8x9fl83tBxocMnzLzZoEPy62TnXQsxCAGsA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gsDb2jZArLOM3KzVZNnl4lX2/pVmOJg+3WR1R2TVZsT1rZp8mpkDe0y2g2CYUlMLfNYau4jy/oqIfslWkOKo+XZgxIbU//ImSYnNE71H2ka/g5HDHwEdlWYX+gJX8WD01JR5a+BRtpU4OcM5TJSb2ElFfMkfTE0c7riuw7GB8qrE6ftAFtPl1I04+0PCZcLqjS1Q/rnZMq8AF1m6p7zyDZ9JALoYNiNtYCFvXq+mbLjmp65bNaU2fsMUpUumAPlKcUxqWLWAtQiLYsVIvLJK0T/+a6Crql2Ry0VzdjzSM8M82gohgLMEmOSdn9d5D+KkfJhcGevfl3rbjTUi/9TSdw==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=suse.com;

Cc: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxxx>

Delivery-date: Thu, 22 Sep 2022 18:43:53 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On 22.09.2022 15:42, Marek Marczykowski-Górecki wrote: > Jürgen: today two grants formats, v1 supports only up to 16TB addresses > v2 solves 16TB issue, introduces several more features^Wbugs > v2 is 16 bytes per entry, v1 is 8 bytes per entry, v2 more > complicated interface to the hypervisor > virtio could use per-device grant table, currently virtio iommu > device, slow interface > v3 could be a grants tree (like iommu page tables), not flat array, > separate trees for each grantee > could support sharing large pages too > easier to have more grants, continuous grant numbers etc > two options to distingush trees (from HV PoV): > - sharing guest ensure distinct grant ids between (multiple) trees > - hv tells guest index under tree got registered > v3 can be addition to v1/v2, old used for simpler cases where tree is > an overkill > hypervisor needs extra memory to keep refcounts - resource allocation > discussion How would refcounts be different from today? Perhaps I don't have a clear enough picture yet how you envision the tree-like structure(s) to be used. > hv could have TLB to speedup mapping > issue with v1/v2 - granter cannot revoke pages from uncooperating > backend > tree could have special page for revoking grants (redirect to that > page) > special domids, local to the guest, toolstack restaring backend could > request to keep the same virtual domid > Marek: that requires stateless (or recoverable) protocol, reusing domid > currently causes issues > Andrei: how revoking could work > Jürgen: there needs to be hypercall, replacing and invalidating mapping (scan > page tables?), possibly adjusting IOMMU etc; may fail, problematic for PV Why would this be problematic for PV only? In principle any number of mappings of a grant are possible also for PVH/HVM. So all of them would need finding and replacing. Because of the multiple mappings, the M2P is of no use here. While thinking about this I started wondering in how far things are actually working correctly right now for backends in PVH/HVM: Any mapping of a grant is handed to p2m_add_page(), which insists on there being exactly one mapping of any particular MFN, unless the page is a foreign one. But how does that allow a domain to map its own grants, e.g. when block-attaching a device locally in Dom0? Afaict the grant-map would succeed, but the page would be unmapped from its original GFN. > Yann: can backend refuse revoking? > Jürgen: it shouldn't be this way, but revoke could be controlled by feature > flag; revoke could pass scratch page per revoke call (more flexible control) A single scratch page comes with the risk of data corruption, as all I/O would be directed there. A sink page (for memory writes) would likely be okay, but device writes (memory reads) can't be done from a surrogate page. > Marek: what about unmap notification? > Jürgen: revoke could even be async; ring page for unmap notifications > > Marek: downgrading mappings (rw -> ro) > Jürgen: must be careful, to not allow crashing backend > > Jürgen: we should consider interface to mapping large pages ("map this area > as a large page if backend shared it as large page") s/backend/frontend/ I guess? > Edwin: what happens when shattering that large page? > Jürgen: on live migration pages are rebuilt anyway, can reconstruct large > pages If only we did already rebuild large pages ... Jan

References:

Design session "grant v3"
- From: Marek Marczykowski-Górecki

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.