[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Design session "grant v3"
On 22.09.22 20:43, Jan Beulich wrote: On 22.09.2022 15:42, Marek Marczykowski-Górecki wrote:Jürgen: today two grants formats, v1 supports only up to 16TB addresses v2 solves 16TB issue, introduces several more features^Wbugs v2 is 16 bytes per entry, v1 is 8 bytes per entry, v2 more complicated interface to the hypervisor virtio could use per-device grant table, currently virtio iommu device, slow interface v3 could be a grants tree (like iommu page tables), not flat array, separate trees for each grantee could support sharing large pages too easier to have more grants, continuous grant numbers etc two options to distingush trees (from HV PoV): - sharing guest ensure distinct grant ids between (multiple) trees - hv tells guest index under tree got registered v3 can be addition to v1/v2, old used for simpler cases where tree is an overkill hypervisor needs extra memory to keep refcounts - resource allocation discussionHow would refcounts be different from today? Perhaps I don't have a clear enough picture yet how you envision the tree-like structure(s) to be used. What was meant here are the additional resources the hypervisor will need for higher grant counts of a guest. With the tree approach the number of grant frames will be basically controlled by the guest and imposing a limit like today wouldn't work very well (especially with the current default of only 64 grant frames). hv could have TLB to speedup mapping issue with v1/v2 - granter cannot revoke pages from uncooperating backend tree could have special page for revoking grants (redirect to that page) special domids, local to the guest, toolstack restaring backend could request to keep the same virtual domid Marek: that requires stateless (or recoverable) protocol, reusing domid currently causes issues Andrei: how revoking could work Jürgen: there needs to be hypercall, replacing and invalidating mapping (scan page tables?), possibly adjusting IOMMU etc; may fail, problematic for PVWhy would this be problematic for PV only? In principle any number of mappings of a grant are possible also for PVH/HVM. So all of them would need finding and replacing. Because of the multiple mappings, the M2P is of no use here. It is an additional layer in the PV case: even when mapping a foreign page to only a single local PFN there could be multiple PTEs referencing it. I didn't think of the problem doing multiple mappings of the same grant. I will look into that. While thinking about this I started wondering in how far things are actually working correctly right now for backends in PVH/HVM: Any mapping of a grant is handed to p2m_add_page(), which insists on there being exactly one mapping of any particular MFN, unless the page is a foreign one. But how does that allow a domain to map its own grants, e.g. when block-attaching a device locally in Dom0? Afaict the grant-map would succeed, but the page would be unmapped from its original GFN.Yann: can backend refuse revoking? Jürgen: it shouldn't be this way, but revoke could be controlled by feature flag; revoke could pass scratch page per revoke call (more flexible control)A single scratch page comes with the risk of data corruption, as all I/O would be directed there. A sink page (for memory writes) would likely be okay, but device writes (memory reads) can't be done from a surrogate page. I don't see that problem. In case the grant is revoked due to a malicious/buggy backend, you can't trust the I/O data anyway. And in case the frontend is revoking the grant because the frontend is malicious, this isn't an issue either. Marek: what about unmap notification? Jürgen: revoke could even be async; ring page for unmap notifications Marek: downgrading mappings (rw -> ro) Jürgen: must be careful, to not allow crashing backend Jürgen: we should consider interface to mapping large pages ("map this area as a large page if backend shared it as large page")s/backend/frontend/ I guess? Yes. But large pages have another downside: The backend needs to know it is a large page, otherwise it might get confused. So while this sounds like a nice idea, it is cumbersome in practice. But maybe someone is coming up with a nice idea how to solve that. Edwin: what happens when shattering that large page? Jürgen: on live migration pages are rebuilt anyway, can reconstruct large pagesIf only we did already rebuild large pages ... Indeed. But OTOH shattering shouldn't be a problem at least for PVH/HVM guests, as we are speaking of gfns here. And PV guests don't have large pages anyway. Juergen Attachment:
OpenPGP_0xB0DE9DD628BF132F.asc Attachment:
OpenPGP_signature
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |