[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v3 5/8] domain: map/unmap GADDR based shared guest areas


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
  • Date: Wed, 27 Sep 2023 16:53:38 +0200
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=citrix.com; dmarc=pass action=none header.from=citrix.com; dkim=pass header.d=citrix.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=0B+c3Frbz8Lioj9t2uzh+9DwcCQBmsRVoYsycP/KRf8=; b=eGGKAFEP1OzlYqDsrVpFFqlk91Jf+MFEwlT+FOmJGoVURENAZWT8sLxnkx5OXxVsxa6FGHF3PqXb7cU9aFZQFQO2mqtVIaNj4TwE9n8zB9Q2DwZ/3YgdesalTPl39gXqc89Qk9VYEM8OUQKWsVa9K/A/BiJkagKLO6C1ZUb9nEHuNB54IAlOu6Nn0uL6HIUaTnSi8c24xSNodY0618it2GoP9KqYTaSZTW9AOy+asTrRfkHBoQjbvhrFHtRqM5HQc7gRSIu8eh4nFcc3sEbc9F+ACpkVtQAGP/grZtfKSxc2LBQFjFLFCIPOns3zUvpcBG4NjYwuaI55nZ7q/NTsXg==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=GB6gygtiPnJ1OZ0PHv7eBOH9rNfxc4hgM9Ce2cmIaVf8kT0koNhpOO0ZL24Z3ojegDuoYHh7HuFtlswr7HRbQol2NgOjF41gq23PUPcUe/YSDeDkhNChBg28gRtkOTw0U708AfiNHT7CVFV4JvaXMGKTja/nWyFlPuQWXeZa6xBPpg/zQnCwglsSPdySDKRJFzV0PP9qxnCIL9mKVLDf0k8RkbUYolhNJ5ub+ozQ9HtDjwnR+C0qWOWGsqKIBumjeB8c94rp2DP1Bktgl0bc5vZTqZpdtSP0V19q0FFQBD8Q5jNgSoafQ8y9GiQWeimGhduT5CWSh55uoGUtDYdLaw==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=citrix.com;
  • Cc: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>
  • Delivery-date: Wed, 27 Sep 2023 14:53:55 +0000
  • Ironport-data: A9a23:LmJYxqyzbc3ipJi3hQt6t+f5xyrEfRIJ4+MujC+fZmUNrF6WrkUBy DYWCGuBPvvfNmf0c40jO97ioxkHvJ7Wm9M1TQJu/iAxQypGp/SeCIXCJC8cHc8wwu7rFxs7s ppEOrEsCOhuExcwcz/0auCJQUFUjPzOHvykTrecZkidfCc8IA85kxVvhuUltYBhhNm9Emult Mj75sbSIzdJ4RYtWo4vw/zF8EgHUMja4mtC5QRvP68T5jcyqlFOZH4hDfDpR5fHatE88t6SH 47r0Ly/92XFyBYhYvvNfmHTKxBirhb6ZGBiu1IOM0SQqkEqSh8ai87XAME0e0ZP4whlqvgqo Dl7WT5cfi9yVkHEsLx1vxC1iEiSN4UekFPMCSDXXcB+UyQq2pYjqhljJBheAGEWxgp4KW1N8 +4YJT0qUhOCltDs7aC6U7hHrdt2eaEHPKtH0p1h5RfwKK98BLzmHeDN79Ie2yosjMdTG/qYf 9AedTdkcBXHZVtIJ0sTD5U92uyvgxETcRUB8A7T+fVxvjmVlVIguFTuGIO9ltiiX8Jak1zev mvb12/4HgsbJJqUzj/tHneE37WUzHyhB9hNfFG+3t1YvWCTxn0oMkFIalmqsMCLp1ysQt0Kf iT4/QJr98De7neDTNPwQhm5q36spQMHVpxbFOhSwBGAzO/Y7hiUAkAATyVdc5o2uckuXzso2 1SV2dTzClRHsqCRSH+b3qeZq3W1Iyd9BXQZeSYOQA8B4t/iiII+lBTCSpBkCqHdpsLxMSH9x XaNtidWulkIpcsC1qH+8VWZhTup/8LNVlRsuFSRWX+55ARkYoLjf5av9VXQ8fdHKsCeU0WFu 38H3cOZ6YjiEK2wqcBEe81VdJnB2hpPGGe0bYJHd3X5ywmQxg==
  • Ironport-hdrordr: A9a23:cg1Ru63oToSVuOedvd1oAAqjBHYkLtp133Aq2lEZdPU0SKGlfq GV7ZEmPHrP4gr5N0tOpTntAse9qBDnhPxICOsqXYtKNTOO0AeVxelZhrcKqAeQeBEWmNQ96U 9hGZIOcuEZDzJB/LvHCN/TKadd/DGFmprY+ts31x1WPGVXgzkL1XYANu6ceHcGIzVuNN4CO7 e3wNFInDakcWR/VLXBOpFUN9KzweEijfjdEGc7OyI=
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

On Wed, May 03, 2023 at 05:57:09PM +0200, Jan Beulich wrote:
> The registration by virtual/linear address has downsides: At least on
> x86 the access is expensive for HVM/PVH domains. Furthermore for 64-bit
> PV domains the areas are inaccessible (and hence cannot be updated by
> Xen) when in guest-user mode, and for HVM guests they may be
> inaccessible when Meltdown mitigations are in place. (There are yet
> more issues.)
> 
> In preparation of the introduction of new vCPU operations allowing to
> register the respective areas (one of the two is x86-specific) by
> guest-physical address, flesh out the map/unmap functions.
> 
> Noteworthy differences from map_vcpu_info():
> - areas can be registered more than once (and de-registered),
> - remote vCPU-s are paused rather than checked for being down (which in
>   principle can change right after the check),
> - the domain lock is taken for a much smaller region.
> 
> Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
> ---
> RFC: By using global domain page mappings the demand on the underlying
>      VA range may increase significantly. I did consider to use per-
>      domain mappings instead, but they exist for x86 only. Of course we
>      could have arch_{,un}map_guest_area() aliasing global domain page
>      mapping functions on Arm and using per-domain mappings on x86. Yet
>      then again map_vcpu_info() doesn't (and can't) do so.

I guess it's fine as you propose now, we might see about using
per-domain mappings.

> RFC: In map_guest_area() I'm not checking the P2M type, instead - just
>      like map_vcpu_info() - solely relying on the type ref acquisition.
>      Checking for p2m_ram_rw alone would be wrong, as at least
>      p2m_ram_logdirty ought to also be okay to use here (and in similar
>      cases, e.g. in Argo's find_ring_mfn()). p2m_is_pageable() could be
>      used here (like altp2m_vcpu_enable_ve() does) as well as in
>      map_vcpu_info(), yet then again the P2M type is stale by the time
>      it is being looked at anyway without the P2M lock held.

check_get_page_from_gfn() already does some type checks on the page.

> ---
> v2: currd -> d, to cover mem-sharing's copy_guest_area(). Re-base over
>     change(s) earlier in the series. Use ~0 as "unmap" request indicator.
> 
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1576,7 +1576,82 @@ int map_guest_area(struct vcpu *v, paddr
>                     struct guest_area *area,
>                     void (*populate)(void *dst, struct vcpu *v))
>  {
> -    return -EOPNOTSUPP;
> +    struct domain *d = v->domain;
> +    void *map = NULL;
> +    struct page_info *pg = NULL;
> +    int rc = 0;

Should you check/assert that size != 0?

> +    if ( ~gaddr )

I guess I will find in future patches, but why this special handling
for ~0 address?

Might be worth to add a comment here, or in the patch description.
Otherwise I would expect that when passed a ~0 address the first check
for page boundary crossing to fail.

> +    {
> +        unsigned long gfn = PFN_DOWN(gaddr);
> +        unsigned int align;
> +        p2m_type_t p2mt;
> +
> +        if ( gfn != PFN_DOWN(gaddr + size - 1) )
> +            return -ENXIO;
> +
> +#ifdef CONFIG_COMPAT
> +        if ( has_32bit_shinfo(d) )
> +            align = alignof(compat_ulong_t);
> +        else
> +#endif
> +            align = alignof(xen_ulong_t);
> +        if ( gaddr & (align - 1) )

IS_ALIGNED() might be clearer.

> +            return -ENXIO;
> +
> +        rc = check_get_page_from_gfn(d, _gfn(gfn), false, &p2mt, &pg);
> +        if ( rc )
> +            return rc;
> +
> +        if ( !get_page_type(pg, PGT_writable_page) )
> +        {
> +            put_page(pg);
> +            return -EACCES;
> +        }
> +
> +        map = __map_domain_page_global(pg);
> +        if ( !map )
> +        {
> +            put_page_and_type(pg);
> +            return -ENOMEM;
> +        }
> +        map += PAGE_OFFSET(gaddr);
> +    }
> +
> +    if ( v != current )
> +    {
> +        if ( !spin_trylock(&d->hypercall_deadlock_mutex) )
> +        {
> +            rc = -ERESTART;
> +            goto unmap;
> +        }
> +
> +        vcpu_pause(v);
> +
> +        spin_unlock(&d->hypercall_deadlock_mutex);
> +    }
> +
> +    domain_lock(d);
> +
> +    if ( map )
> +        populate(map, v);

The call to map_guest_area() in copy_guest_area() does pass NULL as
the populate parameter, hence this will crash?

Should you either assert that populate != NULL or change the if
condition to be map && populate?

> +
> +    SWAP(area->pg, pg);
> +    SWAP(area->map, map);
> +
> +    domain_unlock(d);
> +
> +    if ( v != current )
> +        vcpu_unpause(v);
> +
> + unmap:
> +    if ( pg )
> +    {
> +        unmap_domain_page_global(map);
> +        put_page_and_type(pg);
> +    }
> +
> +    return rc;
>  }
>  
>  /*
> @@ -1587,9 +1662,24 @@ int map_guest_area(struct vcpu *v, paddr
>  void unmap_guest_area(struct vcpu *v, struct guest_area *area)
>  {
>      struct domain *d = v->domain;
> +    void *map;
> +    struct page_info *pg;
>  
>      if ( v != current )
>          ASSERT(atomic_read(&v->pause_count) | atomic_read(&d->pause_count));
> +
> +    domain_lock(d);
> +    map = area->map;
> +    area->map = NULL;
> +    pg = area->pg;
> +    area->pg = NULL;

FWIW you could also use SWAP() here by initializing both map and pg to
NULL (I know it uses one extra local variable).

Thanks, Roger.



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.