[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Date: Wed, 12 Jun 2024 10:12:18 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mPNspxN0qPxIyjp0LEj+8zHpDKYloUZF3qw2f1jDq8E=; b=MjzKg/lxjm5wp+6bdEC9BgrUcNuF0qI8FNqrZ+TUAr9YMict0LlLsk+gKZWv2F6Q78U69QqtYRSoB1oLmgxWcXteXr2gRbgDIym9c3jot9G4Tljad1vNUHhQ17/p9gowtOz6hhscZYMEGPVjYHBDi/rkoEy/o0fB+fsv6OVUT8P/mrADDqAr7sQmoaIzwxa9MLzGo0dR+s7nCQRyhRdWRpLy28a7I8GXaDF85e3L/FuF0eTXT54h94t+wES8TTiM7cl8514qC8bSS6tbHHZb/1eD+L/zKnLpZ/4RMLZyK43hMcJSov2imjJqJXFBKb5haPucmx7CsclDRumZmPYRzQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kHWKtqXsmou//7VUbjbnOASyecYH8nZ1p7ixa4ESHzS7k2gowC7pzRJPnJ91SLXYJukEcmwEHpx5LuBeu8AZWE6hrQP3gDjzlARbocxXP+27i3/IGTpm33BTOzeJGUPg1Xe9WwYcoColfLmBxA+QPrVKpKufNOqe/B+8qzKv/ApcImQtyo+MkttQPxDCSk+nye3/iIm27UnrGekmfLJb1EI5iAqtzS5JqGvsCL7z4lv5eApz597qM9eum1BpIKVwKS+U0REsfAo86yd7ItegwYLxuKlNPtjEXAR9eOsYETX37u3rmt9u0Za1kmaCML+I1ZjQ7dhqzjxP6iSvMpYbhg==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Anthony PERARD <anthony@xxxxxxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, "Hildebrand, Stewart" <Stewart.Hildebrand@xxxxxxx>, "Huang, Ray" <Ray.Huang@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Daniel P . Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Delivery-date: Wed, 12 Jun 2024 10:12:29 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHauLJn3FT0TCW9KkmRq1iRWqwBt7HCqQ8AgAG9m4A=
  • Thread-topic: [RFC XEN PATCH v9 5/5] domctl: Add XEN_DOMCTL_gsi_permission to grant gsi

Hi Jan,

On 2024/6/11 22:39, Jan Beulich wrote:
> On 07.06.2024 10:11, Jiqian Chen wrote:
>> Some type of domain don't have PIRQ, like PVH, it do not do
>> PHYSDEVOP_map_pirq for each gsi. When passthrough a device
>> to guest on PVH dom0, callstack
>> pci_add_dm_done->XEN_DOMCTL_irq_permission will failed at
>> domain_pirq_to_irq, because PVH has no mapping of gsi, pirq
>> and irq on Xen side.
> 
> All of this is, to me at least, in pretty sharp contradiction to what
> patch 2 says and does. IOW: Do we want the concept of pIRQ in PVH, or
> do we want to keep that to PV?
It's not contradictory.
What I did is not to add the concept of PIRQs for PVH.
All previous passthrough code was implemented on the basis of pv dom0 + hvm 
domU.
For pv dom0, it has PIRQs. For hvm domU, it has PIRQs too.
So the codes are not suitable for PVH dom0 + hvm domU, because PVH dom0 has no 
PIRQs.
Patch 2 do PHYSDEVOP_map_pirq for hvm domU even when dom0 is PVH instead of PV. 
It didn't add PIRQs for PVH.
This patch is to grant irq( that get from gsi ) to hvm domU, why 
XEN_DOMCTL_irq_permission is not useful is because PVH has no PIRQs, we can't 
get irq through pirq like PV does.

> 
>> What's more, current hypercall XEN_DOMCTL_irq_permission require
>> passing in pirq and grant the access of irq, it is not suitable
>> for dom0 that has no PIRQ flag, because passthrough a device
>> needs gsi and grant the corresponding irq to guest. So, add a
>> new hypercall to grant gsi permission when dom0 is not PV or dom0
>> has not PIRQ flag.
>>
>> Signed-off-by: Huang Rui <ray.huang@xxxxxxx>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@xxxxxxx>
> 
> A problem throughout the series as it seems: Who's the author of these
> patches? There's no From: saying it's not you, but your S-o-b also
> isn't first.
So I need to change to:
Signed-off-by: Jiqian Chen <Jiqian.Chen@xxxxxxx> means I am the author.
Signed-off-by: Huang Rui <ray.huang@xxxxxxx> means Rui sent them to upstream 
firstly.
Signed-off-by: Jiqian Chen <Jiqian.Chen@xxxxxxx> means I take continue to 
upstream.

> 
>> --- a/tools/libs/light/libxl_pci.c
>> +++ b/tools/libs/light/libxl_pci.c
>> @@ -1412,6 +1412,37 @@ static bool pci_supp_legacy_irq(void)
>>  #define PCI_SBDF(seg, bus, devfn) \
>>              ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>  
>> +static int pci_device_set_gsi(libxl_ctx *ctx,
>> +                              libxl_domid domid,
>> +                              libxl_device_pci *pci,
>> +                              bool map,
>> +                              int *gsi_back)
>> +{
>> +    int r, gsi, pirq;
>> +    uint32_t sbdf;
>> +
>> +    sbdf = PCI_SBDF(pci->domain, pci->bus, (PCI_DEVFN(pci->dev, 
>> pci->func)));
>> +    r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>> +    *gsi_back = r;
>> +    if (r < 0)
>> +        return r;
>> +
>> +    gsi = r;
>> +    pirq = r;
> 
> r is a GSI as per above; why would you store such in a variable named pirq?
> And how can ...
> 
>> +    if (map)
>> +        r = xc_physdev_map_pirq(ctx->xch, domid, gsi, &pirq);
>> +    else
>> +        r = xc_physdev_unmap_pirq(ctx->xch, domid, pirq);
> 
> ... that value be the correct one to pass into here? In fact, the pIRQ number
> you obtain above in the "map" case isn't handed to the caller, i.e. it is
> effectively lost. Yet that's what would need passing into such an unmap call.
Yes r is GSI and I know pirq will be replaced by xc_physdev_map_pirq.
What I do "pirq = r" is for xc_physdev_unmap_pirq, unmap need passing in pirq,
and the number of pirq is always equal to gsi.

> 
>> +    if (r)
>> +        return r;
>> +
>> +    r = xc_domain_gsi_permission(ctx->xch, domid, gsi, map);
> 
> Looking at the hypervisor side, this will fail for PV Dom0. In which case imo
> you better would avoid making the call in the first place.
Yes, for PV dom0, the errno is EOPNOTSUPP, then it will do below 
xc_domain_irq_permission.

> 
>> +    if (r && errno == EOPNOTSUPP)
> 
> Before here you don't really need the pIRQ number; if all it really is needed
> for is ...
> 
>> +        r = xc_domain_irq_permission(ctx->xch, domid, pirq, map);
> 
> ... this, then it probably also should only be obtained when it's needed. Yet
> overall the intentions here aren't quite clear to me.
Adding the function pci_device_set_gsi is for PVH dom0, while also ensuring 
compatibility with PV dom0.
When PVH dom0, it does xc_physdev_map_pirq and xc_domain_gsi_permission(new 
hypercall for PVH dom0)
When PV dom0, it keeps the same actions as before codes, it does 
xc_physdev_map_pirq and xc_domain_irq_permission.

> 
>> @@ -1485,6 +1516,19 @@ static void pci_add_dm_done(libxl__egc *egc,
>>      fclose(f);
>>      if (!pci_supp_legacy_irq())
>>          goto out_no_irq;
>> +
>> +    r = pci_device_set_gsi(ctx, domid, pci, 1, &gsi);
>> +    if (gsi >= 0) {
>> +        if (r < 0) {
> 
> This unusual way of error checking likely wants a comment.
Will add in next version.

> 
>> +            rc = ERROR_FAIL;
>> +            LOGED(ERROR, domainid,
>> +                  "pci_device_set_gsi gsi=%d (error=%d)", gsi, errno);
>> +            goto out;
>> +        } else {
>> +            goto process_permissive;
>> +        }
> 
> Btw, no need for "else" when the earlier if ends in "goto" or alike.
OK, I will change in next version.

> 
>> @@ -1493,13 +1537,6 @@ static void pci_add_dm_done(libxl__egc *egc,
>>          goto out_no_irq;
>>      }
>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> -        sbdf = PCI_SBDF(pci->domain, pci->bus,
>> -                        (PCI_DEVFN(pci->dev, pci->func)));
>> -        r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>> -        /* if fail, keep using irq; if success, r is gsi, use gsi */
>> -        if (r != -1) {
>> -            irq = r;
>> -        }
> 
> If I'm not mistaken, this and ...
> 
>> @@ -2255,13 +2302,6 @@ skip_bar:
>>      }
>>  
>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>> -        sbdf = PCI_SBDF(pci->domain, pci->bus,
>> -                        (PCI_DEVFN(pci->dev, pci->func)));
>> -        r = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>> -        /* if fail, keep using irq; if success, r is gsi, use gsi */
>> -        if (r != -1) {
>> -            irq = r;
>> -        }
> 
> ... this is code added by the immediately preceding patch. It's pretty odd
> for that to be deleted here again right away. Can the interaction of the
> two patches perhaps be re-arranged to avoid this anomaly?
OK, I will do in next version.

> 
>> @@ -237,6 +238,43 @@ long arch_do_domctl(
>>          break;
>>      }
>>  
>> +    case XEN_DOMCTL_gsi_permission:
>> +    {
>> +        unsigned int gsi = domctl->u.gsi_permission.gsi;
>> +        int irq = gsi_2_irq(gsi);
> 
> I'm not sure it is a good idea to issue this call ahead of the basic error
> checks below.
I will move it below the checks.

> 
>> +        bool allow = domctl->u.gsi_permission.allow_access;
> 
> This allows any non-zero values to mean "true". I think you want to bail
> on values larger than 1, much like you also want to check that the padding
> fields are all zero.
Will change in next version.

> 
>> +        /*
>> +         * If current domain is PV or it has PIRQ flag, it has a mapping
>> +         * of gsi, pirq and irq, so it should use XEN_DOMCTL_irq_permission
>> +         * to grant irq permission.
>> +         */
>> +        if ( is_pv_domain(current->domain) || has_pirq(current->domain) )
> 
> Please use currd here (and also again below).
Will change in next version.
> 
>> +        {
>> +            ret = -EOPNOTSUPP;
>> +            break;
>> +        }
>> +
>> +        if ( gsi >= nr_irqs_gsi || irq < 0 )
>> +        {
>> +            ret = -EINVAL;
>> +            break;
>> +        }
>> +
>> +        if ( !irq_access_permitted(current->domain, irq) ||
>> +             xsm_irq_permission(XSM_HOOK, d, irq, allow) )
> 
> Daniel, is it okay to issue the XSM check using the translated value, not
> the one that was originally passed into the hypercall?
> 
>> --- a/xen/arch/x86/io_apic.c
>> +++ b/xen/arch/x86/io_apic.c
>> @@ -955,6 +955,27 @@ static int pin_2_irq(int idx, int apic, int pin)
>>      return irq;
>>  }
>>  
>> +int gsi_2_irq(int gsi)
>> +{
>> +    int entry, ioapic, pin;
>> +
>> +    ioapic = mp_find_ioapic(gsi);
>> +    if ( ioapic < 0 )
>> +        return -1;
> 
> Can this be a proper errno value (likely -EINVAL), please?
Will change in next version.
> 
>> +    pin = gsi - io_apic_gsi_base(ioapic);
> 
> Hmm, instead of the below: Once you have an (ioapic,pin) tuple, can't
> you simply call apic_pin_2_gsi_irq()?
Will change in next version.
> 
>> +    entry = find_irq_entry(ioapic, pin, mp_INT);
>> +    /*
>> +     * If there is no override mapping for irq and gsi in mp_irqs,
>> +     * then the default identity mapping applies.
>> +     */
>> +    if ( entry < 0 )
>> +        return gsi;
>> +
>> +    return pin_2_irq(entry, ioapic, pin);
> 
> Under certain conditions this may return 0. Yet you surely don't want
> to pass IRQ0 back as a result; you want to hand an error back instead.
Will add in next version.
> 
>> --- a/xen/include/public/domctl.h
>> +++ b/xen/include/public/domctl.h
>> @@ -465,6 +465,14 @@ struct xen_domctl_irq_permission {
>>  };
>>  
>>  
>> +/* XEN_DOMCTL_gsi_permission */
>> +struct xen_domctl_gsi_permission {
>> +    uint32_t gsi;
>> +    uint8_t allow_access;    /* flag to specify enable/disable of x86 gsi 
>> access */
>> +    uint8_t pad[3];
>> +};
>> +
>> +
> 
> Nit: No (new) double blank lines please. In fact (didn't I say this before
> already?) you could insert between the two above, such that the existing issue
> also disappears.
I remember I changed it here, maybe I made a mistake somewhere, and I will 
modify it in the next version.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.