[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev


  • To: Jan Beulich <jbeulich@xxxxxxxx>
  • From: "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Date: Tue, 25 Jun 2024 07:38:24 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lIkTpXYcxtuRaBZxR6VLRxSWjDFGw8w+XJBfy5vgzXw=; b=UUe3/8GOSKQteGUTSTbImEIrXgkdH3i2lip/I20akz6RuTHQ2iEOZc56gXNkYPoqqrjLeipFw/OMuvPB/kYqFOy8o3HQroDM6XEK2wJqHeIOsrdq3RI4TaHeWC2HDQCP2rVMnOzVZuYgYLxqVoIYdW4nnx0wUqJw+nyGyC/UgEA82sH7sNFgfK9Bgwi505Mx/fmKtuZnsqtild3DZWNO16CRcNquqAXuAfaz4lQgD4vp97Iid8uYowkBYNAlc1HpZg2ilISjFhfpQuIbcRlrhZi9by9CW1QvjLpK55s3IM/Q0ZRIFKSSsSqJbrrD/LPXUAE4ZknRZ2+5e8a27X+fyQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mgmmSTzNuT31ry9QXwdFPYbewGzL5ZkZBW5w3ANwQ9fnfFUopvSJ+c1Cn48ft9ktm8kqKP7UgUsyjzJiqS9BHnMxh3KBx1+hkmJ2OMgzQ7k5hh96AjQLTwyJzJxS0aK70kJXBzPzU1uFCMjn16oIwc9WIdM688STTidfYEox0F+yxOG4ST3cUimX8PPVv2N/Ex74c+71nBMzvaYp/nyan3q4LbXKWA3SwMdo5B/vhJXWDxGnfGQ28Cr0Em1jxUT+pTnINRLnrVZdLPliBLRVeQvs//jHAdctLnUrOLNmtp6MQxX8IwoEHfy6S1rEcJDbulWCF7Hp7w9P1n4MFwWcww==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=amd.com;
  • Cc: Andrew Cooper <andrew.cooper3@xxxxxxxxxx>, Roger Pau Monné <roger.pau@xxxxxxxxxx>, Wei Liu <wl@xxxxxxx>, George Dunlap <george.dunlap@xxxxxxxxxx>, Julien Grall <julien@xxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>, Anthony PERARD <anthony@xxxxxxxxxxxxxx>, Juergen Gross <jgross@xxxxxxxx>, "Daniel P . Smith" <dpsmith@xxxxxxxxxxxxxxxxxxxx>, "Hildebrand, Stewart" <Stewart.Hildebrand@xxxxxxx>, "Huang, Ray" <Ray.Huang@xxxxxxx>, "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>, "Chen, Jiqian" <Jiqian.Chen@xxxxxxx>
  • Delivery-date: Tue, 25 Jun 2024 07:38:34 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AQHawJToSxHkdXzldkCDZE5kihITT7HMD9QAgAGTOgD//5t7AIADeOiA//+SewCAAKx5AP//hDUAAD2nHAAAhnaTAABBP3WA
  • Thread-topic: [XEN PATCH v10 4/5] tools: Add new function to get gsi from dev

On 2024/6/24 16:13, Jan Beulich wrote:
> On 21.06.2024 10:15, Chen, Jiqian wrote:
>> On 2024/6/20 18:37, Jan Beulich wrote:
>>> On 20.06.2024 12:23, Chen, Jiqian wrote:
>>>> On 2024/6/20 15:43, Jan Beulich wrote:
>>>>> On 20.06.2024 09:03, Chen, Jiqian wrote:
>>>>>> On 2024/6/18 17:13, Jan Beulich wrote:
>>>>>>> On 18.06.2024 10:10, Chen, Jiqian wrote:
>>>>>>>> On 2024/6/17 23:10, Jan Beulich wrote:
>>>>>>>>> On 17.06.2024 11:00, Jiqian Chen wrote:
>>>>>>>>>> --- a/tools/libs/light/libxl_pci.c
>>>>>>>>>> +++ b/tools/libs/light/libxl_pci.c
>>>>>>>>>> @@ -1406,6 +1406,12 @@ static bool pci_supp_legacy_irq(void)
>>>>>>>>>>  #endif
>>>>>>>>>>  }
>>>>>>>>>>  
>>>>>>>>>> +#define PCI_DEVID(bus, devfn)\
>>>>>>>>>> +            ((((uint16_t)(bus)) << 8) | ((devfn) & 0xff))
>>>>>>>>>> +
>>>>>>>>>> +#define PCI_SBDF(seg, bus, devfn) \
>>>>>>>>>> +            ((((uint32_t)(seg)) << 16) | (PCI_DEVID(bus, devfn)))
>>>>>>>>>
>>>>>>>>> I'm not a maintainer of this file; if I were, I'd ask that for 
>>>>>>>>> readability's
>>>>>>>>> sake all excess parentheses be dropped from these.
>>>>>>>> Isn't it a coding requirement to enclose each element in parentheses 
>>>>>>>> in the macro definition?
>>>>>>>> It seems other files also do this. See 
>>>>>>>> tools/libs/light/libxl_internal.h
>>>>>>>
>>>>>>> As said, I'm not a maintainer of this code. Yet while I'm aware that 
>>>>>>> libxl
>>>>>>> has its own CODING_STYLE, I can't spot anything towards excessive use of
>>>>>>> parentheses there.
>>>>>> So, which parentheses do you think are excessive use?
>>>>>
>>>>> #define PCI_DEVID(bus, devfn)\
>>>>>             (((uint16_t)(bus) << 8) | ((devfn) & 0xff))
>>>>>
>>>>> #define PCI_SBDF(seg, bus, devfn) \
>>>>>             (((uint32_t)(seg) << 16) | PCI_DEVID(bus, devfn))
>>>> Thanks, will change in next version.
>>>>
>>>>>
>>>>>>>>>> @@ -1486,6 +1496,18 @@ static void pci_add_dm_done(libxl__egc *egc,
>>>>>>>>>>          goto out_no_irq;
>>>>>>>>>>      }
>>>>>>>>>>      if ((fscanf(f, "%u", &irq) == 1) && irq) {
>>>>>>>>>> +#ifdef CONFIG_X86
>>>>>>>>>> +        sbdf = PCI_SBDF(pci->domain, pci->bus,
>>>>>>>>>> +                        (PCI_DEVFN(pci->dev, pci->func)));
>>>>>>>>>> +        gsi = xc_physdev_gsi_from_dev(ctx->xch, sbdf);
>>>>>>>>>> +        /*
>>>>>>>>>> +         * Old kernel version may not support this function,
>>>>>>>>>
>>>>>>>>> Just kernel?
>>>>>>>> Yes, xc_physdev_gsi_from_dev depends on the function implemented on 
>>>>>>>> linux kernel side.
>>>>>>>
>>>>>>> Okay, and when the kernel supports it but the underlying hypervisor 
>>>>>>> doesn't
>>>>>>> support what the kernel wants to use in order to fulfill the request, 
>>>>>>> all
>>>>>> I don't know what things you mentioned hypervisor doesn't support are,
>>>>>> because xc_physdev_gsi_from_dev is to get the gsi of pcidev through sbdf 
>>>>>> information,
>>>>>> that relationship can be got only in dom0 instead of Xen hypervisor.
>>>>>>
>>>>>>> is fine? (See also below for what may be needed in the hypervisor, even 
>>>>>>> if
>>>>>> You mean xc_physdev_map_pirq needs gsi?
>>>>>
>>>>> I'd put it slightly differently: You arrange for that function to now 
>>>>> take a
>>>>> GSI when the caller is PVH. But yes, the function, when used with
>>>>> MAP_PIRQ_TYPE_GSI, clearly expects a GSI as input (see also below).
>>>>>
>>>>>>> this IOCTL would be satisfied by the kernel without needing to interact 
>>>>>>> with
>>>>>>> the hypervisor.)
>>>>>>>
>>>>>>>>>> +         * so if fail, keep using irq; if success, use gsi
>>>>>>>>>> +         */
>>>>>>>>>> +        if (gsi > 0) {
>>>>>>>>>> +            irq = gsi;
>>>>>>>>>
>>>>>>>>> I'm still puzzled by this, when by now I think we've sufficiently 
>>>>>>>>> clarified
>>>>>>>>> that IRQs and GSIs use two distinct numbering spaces.
>>>>>>>>>
>>>>>>>>> Also, as previously indicated, you call this for PV Dom0 as well. 
>>>>>>>>> Aiui on
>>>>>>>>> the assumption that it'll fail. What if we decide to make the 
>>>>>>>>> functionality
>>>>>>>>> available there, too (if only for informational purposes, or for
>>>>>>>>> consistency)? Suddenly you're fallback logic wouldn't work anymore, 
>>>>>>>>> and
>>>>>>>>> you'd call ...
>>>>>>>>>
>>>>>>>>>> +        }
>>>>>>>>>> +#endif
>>>>>>>>>>          r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
>>>>>>>>>
>>>>>>>>> ... the function with a GSI when a pIRQ is meant. Imo, as suggested 
>>>>>>>>> before,
>>>>>>>>> you strictly want to avoid the call on PV Dom0.
>>>>>>>>>
>>>>>>>>> Also for PVH Dom0: I don't think I've seen changes to the hypercall
>>>>>>>>> handling, yet. How can that be when GSI and IRQ aren't the same, and 
>>>>>>>>> hence
>>>>>>>>> incoming GSI would need translating to IRQ somewhere? I can once 
>>>>>>>>> again only
>>>>>>>>> assume all your testing was done with IRQs whose numbers happened to 
>>>>>>>>> match
>>>>>>>>> their GSI numbers. (The difference, imo, would also need calling out 
>>>>>>>>> in the
>>>>>>>>> public header, where the respective interface struct(s) is/are 
>>>>>>>>> defined.)
>>>>>>>> I feel like you missed out on many of the previous discussions.
>>>>>>>> Without my changes, the original codes use irq (read from file 
>>>>>>>> /sys/bus/pci/devices/<sbdf>/irq) to do xc_physdev_map_pirq,
>>>>>>>> but xc_physdev_map_pirq require passing into gsi instead of irq, so we 
>>>>>>>> need to use gsi whether dom0 is PV or PVH, so for the original codes, 
>>>>>>>> they are wrong.
>>>>>>>> Just because by chance, the irq value in the Linux kernel of pv dom0 
>>>>>>>> is equal to the gsi value, so there was no problem with the original 
>>>>>>>> pv passthrough.
>>>>>>>> But not when using PVH, so passthrough failed.
>>>>>>>> With my changes, I got gsi through function xc_physdev_gsi_from_dev 
>>>>>>>> firstly, and to be compatible with old kernel versions(if the ioctl
>>>>>>>> IOCTL_PRIVCMD_GSI_FROM_DEV is not implemented), I still need to use 
>>>>>>>> the irq number, so I need to check the result
>>>>>>>> of gsi, if gsi > 0 means IOCTL_PRIVCMD_GSI_FROM_DEV is implemented I 
>>>>>>>> can use the right one gsi, otherwise keep using wrong one irq. 
>>>>>>>
>>>>>>> I understand all of this, to a (I think) sufficient degree at least. 
>>>>>>> Yet what
>>>>>>> you're effectively proposing (without explicitly saying so) is that e.g.
>>>>>>> struct physdev_map_pirq's pirq field suddenly may no longer hold a pIRQ
>>>>>>> number, but (when the caller is PVH) a GSI. This may be a necessary 
>>>>>>> adjustment
>>>>>>> to make (simply because the caller may have no way to express things in 
>>>>>>> pIRQ
>>>>>>> terms), but then suitable adjustments to the handling of 
>>>>>>> PHYSDEVOP_map_pirq
>>>>>>> would be necessary. In fact that field is presently marked as "IN or 
>>>>>>> OUT";
>>>>>>> when re-purposed to take a GSI on input, it may end up being necessary 
>>>>>>> to pass
>>>>>>> back the pIRQ (in the subject domain's numbering space). Or 
>>>>>>> alternatively it
>>>>>>> may be necessary to add yet another sub-function so the GSI can be 
>>>>>>> translated
>>>>>>> to the corresponding pIRQ for the domain that's going to use the IRQ, 
>>>>>>> for that
>>>>>>> then to be passed into PHYSDEVOP_map_pirq.
>>>>>> If I understood correctly, your concerns about this patch are two:
>>>>>> First, when dom0 is PV, I should not use xc_physdev_gsi_from_dev to get 
>>>>>> gsi to do xc_physdev_map_pirq, I should keep the original code that use 
>>>>>> irq.
>>>>>
>>>>> Yes.
>>>> OK, I can change to do this.
>>>> But I still have a concern:
>>>> Although without my changes, passthrough can work now when dom0 is PV.
>>>> And you also agree that: for xc_physdev_map_pirq, when use with 
>>>> MAP_PIRQ_TYPE_GSI, it expects a GSI as input.
>>>> Isn't it a wrong for PV dom0 to pass irq in? Why don't we use gsi as it 
>>>> should be used, since we have a function to get gsi now?
>>>
>>> Indeed this and ...
>>>
>>>>>> Second, when dom0 is PVH, I get the gsi, but I should not pass gsi as 
>>>>>> the fourth parameter of xc_physdev_map_pirq, I should create a new local 
>>>>>> parameter pirq=-1, and pass it in.
>>>>>
>>>>> I think so, yes. You also may need to record the output value, so you can 
>>>>> later
>>>>> use it for unmap. xc_physdev_map_pirq() may also need adjusting, as right 
>>>>> now
>>>>> it wouldn't put a negative incoming *pirq into the .pirq field. 
>>>> xc_physdev_map_pirq's logic is if we pass a negative in, it sets *pirq to 
>>>> index(gsi).
>>>> Is its logic right? If not how do we change it?
>>>
>>> ... this matches ...
>>>
>>>>> I actually wonder if that's even correct right now, i.e. independent of 
>>>>> your change.
>>>
>>> ... the remark here.
>> So, what should I do next step?
>> If assume the logic of function xc_physdev_map_pirq and PHYSDEVOP_map_pirq 
>> is right,
>> I think what I did now is right, both PV and PVH dom0 should pass gsi into 
>> xc_physdev_map_pirq.
> 
> It may sound unfriendly, and I'm willing to accept other maintainers
> disagreeing with me, but I think before we think of any extensions of
> what we have here, we want to address any issues with what we have.
> Since you're working in the area, and since getting the additions right
> requires properly understanding how things work (and where things may
> not work correctly right now), I view you as being in the best position
> to see about doing that (imo) prereq step.
Make sense.
OK, I think there may be two issues in the callstack of function 
xc_physdev_map_pirq
(xc_physdev_map_pirq-> physdev_map_pirq-> allocate_and_map_gsi_pirq-> 
allocate_pirq).
For function xc_physdev_map_pirq, I think it should have two use cases:
First, when caller pass a pirq that ">=0", they want to map gsi to a specific 
pirq. In this case, when it reaches allocate_pirq,
if the gsi already has a mapped pirq(current_pirq), and current_pirq is not 
equal with specific pirq, it fails due to EEXIST, if current_pirq
is equal with specific pirq or the gsi doesn't have a current_pirq, 
allocate_pirq return the specific pirq. It successes.
Second, when caller pass a pirq that "<0", they want to get a free pirq for 
gsi. In this case, when it reaches allocate_pirq,
if the gsi already has a mapped pirq(current_pirq), we should return 
current_pirq, otherwise, it allocate a free pirq through get_free_pirqs
and then return the free pirq, it successes.
Roadmap is below:

                                                                                
                        caller pass gsi and pirq into xc_physdev_map_pirq
                                                                                
                                        (xc_physdev_map_pirq)
                                                                                
                                        /               \
                                                                                
                                *pirq>=0                *pirq<0 [issue 1]
                                                                                
                        (map gsi to a specific pirq)    (map gsi to a free pirq)
                                 _                                              
                        /                                               \
                                |                                               
        gsi already has a mapped pirq                           gsi already has 
a mapped pirq
                                |                                               
                (current_pirq)                                          
(current_pirq)
                                |                                               
        /                       \                                       /       
        \
                                |                                               
        yes                     no                                      yes     
        no
allocate_pirq-------------------------|                                         
/                                       \                       /               
                \
                                |                                       
current_pirq == pirq                    return specific pirq    return 
current_pirq [issue2]            return free pirq
                                |                               /               
                \
                                |                               yes             
                no
                                |                       /                       
                        \
                                |_              return specific pirq            
                        fail -EEXIST

But for current code,
[issue 1]: when *pirq<0, in xc_physdev_map_pirq, it re-sets *pirq to index by 
default. " map.pirq = *pirq < 0 ? index : *pirq; ", so that it can't allocate a 
free pirq for gsi(above second case)
[issue 2]: when *pirq<0 and gsi already has mapped pirq and has no need to 
allocate a new free pirq, in allocate_pirq, it returns *pirq(<0) directly, it 
means allocate_pirq fail. Here should return the already mapped pirq for that 
gsi and mean successful.

> 
>> By the way, I found xc_physdev_map_pirq didn't support negative pirq is 
>> since your commit 934a5253d932b6f67fe40fc48975a2b0117e4cce, do you remember 
>> why?
> 
> Counter question: What is a negative pIRQ?
I mean when caller pass a pirq that "<0" in.

> 
> Jan

-- 
Best regards,
Jiqian Chen.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.