[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [RFC PATCH v1 2/7] iommu/arm: ipmmu-vmsa: Add Xen changes for main driver



Hi, Julien.

Sorry for the late response.

On Thu, Aug 10, 2017 at 6:13 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
> Hi,
>
> On 10/08/17 15:27, Oleksandr Tyshchenko wrote:
>>
>> On Tue, Aug 8, 2017 at 2:34 PM, Julien Grall <julien.grall@xxxxxxx> wrote:
>>>
>>> On 26/07/17 16:09, Oleksandr Tyshchenko wrote:
>>>>
>>>> @@ -355,6 +557,10 @@ static struct hw_register
>>>> *root_pgtable[IPMMU_CTX_MAX] = {
>>>>
>>>>  static bool ipmmu_is_root(struct ipmmu_vmsa_device *mmu)
>>>>  {
>>>> +       /* Xen: Fix */
>>>
>>>
>>>
>>> Hmmm. Can we get a bit more details?
>>
>>
>> These is a case when ipmmu_is_root is called with "mmu" being NULL.
>>
>> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2330
>>
>> In ipmmu_vmsa_alloc_page_table() we need to find "root mmu", but we
>> doesn't have argument to pass.
>> So, I had two options:
>>
>> 1. Add code searching for it.
>> ...
>> spin_lock(&ipmmu_devices_lock);
>> list_for_each_entry(mmu, &ipmmu_devices, list) {
>>    if (ipmmu_is_root(mmu))
>>       break;
>> }
>> spin_unlock(&ipmmu_devices_lock);
>> ...
>>
>> 2. Use existing ipmmu_find_root() with adding this check for a valid
>> value.
>> So, if we call ipmmu_find_root() with argument being NULL we will
>> actually get searching the list.
>>
>> I decided to use 2 option.
>
>
> Can you please expand the comment then?
Will do.

>
>
>>
>>>
>>>> +       if (!mmu)
>>>> +               return false;
>>>> +
>>>>         if (mmu->features->has_cache_leaf_nodes)
>>>>                 return mmu->is_leaf ? false : true;
>>>>         else
>>>> @@ -405,14 +611,28 @@ static void ipmmu_ctx_write(struct
>>>> ipmmu_vmsa_domain
>>>> *domain, unsigned int reg,
>>>>         ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE +
>>>> reg,
>>>> data);
>>>>  }
>>>>
>>>> -static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned
>>>> int reg,
>>>> +/* Xen: Write the context for cache IPMMU only. */
>>>
>>>
>>>
>>> Same here. Why does it need to be different with Xen?
>>
>>
>> Well, let me elaborate a bit more about this.
>>
>> I feel that I need to explain in a few words about IPMMU itself:
>> Generally speaking,
>> The IPMMU hardware (R-Car Gen3) has 8 context banks and consists of next
>> parts:
>> - root IPMMU
>> - a number of cache IPMMUs
>>
>> Each cache IPMMU is connected to root IPMMU and has uTLB ports the
>> master devices can be tied to.
>> Something, like this:
>>
>> master device1 ---> cache IPMMU1 [8 ctx] ---> root IPMMU [8 ctx] -> memory
>>                            |                                          |
>> master device2 --                                          |
>>                                                                       |
>> master device3 ---> cache IPMMU2 [8 ctx] --
>>
>> Each context bank has registers.
>> Some registers exist for both root IPMMU and cache IPMMUs -> IMCTR
>> Some registers exist only for root IPMMU -> IMTTLBRx/IMTTUBRx, IMMAIR0,
>> etc
>>
>> So, original driver has two helpers:
>> 1. ipmmu_ctx_write() - is intended to write a register in context bank
>> N* for root IPMMU only.
>> 2. ipmmu_ctx_write2() - is intended to write a register in context
>> bank N for both root IPMMU and cache IPMMU.
>> *where N=0-7
>>
>> AFAIU, original Linux driver provides each IOMMU domain with a
>> separate IPMMU context:
>> master device1 + master device2 are in IOMMU domain1 and use IPMMU context
>> 0
>> master device3 is in IOMMU domain2 and uses IPMMU context 1
>>
>> So, when attaching device to new IOMMU domain in Linux we have to
>> initialize context for root IPMMU and enable context (IMCTR register)
>> for both root and cache IPMMUs.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.9/rcar-3.5.3#n620
>>
>> In Xen we need additional helper "ipmmu_ctx_write1" for writing a
>> register in context bank N for cache IPMMU only.
>> The reason is that we need a way to control cache IPMMU separately
>> since we have a little bit another model.
>>
>> All IOMMU domains within a single Xen domain (dom_iommu(d)->arch.priv)
>> use the same IPMMU context N
>> which was initialized and enabled at the domain creation time. This
>> means that all master devices
>> that are assigned to the guest domain "d" use only this IPMMU context
>> N which actually contains P2M mapping for domain "d":
>> master device1 + master device2 are in IOMMU domain1 and use IPMMU context
>> 0
>> master device3 is in IOMMU domain2 and also uses IPMMU context 0
>>
>> So, when attaching device to new IOMMU domain in Xen we don't have to
>> initialize and enable context,
>> because it has been already done at domain initialization time:
>>
>> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2380
>> we just have to enable context for corresponding cache IPMMU only:
>>
>> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L1083
>>
>> This is the main difference between drivers in Linux and Xen.
>>
>> So, as you can see there is a need to manipulate context registers for
>> cache IPMMU without touching root IPMMU,
>> that's why I added this helper.
>>
>> Does this make sense?
>
>
> I think it does.
good.

>
>
>>
>>>
>>>
>>>> +static void ipmmu_ctx_write1(struct ipmmu_vmsa_domain *domain, unsigned
>>>> int reg,
>>>>                              u32 data)
>>>>  {
>>>>         if (domain->mmu != domain->root)
>>>> -               ipmmu_write(domain->mmu,
>>>> -                           domain->context_id * IM_CTX_SIZE + reg,
>>>> data);
>>>> +               ipmmu_write(domain->mmu, domain->context_id *
>>>> IM_CTX_SIZE
>>>> + reg, data);
>>>> +}
>>>>
>>>> -       ipmmu_write(domain->root, domain->context_id * IM_CTX_SIZE +
>>>> reg,
>>>> data);
>>>> +/*
>>>> + * Xen: Write the context for both root IPMMU and all cache IPMMUs
>>>> + * that assigned to this Xen domain.
>>>> + */
>>>> +static void ipmmu_ctx_write2(struct ipmmu_vmsa_domain *domain, unsigned
>>>> int reg,
>>>> +                            u32 data)
>>>> +{
>>>> +       struct ipmmu_vmsa_xen_domain *xen_domain =
>>>> dom_iommu(domain->d)->arch.priv;
>>>> +       struct iommu_domain *io_domain;
>>>> +
>>>> +       list_for_each_entry(io_domain, &xen_domain->contexts, list)
>>>> +               ipmmu_ctx_write1(to_vmsa_domain(io_domain), reg, data);
>>>> +
>>>> +       ipmmu_ctx_write(domain, reg, data);
>>>>  }
>>>>
>>>>  /*
>>>>
>>>> -----------------------------------------------------------------------------
>>>> @@ -488,6 +708,10 @@ static void ipmmu_tlb_flush_all(void *cookie)
>>>>  {
>>>>         struct ipmmu_vmsa_domain *domain = cookie;
>>>>
>>>> +       /* Xen: Just return if context_id has non-existent value */
>>>
>>>
>>>
>>> Same here.
>>
>>
>> I think that there is a possible race.
>> In ipmmu_domain_init_context() we are trying to allocate context and
>> if allocation fails we will call free_io_pgtable_ops(),
>> but "domain->context_id" hasn't been initialized yet (likely it is 0).
>>
>> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L799
>>
>> And having following call stack:
>> free_io_pgtable_ops() -> io_pgtable_tlb_flush_all() ->
>> ipmmu_tlb_flush_all() -> ipmmu_tlb_invalidate()
>> we will get a mistaken cache flush for a context pointed by
>> uninitialized "domain->context_id".
>>
>> That's why I initialized context_id with non-existent value before
>> allocating context
>>
>> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L792
>> and checked it for a valid value here
>>
>> https://github.com/otyshchenko1/xen/blob/fc231a0f2edb3d01d178fb5c27dd6c1065807c81/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L712
>> and everywhere it is need to checked.
>
>
> The race is in the code added or the one from Linux? If the latter, then you
> should have an action to fix it there. If the former, the I'd like to
> understand how come we introduced a race compare to Linux.
I think the latter. I have just pushed a patch in.
https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023857.html

>
> [...]
>
>>>
>>>> +
>>>>         /*
>>>>          * Find an unused context.
>>>>          */
>>>> @@ -578,6 +807,11 @@ static int ipmmu_domain_init_context(struct
>>>> ipmmu_vmsa_domain *domain)
>>>>
>>>>         /* TTBR0 */
>>>>         ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
>>>> +
>>>> +       /* Xen: */
>>>> +       dev_notice(domain->root->dev, "d%d: Set IPMMU context %u (pgd
>>>> 0x%"PRIx64")\n",
>>>> +                       domain->d->domain_id, domain->context_id, ttbr);
>>>
>>>
>>>
>>> If you want to keep driver close to Linux, then you need to avoid
>>> unecessary
>>> change.
>>
>> Shall I drop it?
>
>
> Depends. How useful is it? If it is, then may you want to upstream that?
Not so useful, but it is better to keep it while driver is in progress.
However, I can move this print out of the ipmmu_domain_init_context().
"Our" ipmmu_vmsa_alloc_page_table() is a good candidate to have it.

>
> [...]
>
>
>>>>  static int ipmmu_attach_device(struct iommu_domain *io_domain,
>>>>                                struct device *dev)
>>>> @@ -787,7 +1042,20 @@ static int ipmmu_attach_device(struct iommu_domain
>>>> *io_domain,
>>>>                 /* The domain hasn't been used yet, initialize it. */
>>>>                 domain->mmu = mmu;
>>>>                 domain->root = root;
>>>> +
>>>> +/*
>>>> + * Xen: We have already initialized and enabled context for root IPMMU
>>>> + * for this Xen domain. Enable context for given cache IPMMU only.
>>>> + * Flush the TLB as required when modifying the context registers.
>>>
>>>
>>>
>>> Why?
>>
>>
>> Original Linux driver provides each IOMMU domain with a separate IPMMU
>> context.
>> So, when attaching device to IOMMU domain which hasn't been
>> initialized yet we have to
>> call ipmmu_domain_init_context() for initializing (root only) and
>> enabling (root + cache * ) context for this IOMMU domain.
>>
>> * You can see at the end of the "original" ipmmu_domain_init_context()
>> implementation, that context is enabled for both cache and root IPMMUs
>> because of "ipmmu_ctx_write2".
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas-bsp.git/tree/drivers/iommu/ipmmu-vmsa.c?h=v4.9/rcar-3.5.3#n620
>>
>> From my point of view, we don't have to do the same when we are
>> attaching device in Xen, as we keep only one IPMMU context (P2M
>> mappings) per Xen domain
>> for using by all assigned to this guest devices.
>> What is more a number of context banks is limited (8), and if we
>> followed Linux way here, we would be quickly run out of available
>> contexts.
>> But having one IPMMU context per Xen domain allow us to passthrough
>> devices to 8 guest domain.
>
>
> The way you describe it give an impression that the driver is fundamentally
> different in Xen compare to Linux. Am I right?
It is hard to say, is "fundamentally different" suitable combination here.

Drivers differ mostly in context assignment.
Also Xen driver has "VMSAv8-64 mode" enabled and asynchronous page
table deallocation sequence.

So, probably, yes.

>
>
>>
>> Taking into the account described above, we initialize (root only) and
>> enable (root only ** ) context at the domain creation time
>> if IOMMU is expected to be used for this guest.
>>
>> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L2380
>>
>> ** You can see at the end of the "modified"
>> ipmmu_domain_init_context() implementation, that context is enabled
>> for root IPMMU only
>> because of "ipmmu_ctx_write".
>>
>> https://github.com/otyshchenko1/xen/blob/ipmmu_v2/xen/drivers/passthrough/arm/ipmmu-vmsa.c#L882
>>
>> That's why, here, in ipmmu_attach_device() we don't have to call
>> ipmmu_domain_init_context() anymore, because
>> the context has been already initialized and enabled. All what we need
>> here is to enable this context for cache IPMMU the device
>> is physically connected to.
>>
>> Does this make sense?
>>
>>>
>>>
>>>> + */
>>>> +#if 0
>>>>                 ret = ipmmu_domain_init_context(domain);
>>>> +#endif
>>>> +               ipmmu_ctx_write1(domain, IMCTR,
>>>> +                               ipmmu_ctx_read(domain, IMCTR) |
>>>> IMCTR_FLUSH);
>>>> +
>>>> +               dev_info(dev, "Using IPMMU context %u\n",
>>>> domain->context_id);
>>>> +#if 0 /* Xen: Not needed */
>>>>                 if (ret < 0) {
>>>>                         dev_err(dev, "Unable to initialize IPMMU
>>>> context\n");
>>>>                         domain->mmu = NULL;
>>>> @@ -795,6 +1063,7 @@ static int ipmmu_attach_device(struct iommu_domain
>>>> *io_domain,
>>>>                         dev_info(dev, "Using IPMMU context %u\n",
>>>>                                  domain->context_id);
>>>>                 }
>>>> +#endif
>>>>         } else if (domain->mmu != mmu) {
>>>>                 /*
>>>>                  * Something is wrong, we can't attach two devices using
>>>> @@ -834,6 +1103,14 @@ static void ipmmu_detach_device(struct
>>>> iommu_domain
>>>> *io_domain,
>>>>          */
>>>>  }
>>>>
>>>> +/*
>>>> + * Xen: The current implementation of these callbacks is insufficient
>>>> for
>>>> us
>>>> + * since they are intended to be called from Linux IOMMU core that
>>>> + * has already done all required actions such as doing various checks,
>>>> + * splitting into memory block the hardware supports and so on.
>>>
>>>
>>>
>>> Can you expand it here? Why can't our IOMMU framework could do that?
>>
>>
>> If add all required support to IOMMU framework and modify all existing
>> IOMMU drivers
>> to follow this support, then yes, it will avoid IOMMU drivers such as
>> IPMMU-VMSA from having these stuff in.
>>
>> To be honest, I was trying to touch IOMMU common code and other IOMMU
>> drivers as little as possible,
>> but I had to introduce a few changes ("non-shared IOMMU").
>
>
> What I am looking is something we can easily maintain in the future. If it
> requires change in the common code then we should do it. If it happens to be
> too complex, then maybe we should not take it from Linux.
I understand your point.

>
>>
>>>
>>> IHMO, if we want to get driver from Linux, we need to get an interface
>>> very
>>> close to it. Otherwise it is not worth it because you would have to
>>> implement for each IOMMU.
>>
>> You are right.
>>
>>>
>>> My overall feeling at the moment is Xen is not ready to welcome this
>>> driver
>>> directly from Linux. This is also a BSP driver, so no thorough review
>>> done
>>> by the community.
>>
>>
>> As I said in a cover letter the BSP driver had more complete support
>> than the mainline one.
>
>
> I know. But this means we are going to bring code in Xen that was not fully
> reviewed and don't know the quality of the code.
>
>> I would like to clarify what need to be done from my side.
>> Should I wait for the missing things reach upsteam and then rebase on
>> the mainline driver?
>> Or should I rewrite this driver without following Linux?
>
>
> I don't have a clear answer here. As I said, we need to weight pros and cons
> to use Linux driver over our own.
>
> At the moment, you are using a BSP driver which has more features but
> modified quite a lot. We don't even know when this is going to be merged in
> Linux.
>
> Keeping code close to Linux requires some hacks that are acceptable if you
> can benefits from the community (bug fix, review...). As the driver is taken
> from the BSP, we don't know if the code will stay in the current form nor be
> able to get bug fix.

I got it. Completely agree with you.
But, we need to choose which direction we should follow. We have 3
options at the moment
and I am OK with each of them:
1. direct port from BSP (current implementation).
2. direct port from mainline Linux (when it has required support).
3. new driver based on BSP/Linux and contains only relevant to Xen things.

I am starting to think that options 2 or 3 (+) would be more suitable.
What do you think?

>
> Cheers,
>
> --
> Julien Grall

-- 
Regards,

Oleksandr Tyshchenko

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.