[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver





On 26/10/2020 11:03, Rahul Singh wrote:
Hello Julien,

Hi Rahul,

On 23 Oct 2020, at 4:19 pm, Julien Grall <julien@xxxxxxx> wrote:



On 23/10/2020 15:27, Rahul Singh wrote:
Hello Julien,
On 23 Oct 2020, at 2:00 pm, Julien Grall <julien@xxxxxxx> wrote:



On 23/10/2020 12:35, Rahul Singh wrote:
Hello,
On 23 Oct 2020, at 1:02 am, Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote:

On Thu, 22 Oct 2020, Julien Grall wrote:
On 20/10/2020 16:25, Rahul Singh wrote:
Add support for ARM architected SMMUv3 implementations. It is based on
the Linux SMMUv3 driver.
Major differences between the Linux driver are as follows:
1. Only Stage-2 translation is supported as compared to the Linux driver
    that supports both Stage-1 and Stage-2 translations.
2. Use P2M  page table instead of creating one as SMMUv3 has the
    capability to share the page tables with the CPU.
3. Tasklets is used in place of threaded IRQ's in Linux for event queue
    and priority queue IRQ handling.

Tasklets are not a replacement for threaded IRQ. In particular, they will
have priority over anything else (IOW nothing will run on the pCPU until
they are done).

Do you know why Linux is using thread. Is it because of long running
operations?

Yes you are right because of long running operations Linux is using the
threaded IRQs.

SMMUv3 reports fault/events bases on memory-based circular buffer queues not
based on the register. As per my understanding, it is time-consuming to
process the memory based queues in interrupt context because of that Linux
is using threaded IRQ to process the faults/events from SMMU.

I didn’t find any other solution in XEN in place of tasklet to defer the
work, that’s why I used tasklet in XEN in replacement of threaded IRQs. If
we do all work in interrupt context we will make XEN less responsive.

So we need to make sure that Xen continue to receives interrupts, but we also
need to make sure that a vCPU bound to the pCPU is also responsive.


If you know another solution in XEN that will be used to defer the work in
the interrupt please let me know I will try to use that.

One of my work colleague encountered a similar problem recently. He had a long
running tasklet and wanted to be broken down in smaller chunk.

We decided to use a timer to reschedule the taslket in the future. This allows
the scheduler to run other loads (e.g. vCPU) for some time.

This is pretty hackish but I couldn't find a better solution as tasklet have
high priority.

Maybe the other will have a better idea.

Julien's suggestion is a good one.

But I think tasklets can be configured to be called from the idle_loop,
in which case they are not run in interrupt context?

  Yes you are right tasklet will be scheduled from the idle_loop that is not 
interrupt conext.

This depends on your tasklet. Some will run from the softirq context which is 
usually (for Arm) on the return of an exception.

Thanks for the info. I will check and will get better understanding of the 
tasklet how it will run in XEN.

4. Latest version of the Linux SMMUv3 code implements the commands queue
    access functions based on atomic operations implemented in Linux.

Can you provide more details?

I tried to port the latest version of the SMMUv3 code than I observed that
in order to port that code I have to also port atomic operation implemented
in Linux to XEN. As latest Linux code uses atomic operation to process the
command queues (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() ,
atomic_fetch_andnot_relaxed()) .

Thank you for the explanation. I think it would be best to import the atomic
helpers and use the latest code.

This will ensure that we don't re-introduce bugs and also buy us some time
before the Linux and Xen driver diverge again too much.

Stefano, what do you think?

I think you are right.
Yes, I agree with you to have XEN code in sync with Linux code that's why I 
started with to port the Linux atomic operations to XEN  then I realised that 
it is not straightforward to port atomic operations and it requires lots of 
effort and testing. Therefore I decided to port the code before the atomic 
operation is introduced in Linux.

Hmmm... I would not have expected a lot of effort required to add the 3 atomics 
operations above. Are you trying to also port the LSE support at the same time?
There are other atomic operations used in the SMMUv3 code apart from the 3 
atomic operation I mention. I just mention 3 operation as an example.

Ok. Do you have a list you could share?


Yes. Please find the list that we have to port to the XEN in order to merge the 
latest SMMUv3 code.

Thanks!


If we start to port the below list we might have to port another atomic 
operation based on which below atomic operations are implemented. I have not 
spent time on how these atomic operations are implemented in detail but as per 
my understanding, it required an effort to port them to XEN and required a lot 
of testing.

For the beginning, I think it is fine to implement them with a stronger memory barrier than necessary or in a less efficient. This can be refined afterwards.

Those helpers can directly be defined in the SMMUv3 drivers so we know they are not for general purpose :).


1. atomic_set_release

This could be implemented as:

smp_mb();
atomic_set();

2. atomic_fetch_andnot_relaxed

This would need to be imported.

3. atomic_cond_read_relaxed

This would need to be imported. The simplest version seems to be the generic version provided by include/asm-generic/barrier.h (see smp_cond_load_relaxed).

4. atomic_long_cond_read_relaxed
5. atomic_long_xor

The two would require the implementation of atomic64. Volodymyr also required a version. I offered my help, however I didn't find enough time to do it yet :(.

For Arm64, it would be possible to do a copy/paste of the existing helpers and replace anything related to a 32-bit register with a 64-bit one.

For Arm32, they are a bit more complex because you now need to work with 2 registers.

However, for your purpose, you would be using atomic_long_t. So the the Arm64 implementation should be sufficient.

6. atomic_set_release

Same as 1.

7. atomic_cmpxchg_relaxed might be we can use atomic_cmpxchg that is 
implemented in XEN need to check.
atomic_cmpxchg() is strongly ordered. So it would be fine to use it for implementing the helper. Although, more inefficient :).

8. atomic_dec_return_release

Xen implements a stronger version atomic_dec_return(). You can re-use it here.

9. atomic_fetch_inc_relaxed

This would need to be imported.

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.