[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver
On 26/10/2020 11:03, Rahul Singh wrote: Hello Julien, Hi Rahul, On 23 Oct 2020, at 4:19 pm, Julien Grall <julien@xxxxxxx> wrote: On 23/10/2020 15:27, Rahul Singh wrote:Hello Julien,On 23 Oct 2020, at 2:00 pm, Julien Grall <julien@xxxxxxx> wrote: On 23/10/2020 12:35, Rahul Singh wrote:Hello,On 23 Oct 2020, at 1:02 am, Stefano Stabellini <sstabellini@xxxxxxxxxx> wrote: On Thu, 22 Oct 2020, Julien Grall wrote:On 20/10/2020 16:25, Rahul Singh wrote:Add support for ARM architected SMMUv3 implementations. It is based on the Linux SMMUv3 driver. Major differences between the Linux driver are as follows: 1. Only Stage-2 translation is supported as compared to the Linux driver that supports both Stage-1 and Stage-2 translations. 2. Use P2M page table instead of creating one as SMMUv3 has the capability to share the page tables with the CPU. 3. Tasklets is used in place of threaded IRQ's in Linux for event queue and priority queue IRQ handling.Tasklets are not a replacement for threaded IRQ. In particular, they will have priority over anything else (IOW nothing will run on the pCPU until they are done). Do you know why Linux is using thread. Is it because of long running operations?Yes you are right because of long running operations Linux is using the threaded IRQs. SMMUv3 reports fault/events bases on memory-based circular buffer queues not based on the register. As per my understanding, it is time-consuming to process the memory based queues in interrupt context because of that Linux is using threaded IRQ to process the faults/events from SMMU. I didn’t find any other solution in XEN in place of tasklet to defer the work, that’s why I used tasklet in XEN in replacement of threaded IRQs. If we do all work in interrupt context we will make XEN less responsive.So we need to make sure that Xen continue to receives interrupts, but we also need to make sure that a vCPU bound to the pCPU is also responsive.If you know another solution in XEN that will be used to defer the work in the interrupt please let me know I will try to use that.One of my work colleague encountered a similar problem recently. He had a long running tasklet and wanted to be broken down in smaller chunk. We decided to use a timer to reschedule the taslket in the future. This allows the scheduler to run other loads (e.g. vCPU) for some time. This is pretty hackish but I couldn't find a better solution as tasklet have high priority. Maybe the other will have a better idea.Julien's suggestion is a good one. But I think tasklets can be configured to be called from the idle_loop, in which case they are not run in interrupt context?Yes you are right tasklet will be scheduled from the idle_loop that is not interrupt conext.This depends on your tasklet. Some will run from the softirq context which is usually (for Arm) on the return of an exception.Thanks for the info. I will check and will get better understanding of the tasklet how it will run in XEN.4. Latest version of the Linux SMMUv3 code implements the commands queue access functions based on atomic operations implemented in Linux.Can you provide more details?I tried to port the latest version of the SMMUv3 code than I observed that in order to port that code I have to also port atomic operation implemented in Linux to XEN. As latest Linux code uses atomic operation to process the command queues (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() , atomic_fetch_andnot_relaxed()) .Thank you for the explanation. I think it would be best to import the atomic helpers and use the latest code. This will ensure that we don't re-introduce bugs and also buy us some time before the Linux and Xen driver diverge again too much. Stefano, what do you think?I think you are right.Yes, I agree with you to have XEN code in sync with Linux code that's why I started with to port the Linux atomic operations to XEN then I realised that it is not straightforward to port atomic operations and it requires lots of effort and testing. Therefore I decided to port the code before the atomic operation is introduced in Linux.Hmmm... I would not have expected a lot of effort required to add the 3 atomics operations above. Are you trying to also port the LSE support at the same time?There are other atomic operations used in the SMMUv3 code apart from the 3 atomic operation I mention. I just mention 3 operation as an example.Ok. Do you have a list you could share?Yes. Please find the list that we have to port to the XEN in order to merge the latest SMMUv3 code. Thanks! If we start to port the below list we might have to port another atomic operation based on which below atomic operations are implemented. I have not spent time on how these atomic operations are implemented in detail but as per my understanding, it required an effort to port them to XEN and required a lot of testing. For the beginning, I think it is fine to implement them with a stronger memory barrier than necessary or in a less efficient. This can be refined afterwards. Those helpers can directly be defined in the SMMUv3 drivers so we know they are not for general purpose :). 1. atomic_set_release This could be implemented as: smp_mb(); atomic_set(); 2. atomic_fetch_andnot_relaxed This would need to be imported. 3. atomic_cond_read_relaxed This would need to be imported. The simplest version seems to be the generic version provided by include/asm-generic/barrier.h (see smp_cond_load_relaxed). 4. atomic_long_cond_read_relaxed 5. atomic_long_xor The two would require the implementation of atomic64. Volodymyr also required a version. I offered my help, however I didn't find enough time to do it yet :(. For Arm64, it would be possible to do a copy/paste of the existing helpers and replace anything related to a 32-bit register with a 64-bit one. For Arm32, they are a bit more complex because you now need to work with 2 registers. However, for your purpose, you would be using atomic_long_t. So the the Arm64 implementation should be sufficient. 6. atomic_set_release Same as 1. atomic_cmpxchg() is strongly ordered. So it would be fine to use it for implementing the helper. Although, more inefficient :).7. atomic_cmpxchg_relaxed might be we can use atomic_cmpxchg that is implemented in XEN need to check. 8. atomic_dec_return_release Xen implements a stronger version atomic_dec_return(). You can re-use it here. 9. atomic_fetch_inc_relaxed This would need to be imported. Cheers, -- Julien Grall
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |