[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [BUG]SMMU-V3 queue need no-cache memory





On 08/12/2022 13:04, Rahul Singh wrote:
Hi Julien,

Hi Rahul,

On 7 Dec 2022, at 12:13 pm, Julien Grall <julien@xxxxxxx> wrote:

Hi,

I only noticed this e-mail because I was skimming xen-devel. If you want to get 
our attention, then I would suggest to CC both of us because I (and I guess 
Stefano) have filter rules so those e-mails land directly in my inbox.

On 07/12/2022 10:24, Rahul Singh wrote:
On 7 Dec 2022, at 2:04 am, sisyphean <sisyphean@zlw.email> wrote:

Hi,

     I try to run XEN on my ARM board(Sorry, for some commercial reasons, I 
can't tell you
     on which platform I run XEN)  and enable SMMU-V3, but all cmds in cmdq 
failed when XEN started.

     After using the debugger to track debugging, the reason for this problem 
is that
     the queue in the smmu-v3 driver is not no-cache, so after the function 
arm_smmu_cmdq_build_cmd
     is executed, the cmd is still in cache.Therefore, the SMMU-V3 hardware 
cannot obtain the correct cmd
     from the memory for execution.
Yes you are right as of now we are allocating the memory for cmdqueue via 
_xzalloc() which is cached
memory because of that you are observing the issue. We have tested the Xen 
SMMUv3 driver on SOC
where SMMUv3 HW is in the coherency domain, and because of that we have not 
encountered this issue.
I think In your case SMMUv3 HW is not in the coherency domain. Please confirm 
from your side if the
"dma-coherent” property is not set in DT.
I think there is no function available as of now to request Xen to allocate 
memory that is not cached.

You are correct.

@Julien and @Stefano do you have any suggestion on how we can request memory 
from Xen that is not
cached something like dma_alloc_coherent() in Linux.

At the moment all the RAM is mapped cacheable in Xen. So it will require some 
work to have some memory uncacheable.

There are two options:
1) Allocate a pool of memory at boot time that will be mapped with different 
memory attribute. This means we would need a separate pool and the user will 
have to size it.
2) Modify after the allocation the caching attribute in the memory and then 
revert back after freeing. The cons is we would end up to shatter superpage. We 
also can't re-create superpage (yet), but that might be fine if the memory is 
never freed.

Option two would probably the best. But before going that route I have one 
question...

The temporary solution I use is to execute function clean_dcache every
time cmd is copied to cmdq in function queue_write. But it is obvious
that this will seriously affect the efficiency.

I agree you will see some performance impact in micro-benchmark. But I am not 
sure about normal use-cases. How often do you expect the command queue to be 
used?

To be precise command queue will be used when

Thanks for the list. See my comments below.

  - Set up the stage-2 translation when we assigned the devices to guests. This 
happens typically dom0 boot and domU creation.

Hotplugging is another approach. At the moment, I would expect that in this situation the cache flush will just be noise as the domain creation is quite complex.

  - When there is a call to iommu_iotlb_flush() that will call IOMMU specific 
iotlb_flush. SMMuv3 driver will send the command to
    SMMUv3 HW to invalidate the entries.

This is an interesting one. Those operations will usually be heavily used by backend PV drivers when mapping/unmapping the grant entries.

I am not aware of anyone that did some performance test when the IOMMU is enabled (I think Stefano did some in the past when disabled).

The grant mapping are usually one page at the time. So it would be interesting to check the overhead of the SMMU (even without the cache flush). The tests I am thinking are comparing the numbers with and without the IOMMU enabled:
 1) Micro-benchmark the map/unmap operations
 2) Benchmark throughput for block and network device

Cheers,

--
Julien Grall



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.