[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Shattering superpages impact on IOMMU in Xen
On Tue, Apr 4, 2017 at 12:28 PM, Oleksandr Tyshchenko <olekstysh@xxxxxxxxx> wrote: > Hi, Stefano. > > On Mon, Apr 3, 2017 at 11:33 PM, Stefano Stabellini > <sstabellini@xxxxxxxxxx> wrote: >> On Mon, 3 Apr 2017, Oleksandr Tyshchenko wrote: >>> On Mon, Apr 3, 2017 at 9:06 PM, Julien Grall <julien.grall@xxxxxxx> wrote: >>> > Hi Andrew, >>> > >>> > >>> > On 03/04/17 18:16, Andrew Cooper wrote: >>> >> >>> >> On 03/04/17 18:02, Julien Grall wrote: >>> >>> >>> >>> Hi Andrew, >>> >>> >>> >>> On 03/04/17 17:42, Andrew Cooper wrote: >>> >>>> >>> >>>> On 03/04/17 17:24, Oleksandr Tyshchenko wrote: >>> >>>>> >>> >>>>> Hi, all. >>> >>>>> >>> >>>>> Playing with non-shared IOMMU in Xen on ARM I faced one interesting >>> >>>>> thing. I found out that the superpages were shattered during domain >>> >>>>> life cycle. >>> >>>>> This is the result of mapping of foreign pages, ballooning memory, >>> >>>>> even if domain maps Xen shared pages, etc. >>> >>>>> I don't bother with the memory fragmentation at the moment. But, >>> >>>>> shattering bothers me from the IOMMU point of view. >>> >>>>> As the Xen owns IOMMU it might manipulate IOMMU page tables when >>> >>>>> passthoughed/protected device doing DMA in Linux. It is hard to detect >>> >>>>> when the DMA transaction isn't in progress >>> >>>>> in order to prevent this race. So, if we have inflight transaction >>> >>>>> from a device when changing IOMMU mapping we might get into trouble. >>> >>>>> Unfortunately, not in all the cases the >>> >>>>> faulting transaction can be restarted. The chance to hit the problem >>> >>>>> increases during shattering. >>> >>>>> >>> >>>>> I did next test: >>> >>>>> The dom0 on my setup contains ethernet IP that are protected by IOMMU. >>> >>>>> What is more, as the IOMMU I am playing with supports superpages (2M, >>> >>>>> 1G) the IOMMU driver >>> >>>>> takes into account these capabilities when building page tables. As I >>> >>>>> gave 256 MB for dom0, the IOMMU mapping was built by 2M memory blocks >>> >>>>> only. As I am using NFS for both dom0 and domU the ethernet IP >>> >>>>> performs DMA transactions almost all the time. >>> >>>>> Sometimes, I see the IOMMU page faults during creating guest domain. I >>> >>>>> think, it happens during Xen is shattering 2M mappings 4K mappings (it >>> >>>>> unmaps dom0 pages by one 4K page at a time, then maps domU pages there >>> >>>>> for copying domU images). >>> >>>>> But, I don't see any page faults when the IOMMU page table was built >>> >>>>> by 4K pages only. >>> >>>>> >>> >>>>> I had a talk with Julien on IIRC and we came to conclusion that the >>> >>>>> safest way would be to use 4K pages to prevent shattering, so the >>> >>>>> IOMMU shouldn't report superpage capability. >>> >>>>> On the other hand, if we build IOMMU from 4K pages we will have >>> >>>>> performance drop (during building, walking page tables), TLB pressure, >>> >>>>> etc. >>> >>>>> Another possible solution Julien was suggesting is to always >>> >>>>> ballooning with 2M, 1G, and not using 4K. That would help us to >>> >>>>> prevent shattering effect. >>> >>>>> The discussion was moved to the ML since it seems to be a generic >>> >>>>> issue and the right solution should be think of. >>> >>>>> >>> >>>>> What do you think is the right way to follow? Use 4K pages and don't >>> >>>>> bother with shattering or try to optimize? And if the idea to make >>> >>>>> balloon mechanism smarter makes sense how to teach balloon to do so? >>> >>>>> Thank you. >>> >>>> >>> >>>> >>> >>>> Ballooning and foreign mappings are terrible for trying to retain >>> >>>> superpage mappings. No OS, not even Linux, can sensibly provide victim >>> >>>> pages in a useful way to avoid shattering. >>> >>>> >>> >>>> If you care about performance, don't ever balloon. Foreign mappings in >>> >>>> translated guests should start from the top of RAM, and work upwards. >>> >>> >>> >>> >>> >>> I am not sure to understand this. Can you extend? >>> >> >>> >> >>> >> I am not sure what is unclear. Handing random frames of RAM back to the >>> >> hypervisor is what exacerbates host superpage fragmentation, and all >>> >> balloon drivers currently do it. >>> >> >>> >> If you want to avoid host superpage fragmentation, don't use a >>> >> scattergun approach of handing frames back to Xen. However, because >>> >> even Linux doesn't provide enough hooks into the physical memory >>> >> management logic, the only solution is to not balloon at all, and to use >>> >> already-unoccupied frames for foreign mappings. >>> > >>> > >>> > Do you have any pointer in the Linux code? >>> > >>> > >>> >> >>> >>> >>> >>>> >>> >>>> >>> >>>> As for the IOMMU specifically, things are rather easier. It is the >>> >>>> guests responsibility to ensure that frames offered up for ballooning >>> >>>> or >>> >>>> foreign mappings are unused. Therefore, if anything cares about the >>> >>>> specific 4K region becoming non-present in the IOMMU mappings, it is >>> >>>> the >>> >>>> guest kernels fault for offering up a frame already in use. >>> >>>> >>> >>>> For the shattering however, It is Xen's responsibility to ensure that >>> >>>> all other mappings stay valid at all points. The correct way to do >>> >>>> this >>> >>>> is to construct a new L1 table, mirroring the L2 superpage but lacking >>> >>>> the specific 4K mapping in question, then atomically replace the L2 >>> >>>> superpage entry with the new L1 table, then issue an IOMMU TLB >>> >>>> invalidation to remove any cached mappings. >>> >>>> >>> >>>> By following that procedure, all DMA within the 2M region, but not >>> >>>> hitting the 4K frame, won't observe any interim lack of mappings. It >>> >>>> appears from your description that Xen isn't following the procedure. >>> >>> >>> >>> >>> >>> Xen is following what's the ARM ARM is mandating. For shattering page >>> >>> table, we have to follow the break-before-sequence i.e: >>> >>> - Invalidate the L2 entry >>> >>> - Flush the TLBs >>> >>> - Add the new L1 table >>> >>> See D4-1816 in ARM DDI 0487A.k_iss10775 for details. So we end up in a >>> >>> small window where there are no valid mapping. It is easy to trap data >>> >>> abort from processor and restarting it but not for device memory >>> >>> transactions. >>> >>> >>> >>> Xen by default is sharing stage-2 page tables with between the IOMMU >>> >>> and the MMU. However, from the discussion I had with Oleksandr, they >>> >>> are not sharing page tables and still see the problem. I am not sure >>> >>> how they are updating the page table here. Oleksandr, can you provide >>> >>> more details? >>> >> >>> >> >>> >> Are you saying that ARM has no way of making atomic updates to the IOMMU >>> >> mappings? (How do I get access to that document? Google gets me to >>> >> >>> >> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.subset.architecture.reference/index.html, >>> >> but >>> >> http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k/index.html >>> >> which looks like the document you specified results in 404.) >>> > >>> > >>> > Below a link, I am not sure why google does not refer it: >>> > >>> > http://infocenter.arm.com/help/topic/com.arm.doc.ddi0487a.k_10775/index.html >>> > >>> >> >>> >> If so, that is an architecture bug IMO. By design, the IOMMU is out of >>> >> control of guest software, and the hypervisor should be able to make >>> >> atomic modifications without guest cooperation. >>> > >>> > >>> > I think you misread what I meant, IOMMU supports atomic operations. >>> > However, >>> > if you share the page table we have to apply Break-Before-Make when >>> > shattering superpage. This is mandatory if you want to get Xen running on >>> > all the micro-architectures. >>> > >>> > Some IOMMU may cope with the BBM, some not. I haven't seen any issue so >>> > far >>> > (it does not mean there are none). >>> > >>> > The IOMMU used by Oleksandr (e.g VMSA-IPMMU) is an IP from Renesas which I >>> > never used myself. In his case he needs different page tables because the >>> > layouts are not the same. >>> > >>> > Oleksandr, looking at the code your provided, the superpage are split the >>> > way Andrew said, i.e: >>> > 1) allocating level 3 table minus the 4K mapping >>> > 2) replace level 2 entry with the new table >>> > >>> > Am I right? >>> >>> It seems, yes. Walking the page table down when trying to unmap we >>> bump into leaf entry (2M mapping), >>> so 2M-4K mapping are inserted at the next level and after that the >>> page table entry are replaced. >> >> Let me premise that Andrew well pointed out what should be the right >> approach on dealing with this issue. However, if we have to use >> break-before-make for IOMMU pagetables, then it means we cannot do >> atomic updates to IOMMU mappings, like Andrew wrote. Therefore, we >> have to make a choice: we either disable superpage IOMMU mappings or >> ballooning. I would disable IOMMU superpage mappings, on the ground that >> supporting superpage mappings without supporting atomic shattering or >> restartable transactions is not really supporting superpage mappings. > > Sounds reasonable. As Julien mentioned too "using 4K pages only" is > the safest way. > At least until I will find a reason why DMA faults take place despite > the fast that shattering is > doing in an atomic way. > >> >> However, you are not doing break-before-make here. I would investigate >> if break-before-make is required by VMSA-IPMMU. If it is not required, >> why are you seeing DMA faults? > > Unfortunally, I can't say about break-before-make sequence for IPMMU > at the moment. > TRM says nothing about it. > > -- > Regards, > > Oleksandr Tyshchenko Hi, guys. Seems, it was only my fault. The issue wasn't exactly in shattering, the shattering just increased probability for IOMMU page faults to occur. I didn't do clean_dcache for the page table entry after updating it. So, with clean_dcache I don't see page faults when shattering superpages! BTW, can I configure domheap pages (which I am using for the IOMMU page table) to be uncached? What do you think? -- Regards, Oleksandr Tyshchenko _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |