[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [PATCH v4 13/21] IOMMU/x86: prefill newly allocate page tables
On Mon, Apr 25, 2022 at 10:40:55AM +0200, Jan Beulich wrote: > Page tables are used for two purposes after allocation: They either > start out all empty, or they get filled to replace a superpage. > Subsequently, to replace all empty or fully contiguous page tables, > contiguous sub-regions will be recorded within individual page tables. > Install the initial set of markers immediately after allocation. Make > sure to retain these markers when further populating a page table in > preparation for it to replace a superpage. > > The markers are simply 4-bit fields holding the order value of > contiguous entries. To demonstrate this, if a page table had just 16 > entries, this would be the initial (fully contiguous) set of markers: > > index 0 1 2 3 4 5 6 7 8 9 A B C D E F > marker 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0 > > "Contiguous" here means not only present entries with successively > increasing MFNs, each one suitably aligned for its slot, but also a > respective number of all non-present entries. > > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx> > Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx> > --- > An alternative to the ASSERT()s added to set_iommu_ptes_present() would > be to make the function less general-purpose; it's used in a single > place only after all (i.e. it might as well be folded into its only > caller). I would think adding a comment that the function requires the PDE to be empty would be good. Also given the current usage we could drop the nr_ptes parameter and just name the function fill_pde() or similar. > > While in VT-d's comment ahead of struct dma_pte I'm adjusting the > description of the high bits, I'd like to note that the description of > some of the lower bits isn't correct either. Yet I don't think adjusting > that belongs here. > --- > v4: Add another comment referring to pt-contig-markers.h. Re-base. > v3: Add comments. Re-base. > v2: New. > > --- a/xen/arch/x86/include/asm/iommu.h > +++ b/xen/arch/x86/include/asm/iommu.h > @@ -146,7 +146,8 @@ void iommu_free_domid(domid_t domid, uns > > int __must_check iommu_free_pgtables(struct domain *d); > struct domain_iommu; > -struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd); > +struct page_info *__must_check iommu_alloc_pgtable(struct domain_iommu *hd, > + uint64_t contig_mask); > void iommu_queue_free_pgtable(struct domain_iommu *hd, struct page_info *pg); > > #endif /* !__ARCH_X86_IOMMU_H__ */ > --- a/xen/drivers/passthrough/amd/iommu-defs.h > +++ b/xen/drivers/passthrough/amd/iommu-defs.h > @@ -446,11 +446,13 @@ union amd_iommu_x2apic_control { > #define IOMMU_PAGE_TABLE_U32_PER_ENTRY (IOMMU_PAGE_TABLE_ENTRY_SIZE / > 4) > #define IOMMU_PAGE_TABLE_ALIGNMENT 4096 > > +#define IOMMU_PTE_CONTIG_MASK 0x1e /* The ign0 field below. */ > + > union amd_iommu_pte { > uint64_t raw; > struct { > bool pr:1; > - unsigned int ign0:4; > + unsigned int ign0:4; /* Covered by IOMMU_PTE_CONTIG_MASK. */ > bool a:1; > bool d:1; > unsigned int ign1:2; > --- a/xen/drivers/passthrough/amd/iommu_map.c > +++ b/xen/drivers/passthrough/amd/iommu_map.c > @@ -115,7 +115,19 @@ static void set_iommu_ptes_present(unsig > > while ( nr_ptes-- ) > { > - set_iommu_pde_present(pde, next_mfn, 0, iw, ir); > + ASSERT(!pde->next_level); > + ASSERT(!pde->u); > + > + if ( pde > table ) > + ASSERT(pde->ign0 == find_first_set_bit(pde - table)); > + else > + ASSERT(pde->ign0 == PAGE_SHIFT - 3); I think PAGETABLE_ORDER would be clearer here. While here, could you also assert that next_mfn matches the contiguous order currently set in the PTE? > + > + pde->iw = iw; > + pde->ir = ir; > + pde->fc = true; /* See set_iommu_pde_present(). */ > + pde->mfn = next_mfn; > + pde->pr = true; > > ++pde; > next_mfn += page_sz; > @@ -295,7 +307,7 @@ static int iommu_pde_from_dfn(struct dom > mfn = next_table_mfn; > > /* allocate lower level page table */ > - table = iommu_alloc_pgtable(hd); > + table = iommu_alloc_pgtable(hd, IOMMU_PTE_CONTIG_MASK); > if ( table == NULL ) > { > AMD_IOMMU_ERROR("cannot allocate I/O page table\n"); > @@ -325,7 +337,7 @@ static int iommu_pde_from_dfn(struct dom > > if ( next_table_mfn == 0 ) > { > - table = iommu_alloc_pgtable(hd); > + table = iommu_alloc_pgtable(hd, IOMMU_PTE_CONTIG_MASK); > if ( table == NULL ) > { > AMD_IOMMU_ERROR("cannot allocate I/O page table\n"); > @@ -717,7 +729,7 @@ static int fill_qpt(union amd_iommu_pte > * page table pages, and the resulting allocations are always > * zeroed. > */ > - pgs[level] = iommu_alloc_pgtable(hd); > + pgs[level] = iommu_alloc_pgtable(hd, 0); Is it worth not setting up the contiguous data for quarantine page tables? I think it's fine now given the current code, but you having added ASSERTs that the contig data is correct in set_iommu_ptes_present() makes me wonder whether we could trigger those in the future. I understand that the contig data is not helpful for quarantine page tables, but still doesn't seem bad to have it just for coherency. > if ( !pgs[level] ) > { > rc = -ENOMEM; > @@ -775,7 +787,7 @@ int cf_check amd_iommu_quarantine_init(s > return 0; > } > > - pdev->arch.amd.root_table = iommu_alloc_pgtable(hd); > + pdev->arch.amd.root_table = iommu_alloc_pgtable(hd, 0); > if ( !pdev->arch.amd.root_table ) > return -ENOMEM; > > --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c > +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c > @@ -342,7 +342,7 @@ int amd_iommu_alloc_root(struct domain * > > if ( unlikely(!hd->arch.amd.root_table) && d != dom_io ) > { > - hd->arch.amd.root_table = iommu_alloc_pgtable(hd); > + hd->arch.amd.root_table = iommu_alloc_pgtable(hd, 0); > if ( !hd->arch.amd.root_table ) > return -ENOMEM; > } > --- a/xen/drivers/passthrough/vtd/iommu.c > +++ b/xen/drivers/passthrough/vtd/iommu.c > @@ -334,7 +334,7 @@ static uint64_t addr_to_dma_page_maddr(s > goto out; > > pte_maddr = level; > - if ( !(pg = iommu_alloc_pgtable(hd)) ) > + if ( !(pg = iommu_alloc_pgtable(hd, 0)) ) > goto out; > > hd->arch.vtd.pgd_maddr = page_to_maddr(pg); > @@ -376,7 +376,7 @@ static uint64_t addr_to_dma_page_maddr(s > } > > pte_maddr = level - 1; > - pg = iommu_alloc_pgtable(hd); > + pg = iommu_alloc_pgtable(hd, DMA_PTE_CONTIG_MASK); > if ( !pg ) > break; > > @@ -388,12 +388,13 @@ static uint64_t addr_to_dma_page_maddr(s > struct dma_pte *split = map_vtd_domain_page(pte_maddr); > unsigned long inc = 1UL << level_to_offset_bits(level - 1); > > - split[0].val = pte->val; > + split[0].val |= pte->val & ~DMA_PTE_CONTIG_MASK; > if ( inc == PAGE_SIZE ) > split[0].val &= ~DMA_PTE_SP; > > for ( offset = 1; offset < PTE_NUM; ++offset ) > - split[offset].val = split[offset - 1].val + inc; > + split[offset].val |= > + (split[offset - 1].val & ~DMA_PTE_CONTIG_MASK) + inc; > > iommu_sync_cache(split, PAGE_SIZE); > unmap_vtd_domain_page(split); > @@ -2173,7 +2174,7 @@ static int __must_check cf_check intel_i > if ( iommu_snoop ) > dma_set_pte_snp(new); > > - if ( old.val == new.val ) > + if ( !((old.val ^ new.val) & ~DMA_PTE_CONTIG_MASK) ) > { > spin_unlock(&hd->arch.mapping_lock); > unmap_vtd_domain_page(page); > @@ -3052,7 +3053,7 @@ static int fill_qpt(struct dma_pte *this > * page table pages, and the resulting allocations are always > * zeroed. > */ > - pgs[level] = iommu_alloc_pgtable(hd); > + pgs[level] = iommu_alloc_pgtable(hd, 0); > if ( !pgs[level] ) > { > rc = -ENOMEM; > @@ -3109,7 +3110,7 @@ static int cf_check intel_iommu_quaranti > if ( !drhd ) > return -ENODEV; > > - pg = iommu_alloc_pgtable(hd); > + pg = iommu_alloc_pgtable(hd, 0); > if ( !pg ) > return -ENOMEM; > > --- a/xen/drivers/passthrough/vtd/iommu.h > +++ b/xen/drivers/passthrough/vtd/iommu.h > @@ -253,7 +253,10 @@ struct context_entry { > * 2-6: reserved > * 7: super page > * 8-11: available > - * 12-63: Host physcial address > + * 12-51: Host physcial address > + * 52-61: available (52-55 used for DMA_PTE_CONTIG_MASK) > + * 62: reserved > + * 63: available > */ > struct dma_pte { > u64 val; > @@ -263,6 +266,7 @@ struct dma_pte { > #define DMA_PTE_PROT (DMA_PTE_READ | DMA_PTE_WRITE) > #define DMA_PTE_SP (1 << 7) > #define DMA_PTE_SNP (1 << 11) > +#define DMA_PTE_CONTIG_MASK (0xfull << PADDR_BITS) > #define dma_clear_pte(p) do {(p).val = 0;} while(0) > #define dma_set_pte_readable(p) do {(p).val |= DMA_PTE_READ;} while(0) > #define dma_set_pte_writable(p) do {(p).val |= DMA_PTE_WRITE;} while(0) > @@ -276,7 +280,7 @@ struct dma_pte { > #define dma_pte_write(p) (dma_pte_prot(p) & DMA_PTE_WRITE) > #define dma_pte_addr(p) ((p).val & PADDR_MASK & PAGE_MASK_4K) > #define dma_set_pte_addr(p, addr) do {\ > - (p).val |= ((addr) & PAGE_MASK_4K); } while (0) > + (p).val |= ((addr) & PADDR_MASK & PAGE_MASK_4K); } while (0) While I'm not opposed to this, I would assume that addr is not expected to contain bit cleared by PADDR_MASK? (or PAGE_MASK_4K FWIW) Or else callers are really messed up. > #define dma_pte_present(p) (((p).val & DMA_PTE_PROT) != 0) > #define dma_pte_superpage(p) (((p).val & DMA_PTE_SP) != 0) > > --- a/xen/drivers/passthrough/x86/iommu.c > +++ b/xen/drivers/passthrough/x86/iommu.c > @@ -522,11 +522,12 @@ int iommu_free_pgtables(struct domain *d > return 0; > } > > -struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd) > +struct page_info *iommu_alloc_pgtable(struct domain_iommu *hd, > + uint64_t contig_mask) > { > unsigned int memflags = 0; > struct page_info *pg; > - void *p; > + uint64_t *p; > > #ifdef CONFIG_NUMA > if ( hd->node != NUMA_NO_NODE ) > @@ -538,7 +539,29 @@ struct page_info *iommu_alloc_pgtable(st > return NULL; > > p = __map_domain_page(pg); > - clear_page(p); > + > + if ( contig_mask ) > + { > + /* See pt-contig-markers.h for a description of the marker scheme. */ > + unsigned int i, shift = find_first_set_bit(contig_mask); > + > + ASSERT(((PAGE_SHIFT - 3) & (contig_mask >> shift)) == PAGE_SHIFT - > 3); I think it might be clearer to use PAGETABLE_ORDER rather than PAGE_SHIFT - 3. Thanks, Roger.
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |