[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough
On Tue, Nov 16, 2010 at 10:57 AM, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: >> >> Using the bounce buffers limits the DMA operations to under 32-bit. So >> >> could it be that you are using some casting macro that casts a PFN to >> >> unsigned long or vice-versa and we end up truncating it to 32-bit? (I've >> >> seen this issue actually with InfiniBand drivers back in RHEL5 days..). >> >> Lastly, do you set your DMA mask on the device to 32BIT? >> >> >> >> The tachyon chip supports both 32-bit & 45-bit dma. Some features need to >> >> set 32-bit physical addr to chip. Others need to set 45-bit physical addr >> >> to chip. >> > >> > Oh boy. That complicates it. >> > >> >> The driver doesn't set DMA mask on the device to 32 bit. >> > >> > Is it set then to 45bit? >> > >> >> We were not explicitly setting the DMA mask. pci_alloc_coherent was > > You should. But only once (during startup). > >> always returning 32 bits but pci_map_single was returning a 34-bit >> address which we truncate by casting it to a uint32_t since the > > Truncating any bus (DMA) address is a big no no. > >> Tachyon's HBA register is only 32 bits. With swiotlb=force, both > > Not knowing the driver I can't comment here much, but > 1). When you say 'HBA registers' I think PCI MMIO BARs. Those are > usually found beneath the 4GB limit and you get the virtual > address when doing ioremap (or the pci equivalant). And the > bus address is definitly under the 4GB. > 2). After you have done that, set your pci_dma_mask to 34-bit, and then > 2). For all other operations where you can do 34-bit use the pci_map > _single. The swiotlb buffer looks at the dma_mask (and if there > is no set it assumes 32bit), and if it finds the physical address > to be within the DMA mask it will gladly translate the physical > to bus and nothing else. If however the physical address is way > beyound the bus address it will give you the bounce buffer which > you will later have to copy from (using pci_sync..). I've written > a little blurp at the bottom of the email explaining this in more details. > > Or is the issue that when you write to your HBA register the DMA > address, the HBA register can _only_ deal with 32-bit values (4bytes)? The HBA register which is using the address returned by pci_map_single is limited to a 32-bit value. > In which case the PCI device seems to be limited to addressing only up to > 4GB, right? The HBA has some 32-bit registers and some that are 45-bit. > >> returned 32 bits without explicitly setting the DMA mask. Once we set >> the mask to 32 bits using pci_set_dma_mask, the NMIs stopped. However >> with iommu=soft (and no more swiotlb=force), we're still stuck with >> the abysmal I/O performance (same as when we had swiotlb=force). > > Right, that is expected. So with iommu=soft, all I/Os have to go through Xen-SWIOTLB which explains why we're seeing the abysmal I/O performance, right? Is it true then that with an HVM domU kernel and PCI passthrough, it does not use Xen-SWIOTLB and therefore results in better performance? > >> In pvops domU (xen-pcifront-0.8.2), what does iommu=soft do? What's >> the default if we don't specify it? Without it, we get no I/Os (it > > If you don't specify it you can't do PCI passthrough in PV guests. > It is automatically enabled when you boot Linux as Dom0. > >> seems the interrupts and/or DMA don't work). > > It has two purposes: > > 1). The predominant and which is used for both DomU and Dom0 is to > translate physical address to machine frame numbers (PFNs->MFNs). > Xen PV guests have a P2M array that is consulted when setting > virtual addresses (PTEs). For PCI BARs, they are equivalant > (PFN == MFN), but for memory regions they can be discontigous, > and in decreasing order. If you would traverse the P2M list you > could see: p2m(0x1000)==0x5121, p2m(0x1001)==0x5120, p2m(0x1002)==0x5119. > > So obviously we need a lookup mechanism to say find for > virtual address 0xfffff8000010000 the DMA address (bus address). > Naively on baremetal on X86 you could use virt_to_phy which would > get you PFN 0x10000. On Xen however, we need to consult the P2M array. > For example, for p2m[0x10000], the real machine frame number might > 0x102323. > > So when you do 'pci_map_*' Xen-SWIOTLB looks up the P2M to find you the > machine frame number and returns that (dma address aka bus address). That > is the value you tell the HBA to transform from/to. > > If you don't enable Xen-SWIOTLB, and use the native one (or none at all), > you end up programming the PCI driver with bogus data since the bus > address you > are giving the card does not correspond to the real bus address. > > 2). Using our example before, the p2m[0x10000] returned MFN 0x102323. That > MFN is above 4GB (0x100000) and if your device can _only_ do PCI Memory > Write > and PCI Memory Read b/c it only has 32-bit address bits we need some way > of still getting the contents of 0x102323 to the PCI card. This is where > bounce buffers come in play. During bootup, Xen-SWIOTLB initializes a 64MB > chunk of space that is underneath the 4GB space - it is also contingous. > When you do 'pci_map_*' Xen-SWIOTLB looks at the DMA mask you have, the > MFN, > and if DMA mask & MFN > DMA mask it copies the value from 0x102323 to one > it'ss > buffers, gives you the MFN of its buffer (say 0x20000) and you program > that > in the PCI card. When you get an interrupt from the PCI card, you call > pci_sync_* which copies from MFN 0x20000 to 0x102323 and sticks the MFN > 0x20000 > back on the list of buffers to be used. And now you have in MFN 0x102323 > the > result. > >> >> Are there any profiling tools you can suggest for domU? I was able to >> apply Dulloor's xenoprofile patch to our dom0 kernel (2.6.32.25-pvops) >> but not to xen-pcifront-0.8.2. > > Oh boy. I don't sorry. > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxx http://lists.xensource.com/xen-devel
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |