[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough


  • To: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>
  • From: Dante Cinco <dantecinco@xxxxxxxxx>
  • Date: Thu, 11 Nov 2010 14:32:15 -0800
  • Cc: Xen-devel <xen-devel@xxxxxxxxxxxxxxxxxxx>
  • Delivery-date: Thu, 11 Nov 2010 14:33:46 -0800
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=ZF1Do4gWHZ8Wx4NzUds+qFRllWaAPnwMug0qgtKq9CBgXDYq8vU773RaSLSLtmEtjv Ep9JXSTTtVOzYaBcGDuHV8EPrOKu9nRpblJ74SEVmgogFb4+A8kbT39fHX6nevyZgUrl 0uzFvYDeRwylsem2WTUgsSuCbdGYGDprYe+OM=
  • List-id: Xen developer discussion <xen-devel.lists.xensource.com>

With iommu=off,verbose in the Xen commandline, pvops domU works only
with swiotlb=force and with the same performance degradation. Without
swiotlb=force, there's no NMI but DMA does not work (see Ray Lin's
reply on Thu 11/11/2010 11:42 AM).

The XenPCIpassthrough wiki
(http://wiki.xensource.com/xenwiki/XenPCIpassthrough) talks about
setting iommu=pv in order to use the hardware IOMMU (VT-d) passthru
for PV guests but I didn't see any difference compared to my original
setting (iommu=1,passthrough,no-intremap). Is iommu=pv still required
for this particular pvops domU kernel (xen-pcifront-0.8.2) and if it
is, what should I be looking for in the Xen log (xm dmesg) to verify
its efficacy?

With my original setting (iommu=1,passthrough,no-intremap), here's what I see:

(XEN) [VT-D]dmar.c:702: Host address width 39
(XEN) [VT-D]dmar.c:717: found ACPI_DMAR_DRHD:
(XEN) [VT-D]dmar.c:413:   dmaru->address = e7ffe000
(XEN) [VT-D]iommu.c:1136: drhd->address = e7ffe000 iommu->reg = ffff82c3fff57000
(XEN) [VT-D]iommu.c:1138: cap = c90780106f0462 ecap = f0207e
(XEN) [VT-D]dmar.c:356:   IOAPIC: 0:1e.1
(XEN) [VT-D]dmar.c:356:   IOAPIC: 0:13.0
(XEN) [VT-D]dmar.c:427:   flags: INCLUDE_ALL
(XEN) [VT-D]dmar.c:722: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.7
(XEN) [VT-D]dmar.c:594:   RMRR region: base_addr df7fc000 end_address df7fdfff
(XEN) [VT-D]dmar.c:722: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.0
(XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.1
(XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.2
(XEN) [VT-D]dmar.c:341:   endpoint: 0:1d.3
(XEN) [VT-D]dmar.c:341:   endpoint: 2:0.0
(XEN) [VT-D]dmar.c:341:   endpoint: 2:0.2
(XEN) [VT-D]dmar.c:341:   endpoint: 2:0.4
(XEN) [VT-D]dmar.c:594:   RMRR region: base_addr df7f5000 end_address df7fafff
(XEN) [VT-D]dmar.c:722: found ACPI_DMAR_RMRR:
(XEN) [VT-D]dmar.c:341:   endpoint: 5:0.0
(XEN) [VT-D]dmar.c:341:   endpoint: 2:0.0
(XEN) [VT-D]dmar.c:341:   endpoint: 2:0.2
(XEN) [VT-D]dmar.c:594:   RMRR region: base_addr df63e000 end_address df63ffff
(XEN) [VT-D]dmar.c:727: found ACPI_DMAR_ATSR:
(XEN) [VT-D]dmar.c:622:   atsru->all_ports: 0
(XEN) [VT-D]dmar.c:327:   bridge: 0:a.0  start = 0 sec = 7  sub = 7
(XEN) [VT-D]dmar.c:327:   bridge: 0:9.0  start = 0 sec = 8  sub = a
(XEN) [VT-D]dmar.c:327:   bridge: 0:8.0  start = 0 sec = b  sub = d
(XEN) [VT-D]dmar.c:327:   bridge: 0:7.0  start = 0 sec = e  sub = 10
(XEN) [VT-D]dmar.c:327:   bridge: 0:6.0  start = 0 sec = 18  sub = 1a
(XEN) [VT-D]dmar.c:327:   bridge: 0:5.0  start = 0 sec = 15  sub = 17
(XEN) [VT-D]dmar.c:327:   bridge: 0:4.0  start = 0 sec = 14  sub = 14
(XEN) [VT-D]dmar.c:327:   bridge: 0:3.0  start = 0 sec = 11  sub = 13
(XEN) [VT-D]dmar.c:327:   bridge: 0:2.0  start = 0 sec = 6  sub = 6
(XEN) [VT-D]dmar.c:327:   bridge: 0:1.0  start = 0 sec = 5  sub = 5
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping not enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) [VT-D]iommu.c:743: iommu_enable_translation: iommu->reg = ffff82c3fff57000

domU bringup:

(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 11:0.3
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 11:0.3
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 11:0.2
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 11:0.2
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 11:0.1
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 11:0.1
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 11:0.0
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 11:0.0
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 8:0.3
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 8:0.3
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 8:0.2
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 8:0.2
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 8:0.1
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 8:0.1
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 8:0.0
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 8:0.0
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 15:0.0
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 15:0.0
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 15:0.1
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 15:0.1
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 18:0.0
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 18:0.0
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = 18:0.1
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = 18:0.1
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = b:0.0
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = b:0.0
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = b:0.1
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = b:0.1
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = e:0.0
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = e:0.0
(XEN) [VT-D]iommu.c:1514: d0:PCIe: unmap bdf = e:0.1
(XEN) [VT-D]iommu.c:1387: d1:PCIe: map bdf = e:0.1
mapping kernel into physical memory
about to get started...

- Dante

On Thu, Nov 11, 2010 at 11:03 AM, Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
> On Thu, Nov 11, 2010 at 10:31:48AM -0800, Dante Cinco wrote:
>> Konrad,
>>
>> Without swiotlb=force, I don't see "PCI-DMA: Using software bounce
>> buffering for IO" in /var/log/kern.log.
>>
>> With iommu=soft and without swiotlb=force, I see the "software bounce
>> buffering" in /var/log/kern.log and an NMI (see below) when I load the
>> kernel module drivers. I made sure the NMI is reproducible and not a
>
> What is the kernel module doing to cause this? DMA?
>> one-time event.
>
> So doing 64-bit DMA causes an NMI. Do you have the Hypervisor's IOMMU VT-d
> enabled or disabled? (iommu=off,verbose) If you turn it off does this work?
>>
>> /var/log/kern.log (iommu=soft):
>> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
>> Placing 64MB software IO TLB between ffff880005800000 - ffff880009800000
>> software IO TLB at phys 0x5800000 - 0x9800000
>>
>> (XEN)
>> (XEN)
>> (XEN) NMI - I/O ERROR
>> (XEN) ----[ Xen-4.1-unstable  x86_64  debug=y  Not tainted ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82c4801701b2>] smp_send_event_check_mask+0x1/0x10
>> (XEN) RFLAGS: 0000000000000012   CONTEXT: hypervisor
>> (XEN) rax: 0000000000000080   rbx: ffff82c480287c48   rcx: 0000000000000000
>> (XEN) rdx: 0000000000000080   rsi: 0000000000000080   rdi: ffff82c480287c48
>> (XEN) rbp: ffff82c480287c78   rsp: ffff82c480287c38   r8:  0000000000000000
>> (XEN) r9:  0000000000000037   r10: 0000ffff0000ffff   r11: 00ff00ff00ff00ff
>> (XEN) r12: ffff82c48029f080   r13: 0000000000000001   r14: 0000000000000008
>> (XEN) r15: ffff82c4802b0c20   cr0: 000000008005003b   cr4: 00000000000026f0
>> (XEN) cr3: 00000001250a9000   cr2: 00007f6165ae9428
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c480287c38:
>> (XEN)    ffff82c480287c78 ffff82c48012001f 0000000000000100 0000000000000000
>> (XEN)    ffff82c480287ca8 ffff83011dadd8b0 ffff83019fffa9d0 ffff82c4802c2300
>> (XEN)    ffff82c480287cc8 ffff82c480117d0d ffff82c48029f080 0000000000000001
>> (XEN)    0000000000000100 0000000000000000 0000000000000002 ffff8300df606000
>> (XEN)    000000411de66867 ffff82c4802c2300 ffff82c480287d28 ffff82c48011f299
>> (XEN)    0000000000000100 0000000000000086 ffff83019e3fa000 ffff83011dadd8b0
>> (XEN)    ffff83019fffa9d0 ffff8300df606000 0000000000000000 0000000000000000
>> (XEN)    000000000000007f ffff83019fe02200 ffff82c480287d38 ffff82c48011f6ea
>> (XEN)    ffff82c480287d58 ffff82c48014e4c1 ffff83011dae2000 0000000000000066
>> (XEN)    ffff82c480287d68 ffff82c48014e54d ffff82c480287d98 ffff82c480105d59
>> (XEN)    ffff82c480287da8 ffff8301616a6990 ffff83011dae2000 0000000000000000
>> (XEN)    ffff82c480287da8 ffff82c480105f81 ffff82c480287e28 ffff82c48015c043
>> (XEN)    0000000000000043 0000000000000043 ffff83019fe02234 0000000000000000
>> (XEN)    000000000000010c 0000000000000000 0000000000000000 0000000000000002
>> (XEN)    ffff82c480287e10 ffff82c480287f18 ffff82c48024f6c0 ffff82c480287f18
>> (XEN)    ffff82c4802c2300 0000000000000002 00007d3b7fd781a7 ffff82c480154ee6
>> (XEN)    0000000000000002 ffff82c4802c2300 ffff82c480287f18 ffff82c48024f6c0
>> (XEN)    ffff82c480287ee0 ffff82c480287f18 00ff00ff00ff00ff 0000ffff0000ffff
>> (XEN)    0000000000000000 0000000000000000 ffff82c4802c23a0 0000000000000000
>> (XEN)    0000000000000000 ffff82c4802c2e80 0000000000000000 0000007a00000000
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82c4801701b2>] smp_send_event_check_mask+0x1/0x10
>> (XEN)    [<ffff82c480117d0d>] csched_vcpu_wake+0x2e1/0x302
>> (XEN)    [<ffff82c48011f299>] vcpu_wake+0x243/0x43e
>> (XEN)    [<ffff82c48011f6ea>] vcpu_unblock+0x4a/0x4c
>> (XEN)    [<ffff82c48014e4c1>] vcpu_kick+0x21/0x7f
>> (XEN)    [<ffff82c48014e54d>] vcpu_mark_events_pending+0x2e/0x32
>> (XEN)    [<ffff82c480105d59>] evtchn_set_pending+0xbf/0x190
>> (XEN)    [<ffff82c480105f81>] send_guest_pirq+0x54/0x56
>> (XEN)    [<ffff82c48015c043>] do_IRQ+0x3b2/0x59c
>> (XEN)    [<ffff82c480154ee6>] common_interrupt+0x26/0x30
>> (XEN)    [<ffff82c48014e3c3>] default_idle+0x82/0x87
>> (XEN)    [<ffff82c480150664>] idle_loop+0x5a/0x68
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) FATAL TRAP: vector = 2 (nmi)
>> (XEN) [error_code=0000] , IN INTERRUPT CONTEXT
>> (XEN) ****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>>
>> Dante
>>
>>
>> On Thu, Nov 11, 2010 at 8:04 AM, Konrad Rzeszutek Wilk
>> <konrad.wilk@xxxxxxxxxx> wrote:
>> > On Wed, Nov 10, 2010 at 05:16:14PM -0800, Dante Cinco wrote:
>> >> We have Fibre Channel HBA devices that we PCI passthrough to our pvops
>> >> domU kernel. Without swiotlb=force in the domU's kernel command line,
>> >> both domU and dom0 lock up after loading the kernel module drivers for
>> >> the HBA devices. With swiotlb=force, the domU and dom0 are stable
>> >
>> > Whoa. That is not good - what happens if you just pass in iommu=soft?
>> > Does the PCI-DMA: Using.. show up if you don't pass in any of those 
>> > parameters?
>> > (I don't think it does, but just doing 'iommu=soft' should enable it).
>> >
>> >
>> >> after loading the kernel module drivers but the I/O performance is at
>> >> least an order of magnitude worse than what we were seeing with the
>> >> HVM kernel. I see the following in /var/log/kern.log in the pvops
>> >> domU:
>> >>
>> >> PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
>> >> Placing 64MB software IO TLB between ffff880005800000 - ffff880009800000
>> >> software IO TLB at phys 0x5800000 - 0x9800000
>> >>
>> >> Is swiotlb=force responsible for the I/O performance degradation? I
>> >> don't understand what swiotlb=force does so I would appreciate an
>> >> explanation or a pointer.
>> >
>> > So, you should only need to use 'iommu=soft'. It will enable the Linux 
>> > kernel IOMMU
>> > to translate the pseudo-PFNs to the real machine frame numbers (bus 
>> > addresses).
>> >
>> > If your card is 64-bit, then that is all it would do. If however your card 
>> > is 32-bit
>> > and your are DMA-ing data from above the 32-bit limit, it would copy the 
>> > user-space page
>> > to memory below 4GB, DMA that, and when done, copy it back to the where 
>> > the user-space
>> > page is. This is called bounce-buffering and this is why you would use a 
>> > mix of
>> > pci_map_page, pci_sync_single_for_[cpu|device] calls around your driver.
>> >
>> > However, I think your cards are 64-bit, so you don't need this 
>> > bounce-buffering. But
>> > if you say 'swiotlb=force' it will force _all_ DMAs to go through the 
>> > bounce-buffer.
>> >
>> > So, try just 'iommu=soft' and see what happens.
>> >
>

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.