[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] MSI message data register configuration in Xen guests
On Fri, Jun 29, 2012 at 4:10 AM, Stefano Stabellini <stefano.stabellini@xxxxxxxxxxxxx> wrote: > On Thu, 28 Jun 2012, Deep Debroy wrote: >> On Wed, Jun 27, 2012 at 4:18 PM, Deep Debroy <ddebroy@xxxxxxxxx> wrote: >> > On Mon, Jun 25, 2012 at 7:51 PM, Rolu <rolu@xxxxxxxx> wrote: >> >> >> >> On Tue, Jun 26, 2012 at 4:38 AM, Deep Debroy <ddebroy@xxxxxxxxx> wrote: >> >> > Hi, I was playing around with a MSI capable virtual device (so far >> >> > submitted as patches only) in the upstream qemu tree but having >> >> > trouble getting it to work on a Xen hvm guest. The device happens to >> >> > be a QEMU implementation of VMWare's pvscsi controller. The device >> >> > works fine in a Xen guest when I switch the device's code to force >> >> > usage of legacy interrupts with upstream QEMU. With MSI based >> >> > interrupts, the device works fine on a KVM guest but as stated before, >> >> > not on a Xen guest. After digging a bit, it appears, the reason for >> >> > the failure in Xen guests is that the MSI data register in the Xen >> >> > guest ends up with a value of 4300 where the Deliver Mode value of 3 >> >> > happens to be reserved (per spec) and therefore illegal. The >> >> > vmsi_deliver routine in Xen rejects MSI interrupts with such data as >> >> > illegal (per expectation) causing all commands issued by the guest OS >> >> > on the device to timeout. >> >> > >> >> > Given this above scenario, I was wondering if anyone can shed some >> >> > light on how to debug this further for Xen. Something I would >> >> > specifically like to know is where the MSI data register configuration >> >> > actually happens. Is it done by some code specific to Xen and within >> >> > the Xen codebase or it all done within QEMU? >> >> > >> >> >> >> This seems like the same issue I ran into, though in my case it is >> >> with passed through physical devices. See >> >> http://lists.xen.org/archives/html/xen-devel/2012-06/msg01423.html and >> >> the older messages in that thread for more info on what's going on. No >> >> fix yet but help debugging is very welcome. >> > >> > Thanks Rolu for pointing out the other thread - it was very useful. >> > Some of the symptoms appear to be identical in my case. However, I am >> > not using a pass-through device. Instead, in my case it's a fully >> > virtualized device pretty much identical to a raw file backed disk >> > image where the controller is pvscsi rather than lsi. Therefore I >> > guess some of the latter discussion in the other thread around >> > pass-through specific areas of code in qemu are not relevant? Please >> > correct me if I am wrong. Also note that I am using upstream qemu >> > where neither the #define for PT_PCI_MSITRANSLATE_DEFAULT nor >> > xenstore.c exsits (which is where Stefano's suggested change appeared >> > to be). >> > >> > So far, here's what I am observing in the hvm linux guest : >> > >> > On the guest side, as discussed in the other thread, >> > xen_hvm_setup_msi_irqs is invoked for the device and a value of 0x4300 >> > is being by xen_msi_compose_msg that is written in the data register. >> > On the qemu (upstream) side, when the virtualized controller is trying >> > to complete a request, it's invoking the following chain of calls -> >> > stl_le_phys -> xen_apic_mem_write -> xen_hvm_inject_msi >> > On the xen side, this ends up in: hvmop_inject_msi -> hvm_inject_msi >> > -> vmsi_deliver. vmsi_deliver, as previously discussed, rejects the >> > delivery mode of 0x3. >> > >> > Is the above sequence of interactions the expected path for a HVM >> > guest trying to use a fully virtualized device/controller that uses >> > MSI in upstream qemu? If so, if a standard linux guest always >> > populates the value of 0x4300 in the MSI data register through >> > xen_hvm_setup_msi_irqs, how are MSI notifications from a device in >> > qemu supposed to work given the delivery type of 0x3 is indeed >> > reserved and bypass the the vmsi_deliver check? >> > >> I wanted to see whether the HVM guest can interact with the MSI >> virtualized controller properly without any of the Xen-specific code >> in the linux kernel kicking in (i.e. allowing the regular PCI/MSI code >> in linux to fire). So I rebuilt the kernel with CONFIG_XEN disabled >> such that pci_xen_hvm_init no longer sets x86_msi.*msi_irqs to xen >> specific routines like xen_hvm_setup_msi_irqs which is where the >> 0x4300 is getting populated. This seems to work properly. The MSI data >> register for the controller ends up getting a valid value like 0x4049, >> vmsi_deliver no longer complains, all MSI notifications are delivered >> in the expected way to the guest and the raw, file-backed disks >> attached to the controller showing up in fdisk -l. >> >> My conclusion: the linux kernel's xen specific code, specifically >> routines like xen_hvm_setup_msi_irqs, need to be tweaked to work with >> fully virtualized qemu devices that use MSI. I will follow-up >> regarding that on LKML. > > Thanks for your analysis of the problem, I think it is correct: Linux PV > on HVM is trying to setup an event channel delivery for the MSI as it > always does (therefore choosing 0x3 as delivery mode). > However emulated devices in QEMU don't support that. > To be honest emulated devices in QEMU didn't support MSIs at all until > very recently, so this is why we are seeing this issue only now. > > Could you please try this Xen patch and let me know if it makes things > better? > Thanks Stefano. I have tested the below patch with the MSI device and it's now working (without any additional changes to the linux guest kernel). > > diff --git a/xen/arch/x86/hvm/irq.c b/xen/arch/x86/hvm/irq.c > index a90927a..f44f3b9 100644 > --- a/xen/arch/x86/hvm/irq.c > +++ b/xen/arch/x86/hvm/irq.c > @@ -281,6 +281,31 @@ void hvm_inject_msi(struct domain *d, uint64_t addr, > uint32_t data) > >> MSI_DATA_TRIGGER_SHIFT; > uint8_t vector = data & MSI_DATA_VECTOR_MASK; > > + if ( !vector ) > + { > + int pirq = ((addr >> 32) & 0xffffff00) | ((addr >> 12) & 0xff); > + if ( pirq > 0 ) > + { > + struct pirq *info = pirq_info(d, pirq); > + > + /* if it is the first time, allocate the pirq */ > + if (info->arch.hvm.emuirq == IRQ_UNBOUND) > + { > + spin_lock(&d->event_lock); > + map_domain_emuirq_pirq(d, pirq, IRQ_MSI_EMU); > + spin_unlock(&d->event_lock); > + } else if (info->arch.hvm.emuirq != IRQ_MSI_EMU) > + { > + printk("%s: pirq %d does not correspond to an emulated > MSI\n", __func__, pirq); > + return; > + } > + send_guest_pirq(d, info); > + return; > + } else { > + printk("%s: error getting pirq from MSI: pirq = %d\n", __func__, > pirq); > + } > + } > + > vmsi_deliver(d, vector, dest, dest_mode, delivery_mode, trig_mode); > } > > diff --git a/xen/include/asm-x86/irq.h b/xen/include/asm-x86/irq.h > index 40e2245..066f64d 100644 > --- a/xen/include/asm-x86/irq.h > +++ b/xen/include/asm-x86/irq.h > @@ -188,6 +188,7 @@ void cleanup_domain_irq_mapping(struct domain *); > }) > #define IRQ_UNBOUND -1 > #define IRQ_PT -2 > +#define IRQ_MSI_EMU -3 > > bool_t cpu_has_pending_apic_eoi(void); > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |