[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH RFC] x86/vMSI-X: avoid missing first unmask of vectors



>>> On 01.04.16 at 15:01, <Paul.Durrant@xxxxxxxxxx> wrote:
>> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> Sent: 01 April 2016 12:21
>> >>> On 01.04.16 at 12:56, <Paul.Durrant@xxxxxxxxxx> wrote:
>> >> From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> >> Sent: 01 April 2016 10:59
>> >> >>> On 01.04.16 at 11:15, <JBeulich@xxxxxxxx> wrote:
>> >> > Recent changes to Linux result in there just being a single unmask
>> >> > operation prior to expecting the first interrupts to arrive. However,
>> >> > we've had a chicken-and-egg problem here: Qemu invokes
>> >> > xc_domain_update_msi_irq(), ultimately leading to
>> >> > msixtbl_pt_register(), upon seeing that first unmask operation. Yet
>> >> > for msixtbl_range() to return true (in order to msixtbl_write() to get
>> >> > invoked at all) msixtbl_pt_register() must have completed.
>> >> >
>> >> > Deal with this by snooping suitable writes in msixtbl_range() and
>> >> > triggering the invocation of msix_write_completion() from
>> >> > msixtbl_pt_register() when that happens in the context of a still in
>> >> > progress vector control field write.
>> >> >
>> >> > Note that the seemingly unrelated deletion of the redundant
>> >> > irq_desc->msi_desc checks in msixtbl_pt_register() is to make clear to
>> >> > any compiler version used that the "msi_desc" local variable isn't
>> >> > being used uninitialized. (Doing the same in msixtbl_pt_unregister() is
>> >> > just for consistency reasons.)
>> >> >
>> >> > Signed-off-by: Jan Beulich <jbeulich@xxxxxxxx>
>> >> > ---
>> >> > TODO: How to deal with REP MOVS to the MSI-X table (in
>> msixtbl_range())?
>> >>
>> >> Some more detail on the thoughts I so far had for this aspect:
>> >> It has always been puzzling me that the hypervisor doesn't
>> >> see _all_ the MSI-X table accesses (which is a result of the
>> >> addresses only getting registered via XEN_DOMCTL_bind_pt_irq);
>> >> it's quite natural that this is an at least latent issue possibly
>> >> causing guest misbehavior. I cannot, however, currently see any
>> >> way to address this without altering both Xen and qemu, since for
>> >> Xen to see all accesses it would need to become aware of the
>> >> GPA of the MSI-X table much earlier (read: before the domain
>> >> actually start, or at the latest when the domain first enables
>> >> memory decoding on the device).
>> >>
>> >> The mapping of the MMIO BARs of the device into guest memory,
>> >> however, intentionally excludes the page(s) covering the MSI-X
>> >> table, so the hypervisor can't become aware of them by just
>> >> looking at data it gets presented today. Hence either we need to
>> >> add some new hypercall for qemu to invoke, or we need to make
>> >> qemu map the full BAR ranges, filtering out MSI-X table pages
>> >> in the hypervisor (using those mapping requests just to learn the
>> >> GPA of the MSI-X table, without entering them into P2M).
>> >>
>> >> Unless someone can think of a way which doesn't require altering
>> >> both qemu and Xen (creating the well known compatibility issues
>> >> between unmatched pairs), I think the patch as presented should
>> >> be okay without handling this case, i.e. best possible effort, and a
>> >> subsequent change then ought to be to deal with this by changing
>> >> both components. In which case I'd suggest that the change here
>> >> go into 4.7, but the full fix would then likely need deferring until
>> >> 4.8.
>> >
>> > I guess it could be handled entirely in Xen if we are willing to snoop on
>> > PCI configuration. It would not be too hard to snoop guest writes to the
>> BARs
>> > in config space so that Xen can keep track of where they are. Snooping on
>> the
>> > MSI-X capability could then tell Xen when to start interposing on the 
>> > table,
>> > and allow it to discover the GPA at that point (via the BIR and offset
>> > values).
>> 
>> Well, that's a possibility, but won't - afaict - work without qemu's
>> help at another point: So far we don't know the guest's PCI bus
>> topology, hence we can't correlate vBAR writes we might snoop
>> with the physical devices they correspond to.
> 
> Does Xen need to know that correspondence just to track state? I thought the 
> problem here was that Xen does not see every guest access to an MSI-X table. 
> If Xen always interposes on MSI-X tables then it can at least track the state 
> of the emulated table, even if we end up just forwarding the access for QEMU 
> to handle at first. When the mapping is created to the actual h/w table then, 
> presumably, Xen's idea of state should correspond to QEMU's.

But Xen doesn't see the guest view of config space, and the
whereabouts of the MSI-X table are in read-only config space fields
(i.e. snooping writes to those doesn't make sense, as the guest may
not ever write them or write anything, which would just get dropped
on the floor).

And additionally msixtbl_addr_to_desc() needs to know the physical
device.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.