[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.



> On January 27, 2016 10:29pm, <JBeulich@xxxxxxxx> wrote:
> >>> On 27.01.16 at 15:13, <quan.xu@xxxxxxxxx> wrote:
> >>  On January 27, 2016 at 9:15pm, <JBeulich@xxxxxxxx> wrote:
> >> >>> On 27.01.16 at 13:38, <quan.xu@xxxxxxxxx> wrote:
> >> >>  On January 27, 2016 at 7:24pm, <JBeulich@xxxxxxxx> wrote:
> >> >> >>> On 27.01.16 at 12:09, <quan.xu@xxxxxxxxx> wrote:
> >> >> >>  On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
> >> >> >> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
> >> >> >> > Sent: Tuesday, January 26, 2016 11:53 PM
> >> >> >
> >> >> >
> >> >> >> > Once again: Before getting started, please assess which route
> >> >> >> > is going to be the better one. Remember that we had already
> >> >> >> > discussed and put aside some form of deferring the hiding of
> >> >> >> > devices, so if you come back with a patch doing that again,
> >> >> >> > you'll have to be able to explain why the alternative(s) are worse.
> >> >> >> >
> >> >> >>
> >> >> >> Quan, could you list pros/cons of those alternatives based on
> >> >> >> discussion so
> >> >> far?
> >> >> >> Then we can decide which way should be done before you go to
> >> >> >> actual
> >> >> coding.
> >> >> >> Earlier suggestion on hiding device immediately is under the
> >> >> >> assumption that all locks have been held. If this part becomes
> >> >> >> too complex, and you can explain clearly that deferring the
> >> >> >> hiding action doesn't lead to any race condition, then people
> >> >> >> can see why you are
> >> >> proposing defer again.
> >> >> >
> >> >> >
> >> >> > The following are pros/cons of those alternatives. It is also
> >> >> > why I propose defer again.
> >> >> >
> >> >> > -- --
> >> >> > 1. Hiding the devices immediately
> >> >> > Pros:
> >> >> >      * it makes whatever changes are ASAP after the Device-TLB
> >> >> > flush
> >> error.
> >> >> >
> >> >> > Cons:
> >> >> >      * It may break the code path.
> >> >> >      * It may lead to any race condition.
> >> >> >      * Hiding the devices immediately is under the assumption
> >> >> > that all
> >> > locks
> >> >> have been held.
> >> >> >       Different locking state is possible for different call trees.
> >> >> > This
> >> > part
> >> >> becomes too complex.
> >> >>
> >> >> So you just repeat what you've already said before. "This part
> >> >> becomes too complex" you say without any kind of proof, yet that's
> >> >> what we need to understand whether the alternative of doing the
> >> >> locking correctly really is
> >> > this
> >> >> bad (and I continue to not see why it would).
> >> >
> >> >
> >> > Such as pcidevs_lock:
> >> >
> >> > 1. as I mentioned, it is indeed different locking state is possible
> >> > for different call trees of flush Device-TLB. When Device-TLB flush
> >> > is error, It is really challenge to judge whether to acquire the
> >> > pcidevs_lock or
> >> not.
> >> >
> >> >    For example,
> >> >    *It is _not_under_ lock for the following call tree:
> >> > $ flush_iotlb_qi()--- iommu_flush_iotlb_psi() --
> >> > __intel_iommu_iotlb_flush()
> >> > --intel_iommu_iotlb_flush() --iommu_iotlb_flush()
> >> > --xenmem_add_to_physmap()--do_memory_op()
> >> >
> >> >    *It is _under_ lock for the following call tree:
> >> > $flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_on
> >> > e()
> >> > --domain_con
> >> > text_unmap()--reassign_device_ownership()--deassign_device()-iommu_
> >> > do_
> >> > pci_domctl()
> >> >
> >> > 2. if I try to acquire the pcidevs_lock for some _not_under_ lock
> >> > call tree, it makes things worse. As the pcidevs_lock is a big lock, then
> >> >   Frequent memory modification may block the pci-device assign due
> >> > to the pcidevs_lock. if I try to split the pcidevs_lock into small locks.
> >> >   It may takes a great deal of time to make it stable.
> >>
> >> I don't understand this, namely in the context of my suggestion to
> >> simply pass down a flag indicating whether the lock is being held
> >> (and hence acquiring it only in the most narrow scope if not already owning
> it).
> >>
> >
> > This is also an idea.
> > BTW, Does the lock refer to pcidevs_lock?
> 
> Yes, for now I assume that only that lock actually needs special attention.
> 

Thanks. I think so too.
-Quan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.