[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.
> On January 27, 2016 at 7:24pm, <JBeulich@xxxxxxxx> wrote: > >>> On 27.01.16 at 12:09, <quan.xu@xxxxxxxxx> wrote: > >> On January 27, 2016 at 6:48am, <Tian, Kevin> wrote: > >> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx] > >> > Sent: Tuesday, January 26, 2016 11:53 PM > > > > > >> > Once again: Before getting started, please assess which route is > >> > going to be the better one. Remember that we had already discussed > >> > and put aside some form of deferring the hiding of devices, so if > >> > you come back with a patch doing that again, you'll have to be able > >> > to explain why the alternative(s) are worse. > >> > > >> > >> Quan, could you list pros/cons of those alternatives based on discussion so > far? > >> Then we can decide which way should be done before you go to actual > coding. > >> Earlier suggestion on hiding device immediately is under the > >> assumption that all locks have been held. If this part becomes too > >> complex, and you can explain clearly that deferring the hiding action > >> doesn't lead to any race condition, then people can see why you are > proposing defer again. > > > > > > The following are pros/cons of those alternatives. It is also why I > > propose defer again. > > > > -- -- > > 1. Hiding the devices immediately > > Pros: > > * it makes whatever changes are ASAP after the Device-TLB flush error. > > > > Cons: > > * It may break the code path. > > * It may lead to any race condition. > > * Hiding the devices immediately is under the assumption that all locks > have been held. > > Different locking state is possible for different call trees. This > > part > becomes too complex. > > So you just repeat what you've already said before. "This part becomes too > complex" you say without any kind of proof, yet that's what we need to > understand whether the alternative of doing the locking correctly really is > this > bad (and I continue to not see why it would). Such as pcidevs_lock: 1. as I mentioned, it is indeed different locking state is possible for different call trees of flush Device-TLB. When Device-TLB flush is error, It is really challenge to judge whether to acquire the pcidevs_lock or not. For example, *It is _not_under_ lock for the following call tree: $ flush_iotlb_qi()--- iommu_flush_iotlb_psi() -- __intel_iommu_iotlb_flush() --intel_iommu_iotlb_flush() --iommu_iotlb_flush() --xenmem_add_to_physmap()--do_memory_op() *It is _under_ lock for the following call tree: $flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_one()--domain_context_unmap()--reassign_device_ownership()--deassign_device()-iommu_do_pci_domctl() 2. if I try to acquire the pcidevs_lock for some _not_under_ lock call tree, it makes things worse. As the pcidevs_lock is a big lock, then Frequent memory modification may block the pci-device assign due to the pcidevs_lock. if I try to split the pcidevs_lock into small locks. It may takes a great deal of time to make it stable. Also I should consider the other locks.. -Quan _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |