[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH v4 3/3] VT-d: Fix vt-d Device-TLB flush timeout issue.



>>> On 27.01.16 at 13:38, <quan.xu@xxxxxxxxx> wrote:
>>  On January 27, 2016 at 7:24pm, <JBeulich@xxxxxxxx> wrote:
>> >>> On 27.01.16 at 12:09, <quan.xu@xxxxxxxxx> wrote:
>> >>  On January 27, 2016 at 6:48am, <Tian, Kevin> wrote:
>> >> > From: Jan Beulich [mailto:JBeulich@xxxxxxxx]
>> >> > Sent: Tuesday, January 26, 2016 11:53 PM
>> >
>> >
>> >> > Once again: Before getting started, please assess which route is
>> >> > going to be the better one. Remember that we had already discussed
>> >> > and put aside some form of deferring the hiding of devices, so if
>> >> > you come back with a patch doing that again, you'll have to be able
>> >> > to explain why the alternative(s) are worse.
>> >> >
>> >>
>> >> Quan, could you list pros/cons of those alternatives based on discussion 
>> >> so
>> far?
>> >> Then we can decide which way should be done before you go to actual
>> coding.
>> >> Earlier suggestion on hiding device immediately is under the
>> >> assumption that all locks have been held. If this part becomes too
>> >> complex, and you can explain clearly that deferring the hiding action
>> >> doesn't lead to any race condition, then people can see why you are
>> proposing defer again.
>> >
>> >
>> > The following are pros/cons of those alternatives. It is also why I
>> > propose defer again.
>> >
>> > -- --
>> > 1. Hiding the devices immediately
>> > Pros:
>> >      * it makes whatever changes are ASAP after the Device-TLB flush error.
>> >
>> > Cons:
>> >      * It may break the code path.
>> >      * It may lead to any race condition.
>> >      * Hiding the devices immediately is under the assumption that all 
> locks
>> have been held.
>> >       Different locking state is possible for different call trees. This 
> part
>> becomes too complex.
>> 
>> So you just repeat what you've already said before. "This part becomes too
>> complex" you say without any kind of proof, yet that's what we need to
>> understand whether the alternative of doing the locking correctly really is 
> this
>> bad (and I continue to not see why it would).
> 
> 
> Such as pcidevs_lock:
> 
> 1. as I mentioned, it is indeed different locking state is possible for 
> different call trees of flush Device-TLB. When Device-TLB flush is error, It 
> is 
> really challenge to judge whether to acquire the pcidevs_lock or not.
> 
>    For example,
>    *It is _not_under_ lock for the following call tree:
> $ flush_iotlb_qi()--- iommu_flush_iotlb_psi() -- __intel_iommu_iotlb_flush() 
> --intel_iommu_iotlb_flush() --iommu_iotlb_flush() 
> --xenmem_add_to_physmap()--do_memory_op() 
> 
>    *It is _under_ lock for the following call tree:
> $flush_iotlb_qi()--iommu_flush_iotlb_dsi()--domain_context_unmap_one()--domain_con
> text_unmap()--reassign_device_ownership()--deassign_device()-iommu_do_pci_domctl()
> 
> 2. if I try to acquire the pcidevs_lock for some _not_under_ lock call tree, 
> it makes things worse. As the pcidevs_lock is a big lock, then
>   Frequent memory modification may block the pci-device assign due to the 
> pcidevs_lock. if I try to split the pcidevs_lock into small locks.
>   It may takes a great deal of time to make it stable.

I don't understand this, namely in the context of my suggestion to
simply pass down a flag indicating whether the lock is being held
(and hence acquiring it only in the most narrow scope if not already
owning it).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.