[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Patch RFC 00/13] VT-d Asynchronous Device-TLB Flush for ATS Device



Jan Beulich wrote on 2015-10-15:
>>>> On 15.10.15 at 10:52, <yang.z.zhang@xxxxxxxxx> wrote:
>> Jan Beulich wrote on 2015-10-15:
>>>>>> On 15.10.15 at 09:28, <yang.z.zhang@xxxxxxxxx> wrote:
>>>> The premise for a misbehaving guest to impact the system is that
>>>> the IOMMU is buggy which takes long time to complete the invalidation.
>>>> In other words, if all invalidations are able to complete within
>>>> several us, what's the matter to do with the spin time?
>>> 
>>> The risk of exploits of such poorly behaving IOMMUs. I.e. if
>>> properly
>> 
>> But this is not a software flaw. A guest has no way to know the
>> underlying IOMMU is wrong and it cannot exploit it.
> 
> A guest doesn't need to know what IOMMU is there in order to try some
> exploit. Plus - based other information it may be able to make an
> educated guess.

As I said before, the premise is the IOMMU is buggy. 

> 
>>> operating IOMMUs only require several us, why spin for several ms?
>> 
>> 10ms is just my suggestion. I don't know whether future hardware
>> will need more time to complete the invalidation. So I think we need
>> to have a large enough timeout here. Meanwhile, doesn't impact the
> scheduling.
> 
> It does, as explained further down in my previous reply.
> 
>>>>>> I remember the origin motivation to handle ATS problem is due to: 1.
>>>>>> ATS spec allow 60s timeout to completed the flush which Xen only
>>>>>> allows 1s, and 2. spin loop for 1s is not reasonable since it
>>>>>> will hurt the scheduler. For the former, as we discussed before,
>>>>>> either disable ATS support or only support some specific ATS
>>>>>> devices(complete the flush less than 10ms or 1ms) is acceptable.
>>>>>> For the latter, if spin loop for 1s is not acceptable, we can
>>>>>> reduce the timeout to 10ms or 1ms
>>>>> to eliminate the performance impaction.
>>>>> 
>>>>> If we really can, why has it been chosen to be 1s in the first place?
>>>> 
>>>> What I can tell is 1s is just the value the original author chooses.
>>>> It has no special means. I have double check with our hardware
>>>> expert and he suggests us to use the value as small as possible.
>>>> According his comment, 10ms is sufficiently large.
>>> 
>>> So here you talk about milliseconds again, while above you talked
>>> about microsecond. Can we at least settle on an order of what is
>>> required? 10ms is
>>> 10 times the minimum time slice credit1 allows, i.e.
>>> awfully long.
>> 
>> We can use an appropriate value which you think reasonable which can
>> cover most of invalidation cases. For left cases, the vcpu can yield
>> the CPU to others until a timer fired. In callback function, hypervisor
>> can check whether the invalidation is completed. If yes, schedule in
>> the vcpu. Otherwise, kill the guest due to unpredictable invalidation
>> timeout.
> 
> Using a timer implies you again think about pausing the vCPU until the
> invalidation completes. Which, as discussed before, has its own
> problems and, even worse, won't cover the domain's other vCPU-s or
> devices still possibly doing work involving the use of the being
> invalidated entries. Or did you have something else in mind?

Why not pause the whole domain? Based on Quan's data, all the invalidations in 
his experiment are completed within 3us. So perhaps 10us is enough to cover all 
invalidations in today's IOMMU.(I need to check with hardware expert to get the 
exact data). The timer mechanism is a backup which only for the extreme case 
which exists in theory. So the probability for a guest trigger the timer 
mechanism can be ignored. Even it happens, it only affect guest itself.

> 
> IOW - as soon as spinning time reaches the order of the scheduler time
> slice, I think the only sane model is async operation with proper refcounting.
> 
> Jan


Best regards,
Yang



_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.