[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [PATCH] xen/pvcalls: fix potential endless loop in pvcalls-front.c



On 15/11/17 22:20, Stefano Stabellini wrote:
> On Wed, 15 Nov 2017, Boris Ostrovsky wrote:
>> On 11/15/2017 02:09 PM, Stefano Stabellini wrote:
>>> On Wed, 15 Nov 2017, Juergen Gross wrote:
>>>>>>>         while(mutex_is_locked(&map->active.in_mutex.owner) ||
>>>>>>>               mutex_is_locked(&map->active.out_mutex.owner))
>>>>>>>                 cpu_relax();
>>>>>>>
>>>>>>> ?
>>>>>> I'm not convinced there isn't a race.
>>>>>>
>>>>>> In pvcalls_front_recvmsg() sock->sk->sk_send_head is being read and only
>>>>>> then in_mutex is taken. What happens if pvcalls_front_release() resets
>>>>>> sk_send_head and manages to test the mutex before the mutex is locked?
>>>>>>
>>>>>> Even in case this is impossible: the whole construct seems to be rather
>>>>>> fragile.
>>> I agree it looks fragile, and I agree that it might be best to avoid the
>>> usage of in_mutex and out_mutex as refcounts. More comments on this
>>> below.
>>>
>>>  
>>>>> I think we can wait until pvcalls_refcount is 1 (i.e. it's only us) and
>>>>> not rely on mutex state.
>>>> Yes, this would work.
>>> Yes, I agree it would work and for the sake of getting something in
>>> shape for the merge window I am attaching a patch for it. Please go
>>> ahead with it. Let me know if you need anything else immediately, and
>>> I'll work on it ASAP.
>>>
>>>
>>>
>>> However, I should note that this is a pretty big hammer we are using:
>>> the refcount is global, while we only need to wait until it's only us
>>> _on this specific socket_.
>>
>> Can you explain why socket is important?
> 
> Yes, of course: there are going to be many open sockets on a given
> pvcalls connection. pvcalls_refcount is global: waiting on
> pvcalls_refcount means waiting until any operations on any unrelated
> sockets stop. While we only need to wait until the operations on the one
> socket we want to close stop.
> 
> 
>>>
>>> We really need a per socket refcount. If we don't want to use the mutex
>>> internal counters, then we need another one.
>>>
>>> See the appended patch that introduces a per socket refcount. However,
>>> for the merge window, also using pvcalls_refcount is fine.
>>>
>>> The race Juergen is concerned about is only theoretically possible:
>>>
>>> recvmsg:                                 release:
>>>   
>>>   test sk_send_head                      clear sk_send_head
>>>   <only few instructions>                <prepare a message>
>>>   grab in_mutex                          <send a message to the server>
>>>                                          <wait for an answer>
>>>                                          test in_mutex
>>>
>>> Without kernel preemption is not possible for release to clear
>>> sk_send_head and test in_mutex after recvmsg tests sk_send_head and
>>> before recvmsg grabs in_mutex.
>>
>> Sorry, I don't follow --- what does preemption have to do with this? If
>> recvmsg and release happen on different processors the order of
>> operations can be
>>
>> CPU0                                   CPU1
>>
>> test sk_send_head
>> <interrupt>
>>                                         clear sk_send_head
>>                                         <send a message to the server>
>>                                         <wait for an answer>
>>                                         test in_mutex
>>                                         free everything
>> grab in_mutex
>>
>> I actually think RCU should take care of all of this.
> 
> Preemption could cause something very similar to happen, but your
> example is very good too, even better, because it could trigger the
> issue even with preemption disabled. I'll think more about this and
> submit a separate patch on top of the simple pvcalls_refcount patch
> below.

We are running as a guest. Even with interrupts off the vcpu could be
off the pcpu for several milliseconds!

Don't count on code length to avoid races!


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
https://lists.xen.org/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.