Xen project Mailing List

Re: [Xen-devel] [PATCH] xen/pvcalls: fix potential endless loop in pvcalls-front.c

To: Juergen Gross <jgross@xxxxxxxx>, Stefano Stabellini <sstabellini@xxxxxxxxxx>

From: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>

Date: Tue, 14 Nov 2017 16:46:06 -0500

Delivery-date: Tue, 14 Nov 2017 21:46:25 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On 11/14/2017 04:11 AM, Juergen Gross wrote: > On 13/11/17 19:33, Stefano Stabellini wrote: >> On Mon, 13 Nov 2017, Juergen Gross wrote: >>> On 11/11/17 00:57, Stefano Stabellini wrote: >>>> On Tue, 7 Nov 2017, Juergen Gross wrote: >>>>> On 06/11/17 23:17, Stefano Stabellini wrote: >>>>>> mutex_trylock() returns 1 if you take the lock and 0 if not. Assume you >>>>>> take in_mutex on the first try, but you can't take out_mutex. Next times >>>>>> you call mutex_trylock() in_mutex is going to fail. It's an endless >>>>>> loop. >>>>>> >>>>>> Solve the problem by moving the two mutex_trylock calls to two separate >>>>>> loops. >>>>>> >>>>>> Reported-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx> >>>>>> Signed-off-by: Stefano Stabellini <sstabellini@xxxxxxxxxx> >>>>>> CC: boris.ostrovsky@xxxxxxxxxx >>>>>> CC: jgross@xxxxxxxx >>>>>> --- >>>>>> drivers/xen/pvcalls-front.c | 5 +++-- >>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c >>>>>> index 0c1ec68..047dce7 100644 >>>>>> --- a/drivers/xen/pvcalls-front.c >>>>>> +++ b/drivers/xen/pvcalls-front.c >>>>>> @@ -1048,8 +1048,9 @@ int pvcalls_front_release(struct socket *sock) >>>>>> * is set to NULL -- we only need to wait for the >>>>>> existing >>>>>> * waiters to return. >>>>>> */ >>>>>> - while (!mutex_trylock(&map->active.in_mutex) || >>>>>> - !mutex_trylock(&map->active.out_mutex)) >>>>>> + while (!mutex_trylock(&map->active.in_mutex)) >>>>>> + cpu_relax(); >>>>>> + while (!mutex_trylock(&map->active.out_mutex)) >>>>>> cpu_relax(); >>>>> Any reason you don't just use mutex_lock()? >>>> Hi Juergen, sorry for the late reply. >>>> >>>> Yes, you are right. Given the patch, it would be just the same to use >>>> mutex_lock. >>>> >>>> This is where I realized that actually we have a problem: no matter if >>>> we use mutex_lock or mutex_trylock, there are no guarantees that we'll >>>> be the last to take the in/out_mutex. Other waiters could be still >>>> outstanding. >>>> >>>> We solved the same problem using a refcount in pvcalls_front_remove. In >>>> this case, I was thinking of reusing the mutex internal counter for >>>> efficiency, instead of adding one more refcount. >>>> >>>> For using the mutex as a refcount, there is really no need to call >>>> mutex_trylock or mutex_lock. I suggest checking on the mutex counter >>>> directly: >>>> >>>> >>>> while (atomic_long_read(&map->active.in_mutex.owner) != 0UL || >>>> atomic_long_read(&map->active.out_mutex.owner) != 0UL) >>>> cpu_relax(); >>>> >>>> Cheers, >>>> >>>> Stefano >>>> >>>> >>>> --- >>>> >>>> xen/pvcalls: fix potential endless loop in pvcalls-front.c >>>> >>>> mutex_trylock() returns 1 if you take the lock and 0 if not. Assume you >>>> take in_mutex on the first try, but you can't take out_mutex. Next time >>>> you call mutex_trylock() in_mutex is going to fail. It's an endless >>>> loop. >>>> >>>> Actually, we don't want to use mutex_trylock at all: we don't need to >>>> take the mutex, we only need to wait until the last mutex waiter/holder >>>> releases it. >>>> >>>> Instead of calling mutex_trylock or mutex_lock, just check on the mutex >>>> refcount instead. >>>> >>>> Reported-by: Dan Carpenter <dan.carpenter@xxxxxxxxxx> >>>> Signed-off-by: Stefano Stabellini <sstabellini@xxxxxxxxxx> >>>> CC: boris.ostrovsky@xxxxxxxxxx >>>> CC: jgross@xxxxxxxx >>>> >>>> diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c >>>> index 0c1ec68..9f33cb8 100644 >>>> --- a/drivers/xen/pvcalls-front.c >>>> +++ b/drivers/xen/pvcalls-front.c >>>> @@ -1048,8 +1048,8 @@ int pvcalls_front_release(struct socket *sock) >>>> * is set to NULL -- we only need to wait for the existing >>>> * waiters to return. >>>> */ >>>> - while (!mutex_trylock(&map->active.in_mutex) || >>>> - !mutex_trylock(&map->active.out_mutex)) >>>> + while (atomic_long_read(&map->active.in_mutex.owner) != 0UL || >>>> + atomic_long_read(&map->active.out_mutex.owner) != 0UL) >>> I don't like this. >>> >>> Can't you use a kref here? Even if it looks like more overhead it is >>> much cleaner. There will be no questions regarding possible races, >>> while with an approach like yours will always smell racy (can't there >>> be someone taking the mutex just after above test?). >>> >>> In no case you should make use of the mutex internals. >> Boris' suggestion solves that problem well. Would you be OK with the >> proposed >> >> while(mutex_is_locked(&map->active.in_mutex.owner) || >> mutex_is_locked(&map->active.out_mutex.owner)) >> cpu_relax(); >> >> ? > I'm not convinced there isn't a race. > > In pvcalls_front_recvmsg() sock->sk->sk_send_head is being read and only > then in_mutex is taken. What happens if pvcalls_front_release() resets > sk_send_head and manages to test the mutex before the mutex is locked? > > Even in case this is impossible: the whole construct seems to be rather > fragile. I think we can wait until pvcalls_refcount is 1 (i.e. it's only us) and not rely on mutex state. In any case, we are in the merge window now and I don't want to send pull request with a known bug so we need to have this resolved in the next couple of days. -boris _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx https://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.