[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH 2/2] evtchn/fifo: don't corrupt queues if an old tail is linked
On 19/11/13 18:17, David Vrabel wrote: > From: David Vrabel <david.vrabel@xxxxxxxxxx> > > An event may still be the tail of a queue even if the queue is now > empty (an 'old tail' event). There is logic to handle the case when > this old tail event needs to be added to the now empty queue (by > checking for q->tail == port). > > However, this does not cover all cases. > > 1. An old tail may be re-added simultaneously with another event. > LINKED is set on the old tail, and the other CPU may misinterpret > this as the old tail still being valid and set LINK instead of > HEAD. All events on this queue will then be lost. > > 2. If the old tail event on queue A is moved to a different queue B > (by changing its VCPU or priority), the event may then be linked > onto queue B. When another event is linked onto queue A it will > check the old tail, see that it is linked (but on queue B) and > overwrite the LINK field, corrupting both queues. > > When an event is linked, save the vcpu id and priority of the queue it > is being linked onto. Use this when linking an event to check if it > is an unlinked old tail event. If it is an old tail event, the old > queue is empty and old_q->tail is invalidated to ensure adding another > event to old_q will update HEAD. The tail is invalidated by setting > it to 0 since the event 0 is never linked. > > The old_q->lock is held while setting LINKED to avoid the race with > the test of LINKED in evtchn_fifo_set_link(). Whilst this is considerably more reliable than before it isn't completely safe. evtchn_fifo_set_pending() is relying on being serialized for each event channel. The serialization is required to protect evtchn->last_* and the split test_bit(LINKED) and set_bit(LINKED). In most cases the serialization is done by a suitable lock in event_channel.c. e.g., interdomain event are serialized with the remote domain's event lock, virqs by the local virq lock. However, If evtchn_fifo_set_pending() is called from evtchn_fifo_unmask() or evtchn_fifo_expand_array() then only the local event lock is held which is different than the lock used for interdomain and virq events. One race is: CPU A CPU B EVTCHNOP_send() evtchn_fifo_set_pending() evtchn->last_vcpu = ... old_v = d->vcpu[evtchn->last_vcpu_id] old_q = old_v->queue[evtchn->last_pri] evtchn->last_pri = ... lock(old_q) // wrong q set_bit(LINKED) q->tail = port ... Guest: clear_bit(LINKED) set_bit(LINKED) // q->tail not invalidated unlock(old_q) lock(q) link port to itself. unlock(q) Linking an event to itself will lose any other event that were in the queue (they're LINKED but not reachable by the guest) Another race is: CPU A CPU B EVTCHNOP_send() evtchn_fifo_set_pending() clear_bit(MASKED) set_bit(PENDING) test_bit(PENDING) test_bit(LINKED) EVTCHNOP_unmask() evtchn_fifo_set_pending() test_bit(LINKED) lock(q->lock) set_bit(LINKED) link to q->tail q->tail = port unlock(q->lock) lock(q->lock) q->tail = 0 because q->tail == port q->head = port unlock(q->lock) The double link of the event loses any other event preceding it in the queue and those events are lost (they have LINKED set but are no longer reachable by the guest). There are two ways to fix this: 1. Introduce a per-event lock and use this to serialize evtchn_fifo_set_pending(). After 4.4, this per-event lock can be used instead of the domain's event lock and virq lock when raising events, hopefully reducing lock contention and improving event channel scalability. This would half the number of struct evtchn per page as the lock takes it over 32 bytes in size. 2. Check for the old_q changing after locking old_q->lock and use test_and_set_bit(LINKED) to bail early if another CPU linked it (see below patch). Any opinions on either of these solutions? --- a/xen/common/event_fifo.c Tue Nov 19 11:06:54 2013 +0000 +++ b/xen/common/event_fifo.c Wed Nov 20 16:41:32 2013 +0000 @@ -34,6 +34,30 @@ static inline event_word_t *evtchn_fifo_ return d->evtchn_fifo->event_array[p] + w; } +static struct evtchn_fifo_queue *lock_old_queue(const struct domain *d, + struct evtchn *evtchn, + unsigned long *flags) +{ + struct vcpu *v; + struct evtchn_fifo_queue *q, *old_q; + + for (;;) + { + v = d->vcpu[evtchn->last_vcpu_id]; + old_q = &v->evtchn_fifo->queue[evtchn->last_priority]; + + spin_lock_irqsave(&old_q->lock, *flags); + + v = d->vcpu[evtchn->last_vcpu_id]; + q = &v->evtchn_fifo->queue[evtchn->last_priority]; + + if ( old_q == q ) + return old_q; + + spin_unlock_irqrestore(&old_q->lock, *flags); + } +} + static int try_set_link(event_word_t *word, event_word_t *w, uint32_t link) { event_word_t new, old; @@ -127,7 +151,6 @@ static void evtchn_fifo_set_pending(stru if ( !test_bit(EVTCHN_FIFO_MASKED, word) && !test_bit(EVTCHN_FIFO_LINKED, word) ) { - struct vcpu *old_v; struct evtchn_fifo_queue *q, *old_q; event_word_t *tail_word; bool_t linked = 0; @@ -139,10 +162,13 @@ static void evtchn_fifo_set_pending(stru */ q = &v->evtchn_fifo->queue[evtchn->priority]; - old_v = d->vcpu[evtchn->last_vcpu_id]; - old_q = &old_v->evtchn_fifo->queue[evtchn->last_priority]; + old_q = lock_old_queue(d, evtchn, &flags); - spin_lock_irqsave(&old_q->lock, flags); + if ( test_and_set_bit(EVTCHN_FIFO_LINKED, word) ) + { + spin_unlock_irqrestore(&old_q->lock, flags); + goto done; + } /* * If this event was a tail, the old queue is now empty and @@ -152,12 +178,9 @@ static void evtchn_fifo_set_pending(stru if ( old_q->tail == port ) old_q->tail = 0; - set_bit(EVTCHN_FIFO_LINKED, word); - /* Moved to a different queue? */ - if ( old_q != q) + if ( old_q != q ) { - evtchn->last_vcpu_id = evtchn->notify_vcpu_id; evtchn->last_priority = evtchn->priority; @@ -191,7 +214,7 @@ static void evtchn_fifo_set_pending(stru &v->evtchn_fifo->control_block->ready) ) vcpu_mark_events_pending(v); } - +done: if ( !was_pending ) evtchn_check_pollers(d, port); } _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |