[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] [PATCH] MCE: Fix race condition in mctelem_reserve
Yes, all these patches came from some customer reports. We manage with xen-mceinj to reproduce the problem and got these types of races. These patches (the other one is already applied) fix the race issues we found. I tested with a CentOS 6.4 version with mcelog. Frediano On Tue, 2014-02-18 at 08:42 +0000, Liu, Jinsong wrote: > This logic (mctelem) is related to dom0 mcelog logic. Have you tested if > mcelog works fine with your patch? > > Thanks, > Jinsong > > Frediano Ziglio wrote: > > These lines (in mctelem_reserve) > > > > > > newhead = oldhead->mcte_next; > > if (cmpxchgptr(freelp, oldhead, newhead) == oldhead) { > > > > are racy. After you read the newhead pointer it can happen that > > another > > flow (thread or recursive invocation) change all the list but set head > > with same value. So oldhead is the same as *freelp but you are setting > > a new head that could point to whatever element (even already used). > > > > This patch use instead a bit array and atomic bit operations. > > > > Actually it use unsigned long instead of bitmap type as testing for > > all zeroes is easier. > > > > Signed-off-by: Frediano Ziglio <frediano.ziglio@xxxxxxxxxx> > > --- > > xen/arch/x86/cpu/mcheck/mctelem.c | 52 > > ++++++++++++++++++++++--------------- 1 file changed, 31 > > insertions(+), 21 deletions(-) > > > > diff --git a/xen/arch/x86/cpu/mcheck/mctelem.c > > b/xen/arch/x86/cpu/mcheck/mctelem.c index 895ce1a..e56b6fb 100644 > > --- a/xen/arch/x86/cpu/mcheck/mctelem.c > > +++ b/xen/arch/x86/cpu/mcheck/mctelem.c > > @@ -69,6 +69,11 @@ struct mctelem_ent { > > #define MC_URGENT_NENT 10 > > #define MC_NONURGENT_NENT 20 > > > > +/* Check if we can fit enough bits in the free bit array */ > > +#if MC_URGENT_NENT + MC_NONURGENT_NENT > BITS_PER_LONG > > +#error Too much elements > > +#endif > > + > > #define MC_NCLASSES (MC_NONURGENT + 1) > > > > #define COOKIE2MCTE(c) ((struct mctelem_ent *)(c)) > > @@ -77,11 +82,9 @@ struct mctelem_ent { > > static struct mc_telem_ctl { > > /* Linked lists that thread the array members together. > > * > > - * The free lists are singly-linked via mcte_next, and we allocate > > - * from them by atomically unlinking an element from the head. > > - * Consumed entries are returned to the head of the free list. > > - * When an entry is reserved off the free list it is not linked > > - * on any list until it is committed or dismissed. > > + * The free lists is a bit array where bit 1 means free. > > + * This as element number is quite small and is easy to > > + * atomically allocate that way. > > * > > * The committed list grows at the head and we do not maintain a > > * tail pointer; insertions are performed atomically. The head > > @@ -101,7 +104,7 @@ static struct mc_telem_ctl { > > * we can lock it for updates. The head of the processing list > > * always has the oldest telemetry, and we append (as above) > > * at the tail of the processing list. */ > > - struct mctelem_ent *mctc_free[MC_NCLASSES]; > > + unsigned long mctc_free[MC_NCLASSES]; > > struct mctelem_ent *mctc_committed[MC_NCLASSES]; > > struct mctelem_ent *mctc_processing_head[MC_NCLASSES]; > > struct mctelem_ent *mctc_processing_tail[MC_NCLASSES]; > > @@ -214,7 +217,10 @@ static void mctelem_free(struct mctelem_ent *tep) > > BUG_ON(MCTE_STATE(tep) != MCTE_F_STATE_FREE); > > > > tep->mcte_prev = NULL; > > - mctelem_xchg_head(&mctctl.mctc_free[target], &tep->mcte_next, tep); > > + tep->mcte_next = NULL; > > + > > + /* set free in array */ > > + set_bit(tep - mctctl.mctc_elems, &mctctl.mctc_free[target]); > > } > > > > /* Increment the reference count of an entry that is not linked on to > > @@ -284,7 +290,7 @@ void mctelem_init(int reqdatasz) > > } > > > > for (i = 0; i < MC_URGENT_NENT + MC_NONURGENT_NENT; i++) { > > - struct mctelem_ent *tep, **tepp; > > + struct mctelem_ent *tep; > > > > tep = mctctl.mctc_elems + i; > > tep->mcte_flags = MCTE_F_STATE_FREE; > > @@ -292,16 +298,15 @@ void mctelem_init(int reqdatasz) > > tep->mcte_data = datarr + i * datasz; > > > > if (i < MC_URGENT_NENT) { > > - tepp = &mctctl.mctc_free[MC_URGENT]; > > - tep->mcte_flags |= MCTE_F_HOME_URGENT; > > + __set_bit(i, &mctctl.mctc_free[MC_URGENT]); > > + tep->mcte_flags = MCTE_F_HOME_URGENT; > > } else { > > - tepp = &mctctl.mctc_free[MC_NONURGENT]; > > - tep->mcte_flags |= MCTE_F_HOME_NONURGENT; > > + __set_bit(i, &mctctl.mctc_free[MC_NONURGENT]); > > + tep->mcte_flags = MCTE_F_HOME_NONURGENT; > > } > > > > - tep->mcte_next = *tepp; > > + tep->mcte_next = NULL; > > tep->mcte_prev = NULL; > > - *tepp = tep; > > } > > } > > > > @@ -310,18 +315,21 @@ static int mctelem_drop_count; > > > > /* Reserve a telemetry entry, or return NULL if none available. > > * If we return an entry then the caller must subsequently call > > exactly one of - * mctelem_unreserve or mctelem_commit for that entry. > > + * mctelem_dismiss or mctelem_commit for that entry. > > */ > > mctelem_cookie_t mctelem_reserve(mctelem_class_t which) > > { > > - struct mctelem_ent **freelp; > > - struct mctelem_ent *oldhead, *newhead; > > + unsigned long *freelp; > > + unsigned long oldfree; > > + unsigned bit; > > mctelem_class_t target = (which == MC_URGENT) ? > > MC_URGENT : MC_NONURGENT; > > > > freelp = &mctctl.mctc_free[target]; > > for (;;) { > > - if ((oldhead = *freelp) == NULL) { > > + oldfree = *freelp; > > + > > + if (oldfree == 0) { > > if (which == MC_URGENT && target == MC_URGENT) { > > /* raid the non-urgent freelist */ > > target = MC_NONURGENT; > > @@ -333,9 +341,11 @@ mctelem_cookie_t mctelem_reserve(mctelem_class_t > > which) } > > } > > > > - newhead = oldhead->mcte_next; > > - if (cmpxchgptr(freelp, oldhead, newhead) == oldhead) { > > - struct mctelem_ent *tep = oldhead; > > + /* try to allocate, atomically clear free bit */ > > + bit = find_first_set_bit(oldfree); > > + if (test_and_clear_bit(bit, freelp)) { > > + /* return element we got */ > > + struct mctelem_ent *tep = mctctl.mctc_elems + bit; > > > > mctelem_hold(tep); > > MCTE_TRANSITION_STATE(tep, FREE, UNCOMMITTED); > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |