[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel][PV-ops][PATCH] Netback: Fix PV network issue for netback multiple threads patchset



On Thu, 2010-07-01 at 16:29 +0100, Jeremy Fitzhardinge wrote:
> On 07/01/2010 04:48 PM, Ian Campbell wrote:
> > On Mon, 2010-06-21 at 12:14 +0100, Jeremy Fitzhardinge wrote:
> >   
> >> Subject: [PATCH] xen/netback: make sure all the group structures are 
> >> initialized before starting async code
> >>
> >> Split the netbk group array initialization into two phases: one to do
> >> simple "can't fail" initialization of lists, timers, locks, etc; and a
> >> second pass to allocate memory and start the async code.
> >>
> >> This makes sure that all the basic stuff is in place by the time the async 
> >> code
> >> is running.
> >>     
> > Paul noticed a crash in netback init because...
> >
> >   
> >> @@ -1764,16 +1766,6 @@ static int __init netback_init(void)
> >>            netbk->netbk_tx_pending_timer.function =
> >>                    netbk_tx_pending_timeout;
> >>  
> >> -          netbk->mmap_pages =
> >> -                  alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> >> -          if (!netbk->mmap_pages) {
> >> -                  printk(KERN_ALERT "%s: out of memory\n", __func__);
> >> -                  del_timer(&netbk->netbk_tx_pending_timer);
> >> -                  del_timer(&netbk->net_timer);
> >> -                  rc = -ENOMEM;
> >> -                  goto failed_init;
> >> -          }
> >> -
> >>            for (i = 0; i < MAX_PENDING_REQS; i++) {
> >>                    page = netbk->mmap_pages[i];
> >>                    SetPageForeign(page, netif_page_release);
> >>     
> > ...this dereference of netbk->mmap_pages[i]...
> >
> >   
> >> @@ -1786,6 +1778,26 @@ static int __init netback_init(void)
> >>     
> > [...]
> >   
> >> +          netbk->mmap_pages =
> >> +                  alloc_empty_pages_and_pagevec(MAX_PENDING_REQS);
> >>     
> > ... happens before this initialisation of the array.
> >   
> 
> Hm, I hadn't meant to commit that properly.  I had it locally and
> accidentally pushed it out.
> 
> I only did that patch as an RFC in response to an issue alluded to by
> Dongxiao (or was it you?) about things not being fully initialized by
> the time the async code starts.  Is this a real issue, and if so, what's
> the correct fix?

I don't think there is an actual current issue, just a potential one
since we are relying on data structures being zeroed rather than
properly initialised to keep the async code from running off into the
weeds, it just seemed a little fragile this way.

Originally I said:
>> The crash is in one of the calls to list_move_tail and I think it is
>> because netbk->pending_inuse_head not being initialised until after
>> the 
>> threads and/or tasklets are created (I was running in threaded mode).
>> Perhaps even though we are now zeroing the netbk struct those fields
>> should still be initialised before kicking off any potentially
>> asynchronous tasks?

this specific issue was fixed by zeroing the netbk array as it is
allocated, I just thought we could make things more robust by not
triggering the async code until everything was fully setup.

Ian.


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxx
http://lists.xensource.com/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.