[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
On Thu, Feb 27, 2014 at 03:43:51PM +0100, Sander Eikelenboom wrote: [...] > > > As far as I can tell netfront has a pool of grant references and it > > will BUG_ON() if there's no grefs in the pool when you request one. > > Since your DomU didn't crash so I suspect the book-keeping is still > > intact. > > >> > Domain 1 seems to have increased it's nr_grant_entries from 2048 to 3072 > >> > somewhere this night. > >> > Domain 7 is the domain that happens to give the netfront messages. > >> > >> > I also don't get why it is reporting the "Bad grant reference" for > >> > domain 0, which seems to have 0 active entries .. > >> > Also is this amount of grant entries "normal" ? or could it be a leak > >> > somewhere ? > >> > > > I suppose Dom0 expanding its maptrack is normal. I see as well when I > > increase the number of domains. But if it keeps increasing while the > > number of DomUs stay the same then it is not normal. > > It keeps increasing (without (re)starting domains) although eventually it > looks like it is settling at a round a maptrack size of 31/256 frames. > Then I guess that's reasonable. You have 15 DomUs after all... > > > Presumably you only have netfront and blkfront to use grant table and > > your workload as described below invovled both so it would be hard to > > tell which one is faulty. > > > There's no immediate functional changes regarding slot counting in this > > dev cycle for network driver. But there's some changes to blkfront/back > > which seem interesting (memory related). > > Hmm all the times i get a "Bad grant reference" are related to that one > specific guest. > And it's not doing much blkback/front I/O (it's providing webdav and rsync to > network based storage (glusterfs)) > OK. I misunderstood that you were rsync'ing from / to your VM disk. What does webdav do anyway? Does it have a specific traffic pattern? > Added some more printk's: > > @@ -2072,7 +2076,11 @@ __gnttab_copy( > &s_frame, &s_pg, > &source_off, &source_len, 1); > if ( rc != GNTST_okay ) > - goto error_out; > + PIN_FAIL(error_out, GNTST_general_error, > + "?!?!? src_is_gref: aquire grant for copy failed > current_dom_id:%d src_dom_id:%d dest_dom_id:%d\n", > + current->domain->domain_id, op->source.domid, > op->dest.domid); > + > + > have_s_grant = 1; > if ( op->source.offset < source_off || > op->len > source_len ) > @@ -2096,7 +2104,11 @@ __gnttab_copy( > current->domain->domain_id, 0, > &d_frame, &d_pg, &dest_off, &dest_len, > 1); > if ( rc != GNTST_okay ) > - goto error_out; > + PIN_FAIL(error_out, GNTST_general_error, > + "?!?!? dest_is_gref: aquire grant for copy failed > current_dom_id:%d src_dom_id:%d dest_dom_id:%d\n", > + current->domain->domain_id, op->source.domid, > op->dest.domid); > + > + > have_d_grant = 1; > > > this comes out: > > (XEN) [2014-02-27 02:34:37] grant_table.c:2109:d0 ?!?!? dest_is_gref: aquire > grant for copy failed current_dom_id:0 src_dom_id:32752 dest_dom_id:7 > If it fails in gnttab_copy then I very much suspects this is a network driver problem as persistent grant in blk driver doesn't use grant copy. > > > My suggestion is, if you have a working base line, you can try to setup > > different frontend / backend combination to help narrow down the > > problem. > > Will see what i can do after the weekend > Thanks > > Wei. > > <snip> > _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |