[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles

To: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
From: Wei Liu <wei.liu2@xxxxxxxxxx>
Date: Thu, 27 Feb 2014 15:15:39 +0000
Cc: annie li <annie.li@xxxxxxxxxx>, Paul Durrant <Paul.Durrant@xxxxxxxxxx>, Wei Liu <wei.liu2@xxxxxxxxxx>, Zoltan Kiss <zoltan.kiss@xxxxxxxxxx>, xen-devel@xxxxxxxxxxxxx
Delivery-date: Thu, 27 Feb 2014 15:16:11 +0000
List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Thu, Feb 27, 2014 at 03:43:51PM +0100, Sander Eikelenboom wrote:
[...]
> 
> > As far as I can tell netfront has a pool of grant references and it
> > will BUG_ON() if there's no grefs in the pool when you request one.
> > Since your DomU didn't crash so I suspect the book-keeping is still
> > intact.
> 
> >> > Domain 1 seems to have increased it's nr_grant_entries from 2048 to 3072 
> >> > somewhere this night.
> >> > Domain 7 is the domain that happens to give the netfront messages.
> >> 
> >> > I also don't get why it is reporting the "Bad grant reference" for 
> >> > domain 0, which seems to have 0 active entries ..
> >> > Also is this amount of grant entries "normal" ? or could it be a leak 
> >> > somewhere ?
> >> 
> 
> > I suppose Dom0 expanding its maptrack is normal. I see as well when I
> > increase the number of domains. But if it keeps increasing while the
> > number of DomUs stay the same then it is not normal.
> 
> It keeps increasing (without (re)starting domains) although eventually it 
> looks like it is settling at a round a maptrack size of 31/256 frames.
> 

Then I guess that's reasonable. You have 15 DomUs after all...

> 
> > Presumably you only have netfront and blkfront to use grant table and
> > your workload as described below invovled both so it would be hard to
> > tell which one is faulty.
> 
> > There's no immediate functional changes regarding slot counting in this
> > dev cycle for network driver. But there's some changes to blkfront/back
> > which seem interesting (memory related).
> 
> Hmm all the times i get a "Bad grant reference" are related to that one 
> specific guest.
> And it's not doing much blkback/front I/O (it's providing webdav and rsync to 
> network based storage (glusterfs))
> 

OK. I misunderstood that you were rsync'ing from / to your VM disk.

What does webdav do anyway? Does it have a specific traffic pattern?

> Added some more printk's:
> 
> @@ -2072,7 +2076,11 @@ __gnttab_copy(
>                                        &s_frame, &s_pg,
>                                        &source_off, &source_len, 1);
>          if ( rc != GNTST_okay )
> -            goto error_out;
> +            PIN_FAIL(error_out, GNTST_general_error,
> +                     "?!?!? src_is_gref: aquire grant for copy failed 
> current_dom_id:%d src_dom_id:%d dest_dom_id:%d\n",
> +                     current->domain->domain_id, op->source.domid, 
> op->dest.domid);
> +
> +
>          have_s_grant = 1;
>          if ( op->source.offset < source_off ||
>               op->len > source_len )
> @@ -2096,7 +2104,11 @@ __gnttab_copy(
>                                        current->domain->domain_id, 0,
>                                        &d_frame, &d_pg, &dest_off, &dest_len, 
> 1);
>          if ( rc != GNTST_okay )
> -            goto error_out;
> +            PIN_FAIL(error_out, GNTST_general_error,
> +                     "?!?!? dest_is_gref: aquire grant for copy failed 
> current_dom_id:%d src_dom_id:%d dest_dom_id:%d\n",
> +                     current->domain->domain_id, op->source.domid, 
> op->dest.domid);
> +
> +
>          have_d_grant = 1;
> 
> 
> this comes out:
> 
> (XEN) [2014-02-27 02:34:37] grant_table.c:2109:d0 ?!?!? dest_is_gref: aquire 
> grant for copy failed current_dom_id:0 src_dom_id:32752 dest_dom_id:7
> 

If it fails in gnttab_copy then I very much suspects this is a network
driver problem as persistent grant in blk driver doesn't use grant
copy.

> 
> > My suggestion is, if you have a working base line, you can try to setup
> > different frontend / backend combination to help narrow down the
> > problem.
> 
> Will see what i can do after the weekend
> 

Thanks

> > Wei.
> 
> <snip>
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel

Follow-Ups:
- Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
  - From: Sander Eikelenboom

References:
- [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
  - From: Sander Eikelenboom
- Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
  - From: annie li
- Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
  - From: Sander Eikelenboom
- Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
  - From: annie li
- Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
  - From: Sander Eikelenboom
- Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
  - From: Sander Eikelenboom
- Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
  - From: Wei Liu
- Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
  - From: Sander Eikelenboom

Prev by Date: [Xen-devel] [PATCHv3] x86/xen: allow privcmd hypercalls to be preempted
Next by Date: Re: [Xen-devel] [PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support
Previous by thread: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
Next by thread: Re: [Xen-devel] Xen-unstable Linux 3.14-rc3 and 3.13 Network troubles
Index(es):
- Date
- Thread

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.