[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [Xen-users] ARM: "xen_add_mach_to_phys_entry: cannot add ... already exists and panics"



On Thu, 3 Jul 2014, Ian Campbell wrote:
> On Thu, 2014-07-03 at 02:01 +0200, Denis Schneider wrote:
> > Hi an,
> > thank you for your reply.
> > 
> > 2014-07-02 11:56 GMT+02:00 Ian Campbell <Ian.Campbell@xxxxxxxxxx>:
> > > First thing I would recommend would be to try the latest mainline stable
> > > 3.15.x release. I think everything needed for a usable sunxi system is
> > > in there already so no need for the sunxi-devel branch
> > 
> > I tried Linus' linux.git/master, which corresponds to 3.16 --
> > resulting in the same messages and panic.
> > Besides that, the mainline kernel works quite well.
> > BTW, git shows that sunxi-devel and mainline Linux v3.15.2 are the
> > same for drivers/net/xen-netback, though linux.git/master shows some
> > changes.
> > 
> > The bug can easily be triggered if you access blkback and netback in
> > parallel (thanks to Maximilian), e.g.
> > domU: iperf -s & cat /dev/xvda > /dev/null
> > dom0: iperf -t 3600 -c domU
> > 
> > It does not matter if the underlying dom0 block device is on a SATA,
> > USB or mmc device. The panic is similar.
> > 
> > > The reason I suggest the latest 3.15.x is that there were a few
> > > interesting netback bugs but I think they've all been backported to
> > > stable by now.
> > 
> > I hope that they are all included in linux.git/master @ 16874b2,
> > regarding xen-netback, those changes occurred from sunxi-devel to
> > 16874b2:
> > * xen-netback: bookkeep number of active queues in our own module
> > * net: xen-netback: include linux/vmalloc.h again
> > * xen-netback: Add support for multiple queues
> > * xen-netback: Factor queue-specific data into queue struct
> > * xen-netback: Move grant_copy_op array back into struct xenvif.
> > * net: get rid of SET_ETHTOOL_OPS
> > 
> > Interestingly, it takes some time until the bug triggers and the time
> > increased when I switched from linux-sunxi to mainline.
> > 
> > Do you have any idea what happens here? I am a bit clueless what's going on.
> 
> Me too. Since there are mach_to_phys messages perhaps Stefano (CCd) has
> a clue. Original logs are in
> http://lists.xen.org/archives/html/xen-users/2014-07/msg00004.html
> 
> Lots of these under network load:
> 
> [ 189.507495] xen_add_phys_to_mach_entry: cannot add pfn=0x0006930f ->
> mfn=0x0004c3bc: pfn=0x00069310 -> mfn=0x0004c3bc already exists
> [ 189.531185] xen_add_phys_to_mach_entry: cannot add pfn=0x0006921d ->
> mfn=0x0004c489: pfn=0x0006928f -> mfn=0x0004c489 already exists

Unfortunately this is a known issue without a proper solution at the
moment.  It is caused by Zoltan's patch series to switch from grant
copies to grant mappings in netfront/netback, that went in Linux
3.15-rc1.  If you use Linux 3.14 instead, you shouldn't have any
problems.

These are the details: ARM support in Linux cannot deal with multiple
foreign mappings of the same physical page. If the frontend decides to
grant the same page twice, using two different grant references, it is
going to cause problems to the accounting in arch/arm/xen/p2m.c on the
backend side. It is not trivial to come up with a solution because the
data structures in p2m.c are already pretty slow as they are, being able
to account for multiple mappings for a single mfn would slow things down
further.

At the moment I would like a way to disable grant mappings and go back
to grant copies on demand. Maybe we could have a feature flag to change
the behaviour of the network backend?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.