[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [linux-4.1 test] 63030: regressions - FAIL



On Wed, 2015-10-21 at 18:34 +0100, Wei Liu wrote:
> On Wed, Oct 21, 2015 at 05:47:06PM +0100, Ian Campbell wrote:
> > On Tue, 2015-10-20 at 16:34 +0100, Ian Jackson wrote:
> > > Wei Liu writes ("Re: [Xen-devel] [linux-4.1 test] 63030: regressions 
> > > - FAIL"):
> > > > From mere code inspection and document of lwip 1.3.0 I think mini
> > > -os
> > > > does send gratuitous ARP.
> > > 
> > > The guest is using the PVHVM drivers at this point, with the backend
> > > directly in dom0, so it is the guest's gratuitous arp which is
> > > needed,
> > > I think.
> > 
> > It would be worth investigating whether mini-os's gratuitous ARP might
> > also be occurring and confusing things, e.g. by coming after and
> > therefore taking precedence over the one coming from the guest.
> > 
> 
> Several observations:
> 
> 1. The guest doesn't always send gratuitous arp -- but this might not be
>    the cause of this failure. Guest works fine when using qemu-trad
>    only.

As in it always sends the arp when using qemu-trad, or that it is fine
irrespective of not always sending it?

> 2. Guest only sends one gratuitous arp at most.

This is as expected, but does the stubdom also send one?

> 3. When using stubdom, guest is a lot less responsive. See two
>    experiments and analysis below.

Less responsive in use or only while migrating, or to ssh after migration,
or to something else?

> Scenario 1:
>   xl shows "Migration successful."
>   ...30s...
>   xenbr0 receives gratuitous arp
>   ...1s...
>   ssh date command comes back
> 
> Scenario 2:
>   xenbr0 receives gratuitous arp
>   ...1s...
>   xl shows "Migration successful."
>   ssh date command comes back
> 
> When stubdom was not present I never saw scenario 1.

It would be worth looking at the possibility of a delay between "Migration
successful" and the target domain actually running. A 30s delay between the
guest restarting and it sending the ARP would be pretty strange IMHO

> Note that my machine is relative old (>6 years). It would never pass
> the test in osstest because in osstest the timeout is 10s.
> 
> The slowness in osstest seems to be host specific because all failures
> in guest migrate test failed on merlot*. It's not only linux-4.1 is
> failing, other branches fail the same test step on merlot*, too.

This could be a factor in common with the other qmu timeout on merlot which
led to 9acfbe14d726.

It might be worth prodding AMD over that issue again.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.