[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] test report for Xen 4.3 RC1



On Tue, Jun 04, 2013 at 03:59:33PM +0000, Ren, Yongjie wrote:
> Sorry for replying late. :-)
> 
> > -----Original Message-----
> > From: Konrad Rzeszutek Wilk [mailto:konrad.wilk@xxxxxxxxxx]
> > Sent: Tuesday, May 28, 2013 11:16 PM
> > To: Ren, Yongjie; george.dunlap@xxxxxxxxxxxxx
> > Cc: xen-devel@xxxxxxxxxxxxx; Xu, YongweiX; Liu, SongtaoX; Tian, Yongxue
> > Subject: Re: [Xen-devel] test report for Xen 4.3 RC1
> > 
> > On Mon, May 27, 2013 at 03:49:27AM +0000, Ren, Yongjie wrote:
> > > Hi All,
> > > This is a report based on our testing for Xen 4.3.0 RC1 on Intel 
> > > platforms.
> > > (Sorry it's a little late. :-)  If the status changes, I'll have an update
> > later.)
> > 
> > OK, I've some updates and ideas that can help with narrowing some of
> > these
> > issues down. Thank you for doing this.
> > 
> > >
> > > Test environment:
> > > Xen: Xen 4.3 RC1 with qemu-upstream-unstable.git
> > > Dom0: Linux kernel 3.9.3
> > 
> > Could you please test v3.10-rc3. There have been some changes
> > for the VCPU hotplug added in v3.10 that I am not sure whether
> > they are in v3.9?
> I didn't try every bug with v3.10.-rc3, but most of them still exist.
> 
> > > Hardware: Intel Sandy Bridge, Ivy Bridge, Haswell systems
> > >
> > > Below are the features we tested.
> > > - PV and HVM guest booting (HVM: Ubuntu, Fedora, RHEL, Windows)
> > > - Save/Restore and live migration
> > > - PCI device assignment and SR-IOV
> > > - power management: C-state/P-state, Dom0 S3, HVM S3
> > > - AVX and XSAVE instruction set
> > > - MCE
> > > - CPU online/offline for Dom0
> > > - vCPU hot-plug
> > > - Nested Virtualization  (Please look at my report in the following link.)
> > >  http://lists.xen.org/archives/html/xen-devel/2013-05/msg01145.html
> > >
> > > New bugs (4): (some of which are not regressions)
> > > 1. sometimes failed to online cpu in Dom0
> > >
> > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1851
> > 
> > That looks like you are hitting the udev race.
> > 
> > Could you verify that these patches:
> > https://lkml.org/lkml/2013/5/13/520
> > 
> > fix the issue (They are destined for v3.11)
> > 
> Not tried yet. I'll update it to you later.

Thanks!
> 
> > > 2. dom0 call trace when running sriov hvm guest with igbvf
> > >
> > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1852
> > >   -- a regression in Linux kernel (Dom0).
> > 
> > Hm, the call-trace you refer too:
> > 
> > [   68.404440] Already setup the GSI :37
> > 
> > [   68.405105] igb 0000:04:00.0: Enabling SR-IOV VFs using the module
> > parameter is deprecated - please use the pci sysfs interface.
> > 
> > [   68.506230] ------------[ cut here ]------------
> > 
> > [   68.506265] WARNING: at
> > /home/www/builds_xen_unstable/xen-src-27009-20130509/linux-2.6-pvop
> > s.git/fs/sysfs/dir.c:536 sysfs_add_one+0xcc/0xf0()
> > 
> > [   68.506279] Hardware name: S2600CP
> > 
> > is a deprecated warning. Did you follow the 'pci sysfs' interface way?
> > 
> > Looking at da36b64736cf2552e7fb5109c0255d4af804f5e7
> >     ixgbe: Implement PCI SR-IOV sysfs callback operation
> > it says it is using this:
> > 
> > commit 1789382a72a537447d65ea4131d8bcc1ad85ce7b
> > Author: Donald Dutile <ddutile@xxxxxxxxxx>
> > Date:   Mon Nov 5 15:20:36 2012 -0500
> > 
> >     PCI: SRIOV control and status via sysfs
> > 
> >     Provide files under sysfs to determine the maximum number of VFs
> >     an SR-IOV-capable PCIe device supports, and methods to enable and
> >     disable the VFs on a per-device basis.
> > 
> >     Currently, VF enablement by SR-IOV-capable PCIe devices is done
> >     via driver-specific module parameters.  If not setup in modprobe
> > files,
> >     it requires admin to unload & reload PF drivers with number of desired
> >     VFs to enable.  Additionally, the enablement is system wide: all
> >     devices controlled by the same driver have the same number of VFs
> >     enabled.  Although the latter is probably desired, there are PCI
> >     configurations setup by system BIOS that may not enable that to
> > occur.
> > 
> >     Two files are created for the PF of PCIe devices with SR-IOV support:
> > 
> >         sriov_totalvfs  Contains the maximum number of VFs the device
> >                         could support as reported by the TotalVFs
> > register
> >                         in the SR-IOV extended capability.
> > 
> >         sriov_numvfs    Contains the number of VFs currently enabled
> > on
> >                         this device as reported by the NumVFs
> > register in
> >                         the SR-IOV extended capability.
> > 
> >                         Writing zero to this file disables all VFs.
> > 
> >                         Writing a positive number to this file enables
> > that
> >                         number of VFs.
> > 
> >     These files are readable for all SR-IOV PF devices.  Writes to the
> >     sriov_numvfs file are effective only if a driver that supports the
> >     sriov_configure() method is attached.
> > 
> >     Signed-off-by: Donald Dutile <ddutile@xxxxxxxxxx>
> > 
> > 
> > Can you try that please?
> > 
> Recently, one of my workmates already had a fix as below. 
> https://lkml.org/lkml/2013/5/30/20
> And, seems also already been fixed by another guy. 
> https://patchwork.kernel.org/patch/2613481/
> 

Great! Care to update the bug with said relevant information?
> > 
> > > 3. Booting multiple guests will lead Dom0 call trace
> > >
> > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1853
> > 
> > That one worries me. Did you do a git bisect to figure out what
> > is commit is causing this?
> > 
> I only found this bug on some Intel ~EX server. 
> I don't know which version on Xen/Dom0 can work fine.
> If anyone want to reproduce or debug it, it should be good.
> And our team is trying to debug it internally first.

Ah, OK. Then please continue on debugging it. Thanks!
> 
> > > 4. After live migration, guest console continuously prints "Clocksource
> > tsc unstable"
> > >
> > http://bugzilla-archived.xenproject.org//bugzilla/show_bug.cgi?id=1854
> > 
> > This looks like a current bug with QEMU unstable missing a ACPI table?
> > 
> > Did you try booting the guest with the old QEMU?
> > 
> > device_model_version = 'qemu-xen-traditional'
> > 
> This issue still exists with traditional qemu-xen.
> After more testing, this bug can't reproduced by some other guests.
> RHEL6.4 guest will have this issue after live migration, while RHEL6.3 & 
> Fedora 17 & Ubuntu 12.10 guests can work fine.

There is a recent thread on this where the culprit was the PV timeclock
not being updated correctly. But that would seem to be at odds with
your reporting - where you are using Fedora 17 and it works fine.

Hm, I am at loss on this one.
> 
> > >
> > > Old bugs: (11)
> > > 1. [ACPI] Dom0 can't resume from S3 sleep
> > >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1707
> > 
> > That should be fixed in v3.11 (as now we have the fixes)
> > Could you try v3.10 with the Rafael's ACPI tree merged in?
> > (so the patches that he wants to submit for v3.11)
> > 
> I re-tested with Rafel's linux-pm.git tree (master and acpi-hotplug branch), 
> and found Dom0 S3 sleep/resume can't work, either.

The patches he has to submit for v3.11 are in the linux-next branch.
You need to use that branch.

> 
> > > 2. [XL]"xl vcpu-set" causes dom0 crash or panic
> > >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1730
> > 
> > That I think is fixed in v3.10. Could you please check v3.10-rc3?
> > 
> Still exists on v3.10-rc3.
> The following command lines can reproduce it:
> # xl vcpu-set 0 1
> # xl vcpu-set 0 20

Ugh, same exact stack trace? And can you attach the full dmesg or serial
output (so that Ican see what there is at bootup)
> 
> > > 3. Sometimes Xen panic on ia32pae Sandybridge when restore guest
> > >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1747
> > 
> > That looks to be with v2.6.32. Is the issue present with v3.9
> > or v3.10-rc3?
> >
> We didn't test ia32pae Xen for a long time. 
> Now, we only cover ia32e Xen/Dom0.
> So, this bug is only a legacy issue. 
> If we have effort to verify it, we'll update it in the bugzilla.

How about just dropping that bug as 'WONTFIX'.

> 
> > > 4. 'xl vcpu-set' can't decrease the vCPU number of a HVM guest
> > >   http://bugzilla.xen.org/bugzilla/show_bug.cgi?id=1822
> > 
> > That I believe was an QEMU bug:
> > http://lists.xen.org/archives/html/xen-devel/2013-05/msg01054.html
> > 
> > which should be in QEMU traditional now (05-21 was when it went
> > in the tree)
> > 
> In this year or past year, this bug always exists (at least in our testing).
> 'xl vcpu-set' can't decrease the vCPU number of a HVM guest

Could you retry with Xen 4.3 please?

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.