Xen project Mailing List

Re: [Xen-devel] S3 is broken again in xen-unstable

To: Ian Campbell <Ian.Campbell@xxxxxxxxxx>

From: Ben Guthro <ben@xxxxxxxxxx>

Date: Fri, 26 Apr 2013 08:19:45 -0400

Cc: George Dunlap <george.dunlap@xxxxxxxxxxxxx>, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, Ian Jackson <Ian.Jackson@xxxxxxxxxxxxx>, Marek Marczykowski <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel <xen-devel@xxxxxxxxxxxxx>

Delivery-date: Fri, 26 Apr 2013 12:20:05 +0000

List-id: Xen developer discussion <xen-devel.lists.xen.org>

On Fri, Apr 26, 2013 at 4:10 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote: > On Thu, 2013-04-25 at 18:02 +0100, Ben Guthro wrote: >> On Thu, Apr 25, 2013 at 8:00 AM, Ben Guthro <ben@xxxxxxxxxx> wrote: >> > Since this is something that XenClient really relies on working, it >> > has been a pain point with every upgrade of Xen for us. >> > It is enormously time consuming to debug on every upgrade, and has a >> > long tail in discovering problems (I started debugging S3 last Aug on >> > xen-unstable, prior to 4.2 being cut) >> > >> > How can we work with the community to try to get some sort of >> > regression testing for this feature that we rely on in our product? >> >> I am still interested in ideas for getting this into automated >> testing, and any ideas people may have for this. > > CCing Ian Jackson who runs the test infrastructure. I've also CC'ed a few people here, who I mention in my reply below. > > Contributing new tests is now less onerous than it once was (i.e. it > might even possible at all). There is some info at > http://lists.xen.org/archives/html/xen-devel/2012-10/msg01517.html > although the branch may be out of date -- Ian was working on merging the > standalone branch at one point. I'll read up on this > > Some questions: > * How automatable is s3? > * In particular can we automate the wakeup? s3 is save to RAM > IIRC, and most power control in the test system is done with PDU > power cycling. I spoke with George Dunlap a bit about this while I was over in the UK a few weeks ago, and drew up an example shell script for this: http://xen.markmail.org/thread/ghj2ffngemccq6p4 Marek also weighed in, and included some of his own tests, and experiences. In my experience, this mechanism is about as reliable as your RTC. On some systems you might tell it to sleep for 30s, and it will wake in 10s. That said, when things go wrong, the machine does need to be power cycled...so if you are not physically located near the machine under test, you would need a PDU as a recovery mechanism, I suppose. > * Would s3 ever be expected to work on the sorts of whitebox > server systems which form the osstest pool or do we need to > investigate additional hardware? I don't see why it wouldn't work, though admittedly I haven't dealt with xen on servers since 2009. > * How hardware specific are the s3 failures -- we obviously can't > have one of every laptop ever ;-) Clearly. I'm just looking to get a foot in the door here, so there is a chance of catching gross regressions. The hardware differences seem to be more timing related, due to speed... ie, you are likely to uncover new failures when new, faster hardware comes out for laptops. Since typically server hardware is faster than laptop hardware, that would theoretically catch problems at a higher frequency. > > So assuming the answers to the above are positive then contributing a > test case for s3 to the relevant flights seems like a reasonable first > step, even if the expectation is that it would always fail with the > current mainline Xen + mainline Linux. The test system only tracks > regressions, so always failing test cases are OK (you can think of this > in the test-drive development kind of way ;-)). I'll take a look at the test infrastructure, and see if I can make heads/tails of it, and come up with a simplistic test. > >> Would it be helpful to maintain a branch in my xenbits repo that could >> be a rebased version of konrad's acpi-s3 patches against Linus' latest >> kernel? > > What is keeping those out of Linus' tree? Added Konrad here, but I believe he is on vacation this week. This has been a bullet point on his OSS presentation, as outstanding pvops work for at least 3 years now. IIRC, the x86 guys NACK'ed the change as being too invasive. I googled around a bit, but can't seem to find the thread about it. > > Once we have a test case in the standard flights then we can consider > the options around new flights testing other trees. I'm not sure I understand this point. Are you saying you want to see a test that fails in the standard test flight first...because without Konrad's patches, it will be guaranteed not to work. ...and without other changesets queued up for the 3.10 merge window, non-boot CPUs will always have incorrect C-states. Thanks Ben _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.