[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] S3 is broken again in xen-unstable



On Fri, Apr 26, 2013 at 4:10 AM, Ian Campbell <Ian.Campbell@xxxxxxxxxx> wrote:
> On Thu, 2013-04-25 at 18:02 +0100, Ben Guthro wrote:
>> On Thu, Apr 25, 2013 at 8:00 AM, Ben Guthro <ben@xxxxxxxxxx> wrote:
>> > Since this is something that XenClient really relies on working, it
>> > has been a pain point with every upgrade of Xen for us.
>> > It is enormously time consuming to debug on every upgrade, and has a
>> > long tail in discovering problems (I started debugging S3 last Aug on
>> > xen-unstable, prior to 4.2 being cut)
>> >
>> > How can we work with the community to try to get some sort of
>> > regression testing for this feature that we rely on in our product?
>>
>> I am still interested in ideas for getting this into automated
>> testing, and any ideas people may have for this.
>
> CCing Ian Jackson who runs the test infrastructure.

I've also CC'ed a few people here, who I mention in my reply below.

>
> Contributing new tests is now less onerous than it once was (i.e. it
> might even possible at all). There is some info at
> http://lists.xen.org/archives/html/xen-devel/2012-10/msg01517.html
> although the branch may be out of date -- Ian was working on merging the
> standalone branch at one point.

I'll read up on this

>
> Some questions:
>       * How automatable is s3?
>       * In particular can we automate the wakeup? s3 is save to RAM
>         IIRC, and most power control in the test system is done with PDU
>         power cycling.

I spoke with George Dunlap  a bit about this while I was over in the
UK a few weeks ago, and drew up an example shell script for this:
http://xen.markmail.org/thread/ghj2ffngemccq6p4
Marek also weighed in, and included some of his own tests, and experiences.

In my experience, this mechanism is about as reliable as your RTC. On
some systems you might tell it to sleep for 30s, and it will wake in
10s.

That said, when things go wrong, the machine does need to be power
cycled...so if you are not physically located near the machine under
test, you would need a PDU as a recovery mechanism, I suppose.

>       * Would s3 ever be expected to work on the sorts of whitebox
>         server systems which form the osstest pool or do we need to
>         investigate additional hardware?

I don't see why it wouldn't work, though admittedly I haven't dealt
with xen on servers since 2009.

>       * How hardware specific are the s3 failures -- we obviously can't
>         have one of every laptop ever ;-)

Clearly. I'm just looking to get a foot in the door here, so there is
a chance of catching gross regressions.
The hardware differences seem to be more timing related, due to
speed... ie, you are likely to uncover new failures when new, faster
hardware comes out for laptops.
Since typically server hardware is faster than laptop hardware, that
would theoretically catch problems at a higher frequency.

>
> So assuming the answers to the above are positive then contributing a
> test case for s3 to the relevant flights seems like a reasonable first
> step, even if the expectation is that it would always fail with the
> current mainline Xen + mainline Linux. The test system only tracks
> regressions, so always failing test cases are OK (you can think of this
> in the test-drive development kind of way ;-)).

I'll take a look at the test infrastructure, and see if I can make
heads/tails of it, and come up with a simplistic test.

>
>> Would it be helpful to maintain a branch in my xenbits repo that could
>> be a rebased version of konrad's acpi-s3 patches against Linus' latest
>> kernel?
>
> What is keeping those out of Linus' tree?

Added Konrad here, but I believe he is on vacation this week.
This has been a bullet point on his OSS presentation, as outstanding
pvops work for at least 3 years now.

IIRC, the x86 guys NACK'ed the change as being too invasive.
I googled around a bit, but can't seem to find the thread about it.

>
> Once we have a test case in the standard flights then we can consider
> the options around new flights testing other trees.

I'm not sure I understand this point.
Are you saying you want to see a test that fails in the standard test
flight first...because without Konrad's patches, it will be guaranteed
not to work.

...and without other changesets queued up for the 3.10 merge window,
non-boot CPUs will always have incorrect C-states.

Thanks
Ben

_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxx
http://lists.xen.org/xen-devel


 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.