[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] rimava0 (Re: [xen-4.6-testing test] 66466: trouble: broken/fail/pass)
osstest service owner writes ("[xen-4.6-testing test] 66466: trouble: broken/fail/pass"): > flight 66466 xen-4.6-testing real [real] > http://logs.test-lab.xenproject.org/osstest/logs/66466/ > > Failures and problems with tests :-( > > Tests which did not succeed and are blocking, > including tests which could not be run: > test-amd64-i386-qemut-rhel6hvm-amd 3 host-install(3) broken REGR. vs. 65639 This is a timeout waiting for rimava0 to fetch its preseed file. rimava0 was attended to by Yogesh Patel of Credativ (CC'd) on Thursday, and we found it worked. The only thing that we think we did to it to make it work is that the cables were all pushed home. We don't really know what changed, although Yogesh had a vague feeling that the power cable was not properly inserted into the PDU and that he might have seated it better. rimava0 then seems to have been rebootable from 2015-12-17 13:57:55 Z to 2015-12-18 12:04:53 Z (at least). AFter that it experienced a number of successive failures. I just allocated the machine to myself and found it had been left up and booted by some previous test job (presumably, the last one that failed). Power cycling it caused it to reboot as expected. I confirmed the boot order settings in the BIOS. On exiting the BIOS it then started running the d-i autoinstall that had been left set up in the PXE area by the most recent failed test job, as I would expect. I looked at the serial log and it shows the machine having apparently been up between Dec 18 12:20:29 (when 66466 test-amd64-i386-migrupgrade xen-boot/dst_host completed) and Dec 18 16:15:59 (when I manually power cycled it). There is lots of Xen debug keys output, which will be from the log capture steps of the failed jobs. This indicates that the machine was actually continuously up, and responsive to the serial port, for all of this time. The attempts to power cycle it had not actually rebooted it. I have two theories: 1. Depriving rimava0 of power for a mere 15s (the previous configuration of PowerCycleTime) is not sufficient to cause rimava0 to reboot, particularly if it is very idle. This is not particularly convincing. It does not explain the previous problem where rimava0 apparently wouldn't boot even when I turned the power off for (IIRC) 2 minutes; this led me to ask Credativ to investigate in person. It does not explain why rimava1 is not affected. It does not explain very well why the failures clump so much. It also doesn't explain why the serial log doesn't show "Modem lines changed" messages, which for this machine sympathy (our serial concentrator) usually reports when the power goes off or on. In a series of manual poweron/poweroff tests I found that this modem line change appeared in the sympathy client UI almost immediately. However, this theory is fairly easy to test. I have just set the PowerCycleTime to 120s which should surely be enough, and thrown the machine back. 2. The PDU has a fault (eg, a sticky relay). This would explain many of the symptoms. We could test this possibility by using a known-good PDU port. Eg, we could borrow the port assigned to one of the removed machines. But this would require a site visit. I intend to see what the results look like from the runs over the weekend and then decide what to do next. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxx http://lists.xen.org/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |