[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Xen-devel] Help commissioning x86 boxes intended for builds [himrod[012]]
Sorry for the rather random CC list. Last year we bought a variety of test boxes. Amongst them were three biggish Intel machines which I had primarily intended for use as dedicated build servers. These are himrod[012]. Unfortunately I have not been able to commission them because they have been failing their commissioning tests. Investigations have not found the problem. The symptom is that, occasionally, the network stops working for a while. It then comes back, spontaneously. There are no log messages recorded on the box itself in /var/log for this; no messages on the serial console. The failure probability is about 10% for any one individual test job. It seems to do it only under Xen with our own kernels (4.14.x). For initial installation and for for builds we use stock Debian kernels (currently, jessie, so 3.16.56-1 for the installer and 3.16.57-2 for the installed system); and I haven't seen failures there. I have not tried other combinations (yet). They have Intel I350 NICs. We have the same NICs in another pair of boxes, debina[01], which work fine. (The himrods are set to use UEFI; the debinas BIOS.) I have already had the machines' firmware updated. I know that it isn't the whole machine freezing because here is an example where the test box itself experiences a timeout trying to talk to the network: http://logs.test-lab.xenproject.org/osstest/logs/133269/test-amd64-amd64-xl-multivcpu/10.ts-debian-install.log http://logs.test-lab.xenproject.org/osstest/logs/133269/test-amd64-amd64-xl-multivcpu/info.html Here when the test box's network connection starts working again, the TCP carrying the ssh session eventually retransmits and then the TCP connection is unblocked, so the test box ends up sending the whole lot of buffered up error messages to the controller VM which duly logs them. The most recent failed commissioning attempt was for himrods 0 and 2 only. himrod1 has an unrelated problem with its serial cable. However, my notes indicate that I had previous problems with himrod1 too. So I think we need to take this as applying to these three machines. Just in case, I have asked Credativ to give the machines fresh cables to different swtich ports. I don't have a good model of what to do (or try) next. Suggestions welcome. The full report from my most recent commissioning test attempt is here: http://logs.test-lab.xenproject.org/osstest/logs/133269/ Thanks, Ian. _______________________________________________ Xen-devel mailing list Xen-devel@xxxxxxxxxxxxxxxxxxxx https://lists.xenproject.org/mailman/listinfo/xen-devel
|
Lists.xenproject.org is hosted with RackSpace, monitoring our |