[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Xen-devel] [OSSTEST PATCH] README.hardware-acquisition [and 1 more messages]



Ian Jackson writes ("Re: [OSSTEST PATCH] README.hardware-acquisition [and 1 
more messages]"):
> So overall, for the reasons I explain, I'm going to commit this
> document (subject to the other comments etc.) *with* the requirement
> that hardware must be supported by Debian (at least, in -backports).

This didn't happen.  THere was considerable further discussion.  The
fact that various kinds of uncertainty meant this document didn't get
committed is now blocking us giving the go-ahead for some new hardware
acquisition:

Ie, I can't answer the question "should we accept hardware XYZ"
without reference to at least an implied a checklist like this.
Having written it down I ought to use the one I've written down,
because to do otherwise is simply to pointlessly invite mistakes.  And
if I'm to use a written-down checklist it should be one which is
actually official.

Accordingly, I intend to commit this to osstest now.  Juergen, this is
just a document: can I have your release ack for it ?

I will then reply separately about the specific new hardware, using
the checklist as a guide.  Obviously a checklist is always a
guidelines document: if we find that a point is best answered a
different way than the checklist expects, or that the checklist ought
to be changed, then changes to the checklist are a reasonable part of
the outcome of such a process; that would be in the form of further
patches to this document in osstest.

Ian.

From fae48bd584a0b58934a2df97b6db1d06eacf1724 Mon Sep 17 00:00:00 2001
From: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
Date: Tue, 30 Oct 2018 16:12:27 +0000
Subject: [OSSTEST PATCH] README.hardware-acquisition

New document-cum-checklist, for helping with hardware procurement.

Signed-off-by: Ian Jackson <ian.jackson@xxxxxxxxxxxxx>
CC: infra@xxxxxxxxxxxxxx
CC: George Dunlap <dunlapg@xxxxxxxxx>
CC: Stefano Stabellini <sstabellini@xxxxxxxxxx>
CC: Julien Grall <julien.grall@xxxxxxx
--
v2: Add caveats about the Xen ARM Linux branch
    Say something, albeit rather vague, about device trees
---
 README.hardware-acquisition | 317 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 317 insertions(+)
 create mode 100644 README.hardware-acquisition

diff --git a/README.hardware-acquisition b/README.hardware-acquisition
new file mode 100644
index 00000000..0a429db3
--- /dev/null
+++ b/README.hardware-acquisition
@@ -0,0 +1,317 @@
+====================================
+# HARDWARE ACQUISITION FOR OSSTEST #
+====================================
+
+This document can be used as a checklist when procuring hardware for
+an osstest instance.  A few of the points have details specific to the
+Xen Project test lab in Massachusetts, but most of it will be relevant
+to all osstest installations.
+
+
+POWER
+=====
+
+osstest needs to turn each host on and off under program control.
+
+When a host is power cycled, all state in it must be reset.  This
+includes onboard control and management software (eg IPMI), since such
+systems can be buggy and bugs in them can be provoked by bugs in
+system software (ie, buggy versions of Xen can break the LOM, even if
+the LOM, unusually, is not simply flaky).
+
+However, it is often necessary to use the LOM (Lights Out Management)
+as part of the poweron/poweroff sequence as otherwise some machines
+draw enough current to wear out our mains PDU contacts too quickly.
+
+(I use the English word `mains' for the single phase 110V/220V-240V AC
+electrical power supply prevalent in datacentres.)
+
+Requirements for typical server hardware
+----------------------------------------
+
+ * If the system has a LOM it should be driveable with Free Software,
+   eg via the IPMI protocol.
+
+ * Redundant PSUs are not required.
+
+ * Provisioning: One PDU port is required per host.
+
+Requirements for embedded or devboard hardware
+----------------------------------------------
+
+ * There must be arrangements to control the actual power supply
+   to each board (node).  Options include:
+
+     (i) Each node has a separate mains power supply, each of which
+         we will plug into a PDU port.
+
+     (ii) A separate management or PDU board or backplane, which
+         has one single mains power input and which has relays
+         or similar to control power to individual nodes.
+         The management system must have its own separate network
+         connection and not be at risk of corruption from
+         bad software on nodes.
+
+ * Provisioning:
+    + Number of PDU ports required depends on the approach taken.
+    + With a separate PDU controller, a switch port is required.
+
+
+SERIAL
+======
+
+We always use hardware serial for console output.  This is essential
+to capture kernel and hypervisor crash messages, including from early
+boot; as well as bootloader output, and so on.  We use our own serial
+concentrator hardware, separate from the systems under test.  Built-in
+console-over-LAN systems (eg IPMI serial over LAN) are not reliable
+enough for our purposes.
+
+Requirements for typical server hardware
+----------------------------------------
+
+ * At least one conventional RS232 UART, accessible to system
+   software in the conventional way.
+
+ * For ARM, supported as console by both Xen[1] and Linux[2].
+
+ * Presented on a standard 9-pin D connector.  (RJ45 is acceptable
+   if we know the pinout.)
+
+ * Provisioning: one serial concentrator port required per host.
+
+Requirements for a embedded or devboard hardware
+------------------------------------------------
+
+ * At least one suitable UART
+
+ * Supported in software by both Xen[1] and Linux[2]
+
+ * With suitable physical presentation:
+    (i)
+       + Proper RS232 (full voltage, not TTL or 3.3V)
+       + presented on a 9-pin D or RJ45 connector
+       + with known pinout;
+   or
+    (ii)
+       + Connected somehow to a USB-to-serial adapter
+       + Adapter supported by Linux[2]
+       + Multiple adapters, giving one physical USB port
+         for all nodes (ie built-in hub) preferred
+   or
+    (iii) Some other suitable arrangement to be discussed.
+
+ * Provisioning: Requires serial concentrator port(s) and/or spare USB
+   port(s) on appropriate infrastructure host(s).
+
+
+PHYSICAL PRESENTATION
+=====================
+
+ * All equipment should be mounted inside one or more 19" rack
+   mount cases.
+
+ * In as few U as possible: usually 1U (or, exceptionally, maybe 2U)
+   for a single server-type host.  
+
+ * Forbidden: External power adapters (laptop-style mains power supply
+   bricks); external USB hubs; any equipment not physically
+   restrained.  There is no shelf in the rack.
+
+ * Pair principle: Every host or node must be part of a set of several
+   identical hosts.  This allows us to distinguish hardware faults
+   from software bugs.  (In the cases of chassis with backplane, one
+   backplane is OK.)  Conversely, we want diversity to find the most
+   host-specific bugs, so usually around two of each type is best.
+
+ * Provisioning: Enough rack space must be available.
+
+
+MASS STORAGE
+============
+
+Each host needs some locally attached mass storage of its own.
+
+Requirements for typical server hardware
+----------------------------------------
+
+ * SATA controller supported by Linux[2]
+
+ * If SATA controller has multiple modes (eg, AHCI vs RAID)
+   it is sufficient for it to be supported in one mode.
+
+ * Storage redundancy is not required: one disk will do.
+
+ * SSD is not required: rotating rust is cheaper and will do.
+
+Requirements for embedded or devboard hardware
+----------------------------------------------
+
+ * Some mass storage supported by Linux[2].  Best is an onboard SATA
+   controller, connected to a SATA HDD in the same enclosure.
+   High-endurance flash drives are another possibility.
+
+ * If the hardware always starts by boot from a mass storage device,
+   that boot device must be physically read-only and separate from the
+   primary mass storage.  See BOOT ARRANGEMENTS.
+
+
+REMOTE FIRMWARE ACCESS VIA SERIAL
+=================================
+
+Configuration of the primary system firmware must be possible remotely
+using only the power and serial accesses just described.
+Specifically, interaction with the firmware via the serial port.
+
+Requirements for typical server hardware with UEFI or BIOS
+----------------------------------------------------------
+
+ * `BIOS' configuration (including the UEFI equivalent) accessible and
+   useable via BIOS `serial console redirection'.
+
+ * UEFI shell (if provided) also available via serial.
+
+ * Specifically, boot order configuration available via serial.
+
+
+Requirements for embedded or devboard hardware
+----------------------------------------------
+
+ * See BOOT ARRANGEMENTS.
+
+
+BOOT ARRANGEMENTS, NETBOOT
+==========================
+
+Every host must netboot as its first boot source.  The netboot
+configuration must be able to `chain' to the local writeable mass
+storage.  This ensures that a host can be completely wiped, even if
+bad software has corrupted the mass storage.
+
+Requirements for typical server hardware with UEFI or BIOS
+----------------------------------------------------------
+
+ * PXE and/or UEFI netboot.
+
+Requirements for embedded or devboard hardware
+----------------------------------------------
+
+ * Some firmware must be available and provided which is capable of
+   netbooting Xen[1] and Linux[2], under control from the netboot
+   server.  A suitable version of u-boot can meet this need.
+
+ * The firmware which performs the netbooting must be on a read-only
+   storage device (flagged as such in hardware, not software) so that
+   it cannot be corrupted by system software.  So it must be on a
+   separate physical storage device to the primary mass storage (see
+   MASS STORAGE, above).
+
+ * This firmware will not usually be updated.
+
+
+NETWORKING
+==========
+
+Requirements
+------------
+
+ * Each host must have at least one RJ45 ethernet port compatible
+   with ordinary 100Mbit ethernet.   xxx
+
+ * The primary ethernet port must be compatible with Linux[2].
+
+ * In the case of a chassis with backplane, it is acceptable if the
+   chassis contains an ethernet switch, provided that it is a normal
+   and reliable ethernet switch (not a proprietary interconnect).
+
+ * In the case of a system with IPMI or similar LOM, it is best if the
+   LOM has its own physical ethernet port.
+
+
+CPU, CHIPSET, MOTHERBOARD, ETC.
+===============================
+
+General advice and preferences
+------------------------------
+
+ * We prefer multicore, multisocket and NUMA systems because they
+   expose a greater variety of exciting bugs.  But we don't care much
+   about performance and we want a wide variety of different hosts.
+   We want a mixture of systems with different CPU variants and
+   feature support.
+
+ * Memory requirements are modest.  8G or 16G per host is fine. xxx
+
+Compatibility with Xen and Linux - requirements
+-----------------------------------------------
+
+(Normally these issues are not a problem for x86, except perhaps for
+the network and storage controllers - see MASS STORAGE and NETWORKING,
+above.)
+
+ * [1] Xen: The CPU and other hardware must be supported by current
+   versions of xen-unstable, at the very least.
+
+ * [2] Linux: The CPU and other hardware must be supported by existing
+   widely available versions of Linux.  There are two principal
+   requirements:
+
+   + Baremetal boot from Debian stable or stable-backports:
+
+     A suitable Linux kernel binary which can boot baremetal on the
+     proposed hardware must be available from Debian (at least
+     `stable', or, if that is not possible `stable-backports').  It is
+     not OK to require a patched version of Linux, or a version of
+     Linux built from a particular git branch, or some such.  If the
+     required kernel is not available in Debian, the vendor should
+     first work with the Debian project to ensure and validate that
+     the Debian stable-backports kernel binaries boot on the proposed
+     hardware.
+
+   + Boot under Xen with Linux kernel built from source code.
+
+     For x86, recent Linux LTS or mainline kernel source code must be
+     able to boot under Xen, on the proposed hardware.
+
+     For ARM, there is a special Xen ARM kernel branch. The proposed
+     hardware must be able to boot that version of Linux under Xen.
+
+     If the Xen ARM Linux branch does not support the proposed
+     hardware yet, the hardware should not be accepted until that is
+     remedied.  Where this involves adding kernel patches to that
+     branch this is subject to the approval of its maintainers,
+     considering the need to keep it very close to upstream.
+
+ * Board-specific Linux and Xen versions are not acceptable.
+
+ * Hardware vendor offering a "board support package" is a red flag.
+   We will not be using a "board support package".  If we are offered
+   one we will need explicit confirmation, and perhaps verification,
+   of the points above.
+
+ * For ARM systems using Device Tree: check what DT is expected to be
+   used, and where and how we are expecting osstest to get it from.
+
+
+RELIABILITY
+===========
+
+ * osstest stresses systems in unusual ways.  The need to completely
+   wipe the machine for each test means test hosts are power cycled
+   more often than usual.
+
+ * Random failures due to unreliable hardware are not tolerable.  Some
+   hosts do not boot reliably.  Even a very small probability of a
+   random boot failure, per boot, is intolerable in this CI
+   environment: hosts are rebooted many times a day, and a random boot
+   failure looks just like a `hypervisor could not boot' bug.  (The
+   same bug would not be noticeable in a server farm where hosts are
+   nearly never rebooted.)
+
+
+NON-REQUIREMENTS
+================
+
+ * No VGA console needed.
+ * Redundant PSUs are not needed (see POWER, above).
+ * RAID is not needed (or wanted) (see MASS STORAGE, above).
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@xxxxxxxxxxxxxxxxxxxx
https://lists.xenproject.org/mailman/listinfo/xen-devel

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.