Xen project Mailing List

Re: [PATCH 2/2] automation: add a smoke test for xen.efi on X86

To: Stefano Stabellini <sstabellini@xxxxxxxxxx>

From: Stefano Stabellini <sstabellini@xxxxxxxxxx>

Date: Wed, 2 Oct 2024 15:22:59 -0700 (PDT)

Cc: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>, xen-devel@xxxxxxxxxxxxxxxxxxxx, Doug Goldstein <cardoe@xxxxxxxxxx>

Delivery-date: Wed, 02 Oct 2024 22:23:10 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

I forgot to reply to one important part below On Wed, 2 Oct 2024, Stefano Stabellini wrote: > On Wed, 2 Oct 2024, Marek Marczykowski-Górecki wrote: > > Check if xen.efi is bootable with an XTF dom0. > > > > The TEST_TIMEOUT is set in the script to override project-global value. > > Setting it in the gitlab yaml file doesn't work, as it's too low > > priority > > (https://docs.gitlab.com/ee/ci/variables/#cicd-variable-precedence). > > > > The multiboot2+EFI path is tested on hardware tests already. > > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> > > --- > > This requires rebuilding debian:bookworm container. > > > > The TEST_TIMEOUT issue mentioned above applies to xilix-* jobs too. It's > > not clear to me why the default TEST_TIMEOUT is set at the group level > > instead of in the yaml file, so I'm not adjusting the other places. > > Let me premise that now that we use "expect" all successful tests will > terminate as soon as the success condition is met, without waiting for > the test timeout to expire. > > There is a CI/CD variable called TEST_TIMEOUT set at the > gitlab.com/xen-project level. (There is also a check in console.exp in > case TEST_TIMEOUT is not set so that we don't run into problems in case > the CI/CD variable is removed accidentally.) The global TEST_TIMEOUT is > meant to be a high value to account for slow QEMU tests running > potentially on our slowest cloud runners. > > However, for hardware-based tests such as the xilinx-* jobs, we know > that the timeout is supposed to be less than that. The test is running > on real hardware which is considerably faster than QEMU running on our > slowest runners. Basically, the timeout depends on the runner more than > the test. So we override the TEST_TIMEOUT variable for the xilinx-* jobs > providing a lower timeout value. > > The global TEST_TIMEOUT is set to 1500. > The xilinx-* timeout is set to 120 for ARM and 1000 for x86. > > You are welcome to override the TEST_TIMEOUT value for the > hardware-based QubesOS tests. At the same time, given that on success > the timeout is not really used, it is also OK to leave it like this. > > --- > > automation/build/debian/bookworm.dockerfile | 1 + > > automation/gitlab-ci/test.yaml | 7 ++++ > > automation/scripts/qemu-smoke-x86-64-efi.sh | 44 +++++++++++++++++++++ > > 3 files changed, 52 insertions(+) > > create mode 100755 automation/scripts/qemu-smoke-x86-64-efi.sh > > > > diff --git a/automation/build/debian/bookworm.dockerfile > > b/automation/build/debian/bookworm.dockerfile > > index 3dd70cb6b2e3..061114ba522d 100644 > > --- a/automation/build/debian/bookworm.dockerfile > > +++ b/automation/build/debian/bookworm.dockerfile > > @@ -46,6 +46,7 @@ RUN apt-get update && \ > > # for test phase, qemu-smoke-* jobs > > qemu-system-x86 \ > > expect \ > > + ovmf \ > > # for test phase, qemu-alpine-* jobs > > cpio \ > > busybox-static \ > > diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml > > index 8675016b6a37..74fd3f3109ae 100644 > > --- a/automation/gitlab-ci/test.yaml > > +++ b/automation/gitlab-ci/test.yaml > > @@ -463,6 +463,13 @@ qemu-smoke-x86-64-clang-pvh: > > needs: > > - debian-bookworm-clang-debug > > > > +qemu-smoke-x86-64-gcc-efi: > > + extends: .qemu-x86-64 > > + script: > > + - ./automation/scripts/qemu-smoke-x86-64-efi.sh pv 2>&1 | tee > > ${LOGFILE} > > + needs: > > + - debian-bookworm-gcc-debug > > Given that the script you wrote (thank you!) can also handle pvh, can we > directly add a pvh job to test.yaml too? > > > > qemu-smoke-riscv64-gcc: > > extends: .qemu-riscv64 > > script: > > diff --git a/automation/scripts/qemu-smoke-x86-64-efi.sh > > b/automation/scripts/qemu-smoke-x86-64-efi.sh > > new file mode 100755 > > index 000000000000..e053cfa995ba > > --- /dev/null > > +++ b/automation/scripts/qemu-smoke-x86-64-efi.sh > > @@ -0,0 +1,44 @@ > > +#!/bin/bash > > + > > +set -ex -o pipefail > > + > > +# variant should be either pv or pvh > > +variant=$1 > > + > > +# Clone and build XTF > > +git clone https://xenbits.xen.org/git-http/xtf.git > > +cd xtf && make -j$(nproc) && cd - > > + > > +case $variant in > > + pvh) k=test-hvm64-example extra="dom0-iommu=none dom0=pvh" ;; > > + *) k=test-pv64-example extra= ;; > > +esac > > + > > +mkdir -p boot-esp/EFI/BOOT > > +cp binaries/xen.efi boot-esp/EFI/BOOT/BOOTX64.EFI > > +cp xtf/tests/example/$k boot-esp/EFI/BOOT/kernel > > + > > +cat > boot-esp/EFI/BOOT/BOOTX64.cfg <<EOF > > +[global] > > +default=test > > + > > +[test] > > +options=loglvl=all console=com1 noreboot console_timestamps=boot $extra > > +kernel=kernel > > +EOF > > + > > +cp /usr/share/OVMF/OVMF_CODE.fd OVMF_CODE.fd > > +cp /usr/share/OVMF/OVMF_VARS.fd OVMF_VARS.fd > > + > > +rm -f smoke.serial > > +export TEST_CMD="qemu-system-x86_64 -nographic -M q35,kernel-irqchip=split > > \ > > + -drive if=pflash,format=raw,readonly=on,file=OVMF_CODE.fd \ > > + -drive if=pflash,format=raw,file=OVMF_VARS.fd \ > > + -drive file=fat:rw:boot-esp,media=disk,index=0,format=raw \ > > + -m 512 -monitor none -serial stdio" > > + > > +export TEST_LOG="smoke.serial" > > +export PASSED="Test result: SUCCESS" > > +export TEST_TIMEOUT=120 Although this works, I would prefer keeping the TEST_TIMEOUT overrides in test.yaml for consistency. However, it might be better not to override it (or to override to a higher timeout value), as successful tests will terminate immediately anyway. We need to be cautious about setting TEST_TIMEOUT values too low, as using a slow runner (like a small, busy cloud instance) can lead to false positive failures. This issue occurred frequently with ARM tests when we temporarily moved from a fast ARM server to slower ARM cloud instances a couple of months ago. On the other hand, adjusting TEST_TIMEOUT for non-QEMU hardware-based tests is acceptable since those tests rely on real hardware availability, which is unlikely to become suddenly slower.

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.