[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] automation: add a smoke test for xen.efi on X86



I forgot to reply to one important part below


On Wed, 2 Oct 2024, Stefano Stabellini wrote:
> On Wed, 2 Oct 2024, Marek Marczykowski-Górecki wrote:
> > Check if xen.efi is bootable with an XTF dom0.
> > 
> > The TEST_TIMEOUT is set in the script to override project-global value.
> > Setting it in the gitlab yaml file doesn't work, as it's too low
> > priority
> > (https://docs.gitlab.com/ee/ci/variables/#cicd-variable-precedence).
> > 
> > The multiboot2+EFI path is tested on hardware tests already.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx>
> > ---
> > This requires rebuilding debian:bookworm container.
> > 
> > The TEST_TIMEOUT issue mentioned above applies to xilix-* jobs too. It's
> > not clear to me why the default TEST_TIMEOUT is set at the group level
> > instead of in the yaml file, so I'm not adjusting the other places.
> 
> Let me premise that now that we use "expect" all successful tests will
> terminate as soon as the success condition is met, without waiting for
> the test timeout to expire.
> 
> There is a CI/CD variable called TEST_TIMEOUT set at the
> gitlab.com/xen-project level. (There is also a check in console.exp in
> case TEST_TIMEOUT is not set so that we don't run into problems in case
> the CI/CD variable is removed accidentally.) The global TEST_TIMEOUT is
> meant to be a high value to account for slow QEMU tests running
> potentially on our slowest cloud runners.
> 
> However, for hardware-based tests such as the xilinx-* jobs, we know
> that the timeout is supposed to be less than that. The test is running
> on real hardware which is considerably faster than QEMU running on our
> slowest runners. Basically, the timeout depends on the runner more than
> the test. So we override the TEST_TIMEOUT variable for the xilinx-* jobs
> providing a lower timeout value.
> 
> The global TEST_TIMEOUT is set to 1500.
> The xilinx-* timeout is set to 120 for ARM and 1000 for x86.
> 
> You are welcome to override the TEST_TIMEOUT value for the
> hardware-based QubesOS tests. At the same time, given that on success
> the timeout is not really used, it is also OK to leave it like this.
 
 
> > ---
> >  automation/build/debian/bookworm.dockerfile |  1 +
> >  automation/gitlab-ci/test.yaml              |  7 ++++
> >  automation/scripts/qemu-smoke-x86-64-efi.sh | 44 +++++++++++++++++++++
> >  3 files changed, 52 insertions(+)
> >  create mode 100755 automation/scripts/qemu-smoke-x86-64-efi.sh
> > 
> > diff --git a/automation/build/debian/bookworm.dockerfile 
> > b/automation/build/debian/bookworm.dockerfile
> > index 3dd70cb6b2e3..061114ba522d 100644
> > --- a/automation/build/debian/bookworm.dockerfile
> > +++ b/automation/build/debian/bookworm.dockerfile
> > @@ -46,6 +46,7 @@ RUN apt-get update && \
> >          # for test phase, qemu-smoke-* jobs
> >          qemu-system-x86 \
> >          expect \
> > +        ovmf \
> >          # for test phase, qemu-alpine-* jobs
> >          cpio \
> >          busybox-static \
> > diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> > index 8675016b6a37..74fd3f3109ae 100644
> > --- a/automation/gitlab-ci/test.yaml
> > +++ b/automation/gitlab-ci/test.yaml
> > @@ -463,6 +463,13 @@ qemu-smoke-x86-64-clang-pvh:
> >    needs:
> >      - debian-bookworm-clang-debug
> >  
> > +qemu-smoke-x86-64-gcc-efi:
> > +  extends: .qemu-x86-64
> > +  script:
> > +    - ./automation/scripts/qemu-smoke-x86-64-efi.sh pv 2>&1 | tee 
> > ${LOGFILE}
> > +  needs:
> > +    - debian-bookworm-gcc-debug
> 
> Given that the script you wrote (thank you!) can also handle pvh, can we
> directly add a pvh job to test.yaml too?
> 
> 
> >  qemu-smoke-riscv64-gcc:
> >    extends: .qemu-riscv64
> >    script:
> > diff --git a/automation/scripts/qemu-smoke-x86-64-efi.sh 
> > b/automation/scripts/qemu-smoke-x86-64-efi.sh
> > new file mode 100755
> > index 000000000000..e053cfa995ba
> > --- /dev/null
> > +++ b/automation/scripts/qemu-smoke-x86-64-efi.sh
> > @@ -0,0 +1,44 @@
> > +#!/bin/bash
> > +
> > +set -ex -o pipefail
> > +
> > +# variant should be either pv or pvh
> > +variant=$1
> > +
> > +# Clone and build XTF
> > +git clone https://xenbits.xen.org/git-http/xtf.git
> > +cd xtf && make -j$(nproc) && cd -
> > +
> > +case $variant in
> > +    pvh) k=test-hvm64-example    extra="dom0-iommu=none dom0=pvh" ;;
> > +    *)   k=test-pv64-example     extra= ;;
> > +esac
> > +
> > +mkdir -p boot-esp/EFI/BOOT
> > +cp binaries/xen.efi boot-esp/EFI/BOOT/BOOTX64.EFI
> > +cp xtf/tests/example/$k boot-esp/EFI/BOOT/kernel
> > +
> > +cat > boot-esp/EFI/BOOT/BOOTX64.cfg <<EOF
> > +[global]
> > +default=test
> > +
> > +[test]
> > +options=loglvl=all console=com1 noreboot console_timestamps=boot $extra
> > +kernel=kernel
> > +EOF
> > +
> > +cp /usr/share/OVMF/OVMF_CODE.fd OVMF_CODE.fd
> > +cp /usr/share/OVMF/OVMF_VARS.fd OVMF_VARS.fd
> > +
> > +rm -f smoke.serial
> > +export TEST_CMD="qemu-system-x86_64 -nographic -M q35,kernel-irqchip=split 
> > \
> > +        -drive if=pflash,format=raw,readonly=on,file=OVMF_CODE.fd \
> > +        -drive if=pflash,format=raw,file=OVMF_VARS.fd \
> > +        -drive file=fat:rw:boot-esp,media=disk,index=0,format=raw \
> > +        -m 512 -monitor none -serial stdio"
> > +
> > +export TEST_LOG="smoke.serial"
> > +export PASSED="Test result: SUCCESS"
> > +export TEST_TIMEOUT=120

Although this works, I would prefer keeping the TEST_TIMEOUT overrides
in test.yaml for consistency. However, it might be better not to
override it (or to override to a higher timeout value), as successful
tests will terminate immediately anyway. We need to be cautious about
setting TEST_TIMEOUT values too low, as using a slow runner (like a
small, busy cloud instance) can lead to false positive failures. This
issue occurred frequently with ARM tests when we temporarily moved from
a fast ARM server to slower ARM cloud instances a couple of months ago.

On the other hand, adjusting TEST_TIMEOUT for non-QEMU hardware-based
tests is acceptable since those tests rely on real hardware
availability, which is unlikely to become suddenly slower.

 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.