|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Discussion on unexpected behavior of ARINC653 scheduler
We are observing an incorrect or unexpected behavior with ARINC653 scheduler
when we set up multiple ARINC653 CPU pools and assign a different number of
domains to each CPU pool.
Here's the test configuration to reproduce the issue.
[Test environment]
Yocto release : 5.05
Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239)
Target machine : QEMU ARM64
Number of physical CPUs : 4
[CPU pool configuration files]
cpupool_arinc0.cfg
- name= "Pool-arinc0"
- sched="arinc653"
- cpus=["2"]
cpupool_arinc1.cfg
- name="Pool-arinc1"
- sched="arinc653"
- cpus=["3]
[Domain configuration files]
Common domain configurations are applied to each domx.cfg file.
- kernel = "/usr/lib/xen/images/zImage"
- ramdisk = "/usr/lib/xen/images/rootfs.cpio.gz"
- extra = "root=/dev/loop0 rw nohlt"
- memory = 512
dom1.cfg
- vcpus = 1
- pool = "Pool-arinc0"
dom2.cfg / dom3.cfg / dom4.cfg
- vcpus = 1
- pool = "Pool-arinc1"
$ xl cpupool-cpu-remove Pool-0 2,3
$ xl cpupool-create -f cpupool_arinc0.cfg
$ xl cpupool-create -f cpupool_arinc1.cfg
$ xl create dom1.cfg
$ xl create dom2.cfg
$ xl create dom3.cfg
$ xl create dom4.cfg
[ARINC653 scheduler setup]
$ a653_sched -P Pool-arinc0 dom1:100
$ a653_sched -P Pool-arinc1 dom2:100 dom3:100 dom4:100
It seems there's a corner condition in using the global variables "sched_index"
and "next_switch_time" when multiple ARINC653 cpupools are running on different
physical CPUs
The variables sched_index and next_switch_time are defined as static at
xen/common/sched/arinc653.c as shown below.
static void cf_check
a653sched_do_schedule(
const struct scheduler *ops,
struct sched_unit *prev,
s_time_t now,
bool tasklet_work_scheduled)
{
struct sched_unit *new_task = NULL;
static unsigned int sched_index = 0; <==
static s_time_t next_switch_time; <==
First, a race condition against the global variables sched_index and
next_switch_time is observed.
They can be accessed concurrently on different physical CPUs but they are not
correctly protected since each CPU pool uses its own scheduler-private spinlock.
Technically, it's identical to using a local spinlock.
Spinlock is held here (line# 522) and,
spin_lock_irqsave(&sched_priv->lock, flags);
released here (line# 577),
spin_unlock_irqrestore(&sched_priv->lock, flags);
Second, even if it's properly protected, global sched_index and global
next_switch_time can't be used for ARINC653 CPU pools with the asymmetric
number of domains and with the different runtime and major frames.
Domains in the ARINC653 cpupool are organized in the schedule array and the
next domain to run is determined using the global variable sched_index.
Since sched_index is set to 0 whenever the major frame of one ARINC653 CPU pool
is completed, the domains that belong to lengthier ARINC653 CPU pool and are
accessed with higher sched_index can be never scheduled.
We think this can be corrected by using a per-cpupool sched_index and
next_switch_time and would be happy to provide a patch implementing this fix if
this is the correct approach.
Can I get your advice on this subject?
Should you have any questions about the description, please let me know.
Kindly understand my description might not be clear enough as I'm not a native
English speaker.
Regards,
Anderson
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |