[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Discussion on unexpected behavior of ARINC653 scheduler
We are observing an incorrect or unexpected behavior with ARINC653 scheduler when we set up multiple ARINC653 CPU pools and assign a different number of domains to each CPU pool. Here's the test configuration to reproduce the issue. [Test environment] Yocto release : 5.05 Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239) Target machine : QEMU ARM64 Number of physical CPUs : 4 [CPU pool configuration files] cpupool_arinc0.cfg - name= "Pool-arinc0" - sched="arinc653" - cpus=["2"] cpupool_arinc1.cfg - name="Pool-arinc1" - sched="arinc653" - cpus=["3] [Domain configuration files] Common domain configurations are applied to each domx.cfg file. - kernel = "/usr/lib/xen/images/zImage" - ramdisk = "/usr/lib/xen/images/rootfs.cpio.gz" - extra = "root=/dev/loop0 rw nohlt" - memory = 512 dom1.cfg - vcpus = 1 - pool = "Pool-arinc0" dom2.cfg / dom3.cfg / dom4.cfg - vcpus = 1 - pool = "Pool-arinc1" $ xl cpupool-cpu-remove Pool-0 2,3 $ xl cpupool-create -f cpupool_arinc0.cfg $ xl cpupool-create -f cpupool_arinc1.cfg $ xl create dom1.cfg $ xl create dom2.cfg $ xl create dom3.cfg $ xl create dom4.cfg [ARINC653 scheduler setup] $ a653_sched -P Pool-arinc0 dom1:100 $ a653_sched -P Pool-arinc1 dom2:100 dom3:100 dom4:100 It seems there's a corner condition in using the global variables "sched_index" and "next_switch_time" when multiple ARINC653 cpupools are running on different physical CPUs The variables sched_index and next_switch_time are defined as static at xen/common/sched/arinc653.c as shown below. static void cf_check a653sched_do_schedule( const struct scheduler *ops, struct sched_unit *prev, s_time_t now, bool tasklet_work_scheduled) { struct sched_unit *new_task = NULL; static unsigned int sched_index = 0; <== static s_time_t next_switch_time; <== First, a race condition against the global variables sched_index and next_switch_time is observed. They can be accessed concurrently on different physical CPUs but they are not correctly protected since each CPU pool uses its own scheduler-private spinlock. Technically, it's identical to using a local spinlock. Spinlock is held here (line# 522) and, spin_lock_irqsave(&sched_priv->lock, flags); released here (line# 577), spin_unlock_irqrestore(&sched_priv->lock, flags); Second, even if it's properly protected, global sched_index and global next_switch_time can't be used for ARINC653 CPU pools with the asymmetric number of domains and with the different runtime and major frames. Domains in the ARINC653 cpupool are organized in the schedule array and the next domain to run is determined using the global variable sched_index. Since sched_index is set to 0 whenever the major frame of one ARINC653 CPU pool is completed, the domains that belong to lengthier ARINC653 CPU pool and are accessed with higher sched_index can be never scheduled. We think this can be corrected by using a per-cpupool sched_index and next_switch_time and would be happy to provide a patch implementing this fix if this is the correct approach. Can I get your advice on this subject? Should you have any questions about the description, please let me know. Kindly understand my description might not be clear enough as I'm not a native English speaker. Regards, Anderson
|
![]() |
Lists.xenproject.org is hosted with RackSpace, monitoring our |