Xen project Mailing List

Discussion on unexpected behavior of ARINC653 scheduler

To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>

From: "Choi, Anderson" <Anderson.Choi@xxxxxxxxxx>

Date: Thu, 13 Mar 2025 06:51:49 +0000

Accept-language: en-US

Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=boeing.com; dmarc=pass action=none header.from=boeing.com; dkim=pass header.d=boeing.com; arc=none

Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector5401; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=SsIpSQRIgi5CFieI6zDT5nb8o4DTTGNAzd8wdVzGX9A=; b=sqG+yJWpAQyIijGp0DLKBQFhtuUsiZVJz2YL0RlHzj7Hra5LlhZdIwJqXTaTeRWaqaCAMVPhOwqjmHEqXDQm6hXd1SIB6J4XdutJMPmQnmjimexxeeS1XcazHQHOBzpyGZTZNRN2LYHOYBx81ziBIocdUZ/Q9UeYpxY2OoEcaXTIFQ3qONhPnZu8ch1jp0viiyZEEKQtbpxHwochyAf7++T4OX/m6zoxMqlAgvIYgxBlb2n9XhUJm5396NISgWRg+E9eS+EEdZMlUjvv3I/4VhrnQItucq44SKKmeXNvcDswrPNBK/kuMgLlBrJQXwxVf8ZrXr5Ucsv6JY5fH1pXPA==

Arc-seal: i=1; a=rsa-sha256; s=arcselector5401; d=microsoft.com; cv=none; b=kbyhGnlNy3ZjsTUtpf6HN4NsydaAp04oJ13IARrL/PO2V0LmeUTo0yVFDu6uHGcWWjoJ/0MTljE0q3N5ENNwaMW72TnGj11tUHojHzU5OM3GM/q6+Slbxy2ugcG1iIf08+YAJ6/8bJGCLSr6Ks4VZjTw9i9q2xTlbeN5aZWvtJmQYU/UQaeTMzA8CaYNZWN7HfUJBhpPJFDeGo2J/MyL+Ewo/ErrxqLQk1IeD/aYP9LVIhbA5WCPlAgSFm/8IhpAfOAHW5R2C5fSWzGpjJn6VKMiy6YtG6vFyCDzfOU1Rn+X0l7LXmfuGnkTK9+OKThMkaTf7JABswNtRRNrWuceXA==

Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=boeing.com;

Cc: "nathan.studer@xxxxxxxxxxxxxxx" <nathan.studer@xxxxxxxxxxxxxxx>, "stewart@xxxxxxx" <stewart@xxxxxxx>, "Weber (US), Matthew L" <matthew.l.weber3@xxxxxxxxxx>, "Whitehead (US), Joshua C" <joshua.c.whitehead@xxxxxxxxxx>

Delivery-date: Thu, 13 Mar 2025 06:52:20 +0000

List-id: Xen developer discussion <xen-devel.lists.xenproject.org>

Thread-index: AduT5GPU8k+LryI1RQGgVtY2Yt6hPw==

Thread-topic: Discussion on unexpected behavior of ARINC653 scheduler

We are observing an incorrect or unexpected behavior with ARINC653 scheduler when we set up multiple ARINC653 CPU pools and assign a different number of domains to each CPU pool. Here's the test configuration to reproduce the issue. [Test environment] Yocto release : 5.05 Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239) Target machine : QEMU ARM64 Number of physical CPUs : 4 [CPU pool configuration files] cpupool_arinc0.cfg - name= "Pool-arinc0" - sched="arinc653" - cpus=["2"] cpupool_arinc1.cfg - name="Pool-arinc1" - sched="arinc653" - cpus=["3] [Domain configuration files] Common domain configurations are applied to each domx.cfg file. - kernel = "/usr/lib/xen/images/zImage" - ramdisk = "/usr/lib/xen/images/rootfs.cpio.gz" - extra = "root=/dev/loop0 rw nohlt" - memory = 512 dom1.cfg - vcpus = 1 - pool = "Pool-arinc0" dom2.cfg / dom3.cfg / dom4.cfg - vcpus = 1 - pool = "Pool-arinc1" $ xl cpupool-cpu-remove Pool-0 2,3 $ xl cpupool-create -f cpupool_arinc0.cfg $ xl cpupool-create -f cpupool_arinc1.cfg $ xl create dom1.cfg $ xl create dom2.cfg $ xl create dom3.cfg $ xl create dom4.cfg [ARINC653 scheduler setup] $ a653_sched -P Pool-arinc0 dom1:100 $ a653_sched -P Pool-arinc1 dom2:100 dom3:100 dom4:100 It seems there's a corner condition in using the global variables "sched_index" and "next_switch_time" when multiple ARINC653 cpupools are running on different physical CPUs The variables sched_index and next_switch_time are defined as static at xen/common/sched/arinc653.c as shown below. static void cf_check a653sched_do_schedule( const struct scheduler *ops, struct sched_unit *prev, s_time_t now, bool tasklet_work_scheduled) { struct sched_unit *new_task = NULL; static unsigned int sched_index = 0; <== static s_time_t next_switch_time; <== First, a race condition against the global variables sched_index and next_switch_time is observed. They can be accessed concurrently on different physical CPUs but they are not correctly protected since each CPU pool uses its own scheduler-private spinlock. Technically, it's identical to using a local spinlock. Spinlock is held here (line# 522) and, spin_lock_irqsave(&sched_priv->lock, flags); released here (line# 577), spin_unlock_irqrestore(&sched_priv->lock, flags); Second, even if it's properly protected, global sched_index and global next_switch_time can't be used for ARINC653 CPU pools with the asymmetric number of domains and with the different runtime and major frames. Domains in the ARINC653 cpupool are organized in the schedule array and the next domain to run is determined using the global variable sched_index. Since sched_index is set to 0 whenever the major frame of one ARINC653 CPU pool is completed, the domains that belong to lengthier ARINC653 CPU pool and are accessed with higher sched_index can be never scheduled. We think this can be corrected by using a per-cpupool sched_index and next_switch_time and would be happy to provide a patch implementing this fix if this is the correct approach. Can I get your advice on this subject? Should you have any questions about the description, please let me know. Kindly understand my description might not be clear enough as I'm not a native English speaker. Regards, Anderson

©2013 Xen Project, A Linux Foundation Collaborative Project. All Rights Reserved.
Linux Foundation is a registered trademark of The Linux Foundation.
Xen Project is a trademark of The Linux Foundation.