[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Discussion on unexpected behavior of ARINC653 scheduler


  • To: "xen-devel@xxxxxxxxxxxxxxxxxxxx" <xen-devel@xxxxxxxxxxxxxxxxxxxx>
  • From: "Choi, Anderson" <Anderson.Choi@xxxxxxxxxx>
  • Date: Thu, 13 Mar 2025 06:51:49 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=boeing.com; dmarc=pass action=none header.from=boeing.com; dkim=pass header.d=boeing.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector5401; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=SsIpSQRIgi5CFieI6zDT5nb8o4DTTGNAzd8wdVzGX9A=; b=sqG+yJWpAQyIijGp0DLKBQFhtuUsiZVJz2YL0RlHzj7Hra5LlhZdIwJqXTaTeRWaqaCAMVPhOwqjmHEqXDQm6hXd1SIB6J4XdutJMPmQnmjimexxeeS1XcazHQHOBzpyGZTZNRN2LYHOYBx81ziBIocdUZ/Q9UeYpxY2OoEcaXTIFQ3qONhPnZu8ch1jp0viiyZEEKQtbpxHwochyAf7++T4OX/m6zoxMqlAgvIYgxBlb2n9XhUJm5396NISgWRg+E9eS+EEdZMlUjvv3I/4VhrnQItucq44SKKmeXNvcDswrPNBK/kuMgLlBrJQXwxVf8ZrXr5Ucsv6JY5fH1pXPA==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector5401; d=microsoft.com; cv=none; b=kbyhGnlNy3ZjsTUtpf6HN4NsydaAp04oJ13IARrL/PO2V0LmeUTo0yVFDu6uHGcWWjoJ/0MTljE0q3N5ENNwaMW72TnGj11tUHojHzU5OM3GM/q6+Slbxy2ugcG1iIf08+YAJ6/8bJGCLSr6Ks4VZjTw9i9q2xTlbeN5aZWvtJmQYU/UQaeTMzA8CaYNZWN7HfUJBhpPJFDeGo2J/MyL+Ewo/ErrxqLQk1IeD/aYP9LVIhbA5WCPlAgSFm/8IhpAfOAHW5R2C5fSWzGpjJn6VKMiy6YtG6vFyCDzfOU1Rn+X0l7LXmfuGnkTK9+OKThMkaTf7JABswNtRRNrWuceXA==
  • Authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=boeing.com;
  • Cc: "nathan.studer@xxxxxxxxxxxxxxx" <nathan.studer@xxxxxxxxxxxxxxx>, "stewart@xxxxxxx" <stewart@xxxxxxx>, "Weber (US), Matthew L" <matthew.l.weber3@xxxxxxxxxx>, "Whitehead (US), Joshua C" <joshua.c.whitehead@xxxxxxxxxx>
  • Delivery-date: Thu, 13 Mar 2025 06:52:20 +0000
  • List-id: Xen developer discussion <xen-devel.lists.xenproject.org>
  • Thread-index: AduT5GPU8k+LryI1RQGgVtY2Yt6hPw==
  • Thread-topic: Discussion on unexpected behavior of ARINC653 scheduler

We are observing an incorrect or unexpected behavior with ARINC653 scheduler 
when we set up multiple ARINC653 CPU pools and assign a different number of 
domains to each CPU pool.

Here's the test configuration to reproduce the issue.

[Test environment]
Yocto release : 5.05
Xen release : 4.19 (hash = 026c9fa29716b0ff0f8b7c687908e71ba29cf239)
Target machine : QEMU ARM64
Number of physical CPUs : 4

[CPU pool configuration files]
cpupool_arinc0.cfg
- name= "Pool-arinc0"
- sched="arinc653"
- cpus=["2"]

cpupool_arinc1.cfg
- name="Pool-arinc1"
- sched="arinc653"
- cpus=["3]

[Domain configuration files]
Common domain configurations are applied to each domx.cfg file.
- kernel = "/usr/lib/xen/images/zImage"
- ramdisk = "/usr/lib/xen/images/rootfs.cpio.gz"
- extra = "root=/dev/loop0 rw nohlt"
- memory = 512

dom1.cfg
- vcpus = 1
- pool = "Pool-arinc0"

dom2.cfg / dom3.cfg / dom4.cfg 
- vcpus = 1
- pool = "Pool-arinc1"

$ xl cpupool-cpu-remove Pool-0 2,3
$ xl cpupool-create -f cpupool_arinc0.cfg
$ xl cpupool-create -f cpupool_arinc1.cfg
$ xl create dom1.cfg
$ xl create dom2.cfg
$ xl create dom3.cfg
$ xl create dom4.cfg

[ARINC653 scheduler setup]
$ a653_sched -P Pool-arinc0 dom1:100
$ a653_sched -P Pool-arinc1 dom2:100 dom3:100 dom4:100

It seems there's a corner condition in using the global variables "sched_index" 
and "next_switch_time" when multiple ARINC653 cpupools are running on different 
physical CPUs

The variables sched_index and next_switch_time are defined as static at 
xen/common/sched/arinc653.c as shown below.

static void cf_check
a653sched_do_schedule(
    const struct scheduler *ops,
    struct sched_unit *prev,
    s_time_t now,
    bool tasklet_work_scheduled)
{
    struct sched_unit *new_task = NULL;
    static unsigned int sched_index = 0;    <==
    static s_time_t next_switch_time;       <==

First, a race condition against the global variables sched_index and 
next_switch_time is observed.
They can be accessed concurrently on different physical CPUs but they are not 
correctly protected since each CPU pool uses its own scheduler-private spinlock.
Technically, it's identical to using a local spinlock.

Spinlock is held here (line# 522) and,
spin_lock_irqsave(&sched_priv->lock, flags);

released here (line# 577),
spin_unlock_irqrestore(&sched_priv->lock, flags);

Second, even if it's properly protected, global sched_index and global 
next_switch_time can't be used for ARINC653 CPU pools with the asymmetric 
number of domains and with the different runtime and major frames.
Domains in the ARINC653 cpupool are organized in the schedule array and the 
next domain to run is determined using the global variable sched_index.
Since sched_index is set to 0 whenever the major frame of one ARINC653 CPU pool 
is completed, the domains that belong to lengthier ARINC653 CPU pool and are 
accessed with higher sched_index can be never scheduled.

We think this can be corrected by using a per-cpupool sched_index and 
next_switch_time and would be happy to provide a patch implementing this fix if 
this is the correct approach.

Can I get your advice on this subject?

Should you have any questions about the description, please let me know.

Kindly understand my description might not be clear enough as I'm not a native 
English speaker.

Regards,
Anderson



 


Rackspace

Lists.xenproject.org is hosted with RackSpace, monitoring our
servers 24x7x365 and backed by RackSpace's Fanatical Support®.