Schedulability Analysis of Non-preemptive Real-time Scheduling for Multicore Processors with Shared Caches

Xiao, J.; Altmeyer, S.; Pimentel, A.

DOI
10.1109/RTSS.2017.00026

Publication date
2017

Document Version
Final published version

Published in
2017 IEEE Real-Time Systems Symposium

License
Article 25fa Dutch Copyright Act (https://www.openaccess.nl/en/in-the-netherlands/you-share-we-take-care)

Link to publication

Citation for published version (APA):

General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).

Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.

UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)
Schedulability Analysis of Non-preemptive Real-time Scheduling for Multicore Processors with Shared Caches

Jun Xiao*, Sebastian Altmeyer† and Andy Pimentel‡
Informatics Institute, University of Amsterdam, Amsterdam, Netherlands
Email: *J.Xiao@uva.nl, †S.J.Altmeyer@uva.nl, ‡A.D.Pimentel@uva.nl

Abstract—Shared caches in multicore processors introduce serious difficulties in providing guarantees on the real-time properties of embedded software due to the interaction and the resulting contention in the shared caches. To address this problem, we develop a new schedulability analysis for real-time multicore systems with shared caches. To the best of our knowledge, this is the first work that addresses the schedulability problem with inter-core cache interference. We construct an integer programming formulation, which can be transformed to an integer linear programming formulation, to calculate an upper bound on cache interference exhibited by a task within a given execution window. Using the integer programming formulation, an iterative algorithm is presented to obtain the upper bound on cache interference a task may exhibit during one job execution. The upper bound on cache interference is subsequently integrated into the schedulability analysis to derive a new schedulability condition. A range of experiments is performed to investigate how the schedulability is degraded by shared cache interference.

I. INTRODUCTION

Multicore architectures are increasingly used in both the desktop and the embedded markets. Modern multicore processors incorporate shared resources between cores to improve performance and efficiency. Shared caches are among the most critical shared resources on multicore systems as they can efficiently bridge the performance gap between memory and processor speeds by backing up small private caches. However, this brings major difficulties in providing guarantees on real-time properties of embedded software due to the interaction and the resulting contention in a shared cache.

In a multicore processor with shared caches, a real-time task may suffer from two different kinds of cache interferences [13], which severely degrade the timing predictability of multicore systems. The first is called intra-core cache interference, which occurs within a core, when a task is preempted and its data is evicted from the cache by other real-time tasks. The second is inter-core cache interference, which happens when tasks executing on different cores access the shared cache simultaneously. Inter-core cache interference may cause several types of cache misses including capacity misses, conflict misses and so on [4].

It is challenging to design real-time applications executing on multicore platforms with shared caches, which cannot afford to miss deadlines and hence demand timing predictability. Any schedulability analysis requires knowledge about the Worst-Case Execution Time (WCET) of real-time tasks. With a multicore system, the WCETs are strongly dependent on the amount of inter-core interference on shared hardware resources such as main memory, shared caches and interconnects. In this paper, we shall only focus on the shared cache interferences and study the schedulability analysis problem for hard real-time tasks that exhibit shared cache interferences.

A major obstacle is to predict the cache behavior to accurately obtain the WCET of a real-time task considering inter-core cache interference since different cache behaviors (cache hit or miss) will result in different execution times of each instruction. In [19], it was even pointed out that "it will be extremely difficult, if not impossible, to develop analysis methods that can accurately capture the contention among multiple cores in a shared cache".

In this paper, task’s WCET does not account for shared cache interference. [12] presents such an approach to derive a task’s WCET without considering shared cache interference. We propose a novel schedulability analysis of non-preemptive real-time scheduling for multicore systems with shared caches. Intra-core cache interference is avoided since no preemption is possible during task execution. We therefore focus on inter-core cache interference and specially conflict misses that caused by limited cache associativity. The main contributions of this work are:

- An integer programming formulation is constructed to calculate the upper bound on the cache interference exhibited by a task within a given execution window;
- An iterative algorithm is presented to obtain the upper bound on inter-core cache interference a task may exhibit during its job executions;
- A new schedulability condition is derived by integrating the upper bound on inter-core cache interference into the schedulability analysis;
- A set of experiments are performed using the proposed schedulability analysis to investigate the effects of inter-core cache interference for a range of different tasksets.

The rest of the paper is organized as follows. Section II gives an overview of the related work. The system model is described in Section III. Section IV describes the proposed schedulability analysis, where we also detail the computation of processor-
contention and inter-core cache interferences applied in the analysis. Section V presents an iterative computation to obtain the upper bound of inter-core cache interferences. Section VI presents the experimental results, after which Section VII concludes the paper.

II. RELATED WORK

For hard real-time systems, it is essential to obtain each real-time task’s WCET, which provides the basis for the schedulability analysis. WCET analysis has been actively investigated in the last two decades, of which an excellent overview can be found in [21]. There are well-developed techniques to estimate real-time tasks’ WCET for single processor systems. Unfortunately, the existing techniques for single processor platforms are not applicable to multicore systems with shared caches. Only a few methods have been developed to estimate task WCETs for multicore systems with shared caches [23], [11], [15]. In almost all those works, due to the assumption that cache interferences can occur at any program point, WCET analysis will be extremely pessimistic, especially when the system contains many cores and tasks. An overestimated WCET is not useful as it degrades system schedulability.

Since shared caches introduce the difficulty into accurately estimating the WCET, many researchers in the real-time systems community have recognized and studied the problem of cache interference in order to use shared caches in a predictable manner. Cache partitioning, which isolates application workloads that interfere with each other by assigning separate shared cache partitions to individual tasks, is a successful and widely-used approach to address contention for shared caches in (real-time) multicore applications. There are two cache partitioning methods: software-based and hardware-based techniques [8]. The most common software-based cache partitioning technique is page coloring [16], [20]. By exploiting the virtual to physical page address translations present in virtual memory systems at OS-level, page addresses are mapped to pre-defined cache regions to avoid the overlap of cache spaces. Hardware-based cache partitioning is achieved using a cache locking mechanism [6], [19], [17], which prevents cache lines from being evicted during program execution. The main drawback of cache locking is that it requires specific hardware support that is not available in many commercial processors. With shared cache partitioning techniques, one can apply existing analyses to derive the upper bounds of a task’s WCET assuming that no cache interference can occur between tasks simultaneously running on different cores. In that case, it is safe to use the derived WCETs in the schedulability analysis.

Although the schedulability analysis of global multiprocessor scheduling has been intensively studied [14], [5], [3], few works addressing schedulability analysis for multicorecs with shared caches are [9], [22], where cache space isolation is deployed. In addition to regular temporal constraints, cache constraints due to cache space isolation are added in the schedulability analysis. They propose a linear programming formulation to perform the schedulability test and an over approximation of this formulation to improve the scalability of the test. However, this solution is not applicable to our problem since our system architecture does not deploy any cache isolation techniques.

Our work also differs from other approaches to the timing verification of multicore systems [1] in that all other sources of interferences are assumed to be included within the WCET. We analyze the effect of shared cache interference on the schedulability. To the best of our knowledge, this is the first work that integrates inter-core cache interferences into schedulability analysis.

III. SYSTEM MODEL

A. Task Model

We consider a set $\tau$ of $n$ periodic or sporadic real-time tasks $\tau_1, \tau_2, \ldots, \tau_n$ to be scheduled on a multicore processor. Each task $\tau_k = (C_k, D_k, T_k) \in \tau$ is characterized by a worst-case computation time $C_k$, a period or minimum inter-arrival time $T_k$, and a relative deadline $D_k$. All tasks are considered to be deadline constrained, i.e. the task relative deadline is less or equal to task period: $D_k \leq T_k$.

We further assume that all those tasks are independent, i.e. they have no shared variables, no precedence constraints, and so on. Moreover, jobs of any task cannot be executed at the same time on more than one core. A task $\tau_k$ is a sequence of jobs $J_k^j$, where $j$ is the job index. We denote the arrival time, starting time, finishing time and absolute deadline of a job $j$ as $r_k^j$, $s_k^j$, $f_k^j$ and $d_k^j$, respectively. Note that the goal of a real-time scheduling algorithm is to guarantee that each job will complete before its absolute deadline: $f_k^j \leq d_k^j = r_k^j + D_k$.

As explained, it is difficult to accurately estimate $C_k$ considering cache interference of other tasks executing concurrently. It should be pointed out that $C_k$ in this paper refers to the WCET of task $k$, assuming task $k$ is the only task executing on the multicore processor platform, i.e. any cache interference delays are not included in $C_k$.

Since time measurement cannot be more precise than one tick of the system clock, all timing parameters and variables in this paper are assumed to be non-negative integer values.

B. Architecture Model

Our system architecture consists of a multicore processor with $m$ identical cores onto which the individual tasks are scheduled. Most multicore processors have different independent caches, including instruction and data caches. Caches are organized as a hierarchy of multiple cache levels to address the tradeoff between cache latency and hit rate. The low level caches ($L1$) in our considered multicore processor are assumed to be private, while the last level caches ($LLC$, for example $L2$) are shared between all cores. Furthermore, we assume that the LLC cache is noninclusive with respect to the private caches ($L1$), and that LLC caches are direct-mapped caches.

We believe this work can be extended to set-associative LRU caches, and we plan to do so as future work.
In this work, we only consider instruction caches since we adopt the approach in [12], which only accounts for instruction cache, to derive WCET.

C. NP-FP Scheduler

In this paper, we focus on non-preemptive global scheduling, thus we do not have to consider intra-core cache interference. If not explicitly stated, cache interference will therefore refer to inter-core cache interference in the following discussion. For simplicity reasons, we will take the Non-Preemptive Fixed Priority (NP-FP) global scheduling as the example in this paper. We will extend our work to other non-preemptive schedulers such as a global Non-Preemptive Earliest Deadline First (NP-EDF) scheduler.

To use NP-FP scheduling, a priority $P_k$ is assigned to each task $\tau_k$ ($k = 1, 2, ..., n$). As each task has a unique priority, we use $hp(k)$ to denote the set of tasks with higher priorities than $\tau_k$, and $lhp(k) = hp(k) \cup \{\tau_k\}$ the set of tasks whose priorities are not lower than $\tau_k$. Similarly, $lp(k)$ is the set of tasks with lower priorities than $\tau_k$ and $lep(k) = lp(k) \cup \{\tau_k\}$ the set of tasks whose priorities are not higher than $\tau_k$.

The NP-FP scheduling algorithm is work-conserving, according to the following definition.

**Definition 1.** A scheduling algorithm is work-conserving if there are no idle cores when a ready task is waiting for execution.

IV. Schedulability Analysis

In this section, we give an overview of the new schedulability analysis that accounts for cache interference. We also present the approaches to derive the upper bound on the parameters used in the schedulability condition.

A. Overview

We first analyze the execution of one job $J_k^j$ of a task $\tau_k$. The time interval $[r_k^j, d_k^j]$ is called a problem window [3].

![Figure 1: Overview of the new schedulability analysis that accounts for cache interference.](image)

As shown in Figure 1, a task $\tau_k$ exhibits two kinds of interferences during the execution of a job. The first interference is $I_k^{pre}(r_k^j, d_k^j)$, denoting the cumulative length of all intervals over the problem window in which $\tau_k$ is ready to execute but cannot proceed due to unavailability of cores as they are executing other jobs. We define the interference $I_k^{pre}(r_k^j, d_k^j)$ of a task $\tau_j$ on a task $\tau_k$ over the interval $[r_k^j, d_k^j]$ as the cumulative length of all intervals in which $\tau_k$ is ready to execute, and $\tau_j$ is executing while $\tau_k$ is not. The second type of interference is the cumulative length of all extra execution delays caused by shared cache interference due to conflict accesses from all other tasks running concurrently on other cores, denoted as $I_k^{sc}(r_k^j, d_k^j)$. We also define the interference $I_{k,j}^{sc}(r_k^j, d_k^j)$ as the cumulative length of all extra execution delays of $\tau_k$ over the problem window caused by conflict shared cache accesses between task $\tau_j$ and task $\tau_k$.

It is clear that for a job to meet its deadline, the sum of the two interferences a task exhibits in the problem window plus the task’s WCET $C_k$ must be less than the length of the problem window, which is $D_k$. For a task to be schedulable, this condition must hold for all its jobs.

We define the worst-case interference for task $\tau_k$ as:

$$ I_k = \max(I_k^{pre}(r_k^j, d_k^j) + I_k^{sc}(r_k^j, d_k^j)) $$

$$ = I_k^{pre}(r_k^j, d_k^j) + I_k^{sc}(r_k^j, d_k^j) $$

where $p*$ is the job instance in which the sum of the two interferences is maximal.

By construction, we have the first schedulability test for $\tau$.

**Theorem 1.** A task set $\tau$ is schedulable with a NP-FP scheduling policy on a multicore processor composed of $m$ identical cores with shared caches if and only if for each task $\tau_k \in \tau$

$$ I_k + C_k < D_k $$

The necessary and sufficient schedulability condition expressed by Theorem 1 cannot be used to check if a task set is schedulable without knowing how to compute the interference terms $I_k$. Unfortunately, we are not aware of any method to compute $I_k$ starting from the given task parameters. To sidestep this problem, we will use an upper bound on each of the interferences.

We define the worst-case processor-contention interference $I_k^{pre}(r_k^j, d_k^j)$ and worst-case shared cache interference $I_k^{sc}(r_k^j, d_k^j)$ for task $\tau_k$ as:

$$ I_k^{pre} = \max(I_k^{pre}(r_k^j, d_k^j)) = I_k^{pre}(r_k^{q*}, d_k^{q*}) $$

where $q*$ is the job instance in which the processor-contention interference is maximal and

$$ I_k^{sc} = \max(I_k^{sc}(r_k^j, d_k^j)) = I_k^{sc}(r_k^{s*}, d_k^{s*}) $$

where $s*$ is the job instance in which the cache interference is maximal, respectively.

**Theorem 2.** A task set $\tau$ is schedulable with a NP-FP scheduling policy on a multicore processor composed of $m$ identical cores with shared caches if for each task $\tau_k \in \tau$

$$ I_k^{pre} + C_k + I_k^{sc} < D_k $$
Proof.

\[ \bar{I}_k = \max_j (P^\text{pre}_k (r^j_k, d^j_k) + I^\text{ec}_k (r^j_k, d^j_k)) \]
\[ \leq \max_j (P^\text{pre}_k (r^j_k, d^j_k)) + \max_j (I^\text{ec}_k (r^j_k, d^j_k)) \]
\[ = \bar{P}^\text{pre}_k + \bar{I}^\text{ec}_k \]

if \( \bar{P}^\text{pre}_k + C_k + \bar{I}^\text{ec}_k < D_k \), then \( \bar{I}_k + C_k < D_k \). The Theorem follows from Theorem 1.

B. Computing an upper bound of \( \bar{I}^\text{pre}_k \)

The workload \( W_i(D_k) \) of a task \( \tau_i \) in the problem window \([r^j_k, d^j_k]\) of length \( D_k \) is the time \( \tau_i \) executes during interval \([r^j_k, d^j_k]\), according to a given scheduling policy.

Lemma 1. The processor-contention interference that a task \( \tau_i \) causes on a task \( \tau_k \) in the problem window of \( \tau_i \) is never greater than the workload of \( \tau_i \) in the problem window.

\[ \forall i, k, j \quad P^\text{pre}_k (r^j_k, d^j_k) \leq W_i(D_k) \]

Lemma 1 is obvious, since \( W_i(D_k) \) is an upper bound on the execution of \( \tau_i \) in the problem window.

Since it is difficult to compute the exact value of \( P^\text{pre}_k \), we will compute the upper bound of the worst-case workload by each task in the problem window, and use the sum of each task’s workload to derive an upper bound on \( P^\text{pre}_k \).

As shown in Figure 2, the upper bound of the worst-case workload can be calculated by categorizing each job of \( \tau_i \) in the problem window into one of the three types [2]:

- Carry-in job: a job with its release time earlier than \( r^j_k \), but with its deadline in the problem window;
- Body job: a job with both its release time and its deadline in the problem window;
- Carry-out job: a job with its release time in the problem window, but with its deadline later than \( d^j_k \).

![Figure 2: Three types of contribution jobs and problem window. Case (a) and (b) shows the densest possible packing of jobs of \( \tau_i \) if \( \tau_i \) has a carry-in job and has no carry-in jobs, respectively.](image)

The worst-case workload of \( \tau_i \) occurs when a carry-in job (if \( \tau_i \) has a carry-in job) finishes execution as late as possible and a carry-out job starts its execution as early as possible. We use \( W^\text{nc}_i(D_k) \) to denote an upper bound of \( \tau_i \)'s workload in the problem window if \( \tau_i \) has no carry-in job, and use \( W^\text{ci}_i(D_k) \) to denote an upper bound of \( \tau_i \)'s workload if \( \tau_i \) has a carry-in job. Following the approach in [10], we compute \( W^\text{nc}_i(D_k) \) and \( W^\text{ci}_i(D_k) \) as follows:

- Computing \( W^\text{nc}_i(D_k) \)
  For \( \tau_i \in \text{hp}(k) \), the worst-case workload of task \( \tau_i \) occurs when a job of \( \tau_i \) arrives at exactly the start of the problem window, as shown in case (b) in Figure 2. The next jobs of \( \tau_i \) are then released periodically every \( T_i \) time units. Therefore the number of body jobs of \( \tau_i \) that contribute with \( C_i \) to the workload in the problem window is at most \( \left\lfloor \frac{D_k}{T_i} \right\rfloor \). The contribution of the carry-out job can then be bounded by \( \min(D_k \mod T_i, C_i) \).

\[ \tau_i \)'s workload in the problem window is 0 for \( \tau_i \in \text{lep}(k) \).

We can compute \( W^\text{nc}_i(D_k) \) by:

\[ W^\text{nc}_i(D_k) = \begin{cases} \frac{D_k}{T_i} C_i + \omega^{\text{nc}} & \tau_i \in \text{hp}(k) \\ 0 & \tau_i \in \text{lep}(k) \end{cases} \]

where \( \omega^{\text{nc}} = \min(D_k \mod T_i, C_i) \).

- Computing \( W^\text{ci}_i(D_k) \)
  If \( \tau_i \in \text{hp}(k) \), the worst-case workload of \( \tau_i \) is generated when \( r^j_k \) coincides with the starting time of the carry-in job of \( \tau_i \); moving the problem window backwards, the contribution of carry-in cannot increase and the carry-out can only decrease; while advancing the problem window, the carry-in will decrease and the carry-out can increase by at most the same amount. Such a situation is depicted as case (a) in Figure 2. The contribution of the carry-in job is bounded by \( C_i \). Note that the first body job of \( \tau_i \) after the carry-in jobs, is released at time \( r^j_k + C_i + T_i - D_i \). The number of body jobs that contribute to \( \tau_i \)'s workload is \( N_i(D_k) = \left\lfloor \frac{\max(0, D_k - C_i - T_i + D_i)}{T_i} \right\rfloor \). The contribution of the carry-out job can then be bounded by \( \min(C_i, \max(0, D_k - C_i - T_i + D_i) \mod T_i) \).

For \( \tau_i \in \text{lep}(k) \), only the carry in job of \( \tau_i \) starting execution before \( r^j_k \) can contribute to the workload in the problem window.

Thus, we compute \( W^\text{ci}_i(D_k) \) by:

\[ W^\text{ci}_i(D_k) = \begin{cases} (1 + N_i(D_k)) C_i + \omega^{\text{ci}} & \tau_i \in \text{hp}(k) \\ \min(D_k, C_i) & \tau_i \in \text{lep}(k) \end{cases} \]

where

\[ \omega^{\text{ci}} = \min(C_i, \max(0, D_k - C_i - T_i + D_i) \mod T_i) \].

Lemma 2. If tasks are scheduled with a NP-FP scheduling policy on a multicore processor composed of m identical cores, at most \( m \) tasks have carry-in jobs.

Proof. See Lemma 5.2, in [10].

The task set \( \tau \) can be partitioned into two subsets \( \tau^{\text{nc}} \) and \( \tau^{\text{ci}} \) that include tasks with carry-in jobs and tasks without carry-in jobs in the problem window, respectively. According to Lemma 2, \( \tau^{\text{ci}} \) has at most \( m \) tasks. Now we define \( \Omega_k \) as the maximal value of the sum of all tasks' workloads (other
than $\tau_k$’ workload) in the problem window of $\tau_k$ among all possible cases:

$$\Omega_k = \max \sum_{i \neq k} W_i(D_k)$$

where $\tau^nc_i$ and $\tau^{ci}$ satisfy $\tau^nc_i \cup \tau^{ci} = \tau_k \setminus \{\tau_k\}$, $\tau^nc_i \cap \tau^{ci} = \emptyset$ and $|\tau^{ci}| \leq m$.

By taking the maximum over the task set, $\Omega_k$ describes an upper bound on the total worst-case workload (other than $\tau_k$’ workload) in the problem window of $\tau_k$. The complexity to compute $\Omega_k$ is $O(n)$, as explained in [3].

Replacing $C_i$. The computation of $W_i^nc(D_k)$, $W_i^{ci}(D_k)$, $\Omega_k$ depends on $C_i$. We denote the value $\Omega_k$ as $\Omega_k(C)$ when $C_i$ is used in Equation (2.1) and (2.2). Fixing parameters other than $C_i$ in Equation (2.1) and (2.2), $W_i^nc(D_k)$, $W_i^{ci}(D_k)$ and the resulting $\Omega_k$ are non-decreasing with respect to $C_i$. In the following discussion, we will show that the actual execution time of $\tau_i$ including cache interference could be larger than $C_i$. Since cache interference could also contribute to the task workload, we will use $C_i^*$ which is the sum of $C_i$ and the upper bound on cache interference exhibited by $\tau_i$ to replace $C_i$ in Equation (2.1) and (2.2) in order to get the correct upper bound on the worst-case workload. We denote $\Omega_k(C^*)$ as resulting value if $C_i^*$ is used in the computation.

We are now ready to compute an upper bound of $I^pre_k$.

**Lemma 3.** If tasks are scheduled with a NP-FP scheduling policy on a multicore processor composed of $m$ identical cores with shared cache,

$$I^pre_k \leq \frac{\Omega_k(C^*)}{m}$$

**Proof.** Since the scheduling algorithm NP-FP is work-conserving, in the time instants in which a job of $\tau_k$ is ready but not executing, each core must be occupied by a job of another task. As $I^pre_k(\tau^qa^*_k, d^qa^*_k) = 0$, we can exclude the contribution of $\tau_k$ to the interference. So

$$I^pre_k \leq \frac{\sum_{i \neq k} I^pre_k(\tau^qa^*_k, d^qa^*_k)}{m}.$$  

By Lemma 1, the interference that a task $\tau_i$ causes on a task $\tau_k$ in the problem window is bounded its workload,

$$I^pre_k(\tau^qa^*_k, d^qa^*_k) \leq W_i(D_k).$$

So, we have

$$I^pre_k \leq \frac{\sum_{i \neq k} I^pre_k(\tau^qa^*_k, d^qa^*_k)}{m} \leq \frac{\max \sum_{i \neq k} W_i(D_k)}{m} = \frac{\Omega_k(C^*)}{m}$$

**C. Computing an upper bound of $I^{bac}_k$**

We first identify the maximum cache interference between two tasks and then we construct an integer programming formulation to calculate the upper bound on the shared cache interference exhibited by a task within an execution window.

1) *Cache interference between two tasks*: We first analyze the cache interference during one job execution between $\tau_k$ and $\tau_i$. Let $\tau_k$ be the interfered and $\tau_i$ be the interfering task.

Following the approach in [12], we can obtain the WCET of a task by performing a Cache Access Classification (CAC) and Cache Hit/Miss Classification (CHMC) analysis for each instruction memory access at the private caches and the shared LLC cache separately. The CAC determines the possibility that an instruction being fetched from memory will access a certain cache level, and the access to a certain cache level can be *Always (A)*, *Uncertain (U)* or *Never (N)*. CHMC assigns a cache lookup result to each memory reference according to the cache states. As a result, a reference to a memory block of instructions can be classified as *Always Hit (AH)*, *Always Miss (AM)* or *Uncertain (U)*.

Since we consider noninclusive caches, accesses to the private caches cannot be affected by tasks executing on other cores. Accesses classified as *AM* or *U* at the shared LLC cache will also not be affected by shared cache interferences, since they are already counted as misses in the WCET analysis.

We start the cache interference analysis by defining two concepts for cache blocks.

**Definition 2.** A Hit Block (HB) is a memory block whose access is classified as AH at the shared LLC cache.

**Definition 3.** A Conflicting Block (CB) is a memory block whose access is classified as A or U at the shared LLC cache.

We use $HB_k = \{m_{k,1}, m_{k,2}, ..., m_{k,p}\}$ to represent the set of HB for task $\tau_k$ and use $m_{k,x}$ ($x = 1, 2, ..., p$) to denote the number of $m_{k,x}$’s accesses that are classified as an AH at the LLC cache. Similarly, we define $CB_i = \{m_{i,1}, m_{i,2}, ..., m_{i,q}\}$ as the set of CB for task $\tau_i$ and denote $m_{i,x}$ as the number of $m_{i,x}$’s accesses that are classified as an A or U at the LLC cache. Note that $HB_k$ and $CB_i$ includes the memory blocks that meet the requirement in every program path that may be taken by the task.

In our system architecture, cache interference occurs only at the shared LLC cache when a cache line used by $\tau_k$ is evicted by $\tau_i$ and consequently causing reload overhead for $\tau_k$. A cache line that may cause cache interference for $\tau_k$ needs to satisfy at least two conditions:

(i) access to that cache line will result in a cache hit at the LLC cache in WCET analysis of $\tau_k$.

(ii) the cache line may be used by $\tau_i$.

From the above two conditions, we can analyze memory block accessing that may cause interference. The first condition implies that only accessing to $HB_k$ may cause cache interference for $\tau_k$, while the second condition indicates that accessing to $CB_i$ by $\tau_i$ may interfere with $\tau_k$. Furthermore, cache interference occurs only if $\tau_k$ accesses memory blocks in $HB_k$ and $\tau_i$ accesses memory blocks in $CB_i$ concurrently, and those memory blocks have the same cache index.
We use $I_{i,k}^{sc}$ to represent the upper bound on the shared cache interference imposed on $\tau_i$ by only one job execution of $\tau_i$.

Suppose the indices of the LLC cache range from 0 to $N - 1$, we can derive $N$ subsets of $HB_k$ according to the mapping function $idx$ that maps a memory address to the cache line index at the LLC cache as follows,

$$\hat{n}_{k,u} = \{m_{k,x} \in HB_k | idx(m_{k,x}) = u\}, \ (0 \leq u < N, u \in \mathbb{N}).$$

We define the characteristic function of a set $A$ which indicates membership of an element $x$ in $A$ as:

$$\chi_A(x) = \begin{cases} 1 & x \in A \\ 0 & otherwise \end{cases}.$$  

Let $N_{k,u}$ represent the number of hit accesses to the $u$-th cache line by $\tau_i$ without cache interference. $N_{k,u}$ equals to the total number of access to the $HB$s mapping to the $k$-th cache line,

$$N_{k,u} = \sum_{x=1}^{p} n_{k,x} \chi_{\hat{n}_{k,u}}(m_{k,x}).$$

Similarly, we divide $CB_i$ into $N$ subsets by

$$\hat{e}_{i,u} = \{m_{i,x} \in CB_i | idx(m_{i,x}) = u\}, \ (0 \leq u < N, u \in \mathbb{N}).$$

The number of accesses to the $k$-th cache line by $\tau_i$ is bounded by

$$N_{i,u} = \sum_{x=1}^{q} n_{i,x} \chi_{\hat{e}_{i,u}}(m_{i,x}).$$

Cache interference can only happen among memory blocks that are in the same subset that maps to the same cache line. For the $u$-th cache line, $\tau_k$ can be interfered at most $N_{k,u}$ times and $\tau_i$ can interfere at most $N_{i,u}$ times. The following formula gives an upper bound on the number of cache miss by accessing the $HB$s for task $\tau_k$.

$$S(\tau_i, \tau_k) = \sum_{u=0}^{N-1} \min(N_{i,u}, N_{k,u}).$$

Suppose the penalty for an LLC cache miss is a constant, $C_{miss}$, then $I_{i,k}^{sc}$ satisfies:

$$I_{i,k}^{sc} = S(\tau_i, \tau_k) C_{miss}.$$  

The computation only takes the memory accesses of $\tau_k$ and $\tau_i$ as input, so $I_{i,k}^{sc}$ only depends on memory accesses of $\tau_k$ and $\tau_i$.

**Lemma 4.** $I_{i,k}^{sc} = S(\tau_i, \tau_k) C_{miss}$.

**Proof.** The lemma holds as discussed above.  

Given a taskset, $I_{i,k}^{sc}$ can be computed, as shown in the proof of Lemma 4. In the following discussion, we assume $I_{i,k}^{sc}$ is known.

**Lemma 4** gives an upper bound on cache interference for $\tau_k$ imposed by only one job of $\tau_i$. It is possible that more than one jobs of $\tau_i$ interfere with $\tau_k$. We denote the number of jobs of $\tau_i$ that interfere with $\tau_k$ as $N_{i,k}$.

**Lemma 5.** The total cache interference $\tau_k$ exhibited from $N_{i,k}$ jobs of $\tau_i$ is bounded by $N_{i,k} I_{i,k}^{sc}$.

**Proof.** For $N_{i,k}$ jobs of $\tau_i$, the total number of accesses to each memory block $m_{i,x}$ is bounded by $N_{i,k} n_{i,x}$. Thus, the execution of $N_{i,k}$ jobs of $\tau_i$ accesses the $k$-th cache line at most $N_{i,k} n_{i,x}$ times. From the proof of Lemma 4, the upper bound of the total cache interference exhibited by $\tau_k$ from $N_{i,k}$ jobs of $\tau_i$ is $\sum_{u=0}^{N-1} \min(N_{i,k} n_{i,u}, N_{k,u}) C_{miss}$.

$$N_{i,k} I_{i,k}^{sc} = N_{i,k} \sum_{u=0}^{N-1} \min(N_{i,u}, N_{k,u}) C_{miss} = \sum_{u=0}^{N-1} \min(N_{i,k} n_{i,u}, N_{k,u}) C_{miss} \geq \sum_{u=0}^{N-1} \min(N_{i,k} n_{i,u}, N_{k,u}) C_{miss} = \sum_{u=0}^{N-1} \min(N_{i,k} n_{i,u}, N_{k,u}) C_{miss} \geq \sum_{u=0}^{N-1} \min(N_{i,k} n_{i,u}, N_{k,u}) C_{miss}.$$  

2) IP formulation: We can compute an upper bound of the maximum cache interference a task may exhibit during an execution window by introducing an Integer Programming (IP) formulation, which can be transformed to an integer linear programming formulation.

It is necessary to check the schedulability of the task-set without considering cache interference. If the task-set does not pass the initial schedulability test, there is no need to calculate the cache interference. Only if all tasks (including $\tau_k$) pass the schedulability test (without considering cache interference), the IP is solved to compute the upper bound on cache interference. Therefore, the IP formulation is based on the assumption that $\tau_k$ is schedulable without cache interference.

If $N_{i,k}$ jobs of $\tau_i$ executing concurrently with $\tau_k$, the cache interference that $\tau_i$ causes on $\tau_k$ is bounded by $N_{i,k} I_{i,k}^{sc}$ according to Lemma 5. As a task may exhibit cache interference from more than one task during a job execution, the total cache interference for one job execution of $\tau_k$ is bounded by the sum of the contributions of all other tasks $\tau_j (j \neq k)$ in the task set $\tau$. Thus, the objective function of the IP formulation is:

$$\max \sum_{i \neq k} N_{i,k} I_{i,k}^{sc}. \quad (2.4)$$

The IP formulation will have an unbounded solution without further constraints to the variable $N_{i,k}$. To get a bounded solution, we analyze the constraints on $N_{i,k}$. First, we define the concept of the execution window of a job.

**Definition 4.** The Execution Window (EW) of the $j$-th job of $\tau_k$ ($J_k^j$) is time interval $[s_k^j, f_k^j]$ from the staring time to the finishing time of $J_k^j$.

Note that the length of an execution window may be larger than $C_k$, since the EW includes the cache interference. We use $C_k^j$ as the length of the EW because of the iterative computation which will be described later on.
\( N_{i,k} \) reaches its minimal value when a job of \( \tau_i \) starts to execute as soon as it is released and the execution finishes just before the start of the \( EW \), as shown the case (a) in Figure 3. Denoting \( C_{i}^{\text{min}} \) as the smallest execution time of \( \tau_i \), often called Best-Case Execution Time (BCET), we have the following constraint:

\[
\forall i \neq k, \quad \left[ \max(0, C'_{k} - T_i + C_{i}^{\text{min}}) \right] / T_i + \xi_i \leq N_{i,k} \tag{2.5}
\]

where \( \xi_i \) = \begin{cases} 1 & \text{if } ((C'_{k} + C_{i}^{\text{min}}) \mod T_i) - D_i + C_{i}^{\text{min}} > 0 \\ 0 & \text{otherwise} \end{cases}

The term \( \xi_i \) indicates whether the last job of \( \tau_i \) released within the \( EW \) that interferes with \( \tau_k \) since the last released job should start its execution \( C_{i}^{\text{min}} \) before its relative deadline if the task is schedulable.

\[ \tau_i : (a) \]

\[ \tau_i : (b) \]

\[ \tau_k \]

\[ \text{Execution window: } C_{i}^{\text{min}} \]

Figure 3: Situations where \( \tau_i \) interferes \( \tau_k \) with the most and least number of jobs.

The maximum value of \( N_{i,k} \) is taken when the first interfering job of \( \tau_i \) finishes just after the start of the \( EW \) and the last interfering job of \( \tau_i \) starts to execute at the time when it is released. Such a situation is depicted as case (b) in Figure 3. Thus, we have the second constraint on \( N_{i,k} \):

\[ \forall i \neq k, \quad N_{i,k} \leq 1 + \left[ \max(0, C'_{k} - T_i + D_i) \right] / T_i \tag{2.6} \]

If \( N_{i,k} > 2 \), the first and last interfering jobs of \( \tau_i \) may occupy almost 0 computation capacity in the \( EW \). Let \( J_i^2 \) be such a job among the remaining \( N_{i,k} - 2 \) interfering jobs of \( \tau_i \) between the first and the last ones. Both release time \( r_i^2 \) and deadline \( d_i^2 \) of \( J_i^2 \) are within the \( EW \) of \( \tau_k \).

**Lemma 6.** If \( \tau_i \) is schedulable without considering cache interference, \( C_i \) computation capacity of the processing core is reserved for the execution of \( J_i^2 \) during \([r_i^2, d_i^2]\). If \( J_i^2 \) executes for \( C_{i}^{\text{act}} < C_i \), the processing core will be accumulatively idle (executing nothing, simply wasting the processing capacity for \( \tau_i \)) for at least \( C_i - C_{i}^{\text{act}} \) during \([r_i^2, d_i^2]\).

**Proof.** If \( \tau_i \) satisfies the schedulability condition without considering cache interference (shown in Pseudocode 1): \( \frac{N_i(C_i)}{m} + C_i < D_i \), the core on which \( J_i^2 \) is executed spends at most \( D_i - C_i \) in total for the execution of other interfering tasks during \([r_i^2, d_i^2]\). \( J_i^2 \) is guaranteed to have \( C_i \) computation capacity during \([r_i^2, d_i^2]\).

The remaining computation capacity of a multicore processor with \( m \) cores is \( (m - 1)C_k' \) since one core is dedicated to the execution of \( \tau_k \). Due to the limited computation capacity of the processor, the total execution of the tasks that may interfere with \( \tau_k \) within the \( EW \) can not exceed \( (m - 1)C_k' \). Hence, we have the third constraint:

\[
\sum_{i \neq k} \max(0, N_{i,k} - 2) C_i \leq (m - 1) C_k' \tag{2.7}
\]

The objective function (2.4) together with three constraints on \( N_{i,k} \) i.e., inequalities (2.5), (2.6) and (2.7) form our IP problem. Since \( C_{i}^{\text{min}} \) is a relatively small number, we take the extreme case: \( C_i^{\text{min}} = 0 \). As task parameters such as \( C_i, D_i, T_i \) is known, the optimal solution of the IP only depends on the length of \( EW \). Thus, we use \( I^{\text{sc}}(C_k') \) to denote the optimal value of the IP problem if \( C_k' \) is used as the length of the \( EW \) in the IP.

Note that Inequalities (2.5) and (2.7) are based on the assumption that \( \tau_i \) is schedulable. Thus, before solving the IP, we have to check the schedulability of the taskset assuming no cache interference between tasks, i.e., \( I^{\text{sc}} = 0 \).

**Computation complexity of the IP.** The original IP can be easily transformed to an Integer Linear Programming (ILP) problem by introducing a new integer variable \( y_{i,j} \) for each \( i,j \) with two additional constraints: \( y_{i,j} \geq 0 \) and \( y_{i,j} \geq N_{i,k} - 2 \). Inequality (2.7) can be replaced by \( \sum_{i \neq k} y_{i,k} C_i \leq (m - 1) C_k' \). In the transformed ILP problem, we have totally \( 2(n - 1) \) variables and \( 4(n - 1) + 1 \) constraints. The complexity of the IP is the same as the complexity of solving the transformed ILP problem, which is \( O(4n6^{4n} \ln 4n) \) [7].

**V. ITERATIVE COMPUTATION**

Due to the presence of cache interference, a job may execute longer than \( C_k \) on a multicore platform with shared caches. However, a larger execution time may introduce more cache interference, as illustrated in Figure 4.

In Figure 4 (a), if the job of \( \tau_k \) executes for \( C_k' \), only one job of \( \tau_i \) interferes with \( \tau_k \). In Figure 4 (b), if the job of \( \tau_k \) executes for a larger execution time, say \( C_k' + I^{\text{sc}}(C_k') \), two jobs of \( \tau_i \) could possibly interfere with \( \tau_k \), which potentially may increase the cache interference exhibited by \( \tau_k \). This example suggests an iterative method to find an upper bound on the cache interference.

\[ \tau_i \]

\[ \tau_k \]

\[ \text{Execution window: } C_{i}^{\text{min}} \]

Figure 4: More cache interference if \( \tau_k \) executes for a longer time.
Lemma 7. $I^{sc}(C^*_k)$ is non-decreasing with respect to $C^*_k$

Lemma 7 is explained by the above example.

We give a sufficient condition for a certain value that can be used as an upper bound on cache interference.

Lemma 8. If $\exists C^*_k \geq C_k$ such that $C^*_k = C_k + I^{sc}(C^*_k)$, then $I^{sc}(C^*_k)$ is the upper bound on cache interference exhibited by $\tau_k$.

**Proof.** If $C^*_k = C_k + I^{sc}(C^*_k)$, then $I^{sc}(C^*_k) = I^{sc}(C_k + I^{sc}(C^*_k))$. According to Lemma 7, given an execution window of $\tau_k$ that is no more than $C_k + I^{sc}(C^*_k)$, the cache interference exhibited by $\tau_k$ is not larger than $I^{sc}(C^*_k)$. Therefore, $I^{sc}(C^*_k)$ is the upper bound on cache interference for $\tau_k$. \qed

We now derive the iterative algorithm, called $CacheInterference(\tau)$ which takes taskset $\tau$ as input, to compute an upper bound on cache interference for each task $\tau_k \in \tau$:

- Since the constraints of IP assume the taskset is schedulable, we first check the schedulability of the taskset assuming no cache interference between each task. Only if all tasks pass schedulability test, the following steps will be taken.
- $C^*_k$ is initialized with $C_k$ and an upper bound value on the cache interference $I^{sc}(C^*_k)$ is created which is initially set to zero.
- By solving the IP, we compute a new upper bound of the cache interference $I^{sc}(C^*_k)$.

If the new upper bound of cache interference is the same as the old upper bound, the $I^{sc}(C^*_k)$ is the final upper bound of $\tau_k$. Otherwise, another round of computing the upper bound on cache interference is performed using the upper bound derived at the previous iteration. The iteration for $\tau_k$ stops either if no update on $I^{sc}(C^*_k)$ is possible anymore or if the computed $I^{sc}(C^*_k)$ is large enough to make $\tau_k$ unschedulable.

- The previous steps are repeated for every task in $\tau$.

A more formal version of the $CacheInterference(\tau, m)$ algorithm is given by Pseudocode 1. The algorithm returns $I^*$ which includes the upper bounds on cache interference $I^{sc}(C^*_k)$ for each task $\tau_k$ and $C^*$ which includes the upper bounds on the execution length $C^*_k$ for each $\tau_k$. If $I^*$ and $C^*$ are empty, the taskset is not schedulable. Since the solution of the IP is non-decreasing with respect to $C^*_k$ according to Lemma 7 and one termination condition is $C^*_k \geq D_k$, the termination of the algorithm is guaranteed.

We propose the following Theorem to check the schedulability of the task set.

**Theorem 3.** A task set $\tau$ is schedulable with the NP-FP scheduling policy on a multicore platform composed of $m$ identical cores with shared caches if for each task $\tau_k \in \tau$

1. $\exists C^*_k \geq C_k$ such that $C^*_k = C_k + I^{sc}(C^*_k)$.
2. $\frac{\Omega_k(C^*)}{m} + C^*_k < D_k$.

Proof. From (1), $I^{sc}(C^*_k)$ is the upper bound on cache interference exhibited by $\tau_k$ according to Lemma 8. So, $I^{sc}(C^*_k) \geq I^{sc}$.

From Lemma 3, $\Omega_k(C^*_k) \geq \hat{I}^{pre}$.

If $\frac{\Omega_k(C^*)}{m} + C_k = \frac{\Omega_k(C^*_k)}{m} + C_k + I^{sc}(C^*_k) < D_k$ then $I^* \setminus C_k + I^{sc}(C^*_k) < D_k$. Theorem 3 follows from Theorem 2. \qed

Finally, we give the procedure $CheckSchedulability(\tau, m)$ to perform the schedulability test, as illustrated by Pseudocode 2.

**Pseudocode 2: CheckSchedulability(\tau, m)**

1: Input: Task parameters, number of cores: $m$
2: $I^* \leftarrow$ empty list, used to store $I^{sc}(C^*_k)$ for each task
3: $C^* \leftarrow$ empty list, used to store $C^*_k$ for each task
4: pass $\leftarrow$ true
5: for all $\tau_k \in \tau$ do
6: $\Omega_k(C^*) \leftarrow$ calculation of Equation (2.3) using $C^*$
7: if $\Omega_k(C^*) + C_k \geq D_k$ then
8: pass $\leftarrow$ false
9: break
10: end if
11: end for
12: if pass then
13: for all $\tau_k \in \tau$ do
14: update $\leftarrow$ true, $I^{old} \leftarrow 0, I^{new} \leftarrow 0$
15: $C^*_k \leftarrow C_k$
16: while update do
17: $I^{old} \leftarrow I^{new}$
18: $I^{new} \leftarrow$ Solution of IP with $C^*_k$ as the EW
19: $C^*_k = C_k + I^{new}$
20: if $I^{new} == I^{old}$ or $C^*_k \geq D_k$ then
21: update $\leftarrow$ false
22: end if
23: end while
24: Add $I^{new}$ to $I^*$
25: Add $C^*_k$ to $C^*$
26: end for
27: end if
28: return $I^*, C^*$

Proof. From (1), $I^{sc}(C^*_k)$ is the upper bound on cache interference exhibited by $\tau_k$ according to Lemma 8. So, $I^{sc}(C^*_k) \geq I^{sc}$.

From Lemma 3, $\Omega_k(C^*) \geq \hat{I}^{pre}$.

If $\frac{\Omega_k(C^*)}{m} + C_k = \frac{\Omega_k(C^*_k)}{m} + C_k + I^{sc}(C^*_k) < D_k$ then $I^* \setminus C_k + I^{sc}(C^*_k) < D_k$. Theorem 3 follows from Theorem 2. \qed

Pseudocode 1: CacheInterference(\tau, m)

1: Input: Task parameters, number of cores: $m$
2: $I^* \leftarrow$ empty list, used to store $I^{sc}(C^*_k)$ for each task
3: $C^* \leftarrow$ empty list, used to store $C^*_k$ for each task
4: pass $\leftarrow$ true
5: for all $\tau_k \in \tau$ do
6: $\Omega_k(C^*) \leftarrow$ calculation of Equation (2.3) using $C^*$
7: if $\Omega_k(C^*) + C_k \geq D_k$ then
8: pass $\leftarrow$ false
9: break
10: end if
11: end for
12: if pass then
13: for all $\tau_k \in \tau$ do
14: update $\leftarrow$ true, $I^{old} \leftarrow 0, I^{new} \leftarrow 0$
15: $C^*_k \leftarrow C_k$
16: while update do
17: $I^{old} \leftarrow I^{new}$
18: $I^{new} \leftarrow$ Solution of IP with $C^*_k$ as the EW
19: $C^*_k = C_k + I^{new}$
20: if $I^{new} == I^{old}$ or $C^*_k \geq D_k$ then
21: update $\leftarrow$ false
22: end if
23: end while
24: Add $I^{new}$ to $I^*$
25: Add $C^*_k$ to $C^*$
26: end for
27: end if
28: return $I^*, C^*$
Computational complexity: Let \( n \) represent the number of tasks in the task-set. For \( \tau_k \), let \( I^{\min}_{k} \) be the smallest difference between cache interference caused by one job of \( \tau_i \) and \( \tau_j \), i.e. \( I^{\min}_{k} = \min(I_{i,k}^{sc} - I_{j,k}^{sc}) \), the while loop in \( \text{CacheInterference}(\tau,m) \) takes at most \( \gamma = \max_{k}(D_k - C_k) \) times since \( C'_k \) either stays the same or increases at least with \( I^{\min}_{k} \) in each iteration. Thus, the complexity of \( \text{CacheInterference}(\tau) \) is \( O(\gamma n^2) \). The computational complexity of \( \Omega_k(C^*) \) is \( O(n) \). Therefore, the complexity of \( \text{CheckSchedulability}(\tau,m) \) is \( O(\gamma n^4) \).

VI. Experiments

In this section, we evaluate the performance of the proposed schedulability test in terms of acceptance ratio. More specifically, we will quantify the effects of cache interference on the schedulability of the generated tasksets.

The experiments have been performed varying i) the number of cores \( m \) (\( m = 2, 4, \) or 8), ii) the number of tasks \( n \) (\( n = 10, 20, \) or 30) in the taskset, iii) total task utilization \( U_{\text{tot}} \) (\( U_{\text{tot}} \) from 0 to \( m \) with steps of 0.2), iv) the cache interference factor \( IF \) (\( IF = 0, 0.3, 0.6 \) or 0.9), and v) the probability of two tasks having cache interference on each other: \( P \) (\( P = 0.1, 0.2, 0.3 \) or 0.4). Given these five parameters, we have generated 200000 tasksets in each experiment. As the task generation policies may significantly affect experimental results, we give the policies used in the experiments as follows.

Task utilization generation policy. We use Randfixed-sum [18] to generate vectors that consist of \( N \) elements and whose components sum to the \( U_{\text{tot}} \). Each element in the vector is assigned an individual task utilization \( U_k \) in the taskset.

Task period and WCET generation policy. For each task \( \tau_k \), \( T_k \) is uniformly distributed over the interval [100000, 200000]. The WCET of \( \tau_k \) is derived by \( C_k = T_k \times U_k \). We consider an implicit deadline task system, which implies that \( D_i = T_i \).

Cache interference generation policy. The probability of two task having cache interference is \( P \). If two tasks \( \tau_k \) and \( \tau_i \) interfere with each other, \( I^{sc}_{i,k} \) is generated as \( I^{sc}_{i,k} = IF \times \min(0.5C_k, 0.5C_i) \).

In each experiment, we measure the number of schedulable tasksets that pass the proposed schedulability test. The acceptance ratio is the number of schedulable tasksets devied by the total number of tasksets (200000).

Figure 5 shows the acceptance ratio for the case \( IF = 0, 0.3, 0.6, 0.9 \), when fixing \( m = 8, n = 10, P = 0.1 \). The red line with \( IF = 0 \) represents the acceptance ratio when tasks have no cache interference. Evidently, the acceptance ratios with a lower \( IF \) are better than those with a larger \( IF \). As we increase \( IF \) with the same amount, the average acceptance ratio decreases in a slower fashion. However, it does not indicate that a lower bound on the average acceptance ratio is possible since the cache interference gets larger as \( IF \) increases, eventually making the interfered tasks unschedulable.

Figure 6 compares the acceptance ratio with different \( P \), fixing \( m = 8, n = 10, IF = 0.3 \). With the same \( U_{\text{tot}} \), the acceptance ratio decreases as \( P \) increases because a larger \( P \) indicates more tasks in the taskset could interfere with each other, which may potentially increase the upper bound on cache interference for each task.

Figure 5: Acceptance ratio with different IF when \( m = 8, n = 10, P = 0.1 \).

Figure 6: Acceptance ratio with different \( P \) when \( m = 8, n = 10, IF = 0.3 \).

Figure 7 illustrates the acceptance ratio with respect to the number of cores. The acceptance ratio for task having no cache interference is also plotted in Figure 7. Instead of using \( U_{\text{tot}} \) as horizontal axis, we scale the horizontal axis with \( U_{\text{tot}} \times \frac{m}{8} \) for \( m = 2, 4, 8 \). It is worth noting that an execution platform with fewer cores is more efficient in terms of acceptance ratio than those with more cores. However, for processors with different cores, the difference in the acceptance ratio between the baseline (tasks having no cache interference, \( IF = 0 \)) and tasks having cache interference is almost similar.

A set of experiments are performed to investigate the impact of the number of tasks in the taskset on the acceptance ratio. Figure 8 shows the acceptance ratio for different \( n \) in the taskset. It is interesting to note that when \( U_{\text{tot}} \) is less than 2, the acceptance ratio of tasksets with less tasks is slightly worse than those with more tasks. When \( U_{\text{tot}} \) is very small, \( U_k \) and \( C_k \) in a taskset with more tasks are on average smaller than those with more tasks, thus \( I^{sc}_{i,k} \) is also smaller. While as \( U_{\text{tot}} \) increases, the acceptance ratio for tasksets with fewer tasks becomes better than those with more tasks. This may be due to the fact that more tasks in the taskset results in more tasks having cache interference as \( P \) is fixed.

In order to compute the average running time of the proposed
schedulability test with different task-set scales, we measured the execution time of the schedulability test for the task-sets used in the previous experiment. The executions are conducted on a server with an 48-core AMD processor (2.1GHz). On average, it takes 0.2026 seconds to check the schedulability of the task-set consisting of 10 tasks, 0.4925 seconds for task-set with 20 tasks, while 1.0117 seconds for task-set with 30 tasks.

VII. CONCLUSIONS

In this paper, we developed a new schedulability analysis of non-preemptive real-time scheduling for multicore processors with shared caches. We constructed an integer programming formulation that can be transformed to an integer linear programming formulation to calculate the upper bound on the shared cache interference a task may exhibit during one job execution. We derive a new schedulability condition by integrating the upper bound on the cache interference into the schedulability analysis. A set of experiments has been performed using our proposed schedulability analysis to demonstrate the effects of cache interference for a range of different tasks sets. As for future work, we plan to extend our schedulability analysis to real-time multicore systems with shared caches that use preemptive task scheduling.

REFERENCES

[7] K. L. Clarkson. Las vegas algorithms for linear and integer programming when the dimension is small.