Progress on Static Probabilistic Timing Analysis for Systems with Random Cache Replacement Policies
Altmeyer, S.J.; Cucu-Grosjean, L.; Davis, R.I.; Lesage, B.

Published in:

Citation for published version (APA):
Progress on static probabilistic timing analysis for systems with random cache replacement policies

Sebastian Altmeyer  Liliana Cucu-Grosjean  Robert I. Davis  Benjamin Lesage
University of Amsterdam, altmeyer@uva.nl  INRIA Paris-Rocquencourt liliana.cucu@inria.fr  University of York, rob.davis@york.ac.uk  University of York, benjamin.lesage@york.ac.uk

I. ORIGINAL PROBLEM STATEMENT

Real-time systems such as those deployed in space, aerospace, automotive and railway applications require guarantees that the probability of the system failing to meet its timing constraints is below an acceptable threshold (e.g. a failure rate of less than $10^{-9}$ per hour). Advances in hardware technology and the large gap between processor and memory speeds, bridged by the use of cache, make it difficult to provide such guarantees without significant over-provisioning of hardware resources. The use of deterministic cache replacement policies means that pathological worst-case behaviours need to be accounted for, even when in practice they may have a vanishingly small probability of actually occurring. The use of cache with random replacement policies [3] can negate the effects of pathological worst-case behaviours while still achieving efficient average-case performance, hence providing a way of increasing guaranteed performance in hard real-time systems.

The timing behaviour of programs running on a processor with a random cache replacement policy can be determined using Static Probabilistic Timing Analysis (SPTA). SPTA computes an upper bound on the probabilistic Worst-Case Execution Time (pWCET) in terms of an exceedence function, which gives the probability, as a function of all possible values for an execution time budget $x$, that the execution time of the program will not exceed that budget on any single run. SPTA [5] requires a probability function that can be used to compute an estimate of the probability of a cache hit for each memory access. This probability function is valid if it provides a lower bound on the probability of a cache hit. As shown last year at RTSOPS 2013 [4], the only valid cache-hit probability known by then is given as follows:

$$P^D(k) = \begin{cases} \left(\frac{N-k}{N}\right)^k & N > k \\ 0 & \text{otherwise} \end{cases}$$

where $N$ denotes the associativity of the cache and $k$ the reuse distance, i.e., the number of intervening memory accesses that could cause an eviction, since the memory block was last accessed. All other estimations of the hit-probability [7, 6] that had been proposed by then have been refuted as they may lead to optimistic results. The complexity of deriving a sound estimate of the hit-probability is caused by the dependency of the current event of a cache hit or miss on the history of prior events; caused by the finite size of the cache. In Equation (1), this dependency is accounted for by setting the probability of a cache hit to zero in cases where the reuse distance exceeds the associativity; which results in a large over-approximation even for simple access sequences. The open problem presented in last year’s RTSOPS [4] was thus: how to improve upon the simple SPTA analysis?

II. CORRECTNESS CONDITIONS AND OPTIMALITY

Instead of immediately answering the open problem, we tried to learn from the failed approaches to improve upon Equation (1) and identified the correctness conditions [2] that any sound approximation of the cache-hit probability must fulfill. Sound in this context means that for any sequence of cache accesses $e_1, \ldots, e_n$, the approximation $\hat{P}$ complies with two constraints: (C1) it does not over-estimate the probability of a cache hit, and (C2) it gives zero probability of a cache hit for these elements in the subset are a hit, is at most the precise probability of such an event occurring:

$$\begin{align*}
&C1 \quad \forall e \in \{e_1, \ldots, e_n\} : \hat{P}(e_{hit}) \geq P(e_{hit}), \\
&C2 \quad \forall E \subseteq \{e_1, \ldots, e_n\} : \hat{P}\left(\bigwedge_{e \in E} e_{hit}\right) \geq \prod_{e \in E} \hat{P}(e_{hit}).
\end{align*}$$

Using these soundness conditions, we have been able to clearly identify why former approaches [7, 6] failed and we have been able to show that Equation (1) is not only correct, but also optimal with respect to the limited information it uses: any cache-hit probability that only uses the associativity and the reuse distance is either at most as precise as Equation (1) or optimistic. Due to space limitation, we refer to [1] for the proof of optimality.

III. USING OTHER INFORMATION

The negative result that we can not improve the existing cache-hit probability by using the same information also gives the key to providing better bounds: we have to include additional information which is not yet taken into account.

A. Stack Distance

$\hat{P}^D(k)$ can be pessimistic in the commonly observed case of sequences with repeated accesses (e.g. loops). For example, the trace $a, b, c, d, c, d, c, d, a, b$ repeats the accesses $c, d$ three times within the reuse distance of the final accesses to $a$ and $b$. Assuming an associativity of 4, then $\hat{P}^D(k)$ gives zero probability of a cache hit for these accesses, since their reuse distance exceeds the associativity of the cache. However, it is possible for the cache to contain all four distinct memory blocks $a, b, c, d$ accessed in this sequence, and so a zero value for the probability of a cache hit for the final accesses to $a$ and $b$ is pessimistic.

Let $\Delta$ be the stack distance of element $e_i$, i.e., the total number of pair-wise distinct memory blocks that are accessed within the reuse distance $k$ of element $e_i$. The maximum number of distinct cache locations loaded during the reuse...
distance of \( e_l \) is upper bounded by \( \Delta \), hence it follows that a lower bound on the probability that \( e_l \) will survive all of the loads and remain in the cache is given by:

\[
\hat{P}(\Delta, k) = \begin{cases} \frac{N-k}{N} & (N > \Delta) \land (k \neq \infty) \\ 0 & \text{otherwise} \end{cases}
\]

We note that \( \hat{P}(\Delta, k) \) and \( \hat{P}(k) \) are incomparable, yet both give valid lower bounds on the probability of a cache hit. We thus may use the maximum of them to compute an improved lower bound that dominates each individually.

B. Cache Contention

Equation (1) and Equation (2) both provide a tight lower bound on the probability of a cache hit, but are imprecise even for simple access sequences. If we consider for instance a random cache with associativity 4 and the following access sequence, \( a, b, c, d, f, a^4, b^4, c^4, d^4, f^4 \) all accesses are considered cache misses. The reason for this is that for each of the last five accesses, the probability of a cache hit is set to 0 to ensure correctess with respect to condition C2, i.e., that the probability of the last five access all being hits is zero. However, this can also be ensured by considering the probability of a cache hit for the preceding accesses. To this end, we define the concept of the cache contention of a memory block \( e_l \) which denotes the number of memory accesses within the reuse distance of \( e_l \) that potentially contend with \( e_l \) for space in the cache. We only need to set the probability of a cache hit for an access \( e_l \) to zero when the cache contention is greater than or equal to the associativity \( N \):

\[
\hat{P}(e^m_l) = \begin{cases} 0 & \text{con}(e_l, T) \geq N \\ \max\{\hat{P}(\Delta, k), \frac{N-k}{N}\} & \text{otherwise} \end{cases}
\]

Conceptually, the cache contention assumes that each access within the reuse distance of \( e_l \) that has been assigned non-zero probability of being a hit as requiring its own separate location in the cache. Due to space limitation, we refer to [1] for the exact definition of the cache contention.

IV. Collecting Semantics and Combined Approach

An orthogonal approach to compute the pWCET is to enumerate all possible cache states and the associated probabilities. As this solution is computationally intractable, we have developed a combined approach with scalable precision: The idea is to use the precise approach for a small subset of relevant memory blocks, while using the imprecise approach for the remaining blocks. So, instead of enumerating all possible cache states, we abstract the set of cache states and focus only on the most important memory blocks, where \( m \) can be chosen to control both the precision and the runtime of the analysis. In this way, we effectively reduce the complexity of the precise component of the analysis for a trace with \( l \) distinct elements from \( 2^l \) to \( 2^m \) (typically with \( m \ll l \)). We again refer to [2] for the details of this approach.

V. Open Problems and Future Work

This progress report presents the solutions to one of last year’s open problems: how to improve upon the simple SPTA analysis? A first negative result, namely that the original hit probability can not be improved without additional information, has led us towards (i) the discovery of alternative approaches to bound cache-hit probability that rely on additional information such as the stack distance and the cache contention and (ii) the development of an orthogonal approach that relies on complete, or partial enumeration of the cache contents. As according to George Bernard Shaw science never solves a problem without creating ten more, the recent advancements lead to new, open problems. Foremost, how to extend the analysis to control-flow graphs and how to select the relevant memory blocks for the combined approach.

Acknowledgements

This work was funded by COST Action IC1202 (TACLe), and the EU FP7 Integrated Project PROXIMA (611085).

References