On the operating unit size of load/store architectures

Bergstra, J.A.; Middelburg, C.A.

DOI
10.1017/S0960129509990314

Publication date
2010

Document Version
Final published version

Published in
Mathematical Structures in Computer Science

Citation for published version (APA):
On the operating unit size of load/store architectures†

J. A. BERGSTRA and C. A. MIDDELBURG

Informatics Institute, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, the Netherlands
Email: {J.A.Bergstra,C.A.Middelburg}@uva.nl

Received 3 August 2007; revised 13 January 2009

We introduce a strict version of the concept of a load/store instruction set architecture in the setting of Maurer machines. We take the view that transformations on the states of a Maurer machine are achieved by applying threads as considered in thread algebra to the Maurer machine. We study how the transformations on the states of the main memory of a strict load/store instruction set architecture that can be achieved by applying threads depend on the operating unit size, the cardinality of the instruction set and the maximal number of states of the threads.

1. Introduction

In Bergstra and Middelburg (2007), we introduced Maurer machines, which are based on the model for computers proposed in Maurer (1966), and extended basic thread algebra, which was introduced in Bergstra and Loots (2002) under the name of basic polarised process algebra, with operators for applying threads to Maurer machines. Threads can be looked upon as the behaviours of deterministic sequential programs as run on a machine. By applying threads to a Maurer machine, transformations on the states of the Maurer machine are achieved. In Bergstra and Middelburg (2008), we proposed a strict version of the concept of a load/store instruction set architecture for theoretical work relevant to the design of micro-architectures (architectures of micro-processors). We described the concept in the setting of Maurer machines. The idea underlying it is that there is a main memory whose elements contain data, an operating unit with a small internal memory through which data can be manipulated, and an interface between the main memory and the operating unit for data transfer between them. The bit size of the operating unit memory of a load/store instruction set architecture is called its operating unit size.

In this paper, we study how the transformations on the states of the main memory of a strict load/store instruction set architecture that can be achieved by applying threads to it depend on the operating unit size, the cardinality of the instruction set

† This research was carried out in part in the framework of the GLANCE-project MICROGRIDS, which is funded by the Netherlands Organisation for Scientific Research (NWO).
and the maximal number of states of the threads. The motivation for this work is our assumption that load/store instruction set architectures impose restrictions on the expressiveness of computers, evidence for which is produced in the paper. To assist in the presentation of certain results, we introduce the concept of a thread powered function class. The idea underlying this concept is that the transformations on the main memory of strict load/store instruction set architectures achievable by applying threads to them are primarily determined by the address width, the word length, the operating unit size and the cardinality of their instruction set, and the number of states of the threads that can be applied to them.

We choose to use Maurer machines and basic thread algebra to study issues relevant to the design of instruction set architectures. Maurer machines are based on the view that a computer has a memory, the contents of all memory elements make up the state of the computer, the computer processes instructions and the processing of an instruction amounts to performing an operation on the state of the computer that results in changes of the contents of certain memory elements. The design of instruction set architectures must deal with these aspects of real computers. Turing machines and the other kinds of machines used in theoretical computer science (see, for example, Hopcroft et al. (2001)) abstract from these aspects of real computers. Basic thread algebra is a form of process algebra. Well-known process algebras, such as ACP (Baeten and Weijland 1990), CCS (Milner 1989) and CSP (Hoare 1985), are too general for our purposes, viz. modelling deterministic sequential processes that interact with a machine. Basic thread algebra was designed as an algebra of processes with this end in mind. We showed in Bergstra and Middelburg (2006) that the processes considered in basic thread algebra can be viewed as processes that are definable over ACP. However, modelling and analysing them is rather difficult using such a general process algebra.

The structure of this paper is as follows. First, we review basic thread algebra (Section 2), Maurer machines (Section 3) and the operators for applying threads to Maurer machines (Section 4). In Section 5, we introduce the concept of a strict load/store Maurer instruction set architecture. In Section 6, we study the consequences of reducing the operating unit size of a strict load/store Maurer instruction set architecture. In Section 7, we give conditions under which all possible transformations on the states of the main memory of a strict load/store Maurer ISA with a certain address width and word length can be achieved by applying a thread to such a strict load/store Maurer ISA. In Section 8, we give a condition under which not all possible transformations can be achieved. Finally, Section 9 presents some concluding remarks.

2. Basic thread algebra

In this section, we review BTA (Basic Thread Algebra). This is a form of process algebra, which was first presented in Bergstra and Loots (2002) under the name BPPA (Basic Polarised Process Algebra), that is tailored to the description of the behaviour of deterministic sequential programs under execution. These behaviours are called threads.
In BTA, it is assumed that there is a fixed but arbitrary set of basic actions $\mathcal{A}$. BTA has the following constants and operators:

— the deadlock constant $D$;
— the termination constant $S$;
— for each $a \in \mathcal{A}$, a binary postconditional composition operator $\preceq a \succeq$.

We use infix notation for postconditional composition. We introduce action prefixing as an abbreviation: $a \circ p$, where $p$ is a term of BTA, abbreviates $p \preceq a \succeq p$.

The intuition is that each basic action performed by a thread is taken as a command to be processed by the execution environment of the thread. The processing of a command may involve a change of state of the execution environment. At completion of the processing of the command, the execution environment produces a reply. This reply is either $T$ or $F$ and is returned to the thread concerned. Let $p$ and $q$ be closed terms of BTA. Then $p \preceq a \succeq q$ will perform action $a$, and after that proceed as $p$ if the processing of $a$ leads to the reply $T$ (called a positive reply) and proceed as $q$ if the processing of $a$ leads to the reply $F$ (called a negative reply).

Each closed term of BTA denotes a finite thread, that is, a thread where the length of the sequences of actions that it can perform is bounded. Guarded recursive specifications can give rise to infinite threads.

A guarded recursive specification over BTA is a set of recursion equations $E = \{X = t_X \mid X \in V\}$, where $V$ is a set of variables and each $t_X$ is a term of the form $D$, $S$ or $t \preceq a \succeq t'$ with $t$ and $t'$ terms of BTA that contain only variables from $V$. We write $V(E)$ for the set of all variables that occur on the left-hand side of an equation in $E$. We are only interested in models of BTA in which guarded recursive specifications have unique solutions, such as the projective limit model of BTA presented in Bergstra and Bethke (2003).

We extend BTA with guarded recursion by adding constants for solutions of guarded recursive specifications and axioms concerning these additional constants. For each guarded recursive specification $E$ and each $X \in V(E)$, we add to the constants of BTA a constant standing for the unique solution of $E$ for $X$, which we denote by $\langle X \mid E \rangle$. We also add the axioms for guarded recursion given in Table 1 to BTA, where we write $\langle t_X \mid E \rangle$ for $t_X$ with, for all $Y \in V(E)$, all occurrences of $Y$ in $t_X$ replaced by $\langle Y \mid E \rangle$. In Table 1, $X$, $t_X$ and $E$ stand for an arbitrary variable, an arbitrary term of BTA and an arbitrary guarded recursive specification, respectively. Side conditions are added to restrict the variables, terms and guarded recursive specifications for which $X$, $t_X$ and $E$ stand. The additional axioms for guarded recursion are known as the recursive definition principle (RDP) and the recursive specification principle (RSP). The equations $\langle X \mid E \rangle = \langle t_X \mid E \rangle$ for a fixed $E$ express the fact that the constants $\langle X \mid E \rangle$ make up a solution of $E$. The conditional equations $E \Rightarrow X = \langle X \mid E \rangle$ express the fact that this solution is the only one.

| $\langle X \mid E \rangle = \langle t_X \mid E \rangle$ if $X = t_X \in E$ | RDP |
| $E \Rightarrow X = \langle X \mid E \rangle$ if $X \in V(E)$ | RSP |

Table 1. Axioms for guarded recursion
\[ \bigwedge_{n \geq 0} \pi_n(x) = \pi_n(y) \Rightarrow x = y \quad \text{AIP} \]

Table 2. Approximation induction principle

| \( \pi_0(x) = D \) | P0 |
| \( \pi_{n+1}(S) = S \) | P1 |
| \( \pi_{n+1}(D) = D \) | P2 |
| \( \pi_{n+1}(x \leq a \geq y) = \pi_n(x) \leq a \geq \pi_n(y) \) | P3 |

Table 3. Axioms for projection operators

We will write BTA+REC for BTA extended with the constants for solutions of guarded recursive specifications and axioms RDP and RSP.

Closed terms of BTA+REC that denote the same infinite thread cannot always be proved to be equal using the axioms of BTA+REC. We introduce the approximation induction principle to remedy this. The approximation induction principle, AIP, is based on the view that two threads are identical if their approximations up to any finite depth are identical. The approximation up to depth \( n \) of a thread is obtained by cutting it off after performing a sequence of actions of length \( n \).

AIP is the infinitary conditional equation given in Table 2. Here, following Bergstra and Bethke (2003), the approximation of depth \( n \) is given in terms of a unary projection operator \( \pi_n(\_ \_) \). The axioms for the projection operators are given in Table 3, where \( a \) stands for an arbitrary member of \( A \).

From now on, we will write \( \mathcal{E}_{\text{lin}}(A) \), where \( A \subseteq \mathcal{A} \), for the set of all finite guarded recursive specifications over BTA that only contain postconditional operators \( \_ \leq a \geq \_ \) for which \( a \in A \). Moreover, we write \( \mathcal{T}_{\text{finrec}}(A) \), where \( A \subseteq \mathcal{A} \), for the set of all closed terms of BTA+REC that only contain postconditional operators \( \_ \leq a \geq \_ \) for which \( a \in A \) and only contain constants \( \langle X | E \rangle \) for which \( E \in \mathcal{E}_{\text{fin}}(A) \).

A linear recursive specification over BTA is a guarded recursive specification \( E = \{ X = t_X | X \in V \} \), where each \( t_X \) is a term of the form \( D, S \) or \( Y \leq a \geq Z \) with \( Y,Z \in V \). For each closed term \( p \in \mathcal{T}_{\text{finrec}}(A) \), there exist a linear recursive specification \( E \in \mathcal{E}_{\text{lin}}(A) \) and a variable \( X \in V(E) \) such that \( p = \langle X | E \rangle \) is derivable from the axioms of BTA+REC.

From now on, we will write \( \mathcal{E}_{\text{lin}}(A) \), where \( A \subseteq \mathcal{A} \), for the set of all linear recursive specifications from \( \mathcal{E}_{\text{lin}}(A) \).

In the following, the interpretations of the constants and operators of BTA+REC in models of BTA+REC will be denoted by the constants and operators themselves. Let \( \mathcal{M} \) be some model of BTA+REC, and \( p \) be an element from the domain of \( \mathcal{M} \). Then the set of states or residual threads of \( p \), written \( \text{Res}(p) \), is defined inductively as follows:

- \( p \in \text{Res}(p) \);
- if \( q \leq a \geq r \in \text{Res}(p) \), then \( q \in \text{Res}(p) \) and \( r \in \text{Res}(p) \).

We are only interested in models of BTA+REC in which \( \text{card}(\text{Res}(\langle X | E \rangle)) \leq \text{card}(E) \) for all finite linear recursive specifications \( E \), such as the projective limit model of BTA presented in Bergstra and Bethke (2003).
3. Maurer machines

In this section, we review the concept of a Maurer machine. This concept was first introduced in Bergstra and Middelburg (2007).

A Maurer machine $H$ consists of the following components:

- a non-empty set $M$;
- a set $B$ with $\text{card}(B) \geq 2$;
- a set $\mathcal{S}$ of functions $S : M \rightarrow B$;
- a set $\mathcal{O}$ of functions $O : \mathcal{S} \rightarrow \mathcal{S}$;
- a set $A \subseteq \mathcal{A}$;
- a function $[\cdot] : A \rightarrow (\mathcal{O} \times M)$.

It also satisfies the following conditions:

- if $S_1, S_2 \in \mathcal{S}$, $M' \subseteq M$, and $S_3 : M \rightarrow B$ is such that $S_3(x) = S_1(x)$ if $x \in M'$ and $S_3(x) = S_2(x)$ if $x \notin M'$, then $S_3 \in \mathcal{S}$;
- if $S_1, S_2 \in \mathcal{S}$, then the set $\{ x \in M \mid S_1(x) \neq S_2(x) \}$ is finite;
- if $S \in \mathcal{S}$, $a \in A$, and $[a] = (O, m)$, then $S(m) \in \{T, F\}$.

$M$ is called the memory of $H$, and $B$ is called the base set of $H$. The members of $\mathcal{S}$ are called the states of $H$, the members of $\mathcal{O}$ are called the operations of $H$, the members of $A$ are called the basic actions of $H$ and $[\cdot]$ is called the basic action interpretation function of $H$.

We write $M_H$, $B_H$, $\mathcal{S}_H$, $\mathcal{O}_H$, $A_H$ and $[\cdot]_H$, where $H = (M, B, \mathcal{S}, \mathcal{O}, A, [\cdot])$ is a Maurer machine, for $M$, $B$, $\mathcal{S}$, $\mathcal{O}$, $A$ and $[\cdot]$, respectively.

A Maurer machine has much in common with a real computer. The memory of a Maurer machine consists of memory elements whose contents are elements from its base set. The term memory must not be interpreted too strictly. For example, register files and caches must be regarded as parts of the memory. The contents of all memory elements together make up a state of the Maurer machine. State changes are accomplished by performing its operations. Every state change amounts to a change in the contents of certain memory elements. The basic actions of a Maurer machine are the instructions that it is able to process. The processing of a basic action amounts to performing the operation associated with the basic action using the basic action interpretation function. At completion of the processing, the content of the memory element associated with the basic action by the basic action interpretation function is the reply produced by the Maurer machine. The term basic action originates from BTA.

The first condition on the states of a Maurer machine is a structural condition and the second is a finite variability condition. We will return to these conditions, which are met by any real computer, after we have introduced the input and output regions of an operation. The third condition on the states of a Maurer machine restricts the possible replies at completion of the processing of a basic action to $T$ or $F$.

Maurer (1966) proposed a model for computers. We then introduced the term Maurer computer in Bergstra and Middelburg (2007) to denote a computer conforming to Maurer's definition. Separating out the set of basic actions and the basic action interpretation function from a Maurer machine leaves a Maurer computer. The set of
basic actions and the basic action interpretation function constitute the interface of a
Maurer machine with its environment, which effectuates state changes by issuing basic
actions.

The notions of the input and output regions of an operation, which originated in
Maurer (1966), will be used in later sections.

Let \( H = (M, B, \mathcal{S}, O, A, \llbracket - \rrbracket) \) be a Maurer machine and let \( O : \mathcal{S} \to \mathcal{S} \). Then the input region of \( O \), written \( IR(O) \), and the output region of \( O \), written \( OR(O) \), are the subsets of \( M \) defined as follows:

\[
IR(O) = \{ x \in M \mid \exists S_1, S_2 \in \mathcal{S} \cdot (\forall z \in M \setminus \{x\} \cdot S_1(z) = S_2(z) \land \exists y \in OR(O) \cdot O(S_1)(y) \neq O(S_2)(y))\}
\]

\[
OR(O)) = \{ x \in M \mid \exists S \in \mathcal{S} \cdot S(x) \neq O(S)(x)\}^\dagger.
\]

\( OR(O) \) is the set of all memory elements that might possibly be affected by \( O \); and \( IR(O) \) is the set of all memory elements that might possibly affect elements of \( OR(O) \) under \( O \). For example, the input and output regions of an operation that adds the content of a given main memory cell, say \( X \), to the content of a given register, say \( R_0 \), are \{\( X, R_0 \)\} and \{\( R_0 \)\}, respectively.

Let \( H = (M, B, \mathcal{S}, O, A, \llbracket - \rrbracket) \) be a Maurer machine. Let \( S_1, S_2 \in \mathcal{S} \) and \( O \in \mathcal{O} \). Then \( S_1 \upharpoonright IR(O) = S_2 \upharpoonright IR(O) \) implies \( O(S_1) \upharpoonright OR(O) = O(S_2) \upharpoonright OR(O) \). In other words, every operation transforms states that coincide on the input region of the operation to states that coincide on the output region of the operation. The second condition on the states of a Maurer machine is required for this fundamental property to hold. The first condition on the states of a Maurer machine could be relaxed somewhat.

More results relating to input regions and output regions are given in Maurer (1966). Recently, a revised and expanded version of Maurer (1966), which includes all the proofs, has appeared as Maurer (2006).

4. Applying threads to Maurer machines

In this section, we add a binary apply operator \( - \bullet_H - \) to BTA+REC for each Maurer
machine \( H \), and introduce a notion of computation in the resulting setting.

The apply operators associated with Maurer machines are related to the apply operators introduced in Bergstra and Ponse (2002). They allow threads to transform states of the associated Maurer machine by means of its operations. Such state transformations produce either a state of the associated Maurer machine or the undefined state \( \uparrow \). It is assumed that \( \uparrow \) is not a state of any Maurer machine. We extend function restriction to \( \uparrow \) by stipulating that \( \uparrow \upharpoonright M = \uparrow \) for any set \( M \). The first operand of the apply operator \( - \bullet_H - \) associated

\[\dagger\] The following precedence conventions are used in logical formulas. Operators bind more strongly than predicate symbols, and predicate symbols bind more strongly than logical connectives and quantifiers. Moreover, \( \neg \) binds more strongly than \( \land \) and \( \lor \), and \( \land \) and \( \lor \) bind more strongly than \( \Rightarrow \) and \( \Leftrightarrow \). Quantifiers are given the smallest possible scope.

\[\ddagger\] We use the notation \( f \upharpoonright D \), where \( f \) is a function and \( D \subseteq \text{dom}(f) \), for the function \( g \) with \( \text{dom}(g) = D \) such that for all \( d \in \text{dom}(g) \), \( g(d) = f(d) \).
with Maurer machine \( H = (M, B, \mathcal{O}, A, \llbracket \_ \rrbracket) \) must be a term from \( \mathcal{T}_{\text{finrec}}(A) \), and its second argument must be a state from \( \mathcal{T} \cup \{ \uparrow \} \).

Let \( H = (M, B, \mathcal{O}, A, \llbracket \_ \rrbracket) \) be a Maurer machine, \( p \in \mathcal{T}_{\text{finrec}}(A) \) and \( S \in \mathcal{T} \). Then \( p \cdot_H S \) is the state that results if all basic actions performed by thread \( p \) are processed by the Maurer machine \( H \) from initial state \( S \). The processing of a basic action \( a \) by \( H \) amounts to a state change according to the operation associated with \( a \) by \( \llbracket \_ \rrbracket \). In the resulting state, the reply produced by \( H \) is contained in the memory element associated with \( a \) by \( \llbracket \_ \rrbracket \). If \( p \) is \( S \), there will be no state change; if \( p \) is \( D \), the result is \( \uparrow \).

Let \( H = (M, B, \mathcal{O}, A, \llbracket \_ \rrbracket) \) be a Maurer machine and \( (O_a, m_a) = \llbracket a \rrbracket \) for all \( a \in A \). Then the apply operator \( \_ \cdot_H \_ \) is defined by the equations given in Table 4 and the rule given in Table 5. In these tables, \( a \) stands for an arbitrary member of \( A \), and \( S \) stands for an arbitrary member of \( \mathcal{T} \).

Let \( H = (M, B, \mathcal{O}, A, \llbracket \_ \rrbracket) \) be a Maurer machine. Let \( p \in \mathcal{T}_{\text{finrec}}(A) \) and \( S \in \mathcal{T} \). Then \( p \) converges from \( S \) on \( H \) if there exists an \( n \in \mathbb{N} \) such that \( \pi_n(p) \cdot_H S \neq \uparrow \). The rule from Table 5 can be read as follows: if \( x \) does not converge from \( S \) on \( H \), then \( x \cdot_H S \) equals \( \uparrow \).

Before we introduce a notion of computation in the current setting, we need to introduce some auxiliary notions.

Let \( H = (M, B, \mathcal{O}, A, \llbracket \_ \rrbracket) \) be a Maurer machine and \( (O_a, m_a) = \llbracket a \rrbracket \) for all \( a \in A \). Then the step relation \( \_ \vdash_H \_ \subseteq (\mathcal{T}_{\text{finrec}}(A) \times \mathcal{T}) \times (\mathcal{T}_{\text{finrec}}(A) \times \mathcal{T}) \) is defined inductively as follows:

1. if \( O_a(S)(m_a) = T \) and \( p = p' \leq a \geq p'' \), then \( (p, S) \vdash_H (p', O_a(S)) \);
2. if \( O_a(S)(m_a) = F \) and \( p = p' \leq a \geq p'' \), then \( (p, S) \vdash_H (p'', O_a(S)) \).

Let \( H = (M, B, \mathcal{O}, A, \llbracket \_ \rrbracket) \) be a Maurer machine. Then a full path in \( \_ \vdash_H \_ \) is one of the following:

1. a finite path \( \langle (p_0, S_0), \ldots, (p_n, S_n) \rangle \) in \( \_ \vdash_H \_ \) such that there exists no \( (p_{n+1}, S_{n+1}) \in \mathcal{T}_{\text{finrec}}(A) \times \mathcal{T} \) with \( (p_n, S_n) \vdash_H (p_{n+1}, S_{n+1}) \);
2. an infinite path \( \langle (p_0, S_0), (p_1, S_1), \ldots \rangle \) in \( \_ \vdash_H \_ \).

Moreover, let \( p \in \mathcal{T}_{\text{finrec}}(A) \) and \( S \in \mathcal{T} \). Then the full path of \( (p, S) \) on \( H \) is the unique full path in \( \_ \vdash_H \_ \) from \( (p, S) \). If \( p \) converges from \( S \) on \( H \), then the full path of \( (p, S) \) on \( H \) is called the computation of \( (p, S) \) on \( H \) and we write \( \| (p, S) \|_H \) for the length of the computation of \( (p, S) \) on \( H \).

---

**Table 4. Defining equations for apply operator**

| \( x \cdot_H \) \( \uparrow \) | \( S \cdot_H S = S \) | \( D \cdot_H S = \uparrow \) |
| \( (x \leq a \geq y) \cdot_H S = x \cdot_H O_a(S) \) if \( O_a(S)(m_a) = T \) | \( (x \leq a \geq y) \cdot_H S = y \cdot_H O_a(S) \) if \( O_a(S)(m_a) = F \) |

**Table 5. Rule for divergence**

\[ \land_{n \geq 0} \pi_n(x) \cdot_H S = \uparrow \Rightarrow x \cdot_H S = \uparrow \]
It is easy to see that \((p_0, S_0) \vdash_H (p_1, S_1)\) only if \(p_0 \cdot_H S_0 = p_1 \cdot_H S_1\), and that \(\langle(p_0, S_0), \ldots, (p_n, S_n)\rangle\) is the computation of \((p_0, S_0)\) on \(H\) only if \(p_n = S\) and \(S_n = p_0 \cdot_H S_0\). It is also easy to see that if \(p_0\) converges from \(S_0\) on \(H\), then \(\| (p_0, S_0) \|_H\) is the least \(n \in \mathbb{N}\) such that \(\pi_n(p_0) \cdot_H S_0 \neq \uparrow\).

5. Instruction set architectures

In this section, we introduce the concept of a strict load/store Maurer instruction set architecture. This concept, which was first introduced in Bergstra and Middelburg (2008), takes its name from the following features:

— it is described in the setting of Maurer machines;
— it is only concerned with load/store architectures; and
— the load/store architectures concerned are strict in some respects – this will be explained after we have completed the formalisation.

The concept of a strict load/store Maurer instruction set architecture, or a strict load/store Maurer ISA for short, is an approximation of the concept of a load/store instruction set architecture (see, for example, Hennessy and Patterson (2003)). It is focussed on instructions for data manipulation and data transfer. The transfer of program control is treated in a uniform way over different strict load/store Maurer ISAs by working at the abstraction level of threads. All that is left of the transfer of program control at this level is postconditional composition.

Each Maurer machine has a number of basic actions with which an operation is associated. From now on, in the context of Maurer machines that are strict load/store Maurer ISAs, such basic actions will be referred to loosely as basic instructions – the term basic action is uncommon when we are concerned with ISAs.

The idea underlying the concept of a strict load/store Maurer ISA is that there is a main memory whose elements contain data, an operating unit with a small internal memory through which data can be manipulated, and an interface between the main memory and the operating unit for data transfer between them. For the sake of simplicity, data is restricted to the natural numbers between 0 and some upper bound. Other types of data that one might want to support can always be represented by the natural numbers provided. Moreover, the data manipulation instructions offered by a strict load/store Maurer ISA are not restricted, and may include ones that are tailored to the manipulation of representations of other types of data. Therefore, we believe that nothing essential is lost by the restriction to natural numbers.

The concept of a strict load/store Maurer ISA is parametrised by:

— an address width \(aw\);
— a word length \(wl\);
— an operating unit size \(ous\);
— a number \(nrpl\) of pairs of data and address registers for load instructions;
— a number \(nrps\) of pairs of data and address registers for store instructions;
— a set \(A_{dm}\) of basic instructions for data manipulation;

where \(aw, ous \geq 0, wl, nrpl, nrps > 0\) and \(A_{dm} \subseteq \mathcal{A}\).
The address width \( aw \) may be thought of as the number of bits used for the binary representation of addresses of data memory elements. The word length \( wl \) may be thought of as the number of bits used to represent data in data memory elements. The operating unit size \( os \) may be thought of as the number of bits that the internal memory of the operating unit contains. The operating unit size is measured in bits because this allows us to establish results for which no assumptions about the internal structure of the operating unit are involved.

We assume that, for each \( n \in \mathbb{N} \), a fixed but arbitrary countably infinite set \( M^n_{\text{data}} \) and a fixed but arbitrary bijection \( m^n_{\text{data}} : \mathbb{N} \to M^n_{\text{data}} \) are given. The members of \( M^n_{\text{data}} \) are called data memory elements. The contents of data memory elements are taken as data. The data memory elements from \( M^n_{\text{data}} \) can contain natural numbers in the interval \([0, 2^n - 1]\).

We assume that a fixed but arbitrary countably infinite set \( M_{\text{ou}} \) and a fixed but arbitrary bijection \( m_{\text{ou}} : \mathbb{N} \to M_{\text{ou}} \) are given. The members of \( M_{\text{ou}} \) are called operating unit memory elements. They can contain natural numbers in the set \( \{0, 1\} \), that is, bits. Usually, a part of the operating unit memory is partitioned into groups that data manipulation instructions can refer to.

We assume that, for each \( n \in \mathbb{N} \), fixed but arbitrary countably infinite sets \( M^n_{\text{ld}}, M^n_{\text{ia}}, M^n_{\text{sd}} \) and \( M^n_{\text{sa}} \) and fixed but arbitrary bijections \( m^n_{\text{ld}} : \mathbb{N} \to M^n_{\text{ld}}, m^n_{\text{ia}} : \mathbb{N} \to M^n_{\text{ia}}, m^n_{\text{sd}} : \mathbb{N} \to M^n_{\text{sd}} \) and \( m^n_{\text{sa}} : \mathbb{N} \to M^n_{\text{sa}} \) are given. The members of \( M^n_{\text{ld}}, M^n_{\text{ia}}, M^n_{\text{sd}} \) and \( M^n_{\text{sa}} \) are called load data registers, load address registers, store data registers and store address registers, respectively. The contents of load data registers and store data registers are taken to be data, whereas the contents of load address registers and store address registers are taken to be addresses. The load data registers from \( M^n_{\text{ld}} \), the load address registers from \( M^n_{\text{ia}} \), the store data registers from \( M^n_{\text{sd}} \) and the store address registers from \( M^n_{\text{sa}} \) can contain natural numbers in the interval \([0, 2^n - 1]\). The load and store registers are special memory elements designated for the transfer of data between the data memory and the operating unit memory.

We assume a single special memory element \( rr \) for passing on the replies resulting from the processing of basic instructions. This special memory element is called the reply register.

We assume that, for each \( n, n' \in \mathbb{N} \), \( M^n_{\text{data}}, M^n_{\text{ou}}, M^n_{\text{ld}}, M^n_{\text{ia}}, M^n_{\text{sd}}, M^n_{\text{sa}} \) and \( \{rr\} \) are pairwise disjoint sets.

If \( M \subseteq M^n_{\text{data}} \) and \( m^n_{\text{data}}(i) \in M \), we write \( M[i] \) for \( m^n_{\text{data}}(i) \). If \( M \subseteq M^n_{\text{ld}} \) and \( m^n_{\text{ld}}(i) \in M \), we write \( M[i] \) for \( m^n_{\text{ld}}(i) \). If \( M \subseteq M^n_{\text{ia}} \) and \( m^n_{\text{ia}}(i) \in M \), we write \( M[i] \) for \( m^n_{\text{ia}}(i) \). If \( M \subseteq M^n_{\text{sd}} \) and \( m^n_{\text{sd}}(i) \in M \), we write \( M[i] \) for \( m^n_{\text{sd}}(i) \). If \( M \subseteq M^n_{\text{sa}} \) and \( m^n_{\text{sa}}(i) \in M \), we write \( M[i] \) for \( m^n_{\text{sa}}(i) \). Moreover, we write \( B \) for the set \( \{T, F\} \).

Let \( aw, os, nl, nrpl, nrps > 0 \) and \( A_{dn} \subseteq \mathcal{A} \). Then a strict load/store Maurer instruction set architecture with parameters \( aw, wl, os, nl, nrpl, nrps \) and \( A_{dn} \) is a Maurer machine \( H = (M, B, \mathcal{A}, \mathcal{C}, A, \{rr\}) \) with

\[
M = M_{\text{data}} \cup M_{\text{ou}} \cup M_{\text{ld}} \cup M_{\text{ia}} \cup M_{\text{sd}} \cup M_{\text{sa}} \cup \{rr\},
\]

\[
B = [0, 2^{wl} - 1] \cup [0, 2^{aw} - 1] \cup B.
\]
\[
\mathcal{S} = \{ S : M \rightarrow B \mid \\
\forall m \in M_{data} \cup M_{ld} \cup M_{sd} : S(m) \in [0, 2^{aw} - 1] \land \\
\forall m \in M_{ld} \cup M_{sa} : S(m) \in [0, 2^{aw} - 1] \land \\
\forall m \in M_{ou} : S(m) \in \{0,1\} \land S(rr) \in B \}
\]

\[\emptyset = \{ O_a \mid a \in A \} \]

\[A = \{ \text{load} : n \mid n \in [0, nrpl - 1] \} \cup \{ \text{store} : n \mid n \in [0, nrps - 1] \} \cup A_{dm} \]

\[\llbracket a \rrbracket = (O_a, rr) \text{ for all } a \in A,\]

where

\[
M_{data} = \{ m_{data}^{wl}(i) \mid i \in [0, 2^{aw} - 1] \}
\]

\[
M_{ou} = \{ m_{ou}(i) \mid i \in [0, ous - 1] \}
\]

\[
M_{ld} = \{ m_{ld}^{wl}(i) \mid i \in [0, nrpl - 1] \}
\]

\[
M_{la} = \{ m_{la}^{aw}(i) \mid i \in [0, nrpl - 1] \}
\]

\[
M_{sd} = \{ m_{sd}^{wl}(i) \mid i \in [0, nrps - 1] \}
\]

\[
M_{sa} = \{ m_{sa}^{aw}(i) \mid i \in [0, nrps - 1] \}
\]

and, for all \( n \in [0, nrpl - 1] \), we have \( O_{\text{load}, n} \) is the unique function from \( \mathcal{S} \) to \( \mathcal{S} \) such that for all \( S \in \mathcal{S} \),

\[
O_{\text{load}, n}(S) \upharpoonright (M \setminus \{ M_{ld}[n], rr \}) = S \upharpoonright (M \setminus \{ M_{ld}[n], rr \})
\]

\[
O_{\text{load}, n}(S)(M_{ld}[n]) = S(M_{data}[S(M_{la}[n])])
\]

\[
O_{\text{load}, n}(S)(rr) = T
\]

and, for all \( n \in [0, nrps - 1] \), we have \( O_{\text{store}, n} \) is the unique function from \( \mathcal{S} \) to \( \mathcal{S} \) such that for all \( S \in \mathcal{S} \),

\[
O_{\text{store}, n}(S) \upharpoonright (M \setminus \{ M_{data}[S(M_{sa}[n])], rr \}) = S \upharpoonright (M \setminus \{ M_{data}[S(M_{sa}[n])], rr \})
\]

\[
O_{\text{store}, n}(S)(M_{data}[S(M_{sa}[n])]) = S(M_{sd}[n])
\]

\[
O_{\text{store}, n}(S)(rr) = T
\]

and, for all \( a \in A_{dm} \), we have \( O_a \) is a function from \( \mathcal{S} \) to \( \mathcal{S} \) such that

\[
IR(O_a) \subseteq M_{ou} \cup M_{ld}
\]

\[
OR(O_a) \subseteq M_{ou} \cup M_{la} \cup M_{sd} \cup M_{sa} \cup \{ rr \}.
\]

We will write \( \mathcal{J} = \mathcal{S}_{sls}(aw, wl, ous, nrpl, nrps, A_{dm}) \) for the set of all strict load/store Maurer ISAs with parameters \( aw, wl, ous, nrpl, nrps \) and \( A_{dm} \).

In our opinion, load/store architectures give rise to a relatively simple interface between the data memory and the operating unit.

A strict load/store Maurer ISA is strict in the following respects:

— with data transfer between the data memory and the operating unit, a strict separation is made between data registers for loading, address registers for loading, data registers for storing, and address registers for storing;
— from these registers, only the registers of the first kind (data registers for loading) are
allowed in the input regions of data manipulation operations, and only the registers
of the other three kinds are allowed in the output regions of data manipulation
operations;
— a data memory with a size less than the number of addresses determined by the
address width is not allowed.

The first two of these conditions concern the interface between the data memory and the
operating unit. We believe that they provide the most convenient interface for theoretical
work relevant to the design of instruction set architectures. The third condition means
we do not need to deal with addresses that do not address a memory element. Such
addresses can be dealt with in many different ways, but since they all complicate the
architecture considerably, we believe it is a good idea to exclude them in much theoretical
work relevant to the design of instruction set architectures.

A strict separation between data registers for loading, address registers for loading,
data registers for storing, and address registers for storing was also made in Cray and
Thornton’s design of the CDC 6600 computer (Thornton 1970), which was, arguably, the
first load/store architecture to be implemented. However, in their design, data registers
for loading are also allowed in the input regions of data manipulation operations.

6. Reducing the operating unit size

In a strict load/store Maurer ISA, data manipulation takes place in the operating unit.
This raises questions about the consequences of changing the operating unit size. One of
the questions is whether, if the operating unit size is reduced by one, it is possible with
new instructions for data manipulation to transform each thread that can be applied to
the original ISA into one or more threads that can each be applied to the ISA with the
reduced operating unit size and that together yield the same state changes on the data
memory. This question can be answered in the affirmative.

**Theorem 1.** Let \( aw \geq 0, wl, ous, nrpl, nrps > 0 \) and \( A_{dn} \subseteq \mathcal{A} \). Let

\[
H = (M, B, \mathcal{S}, c, A, [\_]) \in \mathcal{MIS}_{sls}(aw, wl, ous, nrpl, nrps, A_{dn})
\]

\[
M_{data} = \{ m_{data}^w(i) \mid i \in [0, 2^{aw} - 1] \}
\]

\[
b_c = m_{bc}(ous - 1).
\]

Then there exist an \( A'_{dn} \subseteq \mathcal{A} \) and an

\[
H' = (M', B, \mathcal{S}', c', A', [\_]) \in \mathcal{MIS}_{sls}(aw, wl, ous - 1, nrpl, nrps, A'_{dn})
\]

such that for all \( p \in \mathcal{T}_{finrec}(A) \) there exist \( p'_0, p'_1 \in \mathcal{T}_{finrec}(A') \) such that

\[
\{(S \upharpoonright M_{data}, (p \circ_H S) \upharpoonright M_{data}) \mid S \in \mathcal{S} \land S(b_c) = 0\} = \{(S' \upharpoonright M_{data}, (p'_0 \circ_{H'} S') \upharpoonright M_{data}) \mid S' \in \mathcal{S}'\}
\]

and

\[
\{(S \upharpoonright M_{data}, (p \circ_H S) \upharpoonright M_{data}) \mid S \in \mathcal{S} \land S(b_c) = 1\} = \{(S' \upharpoonright M_{data}, (p'_1 \circ_{H'} S') \upharpoonright M_{data}) \mid S' \in \mathcal{S}'\}.
\]
Notice that bc is the operating unit memory element of H that is missing in H’. In the proof of Theorem 1 given below, we take \( A_{dm} \) such that, for each instruction \( a \) in \( A_{dm} \), there are four instructions \( a(0), a(1), \overline{a}(0) \) and \( \overline{a}(1) \) in \( A’_{dm} \). The operations \( O_{a(0)} \) and \( O_{a(1)} \) affect the memory elements of \( H’ \) in the same way as \( O_a \) would affect them if the content of the missing operating unit memory element was 0 or 1, respectively. The effect that \( O_a \) would have on the missing operating unit memory element is made available by \( O_{\overline{a}(0)} \) and \( O_{\overline{a}(1)} \), respectively. They do nothing but reply \( F \) if the content of the missing operating unit memory element would become 0 and \( T \) if the content of the missing operating unit memory element would become 1.

**Proof of Theorem 1.** Instead of proving the statement directly, we prove that there exist an \( A’_{dm} \subseteq \mathcal{A} \) and an

\[
H’ = (M’, B, \mathcal{S}', \mathcal{O}', A', \llnull) \in \mathcal{MA}\mathcal{SF}_\text{all}(aw, \mathcal{W}, \mathcal{O}_\text{us}, 1, \mathcal{N}_\text{rpl}, \mathcal{N}_\text{rps}, A_{dm})
\]

such that for all \( p \in \mathcal{S}_\text{finrec}(A) \) there exist \( p'_0, p'_1 \in \mathcal{S}_\text{finrec}(A') \) such that

\[
\{(S \uparrow (M' \setminus \{rr\}), (p \bullet_H S) \uparrow (M' \setminus \{rr\})) \mid S \in \mathcal{S} \land S(\text{bc}) = 0\}
\]

\[
= \{(S' \uparrow (M' \setminus \{rr\}), (p'_0 \bullet_H S') \uparrow (M' \setminus \{rr\})) \mid S' \in \mathcal{S}'\}
\]

and

\[
\{(S \uparrow (M' \setminus \{rr\}), (p \bullet_H S) \uparrow (M' \setminus \{rr\})) \mid S \in \mathcal{S} \land S(\text{bc}) = 1\}
\]

\[
= \{(S' \uparrow (M' \setminus \{rr\}), (p'_1 \bullet_H S') \uparrow (M' \setminus \{rr\})) \mid S' \in \mathcal{S}'\}.
\]

This is sufficient because \( M_{\text{data}} \subseteq M' \setminus \{rr\} \).

We take

\[
A'_{dm} = \{a(k), \overline{a}(k) \mid a \in A_{dm} \land k \in \{0, 1\}\},
\]

and

\[
H' = (M', B, \mathcal{S}', \mathcal{O}', A', \llnull')
\]

such that for each \( a \in A_{dm} \) and \( k \in \{0, 1\} \), we have \( O_{a(k)} \) and \( O_{\overline{a}(k)} \) are the unique functions from \( \mathcal{S}' \) to \( \mathcal{S} \) such that for all \( S' \in \mathcal{S}' \):

\[
O_{a(k)}(S') = O_a(\rho_k(S')) \uparrow M'
\]

and

\[
O_{\overline{a}(k)}(S') \uparrow (M' \setminus \{rr\}) = S' \uparrow (M' \setminus \{rr\})
\]

\[
O_{\overline{a}(k)}(S')(rr) = \gamma(O_a(\rho_k(S'))(\text{bc}))
\]

where, for each \( k \in \{0, 1\} \), we have \( \rho_k \) is the unique function from \( \mathcal{S}' \) to \( \mathcal{S} \) such that

\[
\rho_k(S') \uparrow M' = S'
\]

\[
\rho_k(S')(\text{bc}) = k
\]

and \( \gamma : \{0, 1\} \rightarrow \mathcal{B} \) is defined by

\[
\gamma(0) = \mathcal{F}
\]

\[
\gamma(1) = \mathcal{T}.
\]
We restrict ourselves to $p \in \{ \langle X | E \rangle \mid E \in \mathcal{E}_{\text{lin}}^\text{fin}(A) \land X \in V(E) \}$ because each term from $\mathcal{T}_{\text{finrec}}(A)$ can be proved to be equal to some constant from this set by means of the axioms of BTA+REC.

We define transformation functions

$$\phi_k : \{ \langle X | E \rangle \mid E \in \mathcal{E}_{\text{lin}}^\text{fin}(A) \land X \in V(E) \} \rightarrow \{ \langle X | E \rangle \mid E \in \mathcal{E}_{\text{lin}}^\text{fin}(A') \land X \in V(E) \},$$

for $k \in \{0, 1\}$, as follows:

$$\phi_k(\langle X | E \rangle) = \langle X_k | \phi_k'(E) \rangle,$$

where $\phi_k' : \mathcal{E}_{\text{lin}}^\text{fin}(A) \rightarrow \mathcal{E}_{\text{lin}}^\text{fin}(A')$, for $k \in \{0, 1\}$ is defined as follows:

$$\begin{align*}
\phi_k'(\{X = S\}) &= \{X_k = S\} \\
\phi_k'(\{X = D\}) &= \{X_k = D\} \\
\phi_k'(\{X = Y \subseteq a \supseteq Z\}) &= \{X_k = Y_k \subseteq a \supseteq Z_k\} \quad \text{if} \ a \notin A_{dm} \\
\phi_k'(\{X = Y \subseteq a \supseteq Z\}) &= \{X_k = X_k \subseteq a(k) \supseteq X''_k, X'_k = Y_1 \subseteq a(k) \supseteq Z_1, X''_k = Y_0 \subseteq a(k) \supseteq Z_0\} \quad \text{if} \ a \in A_{dm} \\
\phi_k'(E' \cup E'') &= \phi_k'(E') \cup \phi_k'(E'').
\end{align*}$$

Here, for each variable $X$, the new variables $X_0$, $X'_0$, $X''_0$, $X_1$, $X'_1$ and $X''_1$ are taken such that:

(i) they are pairwise different variables;

(ii) for each variable $Y$ different from $X$, we have $\{X_0, X'_0, X''_0, X_1, X'_1, X''_1\}$ and $\{Y_0, Y'_0, Y''_0, Y_1, Y'_1, Y''_1\}$ are disjoint sets.

Let $p = \{ \langle X | E \rangle \mid E \in \mathcal{E}_{\text{lin}}^\text{fin}(A) \land X \in V(E) \}$. Let $S \in \mathcal{S}$ and $S' \in \mathcal{S}'$ be such that $S \upharpoonright M' = S'$, let $(p_i, S_i)$ be the $(i+1)$th element in the full path of $(p, S)$ on $H$, and let $(p'_i, S'_i)$ be the $(i+1)$th element in the full path of $(\phi_{S(bc)}(p), S')$ on $H'$ whose first component does not equal $p \subseteq a(k) \supseteq q$ for any $p, q \in \mathcal{T}_{\text{finrec}}$ and $k \in \{0, 1\}$. Moreover, let $a_i$ be the unique $a \in A$ such that $p_i = p \subseteq a \supseteq q$ for some $p, q \in \mathcal{T}_{\text{finrec}}$. It is then easy to prove by induction on $i$ that if $a_i \in A_{dm}$, then

$$\begin{align*}
o_{a_i}(S_i)(bc) &= \gamma^{-1}(o_{\overline{M}(bc)}(S'_i)(rr)) \quad (1) \\
o_{a_i}(S_i)(rr) &= o_{a_i}(S_i)(bc)(o_{\overline{M}(bc)}(S'_i)))(rr) \quad (2)
\end{align*}$$

(if $i + 1 < \|(p, S)\|_H$ when $p$ converges from $S$ on $H$). Now, using (1) and (2), it is easy to prove by induction on $i$ that:

$$\begin{align*}
\phi_{S(bc)}(p_i) &= p'_i \\
S_i \upharpoonright (M \setminus \{rr\}) &= \rho_{S(bc)}(S'_i) \upharpoonright (M \setminus \{rr\})
\end{align*}$$

(if $i < \|(p, S)\|_H$ when $p$ converges from $S$ on $H$). The result then follows immediately. \qed
The proof of Theorem 1 gives us some upper bounds:
— for each thread that can be applied to the original ISA, the number of threads that can together produce the same state changes on the data memory of the ISA with the reduced operating unit does not have to be more than 2;
— the number of states of the new threads does not have to be more than 6 times the number of states of the original thread;
— the number of steps that the new threads take to produce some state change does not have to be more than 2 times the number of steps that the original thread takes to produce that state change;
— the number of instructions of the ISA with the reduced operating unit does not have to be more than 4 times the number of instructions of the original ISA.

Moreover, the proof indicates that more efficient new threads are possible: equations $X = Y \leq a \geq Z$ with $a \in A_{dm}$ can be treated as if $a \notin A_{dm}$ when the missing operating unit memory element is not in $IR(O_a)$.

As a corollary of the proof of Theorem 1, we have that only one transformed thread is needed if the input region of the operation associated with the first instruction performed by the original thread does not include the operating unit memory element that is missing in the ISA with the reduced operating unit size.

**Corollary 1.** Let $aw \geq 0$, $wl$, $ous$, $nrpl$, $nrps > 0$ and $A_{dm} \subseteq \mathcal{A}$. Let:

$$H = (M, B, \mathcal{S}, \emptyset, A, \{\_\}) \in \mathcal{MIF}\widehat{\mathcal{A}}_{\mathcal{s}s}(aw, wl, ous, nrpl, nrps, A_{dm})$$

$$M_{data} = \{m_{data}^w(i) \mid i \in [0, 2^{aw} - 1]\}$$

$$bc = m_{ou}(ous - 1).$$

Moreover, let

$$\mathcal{S}' = \{q \leq a \geq r \mid q, r \in \mathcal{S}_{\text{finrec}}(A) \land a \in A \land bc \in IR(O_a)\}.$$

Then there exist an $A'_{dm} \subseteq \mathcal{A}$ and an

$$H' = (M', B, \mathcal{S}', \emptyset, A', \{\_\}') \in \mathcal{MIF}\widehat{\mathcal{A}}_{\mathcal{s}s}(aw, wl, ous - 1, nrpl, nrps, A'_{dm})$$

such that for all $p \in \mathcal{S}_{\text{finrec}}(A) \setminus \mathcal{S}'$ there exists a $p' \in \mathcal{S}_{\text{finrec}}(A')$ such that

$$\{(S \upharpoonright M_{data}, (p \bullet_H S) \upharpoonright M_{data}) \mid S \in \mathcal{S}'.\}$$

$$= \{(S' \upharpoonright M_{data}, (p' \bullet_H S') \upharpoonright M_{data}) \mid S' \in \mathcal{S}'\}.$$

As another corollary of the proof of Theorem 1, if the operating unit size is reduced to zero, it is still possible to transform each thread that can be applied to the original ISA into a number of threads that can each be applied to the ISA with the reduced operating unit size and that together yield the same state changes on the data memory.

**Corollary 2.** Let $aw \geq 0$, $wl$, $ous$, $nrpl$, $nrps > 0$ and $A_{dm} \subseteq \mathcal{A}$. Let

$$H = (M, B, \mathcal{S}, \emptyset, A, \{\_\}) \in \mathcal{MIF}\widehat{\mathcal{A}}_{\mathcal{s}s}(aw, wl, ous, nrpl, nrps, A_{dm})$$

$$M_{data} = \{m_{data}^w(i) \mid i \in [0, 2^{aw} - 1]\}$$

$$M_{ou} = \{m_{ou}(i) \mid i \in [0, ous - 1]\}.$$
Then there exist an $A_{dm}' \subseteq A$ and an
\[ H' = (M', B, \mathcal{F}', 0', A', [\ldots]) \in \mathcal{M}F \mathcal{A}_{abs}(aw, wl, 0, nrpl, nrps, A_{dm}') \]
such that for all $p \in \mathcal{F}_{finrec}(A)$ and $S_{ou} \in \{ S \upharpoonright M_{ou} \mid S \in \mathcal{F} \}$ there exists a $p' \in \mathcal{F}_{finrec}(A')$ such that
\[
\{(S \upharpoonright M_{data}, (p \cdot H S) \upharpoonright M_{data}) \mid S \in \mathcal{F} \land S \upharpoonright M_{ou} = S_{ou}\} = \{(S' \upharpoonright M_{data}, (p' \cdot H' S') \upharpoonright M_{data}) \mid S' \in \mathcal{F}'\}.
\]

Let $M_{ou}$ be as in Corollary 2. Then the cardinality of $\{ S \upharpoonright M_{ou} \mid S \in \mathcal{F} \}$ is $2^{ous}$. Therefore, if the operating unit size is reduced to zero, at most $2^{ous}$ transformed threads are needed for each thread that can be applied to the original ISA. Notice that Corollary 2 does not go through if the number of states of the new threads is bounded.

7. Thread powered function classes

A simple calculation shows that for a strict load/store Maurer ISA with address width $aw$ and word length $wl$, the number of possible transformations on the states of the data memory is $2^{(2^{aw \cdot wl + aw})}$). This raises questions concerning the possibility of achieving all these state transformation by applying a thread to a strict load/store Maurer ISA with this address width and word length. One of the questions is how this possibility depends on the operating unit size of the ISAs, the size of the instruction set of the ISAs and the maximal number of states of the threads. This leads us to introduce the concept of a thread powered function class.

The concept of a thread powered function class is parametrised by:

- an address width $aw$;
- a word length $wl$;
- an operating unit size $ous$;
- an instruction set size $iss$;
- a state space bound $ssb$;
- a working area flag $waf$;

where $aw, ous \geq 0$, $wl, iss, ssb > 0$ and $waf \in \mathbb{B}$.

The instruction set size $iss$ is the number of basic instructions, excluding load and store instructions. To simplify the setting, we will only consider the case where there is one load instruction and one store instruction. The state space bound $ssb$ is a bound on the number of states of the thread that is applied. The working area flag $waf$ indicates whether a part of the data memory is taken as a working area. A part of the data memory is taken as a working area if we are not interested in the state transformations with respect to that part. To simplify the setting, we always set aside half of the data memory for working area if a working area is in order.

Intuitively, the thread powered function class with parameters $aw, wl, ous, iss, ssb$ and $waf$ are the transformations on the states of the data memory, or the first half of the data memory, depending on $waf$, that can be achieved by applying threads with no more than $ssb$ states to a strict load/store Maurer ISA of which the address width is $aw$, the word length is $wl$, the operating unit size is $ous$, the number of register pairs for load
instructions is 1, the number of register pairs for store instructions is 1, and the cardinality of the set of instructions for data manipulation is iss. From now on, we will use the term external memory for the data memory if waf = F and for the first half of the data memory if waf = T. Moreover, if waf = T, we will use the term internal memory for the second half of the data memory.

For aw ≥ 0 and wl > 0, we define $M_{\text{data}}^{aw,wl}$, $S_{\text{data}}^{aw,wl}$ and $T_{\text{data}}^{aw,wl}$ as follows:

$$
M_{\text{data}}^{aw,wl} = \{ m_{\text{data}}^{wl}(i) \mid i \in [0, 2^{aw} - 1] \}
$$

$$
S_{\text{data}}^{aw,wl} = \{ S \mid S : M_{\text{data}}^{aw,wl} \rightarrow [0, 2^{wl} - 1] \}
$$

$$
T_{\text{data}}^{aw,wl} = \{ T \mid T : S_{\text{data}}^{aw,wl} \rightarrow S_{\text{data}}^{aw,wl} \}.
$$

Here:

— $M_{\text{data}}^{aw,wl}$ is the data memory of a strict load/store Maurer ISA with address width aw and word length wl;
— $S_{\text{data}}^{aw,wl}$ is the set of possible states of that data memory;
— $T_{\text{data}}^{aw,wl}$ is the set of possible transformations on those states.

Let $aw, ou, waf, iss, ssb > 0$. Let $waf \in \mathbb{B}$ be such that $waf = F$ if $aw = 0$. Then the thread powered function class with parameters $aw, wl, ou, iss, ssb$ and $waf$, written $\mathcal{TPFC}(aw, wl, ou, iss, ssb, waf)$, is the subset of $T_{\text{data}}^{aw,wl}$ defined by

$$
T \in \mathcal{TPFC}(aw, wl, ou, iss, ssb, waf)
\iff
\exists A_{dm} \subseteq \mathcal{S} \cdot
\exists H \in \mathcal{MIA}_{sls}(aw, wl, ou, 1, 1, A_{dm}) \cdot
\exists p \in \mathcal{TPFC}(aw, wl, ou, iss, ssb, waf) \cdot (\text{card}(A_{dm}) = \text{iss} \land \text{card}(\text{Res}(p)) \leq \text{ssb} \land
\forall S \in \mathcal{S} \cdot
\left( (waf = F \Rightarrow T(S \uparrow M_{\text{data}}^{aw,wl}) = (p \bullet_H S) \uparrow M_{\text{data}}^{aw,wl}) \land
(waf = T \Rightarrow T(S \uparrow M_{\text{data}}^{aw,wl}) \uparrow M_{\text{data}}^{aw-1,wl} = (p \bullet_H S) \uparrow M_{\text{data}}^{aw-1,wl}) \right).
$$

We say that $\mathcal{TPFC}(aw, wl, ou, iss, ssb, waf)$ is complete if

$$
\mathcal{TPFC}(aw, wl, ou, iss, ssb, waf) = T_{\text{data}}^{aw,wl}.
$$

The following theorem states that $\mathcal{TPFC}(aw, wl, ou, iss, ssb, waf)$ is complete if ou = $2^{aw} \cdot wl + aw + 1$, iss = 5 and ssb = 8. Because $2^{aw} \cdot wl$ is the data memory size, that is, the number of bits that the data memory contains, this means that completeness can be obtained using 5 data manipulation instructions and threads whose number of states is less than or equal to 8 by taking the operating unit size to be slightly greater than the data memory size.

**Theorem 2.** Let $aw > 0$, $wl > 0$ and $waf \in \mathbb{B}$, and let $dms = 2^{aw} \cdot wl$. Then $\mathcal{TPFC}(aw, wl, dms + aw + 1, 5, 8, waf)$ is complete.

The idea behind the proof of Theorem 2 is:

1. The content of the whole data memory is copied data memory element by data memory element via the load data register to the operating unit.
2 The intended state transformation is then applied to the copy in the operating unit.

3 Finally, the result is copied back data memory element by data memory element via the store data register to the data memory.

The data manipulation instructions used to accomplish this are an initialisation instruction, a pre-load instruction, a post-load instruction, a pre-store instruction and a transformation instruction:

- The pre-load instruction is used to update the load address register before a data memory element is loaded.
- The post-load instruction is used to store the content of the load data register to the operating unit after a data memory element has been loaded.
- The pre-store instruction is used to update the store address register and to load the content of the store data register from the operating unit before a data memory element is stored.
- The transformation instruction is used to apply the intended state transformation to the copy in the operating unit.

Proof of Theorem 2. For convenience, we define

\[
M_{\text{data}} = \{ m_{\text{data}}^l(i) \mid i \in [0, 2^{aw} - 1]\}
\]

\[
M_{\text{ou}} = \{ m_{\text{ou}}(j) \mid j \in [0, dms + aw]\}
\]

\[
M_{d\text{ou}} = \{ m_{\text{ou}}(j) \mid j \in [0, dms - 1]\}
\]

\[
M_{a\text{ou}} = \{ m_{\text{ou}}(j) \mid j \in [dms, dms + aw]\}
\]

\[
M_{d\text{ou}}^d(i) = \{ m_{\text{ou}}(j) \mid j \in [i \cdot wl, (i + 1) \cdot wl - 1]\} \text{ for } i \in [0, 2^{aw} - 1]
\]

\[
\text{ldr} = m_{\text{id}}^l(0)
\]

\[
\text{sdr} = m_{\text{sd}}^l(0)
\]

\[
\text{lar} = m_{\text{la}}^l(0)
\]

\[
\text{sar} = m_{\text{sa}}^l(0).
\]

We have \( M_{d\text{ou}}^d = \bigcup_{i \in [0, 2^{aw} - 1]} M_{d\text{ou}}^d(i) \) and \( M_{\text{ou}} = M_{d\text{ou}}^d \cup M_{a\text{ou}} \).

We have to deal with the binary representations of natural numbers in the operating unit. The set of possible binary representations of natural numbers in the operating unit is

\[
\mathcal{R} = \bigcup_{n \in [0, dms + aw], m \in [n, dms + aw]} \{ R \mid R : \{ m_{\text{ou}}(i) \mid i \in [n, m]\} \to \{0, 1\}\}.
\]

For each \( R \in \mathcal{R} \), the natural number for which \( R \) is a binary representation is given by the function \( v : \mathcal{R} \to \mathbb{N} \) defined by

\[
v(R) = \sum i \text{ s.t. } m_{\text{ou}}(i) \in \text{dom}(R) R(m_{\text{ou}}(i)) \cdot 2^{i - \min\{j \mid m_{\text{ou}}(j) \in \text{dom}(R)\}}.
\]

To prove the theorem, we take a fixed but arbitrary \( T \in T_{\text{data}}^{aw, wl} \) and show that \( T \in \mathcal{TPFC}(aw, wl, dms + aw + 1, 5, 8, waf) \).

We take

\[
A_{\text{dm}} = \{ \text{init, preload, postload, prestore, transform} \}
\]
and

\[
H = (M, B, \mathcal{S}, \mathcal{C}, A, [\ldots]) \in MFA_{sk}(aw, wl, dms + aw + 1, 1, A_{dm})
\]

such that \(O_{\text{init}}, O_{\text{preload}}, O_{\text{postload}}, O_{\text{prestore}}, \) and \(O_{\text{transform}}\) are the unique functions from \(\mathcal{S}\) to \(\mathcal{S}\) such that for all \(S \in \mathcal{S}\):

\[
O_{\text{init}}(S) \downarrow (M \setminus (M_{ou}^a \cup \{rr\})) = S \downarrow (M \setminus (M_{ou}^a \cup \{rr\}))
\]

\[
v(O_{\text{init}}(S) \uparrow M_{ou}^a) = 0
\]

\[
O_{\text{init}}(S)(rr) = T
\]

\[
O_{\text{preload}}(S) \downarrow (M \setminus (M_{ou}^a \cup \{lar\} \cup \{rr\})) = S \downarrow (M \setminus (M_{ou}^a \cup \{lar\} \cup \{rr\}))
\]

\[
v(O_{\text{preload}}(S) \uparrow M_{ou}^a) = v(S \uparrow M_{ou}^a) + 1 \quad \text{if } v(S \uparrow M_{ou}^a) < 2^{aw}
\]

\[
O_{\text{preload}}(S) \uparrow M_{ou}^a = S \uparrow M_{ou}^a \quad \text{if } v(S \uparrow M_{ou}^a) \geq 2^{aw}
\]

\[
O_{\text{preload}}(S)(lar) = v(S \uparrow M_{ou}^a) \quad \text{if } v(S \uparrow M_{ou}^a) < 2^{aw}
\]

\[
O_{\text{preload}}(S)(lar) = S(lar) \quad \text{if } v(S \uparrow M_{ou}^a) \geq 2^{aw}
\]

\[
O_{\text{preload}}(S)(rr) = T \quad \text{if } v(S \uparrow M_{ou}^a) < 2^{aw}
\]

\[
O_{\text{preload}}(S)(rr) = F \quad \text{if } v(S \uparrow M_{ou}^a) \geq 2^{aw}
\]

\[
O_{\text{postload}}(S) \downarrow (M \setminus (M_{ou}^d \langle v(S \uparrow M_{ou}^a) \rangle \cup \{rr\})) = S \downarrow (M \setminus (M_{ou}^d \langle v(S \uparrow M_{ou}^a) \rangle \cup \{rr\}))
\]

\[
v(O_{\text{postload}}(S) \uparrow M_{ou}^d \langle v(S \uparrow M_{ou}^a) \rangle) = S(ldr)
\]

\[
O_{\text{postload}}(S)(rr) = T
\]

\[
O_{\text{prestore}}(S) \downarrow (M \setminus (M_{ou}^a \cup \{sar, sdr\} \cup \{rr\})) = S \downarrow (M \setminus (M_{ou}^a \cup \{sar, sdr\} \cup \{rr\}))
\]

\[
v(O_{\text{prestore}}(S) \uparrow M_{ou}^a) = v(S \uparrow M_{ou}^a) + 1 \quad \text{if } v(S \uparrow M_{ou}^a) < 2^{aw}
\]

\[
O_{\text{prestore}}(S) \uparrow M_{ou}^a = S \uparrow M_{ou}^a \quad \text{if } v(S \uparrow M_{ou}^a) \geq 2^{aw}
\]

\[
O_{\text{prestore}}(S)(sar) = v(S \uparrow M_{ou}^a) \quad \text{if } v(S \uparrow M_{ou}^a) < 2^{aw}
\]

\[
O_{\text{prestore}}(S)(sar) = S(sar) \quad \text{if } v(S \uparrow M_{ou}^a) \geq 2^{aw}
\]

\[
O_{\text{prestore}}(S)(sdr) = v(S \uparrow M_{ou}^d \langle v(S \uparrow M_{ou}^a) \rangle) \quad \text{if } v(S \uparrow M_{ou}^a) < 2^{aw}
\]

\[
O_{\text{prestore}}(S)(sdr) = S(sdr) \quad \text{if } v(S \uparrow M_{ou}^a) \geq 2^{aw}
\]

\[
O_{\text{prestore}}(S)(rr) = T \quad \text{if } v(S \uparrow M_{ou}^a) < 2^{aw}
\]

\[
O_{\text{prestore}}(S)(rr) = F \quad \text{if } v(S \uparrow M_{ou}^a) \geq 2^{aw}
\]

\[
O_{\text{transform}}(S) \downarrow (M \setminus (M_{ou} \cup \{rr\})) = S \downarrow (M \setminus (M_{ou} \cup \{rr\}))
\]

\[
O_{\text{transform}}(S) \uparrow M_{ou}^d = T(S \uparrow M_{ou}^d)
\]

\[
v(O_{\text{transform}}(S) \uparrow M_{ou}^a) = 0
\]

\[
O_{\text{transform}}(S)(rr) = T
\]

where \(T'\) is the unique function from \(\{S \uparrow M_{ou}^d \mid S \in \mathcal{S}\}\) to \(\{S \uparrow M_{ou}^d \mid S \in \mathcal{S}\}\) such that for all \(S_{ou}^d \in \{S \uparrow M_{ou}^d \mid S \in \mathcal{S}\}\), there exists \(S_{data} \in \{S \uparrow M_{data} \mid S \in \mathcal{S}\}\) such that

\[
\forall i \in [0, 2^{aw} - 1] \cdot (v(S_{ou}^d \uparrow M_{ou}^a(i)) = S_{data}(M_{data}[i]) \land
\]

\[
v(T'(S_{ou}^d) \uparrow M_{ou}^d(i)) = T(S_{data})(M_{data}[i]).
\]
Moreover, we take \( p \in \mathcal{F}_{\text{finrec}}(A) \) such that \( p = \langle X | E \rangle \) with \( E \) consisting of the following equations:

\[
X = \text{init} \circ Y \\
Y = (\text{load}:0 \circ \text{postload} \circ Y) \sqsubseteq \text{preload} \succeq (\text{transform} \circ Z) \\
Z = (\text{store}:0 \circ Z) \sqsubseteq \text{prestore} \succeq S.
\]

Let \( S \in \mathcal{F} \) and \((p_i, S_i)\) be the \((i + 1)\)th element in the full path of \((\langle X | E \rangle, S)\) on \( H \) whose first component equals \( \langle X | E \rangle \), \( \langle Y | E \rangle \), \( \langle Z | E \rangle \) or \( S \). It is then easy to prove by induction on \( i \) that

\[
i = 1 \Rightarrow p_i = \langle Y | E \rangle \land v(S_i \uparrow M_{ou}^d) = 0
\]

\[
i \in [2, 2^{aw} + 2] \Rightarrow
p_i = \langle Y | E \rangle \land
\forall j \in [0, i - 2] \cdot v(S_i \uparrow M_{ou}^d(j)) = S(M_{data}[j]) \land
v(S_i \uparrow M_{ou}^d) = i - 1
\]

\[
i = 2^{aw} + 3 \Rightarrow
p_i = \langle Z | E \rangle \land
\forall j \in [0, 2^{aw} - 1] \cdot v(S_i \uparrow M_{ou}^d(j)) = T(S \uparrow M_{data})(M_{data}[j]) \land
v(S_i \uparrow M_{ou}^d) = 0
\]

\[
i \in [2^{aw} + 4, 2^{aw+1} + 4] \Rightarrow
p_i = \langle Z | E \rangle \land
\forall j \in [0, i - (2^{aw} + 4)] \cdot v(S_i \uparrow M_{ou}^d(j)) = T(S \uparrow M_{data})(M_{data}[j]) \land
v(S_i \uparrow M_{ou}^d) = i - (2^{aw} + 3)
\]

\[
i = 2^{aw+1} + 5 \Rightarrow p_i = S \land S_i \uparrow M_{data} = T(S \uparrow M_{data}). \text{ That is, } T \text{ can be achieved by applying } \langle X | E \rangle \text{ to } H. \]

Hence, \( \langle X | E \rangle_H S \rangle \uparrow M_{data} = T(S \uparrow M_{data}). \) That is, \( T \) can be achieved by applying \( \langle X | E \rangle \) to \( H. \)

As a corollary of the proof of Theorem 2, when \( waf = T \), completeness can also be obtained if we take about half the external memory size as the operating unit size.

**Corollary 3.** Let \( aw > 0 \) and \( wl > 0 \), and let \( \text{ems} = 2^{aw-1} \cdot wl \). Then \( \mathcal{F}_PFC(aw, wl, \text{ems} + aw, 5, 8, T) \) is complete.

As a corollary of the proofs of Theorems 1 and 2, we have that completeness can even be obtained if we take the operating unit size to be zero. However, this may require quite a large number of data manipulation instructions and threads with quite a large number of states.

**Corollary 4.** Let \( aw \geq 0 \) and \( wl > 0 \), let \( waf \in \mathbb{B} \) be such that \( waf = F \) if \( aw = 0 \), and let \( \text{dms} = 2^{aw} \cdot wl \). Then \( \mathcal{F}_PFC(aw, wl, 0, 5 \cdot 4^{\text{dms}+aw+1}, 8 \cdot 6^{\text{dms}+aw+1}, waf) \) is complete.
8. On incomplete thread powered function classes

From Corollary 4, we know that it is possible to achieve all transformations on the states of the external memory of a strict load/store Maurer ISA with given address width and word length even if the operating unit size is zero. However, this may require quite a large number of data manipulation instructions and threads with quite a large number of states. This leads us to ask whether the operating unit size of the ISAs, the size of the instructions set of the ISAs and the maximal number of states of the threads can be taken such that it is impossible to achieve all transformations on the states of the external memory.

Theorem 3 will address this question, but first we give a lemma that will be used in its proof.

**Lemma 1.** Let \( aw > 1 \) and \( wl, ous, iss, ssb > 0 \), and let \( ems = 2^{aw-1} \cdot wl \). Then \( TPFC(aw, wl, ous, iss, ssb, T) \) is not complete if \( ous \leq ems/2 \) and \( iss \leq 2^{ems/2} \) and there are no more than \( 2^{ems} \) threads that can be applied to the members of \( \bigcup_{adm \subseteq \mathcal{M}} \mathcal{F}_{adm}(aw, wl, ous, 1, 1, Adm) \).

**Proof.** If the operating unit size is no greater than \( ems/2 \), then no more than
\[
\left(2^{ems/2}\right)^2
\]
transformations on the states of the operating unit can be associated with one data manipulation instruction. Because the load and store instructions are interpreted in a fixed way, it follows that if there are no more than \( 2^{ems/2} \) data manipulation instructions, then no more than
\[
\left(2^{ems/2}\right)^2 2^{ems/2}
\]
transformations on the states of the external memory can be achieved with one thread. Hence, if no more than \( 2^{ems} \) threads can be applied, no more than
\[
\left(2^{ems/2}\right)^2 2^{ems/2} \cdot 2^{ems}
\]
transformations on the states of the external memory can be achieved. Using elementary arithmetic, we easily establish that
\[
\left(2^{ems/2}\right)^2 2^{ems/2} \cdot 2^{ems} < \left(2^{ems}\right)2^{ems}.
\]
It follows that \( TPFC(aw, wl, ous, iss, ssb, T) \) is not complete because the number of transformations that are possible on the states of the external memory is \( \left(2^{ems}\right)2^{ems} \). \( \square \)

In Lemma 1, the bound on the number of threads that can be applied seems to appear out of the blue. However, it is the number of threads that can at most be represented in the internal memory: even with the most efficient representations, we cannot have more than one thread per state of the internal memory.

The following theorem states that \( TPFC(aw, wl, ous, iss, ssb, T) \) is not complete if the operating unit size is no greater than half the external memory size, the instruction set size is no greater than \( 2^{wl} - 4 \) and the maximal number of states of the threads is no
greater than $2^{aw-2}$. Notice that $2^{wl}$ is the number of instructions that can be represented in memory elements with word length $wl$ and that $2^{aw-2}$ is half the number of memory elements in the internal memory.

**Theorem 3.** Let $aw, wl > 1$ and $ous, iss, ssb > 0$, and let $ems = 2^{aw-1} \cdot wl$. Then $TPFC(aw, wl, ous, iss, ssb, T)$ is not complete if $ous \leq ems/2$ and $iss \leq 2^{wl} - 4$ and $ssb \leq 2^{aw-2}$.

**Proof.** The number of threads with at most $ssb$ states does not exceed
\[
\left( (iss + 2) \cdot ssb^2 + 2 \right)^{ssb}
\]
(recall that there is also one load instruction and one store instruction). Because $ssb > 0$,\[
\left( (iss + 2) \cdot ssb^2 + 2 \right)^{ssb} \leq \left( (iss + 4) \cdot ssb^2 \right)^{ssb}.
\]
Using elementary arithmetic, we can easily establish that\[
\left( (iss + 4) \cdot ssb^2 \right)^{ssb} \leq 2^{ems}.
\]
Consequently, there are no more than $2^{ems}$ threads with at most $ssb$ states. From this, and the fact that $ous \leq ems/2$ and $iss < 2^{ems/2}$, it follows from Lemma 1 that $TPFC(aw, wl, ous, iss, ssb, T)$ is not complete. \qed

9. Conclusions

In Bergstra and Middelburg (2007; 2008), we worked towards a formal approach to micro-architecture design based on Maurer machines and basic thread algebra. In those papers, we made hardly any assumptions about the instruction set architectures for which new micro-architectures are designed, but we put forward strict load/store Maurer instruction set architectures as the preferred instruction set architectures. In the current paper, we have established general properties of strict load/store Maurer instruction set architectures.

Some of these properties presumably belong to the non-trivial insights of practitioners involved in the design of instruction set architectures:

— Theorem 1 clarifies the ever increasing operating unit size. In principle, it is possible to undo an increase in the operating unit size originating from matters such as more advanced pipelined instruction processing, more advanced superscalar instruction issue and more accurate floating point calculations. However, the price of that is prohibitive measured by the increase in the size of the instruction set needed to produce the same state changes on the data memory concerned, the number of steps needed for those state changes, and so on.

— To a certain extent, Theorem 2 explains the existence of programming. In principle, it is possible to achieve each possible transformation on the states of the data memory of an instruction set architecture using a very short and simple program. However, that requires instructions dedicated to the different transformations and an operating unit size that is broadly the same as the data memory size.

— Theorem 3 points out the limitations imposed by existing instruction set architectures. Even if we assume the size of the operating unit, the size of the instruction set and
the maximal number of states of the threads applied are much larger than found in practice, we cannot achieve all possible transformations on the states of the data memory concerned.

Theorems 1, 2, and 3 still go through if we remove the conditions imposed on the data manipulation instructions of strict load/store Maurer instruction set architectures. These conditions originated in Bergstra and Middelburg (2008), where they were introduced to allow the exploitation of parallelism to speed up instruction processing to the maximum.

In this paper, we have taken the view that transformations on the states of the external memory are achieved by applying threads. We could have taken the less abstract view that transformations on the states of the external memory are achieved by running stored programs. This would have led to complications that appear to be unnecessary since only those threads that are represented by these programs are relevant to the transformations on the states of the external memory that are achieved. However, in the case of Theorem 3, the bounds that are imposed on strict load/store Maurer instruction set architectures and threads induce an upper bound on the number of threads that can be applied. This leads us to ask whether the number of threads that can be represented in the internal memory of the instruction set architectures concerned is possibly higher than the upper bound on the number of threads that can be applied. This question can be answered in the negative: the upper bound is the largest number of threads that can be represented in the internal memory. The upper bound is in fact much higher than the number of threads that can in practice be represented in the internal memory (cf. the remark following Lemma 1). This means that Theorem 3 demonstrates that load/store instruction set architectures also impose restrictions on the expressiveness of computers if we take the view that transformations on the states of the external memory are achieved by running stored programs.

One option for future work is to try to improve on the results given in the current paper. For example, we know from Theorem 2 that in order to obtain completeness with 5 data manipulation instructions and threads whose number of states is less than or equal to 8, it is sufficient to take the operating unit size to be slightly greater than the data memory size. However, we do not yet know the size of the smallest operating unit required for completeness. Another option for future work is to establish results concerning the use of half the data memory as internal memory. No such results are given in this paper. Lemma 1 assumes that half the data memory is used as internal memory because it provides a good case for the number taken as the maximal number of threads that can be applied. However, the proofs of Lemma 1 and Theorem 3 both go through without internal memory.

The speed with which transformations on the states of the external memory are achieved depends largely on the way in which the strict load/store Maurer instruction set architecture in question is implemented, which makes establishing general results about it a bit tricky. However, the speed with which transformations on the states of the external memory are achieved also depends on the volume of data transfer needed between the external memory and the operating unit. Establishing a connection between this volume and the parameters of thread powered function classes is yet another option for future work.
References


