A Model for Policy Interest Rates

This paper introduces a model that addresses the key worldwide features of modern monetary policy making: the discreteness of policy interest rates both in magnitude and in timing, the preponderance of status quo decisions, policy inertia and regime switching. We capture them by developing a new dynamic discrete-choice model with switching among three latent policy regimes (dovish, neutral and hawkish), estimated via the Gibbs sampler with data augmentation. The simulations and an application to federal funds rate target demonstrate that ignoring these features leads to biased estimates, worse in- and out-of-sample fit, and qualitatively different inference. Using all Federal Open Market Committee’s (FOMC) decisions made both at scheduled and unscheduled meetings as sample observations, we model the Federal Reserve’s response to real-time data available right before each meeting, and control for the endogeneity of monetary policy shocks. The new model, fitted for Greenspan’s tenure, correctly predicts the directions of about 90% of the next decisions on the target rate (hike, no change, or cut) out of sample during Bernanke’s term including the status quo decisions after reaching the zero lower bound, while the conventional linear model fails to adequately tackle the zero bound and wrongly predicts further cuts.


Introduction
We develop a dynamic model of discrete ordered choice with lagged latent dependent variables among regressors and with time-varying transition probabilities of switching among three latent regimes interpreted in the interest-rate-setting context as easing, neutral and tight monetary policy stances.The model assumes three implicit decisions: the policy regime decision and two decisions (conditional on the easing or tight regime) on the amount of rate change.The new methodology synthesizes and extends the existing models of ordered choice and avoids important common distortions of the datagenerating process (DGP) in the empirical identification of monetary policy rules.
The policy interest rates are currently a critical instrument and a principal measure of monetary policy in many countries.They are an anchor for other short-term market interest rates, and are closely watched and anticipated by many economic agents.If we can quantitatively formalize how monetary authorities set the policy rates, and "if practitioners in financial markets gain a better understanding of how policy is likely to respond to incoming information, asset prices and bond yields will tend to respond to economic data in ways that further the central bank's policy objectives" ( Bernanke, 2007 ).To forecast the state of the economy, or to evaluate the effects of economic shocks, monetary and fiscal policy actions, we need to understand central bank's systematic response to economic data.To improve the monetary policy, we also need a clear empirical description of what is going to be improved.It is difficult to describe monetary policy without using an econometric model.Since the policy rates are set administratively by the monetary authorities, and are neither the outcomes of market interaction of supply and demand nor subject to technical fluctuations or extraneous sources of noise, it makes them of special interest for econometric modeling ( Hamilton and Jorda, 2002 ).
We implement an econometric framework that addresses the main worldwide features of modern monetary policy making: the discreteness of policy interest rates, both in magnitude and in timing; the preponderance and heterogeneity of status quo (no change to the rate) decisions; interest rate smoothing and policy inertia; policy regime switching and different nature of positive and negative interest rate movements.None of the existing models is able to adequately capture all these stylized facts.We extend and synthesize the models of Brooks et al. (2012) , Dale and Sirchenko (2021) , Hamilton and Jorda (2002) , Harris and Zhao (2007) , Müller and Czado (2005) , and Sirchenko (2019Sirchenko ( , 2020) ) .More specifically, we address the following issues.
The discreteness of policy rates.Monetary policy modeling has been historically implemented in a continuous framework, either by estimating a 'simple' linear policy rule such as the Taylor rule ( Taylor, 1993 ) or by estimating a monetary policy reaction function as a linear equation of a vector autoregressive model. 1 Regression methods for a continuous dependent variable have always been and still are absolutely appropriate to model the market interest rates, both the long-and shortterm ones, but not appropriate any longer for policy interest rates set by many central banks.Nowadays, many central banks (and all the major ones) change policy rates by discrete amounts, typically by multiples of 25 basis points (bp), at special meetings of monetary policy committees, which are held 6-12 times a year.The notable examples are the Federal Open Market Committee of the U.S. Federal Reserve System (Fed), the Governing Council of the European Central Bank, the Monetary Policy Committee of the Bank of England, and the Policy Board of the Bank of Japan, among others.
Despite the extensive literature on monetary policy modeling, most studies use monthly or longer-period data averages and employ regression techniques for a continuous dependent variable, and thus do not address the discreteness of interest rates.Ignoring the limited (discrete) nature of the dependent variable may lead to serious biases.While using regressions for continuous outcomes is appropriate with aggregated data, it raises the problems of distortions caused by the use of an incorrect information set and of simultaneity caused by disregarding possible interactions between the policy rates and the other variables that can happen during a period of aggregation.In recent years, discrete-choice approaches such as the ordered probit models ( Gerlach, 2007;Hayo and Neuenkirch, 2010; Vanderhart, 20 0 0 ) have been employed.There have been also attempts to use the discrete-choice models with FOMC-meeting data frequency ( Hu and Phillips, 2004;Kauppi, 2012;Kim et al., 20 09;Piazzesi, 20 05 ); they, however, restricted their attention to the decisions made at the scheduled FOMC meetings only, thereby not reflecting the entire policy-making process.In fact, from 1983 to 1993 only about 15% of the changes to the U.S. federal funds rate target ( target henceforth) have been made at the scheduled FOMC meetings.
We address the discrete nature of policy rates by an ordered choice approach.We carefully mimic the policy-making process by using all interest rate decisions (made both at the scheduled and unscheduled FOMC meetings) as sample observations.We match the FOMC decisions with the latest real-time vintages of macroeconomic and non-aggregated daily financial data truly available immediately before each meeting and not revised later on.Information about the exact timing of FOMC meetings together with the use of daily financial data (such as spreads between the long-and short-term market interest rates) allows us to substantially improve the identification of the Fed policy rule, which would not be possible to do using monthly aggregates of financial data.In addition, the market interest rates encapsulate the huge volume of data available to market participants, and can help avoid the omitted variables and time-varying parameter problems.The empirical results show that discreteness does matter in the estimation of monetary policy rules.The out-of-sample performance of the proposed model is remarkably superior to that of the linear model estimated using the same set of regressors: for example, the mean absolute forecast error is more than twice as small as in the linear model, and the percentage of correct predictions is higher by 19 percentage points.
The preponderance of status quo decisions.The central banks often prefer to wait and see -many of them leave the policy rates unchanged at more than a half of policy meetings.The central banks make status quo (no change) decisions in different macroeconomic circumstances: between rate hikes, between cuts and also between changes in different directions.See Fig. 1 for the FOMC decisions on the target changes during the 7/1987-1/2006 period.
Many of the status quo decisions observed between cuts are made in rather different economic circumstances than those observed between hikes.Many of the status quo decisions observed between changes in different directions (that is before rate reversals) are generated by economic conditions that differ a lot from those observed in the periods of unidirectional changes to the target.We address the heterogeneity of status quo decisions by a three-part modelling approach that allows status quo outcomes to be generated by three different latent regimes interpreted as easing, neutral and tight policy stances.We extend the zero-inflated model of Harris and Zhao (2007) and the middle-inflated model of Brooks et al. (2012) by making them suitable for ordinal outcomes that take on negative, zero and positive values, and by allowing them to be past-dependent and autocorrelated.
We detect the oscillating switches among three latent regimes during the relatively stable period such as the Greenspan's tenure, identify three types of the status quo decisions, and find out that almost a half of them are generated in either easing or tight policy regime.In an out-of-sample forecasting exercise, during the zero-lower bound (ZLB) period, the proposed model correctly predicts 53 out of 55 status quo decisions, successfully treating the ZLB by an extended switch to an easing policy regime.By contrast, the Taylor rule and linear model fail to manage the ZLB and correctly forecast only 34 no-change decisions, wrongly forecasting 21 further cuts.
Regime switching.The central banks may have asymmetric responses to incoming data, so the rate increases and decreases may be generated by different decision-making processes as well as the numerous status quo decisions may also be driven by distinct mechanisms.In other words, the policy actions may be generated in different regimes.
A change to the monetary policy regime is often modelled as a permanent, non-stochastic switch: empirical studies divide samples into regime-specific subperiods or employ a gradual regime switching approach with smoothed transitions between regimes.However, treating regime changes as breaks rather than as a slowly fluctuating stochastic process is largely a matter of convenience and even logically inconsistent, as Cooley et al. (1984) argue.The literature on modeling short-lived oscillating switches in monetary policy demonstrate that dividing samples into regime-specific subperiods can distort qualitative and quantitative inferences, and that monetary policy changes are better modeled as stochastic and repeated fluctuations among two or more regimes -see, among others, Assenmacher-Wesche (2006) in the univariate context, Bikbov and Chernov (2013) and Sims and Zha (2006) in the structural vector autoregression context, and Andolfatto and Gomme (2003) , Bianchi (2013) , Davig and Leeper (2007) and Baele et al. (2015) in the dynamic stochastic general equilibrium (DSGE) framework.
Despite the increasing popularity of a regime-switching approach in macroeconomics, the existing regime-switching applications in monetary economics focus on models for continuous outcomes.Building on the seminal work of Hamilton (1989) , recurring policy regimes are typically modelled as the unobserved states generated by a stochastic exogenous Markov-chain process, which is independent of the endogenous economic variables and has constant probabilities of transition from one regime to another.This assumption is realistic in a wide range of contexts but highly unlikely in monetary policy applications.It is natural to assume that central banks react systematically to changes in the economic conditions; thus, regime switches are endogenous to the state of the economy and have the time-varying transition probabilities.
We model this possibility by a regime-switching approach with time-varying transition probabilities of switching among three latent regimes.It allows the probabilities of positive, negative and no changes to be treated differently and to be asymmetrically affected by the economic data.The structure of the new model shares some similar features with the Markov switching models with constant probabilities of a transition from one regime to another introduced by Hamilton (1989) in the framework of linear regressions for a continuous dependent variable, later extended by Diebold et al. (1994) , Filardo (1994) and Bazzi et al. (2017) to the case when the transition probabilities may change over time.We generalize these endogenous switching models for continuous outcomes to discrete outcomes.
Interest rate smoothing.The policy interest rates are persistent.This stylized fact is usually referred to as interest rate smoothing.If central banks decide to change the rate they will most likely move it again in the same direction several times at the next meetings, avoiding frequent rate reversals.This feature is referred to as monetary policy inertia.We address these two stylized facts by the lagged latent dependent variable introduced to the covariates adopting the autoregressive ordered probit (AOP) model of Müller and Czado (2005) in each latent equation.It allows for a partial adjustment of interest rates and together with the lagged covariates can capture the autocorrelation of latent monetary shocks.
The dynamic single-equation ordered probit models have been developed and applied to policy interest rates, among others, by Davutyan and Parke (1995) , Dueker (1999) , Eichengreen et al. (1985) , Hu and Phillips (2004) , Kim et al. (2009) , andMonokroussos (2011) .Hamilton and Jorda (2002) developed the dynamic two-equation ordered probit model, in which the first-stage binary decision (change or no change) is determined by the autoregressive conditional hazard model, and the magnitude of rate changes is determined by the ordered probit model conditional on a change at the first stage.Grammig and Kehrle (2008) modified this model by implementing the autoregressive conditional multinomial model of Russel and Engle (2005) at the second stage.Each stage in these models is estimated separately, and status quo observa-tions are not included in the estimation of the second stage.We further extend these two-part models by implementing at the first stage a trichotomous decision on the latent policy regime (easing, neutral or tight) that seems to be more realistic than a binary decision (change or no change): the policymakers, who are determined to change the rate, have already chosen the direction of change.Furthermore, we estimate both stages simultaneously and do not exclude no-change outcomes from the second stage.It allows us to discriminate among different types of the status quo decisions.
In the next section we describe the proposed Switching Autoregressive Ordered Probit (SwAOP) model.We discuss the estimation and inference in Section 3 .The estimation of such a dynamic three-equation ordered probit model is a daunting computational challenge -it requires the evaluation of multiple integrals with no closed form solution.We opt for a Bayesian approach using Markov chain Monte Carlo (MCMC) methods (a Gibbs sampler), which makes estimation computationally feasible and requires no numerical optimization and no high-dimensional integration.In Section 4 we study the finite sample performance of the SwAOP estimator under both its own DGP and the AOP one.In Section 5 we discuss the empirical application to the FOMC decisions on the target.Section 6 concludes.Supporting materials are provided in the Appendix A and Online Appendix B .

Model
The SwAOP model is designed for changes in ordinal variables, which are subject to a regime switching or a sample separation and have abundant and heterogeneous observations in a middle neutral category (typically, a zero or no-change).We illustrate the model in the context of monetary authority decisions on the policy interest rate.The observed dependent variable is a change to the policy rate y t = y t − y t−1 , where y t is the level of the rate set at the meeting t = 1 , 2 , ., T ; y t takes on a finite number of discrete ordered values j (typically, in steps of 25 basis points (bp)) coded as J − , . . ., −1 , 0 , 1 , . . ., J + .We mark the latent (unobserved) variables by an asterisk ( * ).Fig. 2 shows the decision tree of the model.
The model assumes three latent regimes, which are observed only partially, namely, only if the observed outcome is a change; if no change is observed, the regime is unknown.The regimes are determined by the continuous latent variable r 0 * t representing the degree of central bank policy stance according to a latent regime decision where β 0 is a vector of k 0 unknown slope parameters, x 0 t is the t th column of the observed k 0 × T data matrix X 0 , φ 0 is an unknown autoregressive (AR) parameter, and ε 0 t is an error term that is independently and identically distributed (IID) according to the standard normal cumulative distribution function (CDF) .
The observed discrete change y t is conditional on a latent discrete variable s * t (coded as −1 , 0, or 1 if the central bank policy regime is easing, neutral or tight, respectively).Both y t and s * t are determined in an ordered probit fashion as in the easing and tight regimes, respectively.They are driven by the latent amount decisions where β − and β + are the vectors of k − and k + unknown slope parameters, x − t and x + t are the t th columns of the observed k − × T and k + × T data matrices X − and X + , φ − and φ + are the unknown AR parameters, and ε − t and ε + t are the IID error terms with the standard normal CDF .The errors ε 0 t , ε − t and ε + t are mutually independent.Henceforth, the superscript indexes '0', ' -' and ' + ' refer to the regime decision (1) and two amount decisions (2) conditional on the easing and tight regimes, respectively.
The SwAOP model can be summarized as The probabilities to observe the outcome j are then given by Pr ( where I j≤0 , I j=0 and I j≥0 are the indicator functions such that: I j≤0 = 1 if j ≤ 0 and I j≤0 = 0 otherwise; I j=0 = 1 if j = 0 and I j=0 = 0 otherwise; and I j≥0 = 1 if j ≥ 0 and I j≥0 = 0 otherwise.As is common in latent-class models of discrete choice, the parameters are identified only jointly.The slope and autoregressive parameters are identified jointly with the variance of the error term; in particular, only their ratios ( ) are identified.The cutpoint parameters are identified jointly with the intercept parameter and the variance of the error term; in particular, only the following quantities are identified: To identify the parameters we fix the variances of ε 0 t , ε − t and ε + t to one, and the intercept components of β 0 , β − and β + to zero.These identifying assumptions are arbitrary and standard in the discrete-choice modeling.The probabilities in (3) are absolutely identifiable and invariant to the choice of parameter-identifying assumptions.
The marginal effect (ME) of the covariates on the probabilities is the partial derivative of the probabilities with respect to one of the covariates x m , ME m, j,t := ∂ x t,m Pr ( y t = j) keeping all other parameters and covariates constant: where f is the normal probability density function (PDF), and β i,all is a vector of the slope coefficients in the latent equation i on all covariates in the model (in X 0 , X − and X + ): β i,all m = 0 if the covariate m does not appear in the equation i .We compute the MEs and probabilities at the empirical medians of the covariates and the theoretical medians of the lagged dependent latent variables.

Joint Distribution
For parameter estimation through a Gibbs sampler we need to derive the model joint distribution.Let R * denote the , T and the three columns of which are r 0 * , r − * and r + * ; y denote the vector of dependent variables ( y 1 , . . ., y T ) ; θ denote the vector of model parameters ( β, φ, c) , where . The joint density for R * , y and θ can be factorized as where π is the prior distribution.By the Markov property it reduces to . The first one is a sum of indicators of whether r * t lie between the corresponding cutpoints: The probability density of r * t is given by Finally, we have to specify a prior π .We choose a uniform prior on the real numbers R , i.e. an improper prior, for almost all parameters.The only restriction we have to consider is φ i ∈ (−1 , 1) to ensure stationarity of the latent AR Eqs. ( 1) and (2) .Besides, we choose the initial latent variables to be distributed as r i * 0 ∼ N (0 , (τ )2 = 10 2 ) for numerical reasons that will be apparent later.Finally, each pair of cutpoints is uniformly distributed only on the subset of R 2 , where the ordering

Challenges to embedding the model into a New Keynesian (NK) framework
Embedding the proposed SwAOP model into a NK DSGE framework would require substantial modifications and would cause serious computational issues.
The SwAOP model is designed for changes to the interest rates -it represents a policy rule in differences.The monetary policy equations in the NK DSGE models are formulated in terms of the levels of interest rates -typically they are represented by the Taylor-type rule.While the SwAOP model can be specified with the Taylor rule's variables (and its underlying equations for the latent continuous variables are linear, and can be interpreted as the optimal policy rules in differences, derived from minimizing the central bank's quadratic loss function), the NK model would need to be changed to work in changes, not the levels of the policy interest rate. 2  The solution methods for the NK model would need to be adapted to allow for only discrete changes in the policy rates rather than continuous changes.The NK modeler would need to go far beyond the conventional log-linearization around a steady state and to use complicated statistical inference for the NK agents who do not observe the optimal interest rate.The technical problems with the computation of the equilibria, caused by the discreteness and nonlinearity of the policy rule, would become exacerbated by the unobserved switching among three latent regimes.

Estimation and inference
While the classical (not simulation-based) statistical approaches, such as maximum likelihood or method of moments, will not work for this model we can estimate the parameters and choice probabilities using a Gibbs sampler based on the MCMC algorithm for the AOP model with a grouped move step that drastically accelerates the convergence of the Gibbs sampler ( Müller and Czado, 2005 ).The structure of the Gibbs sampler is as follows: • Update for the latent variables: Sample the latent dependent variables R * from the corresponding truncated normal distributions.• Update for the cutpoint parameters: Sample the cutpoint parameters c from the uniform distributions.
• Update for the slope parameters: Sample the slope parameters β and the AR parameters φ from the multivariate normal distributions.
• Grouped move step: Sample a scale parameter from a Gamma distribution to rescale the parameters c, β and φ and the latent variables R * .
We use no assumption on the signs of the slope or AR parameters.Without loss of generality and to match the data on the changes to the target in our empirical application, we restrict our inference to the case of five outcome categories j ∈ {−2 , −1 , 0 , 1 , 2 } of the dependent variable y t with two unknown cutpoints in each latent decision: 2 ) .The initial values for the Gibbs sampler for all three latent decisions i ∈ { 0 , −, + } are We will now derive all full conditional densities from the joint distribution exploiting the fact that each full conditional density is proportional to the joint distribution.

Update for the latent variables
For convenience we use the abbreviation rem for all other remaining parameters and latent variables.For instance, the full conditional for the latent variable r 0 * t will be written as The full conditional of the latent variable where (σ i ) ) if the observation is influenced by r i * t (this is not always the case: e.g., if y t = 1 or 2, r − * t can be arbitrary).The truncation intervals for the regime decision r 0 * t are trunc 0 The truncations for the amount decisions in the dovish and hawkish regimes are else .
The updates for r i * 0 and r i * T slightly differ due to a missing predecessor or a successor.In addition, the density of r i * 0 is never truncated since there are no observations for it, and is given by , 2 and (σ i 0 ) 2 := (φ i ) 2 + (τ ) −2 −1 , and the prior hyperparameter τ = 10 is chosen to avoid a big variance (σ i 0 ) 2 for parameters φ i close to zero.In other words, r i * 0 must be sampled from Finally r i * T has the following full conditional density: The sampling of the 3(T + 1) latent variables R * in this section constitutes the bottleneck of the whole Gibbs sampler.To speed up the sampler, every second step can be sampled simultaneously since r * t depends only on r * t−1 and r * t+1 .

Update for the slope parameters
We denote the parameters on all regressors in each latent decision i ∈ { 0 , −, + } by B i := (β i , φ i ) and all regressors in each latent equation by a T × (k i + 1) matrix Z i := (X i , r i * −T ) , the t th row of which is Note that z i t z i t is not a scalar but a dyadic product.So all B i must be sampled independently from multivariate normal distributions

Update for the cutpoint parameters
The full conditional density , where the lower l i m and upper u i m bounds are determined by: c 0 1 : These restrictions do not allow the cutpoint parameters to move substantially within each Gibbs step and slow down the convergence of the sampler.

Grouped move step
The problem with the slowly moving cutpoints was solved for the AOP model by using grouped move steps ( Müller and Czado, 2005 ).To develop a suitable grouped move step for the SwAOP model, we employ a theorem from Liu and Sabatti (20 0 0) that states: If is a locally compact group of transformations defined on the sample space S, L its left Haar measure with density l, ω ∈ S follows a distribution with density f, and γ ∈ is drawn from the density f ω (γ , where | J| is the Jacobian determinant of this transformation.Then ω * := γ (ω) has the density f too. 3o apply the theorem to our case (where the invariant density in the theorem f is the posterior) we write all latent variables and parameters into one vector ω = (r 0 * , r − * , r + * , θ) and exploit the fact that the joint density is a product of normal densities f (r where I is the product of the indicator functions in (4) , and y match each other and I = 0 otherwise, and the error terms are If we rescale the latent variables and the cutpoint parameters with the same value g > 0 , the value of the indicator function I is unaffected (e.g., if r 0 * t < c 0 1 ⇔ gr 0 * t < gc 0 1 ).To get a simple distribution for the sampling of g we must also rescale the slope parameters to gβ.Therefore, we use the transformation γ g (ω) = (gr 0 * , gr − * , gr + * , gβ, gc , φ) , in which we rescale everything except for the AR parameters φ.The group is, hence, the group of positive real numbers with the multiplication { R > 0 , •} .
The left Haar measure of this group has the density l(γ g ) = 1 /g and Jacobian determinant where p = 3 T + k 0 + k − + k + + C is the number of scaled parameters ( C = 6 is the number of cutpoint parameters).Then the density for the scaling parameter g > 0 is We abbreviate q := 1 2 i t ε i t 2 and compute the density of g 2 as which is the density of a Gamma distribution (a, q ) for g 2 with a shape parameter a = p 2 and a rate parameter q .The accelerating grouped move step consists of drawing a number g 2 from this distribution and multiplying the corresponding parameters with g.In fact, because the joint distribution factorizes into three parts from the three decisions i ∈ { 0 , −, + } , it is even possible to perform three independent grouped moves for the parameters of each decision separately.This will lead to an even higher variation and a faster convergence of the Gibbs sampler.
The efficiency of the grouped move can be explained as follows.Since the expectation and variance of the Gamma distribution are E (a,q ) [ g 2 ] = a q and Var (a,q ) [ g 2 ] = a q 2 , both Gamma distribution parameters are huge numbers in our case.The summands of q are the squares of the error terms i t .Assuming both our model and the current Gibbs parameters are true, these increments are standard normally distributed and their square is χ 2 distributed with expectation value 1.Thus, close to the true parameters, we have a ≈ 3 T / 2 ≈ E [ q ] .The expectation value for g is then E (a,q ) [ g 2 ] ≈ 1 and the variance Var (a,q ) [ g 2 ] = a q 2 ≈ 2 3 T is small.This means that almost no rescaling happens (all sampled values for g will be close to 1) if the algorithm has almost converged.Far from convergence, e.g.ω ≈ (λr 0 * , λr − * , λr + * , λβ, λc, φ) = γ λ (ω likely ) , we have Thus, the variance is still small and g close to E [ g ] ≈ 1 /λ will be sampled, so the next (scaled) parameters will be closer to the true parameters γ (ω) = gω ≈ ω likely .

Finite sample performance
We performed the Monte Carlo experiments to assess the finite sample bias and uncertainty of the estimates of the parameters, choice probabilities and MEs of covariates on choice probabilities (and their asymptotic standard errors) in the proposed SwAOP estimator, and to compare the performance of the AOP and SwAOP estimators under each of the true DGPs.
The design and results of Monte Carlo experiments are reported in Appendix A .The results support the asymptotic consistency of the employed Bayesian estimator, which also performs well in small samples.The experiments suggest that the single-equation AOP model delivers asymptotically biased estimates of the probabilities if the DGP is subject to the heterogeneity of status quo outcomes.The SwAOP estimator behaves much better under the AOP true DGP than the AOP estimator under the data generated using the SwAOP model (although the AOP model is not nested in the SwAOP model).When the SwAOP model is fitted to data from the AOP model, as sample size grows, the accuracy of the SwAOP estimator improves.However, when the AOP model is fitted to data from the SwAOP model, as sample size grows, the accuracy of the AOP estimator deteriorates.

Empirical application
We apply the SwAOP model to explain the FOMC decisions on the target during the 7/1987 -1/2006 period under Greenspan's chairmanship, and to predict the target decisions out-of-sample for the 3/2006 -6/2019 period.We contrast the in-and out-of-sample performance of the SwAOP model with those of the AOP model and linear model estimated by ordinary least squares (OLS) using the same set of explanatory variables, and also with that of the Taylor-type rule.

Data and model specifications
The target, a principal measure of U.S. monetary policy during the entire sample ( de facto since the fall of 1982; see Thornton, 2005 ), is set by the FOMC either at the eight prescheduled meetings per year or sometimes at the extra (unscheduled) meetings.The FOMC decision-making cycle -one and a half month between the meetings -is long enough to update the macroeconomic data that are targeted by the Fed (such as inflation, output, unemployment, etc.) and arrive at monthly frequency, and to accumulate enough data that arrive weekly or daily (the Fed, which faces uncertainty and incurs the costs in the case of rate reversal, prefers to "wait and see" and to react to more accumulated economic data to minimize the risk of the reversals).
From 1993 through 2019, there were seven target changes in the intermeeting periods, all caused by extraordinary circumstances or crises.Among the 114 FOMC decisions from 11/1992 to 1/2006 there were only seven intermeeting decisions (among which there were only five target changes: in 1994, 1998, and 20 01).From 2/20 06 through 12/2019, there were only two intermeeting moves (both in 2008).
Prior to 1993, most target changes implemented two weeks and even later after a scheduled meeting were unambiguously settled on that last meeting with the delay of implementation.As a matter of fact, prior to 1993 more than two thirds of target changes are executed between the scheduled FOMC meetings.Out of the 76 FOMC decisions from 7/1987 to 10/1992, 33 decisions (among which there were 29 changes) were made at the unscheduled meetings and teleconferences, and 43 decisions (among which there were only 12 changes) were made at the scheduled meetings.
The Fed changed its operating procedure after 1992, and not only stopped making hidden policy decisions at the scheduled meetings (to be implemented one, two or three weeks later) but also drastically reduced (and eventually abandoned) its practice to run unscheduled meetings without extraordinary circumstances.After 1992, if the change was desirable in the intermeeting period, the FOMC typically made no changes in between the meetings and waited until the next scheduled meeting (unless a rear extraordinary event called for an immediate reaction).The observed policy inactions in the intermeeting periods may not reflect the actual Fed's response to the economic developments, but rather the Fed's desire to obtain more accumulated data and reluctance to change the target between the scheduled meetings.
To model the target, instead of using the quarterly or monthly averages of the federal funds rate and economic variables as is common practice in the literature, we employ the data at the frequency of FOMC decisions, both scheduled and unscheduled ones.We use the historical dates of all FOMC decisions (made either at scheduled or unscheduled meetings, or occasionally at the discretion of the chairman during intermeeting periods) as sample observations with the following advantages.
First, we avoid a problem of reverse causation typical for time-aggregated data.Second, we avoid noise from periods with no movements in the target, when the observed policy inactions may not reflect the actual Fed response to economic developments, but rather the Fed reluctance to change the target between scheduled meetings, especially in the weeks prior to them.Third, information about the exact timing of FOMC meetings together with the use of daily financial data allows us to substantially improve the identification of the Fed policy rule.We match the FOMC decisions with the latest real-time vintages of monthly macroeconomic and non-aggregated daily financial data truly available before each meeting and not revised later on.
Omitting the unscheduled meetings (hence, ignoring the important information on the FOMC behavior), or interpreting the policy inactions between the scheduled meetings as the implicit decisions to not adjust the rate (and perhaps even augmenting the sample by including these hypothetical intermeeting no-change decisions), or, moreover, using the monthly, weekly or daily frequency to model the FOMC decisions would distort the statistical inference about the Fed's policy responses.
The dates of FOMC decisions and the original (unconsolidated) target changes are reported in Table in the Online Appendix B , and are based on information available at ALFRED4 .The observed target decisions have the following distribution: -50 bp -13 times, -31.25 bp -one, -25 bp -30, -18.75 bp -one, -12.5 bp -one, 0 bp -99, 6.25 bp -three, 12.5 bp -two, 18.75 bp -three, 25 bp -27, 31.25 bp -two, 43.75 bp -two, 50 bp -five, and 75 bp -one time.Since October 1989 the Fed made changes to the target only in increments of 25 bp.We classify target changes into five categories of the dependent variable y t : a 'large cut' -a decrease 31.25 bp or more, a 'small cut' -a decrease 25 bp at most and 12.5 bp at least, 'no change' -either no change or a change no more than 6.25 bp, a 'small hike' -an increase 25 bp at most and 12.5 bp at least, and a 'large hike' -an increase 31.25 bp or more.The sample consists of 190 observations with 14, 32, 102, 32 and 10 observations in the above categories, respectively. 5he relationship between economic developments and the Fed response to them is often modeled using a simple policy rule such as the Taylor rule ( Taylor, 1993 ), which establishes a simple linear relation between the policy interest rate, inflation and the output gap: where y e f f t -the effective f ederal funds rate, r * -the equilibrium real interest rate, π t -the inflation rate over the previous four quarters, π * -the long-run inflation target, and gap t -the output gap (the percent deviation of the real gross domestic product (GDP) from the potential one).Unfortunately, the equilibrium rate and potential output are not observed.To provide a better benchmark for the target decisions we estimate a more general (with interest-rate smoothing) version of the Taylor rule using OLS and publicly available real-time data: where π t -the real-time forecast of the core inflation rate for the current quarter, seasonally adjusted, annualized percentage points (before 1/20 0 0 -the Greenbook projection for quarter-over-quarter core consumer price index inflation rate; from 1/20 0 0 through 12/2013 -the Greenbook projection for quarter-over-quarter core personal consumption expenditures (PCE) inflation rate, chain weight; since 1/2014 -the median forecast of the annualized quarter-over-quarter percent changes of the core PCE inflation rate for the current quarter from the Survey of Professional Forecasters (SPF) provided by Philadelphia Fed); gap t -the real-time forecast of the output gap, the difference between the actual and potential output expressed as a percent of potential output, for the current quarter (before 1/2014 -the Greenbook projections; since 1/2014the projections derived from the Congressional Budget Office's estimates of potential real GDP and SPF's median forecasts of the real GDP, seasonally adjusted); and α 0 , α 1 , α 2 and α 3 -the unknown OLS parameters to be estimated.The real-time data on y t , y t−1 , π t and gap t are retrieved from Philadelphia Fed's RTDSM6 and ALFRED.The nowcasts of π t and gap t for the current quarter provided similar or better explanatory power than the one-, two-, three-and four-quarter ahead forecasts.
Recent studies, using discrete-choice models for the target, document that financial indicators, such as the spread between the long-and short-term interest rates, explain the Fed policy decisions better than the Taylor-rule variables ( Hamilton and Jorda, 2002;Kauppi, 2012;Piazzesi, 2005 ).The FOMC always starts its meetings with a review of the 'financial outlook'.The spread can be interpreted as a market-based proxy of future inflation and real activity ( Estrella and Hardouvelis, 1991;Frankel and Lown, 1994;Mishkin, 1990 ).In addition to the spread, we employ two forward-looking indicators of the economic situation computed by Fed staff in the Greenbook for each FOMC meeting: the forecast of housing starts (one of the leading indicators of economic activity, frequently mentioned in the FOMC documents) and the forecast of GDP growth (the economic growth is one of the four goals of the Fed's monetary policy).Finally, we construct a Fed-based proxy of future inflation and real activity, derived from the FOMC documents.Together with its decision on the target, the FOMC issued (before 20 0 0) a statement about its expectations of the future stance of monetary policy and (since 20 0 0) a statement on the balance of risks for inflation and economic growth.These post-meeting statements have been widely seen as the indicators of the next FOMC actions, and are shown to contain predictive content for forecasting target even after controlling for macroeconomic variables ( Hayo and Neuenkirch, 2010;Lapp and Pearce, 20 0 0; Pakko, 20 05 ).The FOMC statements and interest rate spreads summarize the vast amount of forward-looking information available both to the Fed in setting the policy rate and to Fed watchers in anticipating the Fed decisions, and can help surmounting the omitted variables and time-varying parameter problems ( Cochrane and Piazzesi, 2002 ).
Thus, the explanatory variables in the linear, AOP and SwAOP models include: (i) spread t -the difference between the one-year treasury constant maturity rate and the effective federal funds rate, threebusiness-day moving average (the real-time data source: ALFRED); (ii) houstart t -the nowcast of the total number of new privately owned housing units started (in thousands) for current quarter7 (before 1/2014 -the Greenbook projections; since 1/2014 -the latest available monthly data on housing starts8 ; the real-time data source: ALFRED and RTDSM); (iii) gd p t -the nowcast of the quarter-over-quarter growth in the nominal GDP for the current quarter9 , percent change at annual rate, seasonally adjusted (before 1/1992 -the Greenbook projections for the gross national product; from 1/1992 through 12/2013 -the Greenbook projections for the GDP; since 1/2014 -the New York Fed staff nowcast of the GDP; the real-time data source: RTDSM and New York Fed10 ).(iv) tight t−1 and (v) easing t−1 -the two binary indicators constructed from the 'policy bias' or 'balance-of-risks' statements at the previous FOMC decision announcement: tight t−1 is equal to one if the statement at the previous FOMC meeting was tightening, and zero otherwise; and easing t−1 is equal to one if the statement was easing, and zero otherwise (the real-time data source: the FOMC statements and minutes11 ).The examples of the easing, neutral and tightening FOMC statements are given below.
An example of the easing 'balance-of-risks' statement from the FOMC press release on June 25, 2003: The Committee perceives that the upside and downside risks to the attainment of sustainable growth for the next few quarters are roughly equal.In contrast, the probability, though minor, of an unwelcome substantial fall in inflation exceeds that of a pickup in inflation from its already low level.On balance, the Committee believes that the latter concern is likely to predominate for the foreseeable future.
An example of the tightening 'balance-of-risks' statement from the FOMC press release on March 21, 20 0 0: Against the background of its long-run goals of price stability and sustainable economic growth and of the information currently available, the Committee believes the risks are weighted mainly toward conditions that may generate heightened inflation pressures in the foreseeable future.
And an example of the neutral 'balance-of-risks' statement released on May 3, 2005: The Committee perceives that, with appropriate monetary policy action, the upside and downside risks to the attainment of both sustainable growth and price stability should be kept roughly equal.
The values of dependent and explanatory variables are reported in Table B1 in the Online Appendix B .Sample descriptive statistics are shown in Table B2 in the Online Appendix B .The covariates' first-order autocorrelation coefficients are between 0.63 and 0.98.According to the augmented Dickey-Fuller unit root test, the null hypothesis of a unit root is rejected for all employed variables but one at the 0.0 0 01 significance level (see Table B3 in the Online Appendix B ); houstart t seems to be stationary only at the 0.06 level.However, if we test the actual housing starts series, which is highly correlated with the Greenbook projections (the correlation coefficient is 0.98), but available for a far longer period at monthly frequency, we reject the null of a unit root at the 0.01 level.

Estimation results
Table 1 shows the estimation results for the Taylor rule, linear, AOP and SwAOP models.The slope parameters in the SwAOP model (see the right three columns in Table 1 ) have the expected signs and are statistically different from zero at the 0.05 significance level (according to the empirical confidence intervals from the Gibbs sampling and assuming the asymptotic normality of the posterior distribution).The latent continuous dependent variable r 0 * t , representing the degree of the Fed policy stance and determining the regime decision, is driven by tight t−1 (if the 'policy bias' was tightening at the previous meeting, the probability of a tight regime at the next meeting is larger), easing t−1 (if the 'policy bias' was easing, the probability of a tight regime is smaller), spread t , houstart t , gd p t (the larger the covariate values, the larger the probability of a tight regime, and the smaller the probability of an easing regime), and its lagged value r 0 * t−1 (the higher the degree of the policy stance at the previous meeting, the larger the probability of a tight regime at the next meting).The continuous latent dependent variables r − * t and r + * t , representing the desired amount of the target change in the easing and tight regimes, respectively, are driven by spread t (only in the easing regime), gd p t (the larger the covariate value, the larger the probability of a higher target level), and their lagged values r − * t−1 and r + * t−1 , respectively (the larger their values, the smaller the probability of a higher target level).The coefficient on spread t is not significant in the tight regime.The coefficients on houstart t , tight t−1 and easing t−1 are not significant in both amount equations.
The responses to the easing and tightening policy statements tight t−1 and easing t−1 are asymmetrical: the Fed seems to be much more eager to cut the target rate under the easing policy directive rather than to hike it under the tightening directive.The policy reactions to the economic growth and expected inflation (proxied by the interest rate spread) differ in the two regimes: in the tight regime (when the Fed is considering hikes to the target) it reacts stronger to gd p t while in the easing regime (when the Fed is considering cuts) it pays more attention to spread t .
An interesting finding is the opposite sign of the coefficients on the lagged dependent variables: the sign is positive in the regime equation (the p-value is 0.0019), but negative in the amount equations (the p-value is 0.0 0 0 0 in the tight regime and 0.1328 in the easing regime).It implies that the regime and amount decisions have different dynamics.The positive autocorrelation in the regime equation leads to the persistency of regime decisions, whereas the negative autocorrelations in the amount equations mean that the larger the desired change (a cut or a hike) at the previous meeting, the more likely is a status quo decision at the next meeting.The Fed seems to deliberately smooth the path of its target -it prefers to wait and see and to avoid making consecutive changes.Such an inference is impossible if we estimate a single-equation AOP or a linear model with the same set of covariates as in the SwAOP model (see the third or second column in Table 1 ): the AR coefficients on the lagged dependent variable in the linear and the AOP models are small and not significant (at the 0.06 level in the linear model, and at the 0.51 level in the AOP model), implying a lack of intentional interest-rate smoothing by the Fed.
The estimated probabilities of the three policy regimes are shown in Fig. 3 for each FOMC decision during the Greenspan era.The sample average probabilities of easing, neutral and tight regimes are 0.47, 0.28 and 0.25, respectively, though the frequencies of observed cuts, status quo decisions and hikes are 0.24, 0.54 and 0.22, respectively.Apparently, not all status quo decisions are generated by the neutral policy stance.The decomposition of the probability of no change Pr ( y t = 0) into three components Pr ( y t = 0 | s t = −1) , Pr ( y t = 0 | s t = 0) and Pr ( y t = 0 | s t = 1) conditional on the easing, neutral and tight regimes is on average 0.42, 0.54 and 0.05, respectively.The amount decisions tend to smooth the target and moderate or offset the vast majority of the easing and tight policy stances: almost a half of the status quo decisions are generated in either easing or tight regimes.Table 1 ).

Table 1
Parameter estimates of the Taylor rule, the linear, AOP and SwAOP models.
Notes.Sample period: 7/1987Sample period: 7/ -1/2006 (190 observations) (190 observations).Standard deviations of parameters are in parentheses.Each FOMC decision in the sample is matched by the real-time values of covariates as they were known at the end of the previous day.

In-and out-of-sample comparison of competing models
The Taylor rule and three competing models, estimated with the same set of covariates, are contrasted in Table 2 .We compare the in-sample fit for the Greenspan era and the out-of-sample one-step-ahead forecasting performance with recursive re-estimation with an increasing window for the next 111 FOMC decisions during the 3/2006-6/2019 period.It is a rather challenging forecasting exercise since during the in-sample period the target has always been far above zero, whereas out of the sample it abruptly moved forward zero, stuck close to zero for seven years, and then began gradually moving up (the 1/2016-6/2019 period).We compare the accuracy (the percentage of correct predictions), mean absolute error (MAE) and two strictly proper scores: the probability, or Brier score ( Brier, 1950 ) and ranked probability score ( Epstein, 1969 ).Both scores measure the accuracy of the probabilistic forecasts (contrary to the percentage of correct predictions that does not distinguish between cases in which the estimated probability of a particular choice is, for example, 0.51 or 0.99).Both scores have a minimum value of zero when all the observed choices are forecasted with a unit probability.In contrast to the Brier score, the ranked probability score punishes forecasts more severely for non-zero predicted probabilities of choices that are further from the observed choice.Probabilities for the AOP and SwAOP models are evaluated numerically as Pr ( y t = j| ) = Pr ( y t = j| , R * , θ) d Pr (R * , θ| ) , where is the available data.
The SwAOP model outperforms the competitors both in and out of sample according to all employed criteria, whereas the Taylor rule demonstrate the worst results.The MAE in the SwAOP model is lower than in the AOP model by one third both in and out of sample.The performance of the linear model is better than that of the AOP model in sample but much worse out of sample.The out-of-sample forecasts of the Taylor rule and linear model are remarkably inferior to that of the discrete-choice competitors: the MAE is bigger by, respectively, 69% and 52%, and fraction of correct predictions is lower by 32 and 20 percentage points than those in the SwAOP model.In particular, during the ZLB period from 1/2009 through 12/2015, the SwAOP successfully accommodates the ZLB by a protracted switch to an easing regime (see Fig. 4 ), correctly predicts 53 out of 55 status quo decisions, and wrongly predicts only two further cuts.By contrast, the Taylor rule and linear model fail to handle the ZLB and correctly predict only 34 no-change decisions, wrongly predicting 21 further cuts.12

Table 2
Performance of competing models: the FOMC decisions on federal funds rate target favor the SwAOP model.
Notes.Specifications of all models are those reported in Table 1 .Each FOMC decision is matched by the real-time covariates as they were known at the end of the previous day.The out-of-sample forecast of the next FOMC decision is performed using a recursive re-estimation with an increasing window.Predicted choice is that with the highest predicted probability (for the AOP and SwAOP models).Predited choice in the Taylor rule and linear model is determined by rounding the continuous-valued prediction to the nearest discrete-valued choice.Brier probability score is computed as   1 ) using a recursive re-estimation with an increasing window.
The fractions of correct predictions in the SwAOP model in terms of three choices (hike, no change, cut), which are 0.79 in sample and 0.92 out of sample, exceed the accuracy of the existing dynamic discrete-choice models for the FOMC decisions such as those in Hu and Phillips (2004) , Piazzesi (2005) and Kim et al. (2009) .They model FOMC decisions made only at the scheduled meetings (which are easier to predict than unscheduled) in the 2/1994-12/2001, 2/1994-12/1998 and 2/1994-12/2001 periods, respectively, and report the following fractions of correct in-sample predictions: 0.78, 0.75 and 0.67.The out-of-sample forecast is performed only by Kim et al. (2009) for the 20 02-20 06 period with the 0.75 accuracy.

Concluding remarks
We develop a new dynamic discrete-choice model for monetary policy interest rates.Central banks typically adjust policy rates by discrete increments and often leave them unchanged in different economic circumstances; if central banks make a change, it is usually followed by further changes in the same direction.To address these stylized facts the new regime switching autoregressive ordered probit allows (i) the status quo outcomes to be heterogeneous and generated in three latent regimes, which can be interpreted as monetary policy stances (easing, neutral and tightening); (ii) the probabilities of positive and negative changes to the rate to be driven by different processes; and (iii) the persistency of policy rates and monetary policy inertia to be captured by the lagged dependent latent variables among the regressors.
The simulations demonstrate that the proposed Bayesian estimator performs well in small samples.The application to the FOMC decisions on the federal funds rate target shows that the discrete-choice approach and regime switching do matter in the empirical estimation of monetary policy rules.The proposed model overwhelmingly outperforms the linear models (including the Taylor rule), the single-equation ordered probit and other discrete-choice models both in and out of sample.
The new methodology can be employed to model the policy rate decisions of many central banks, changes to rankings, tick-by-tick stock price changes, and other ordinal data, which are subject to regime switching or sample separation and have abundant and heterogeneous observations in a middle neutral category (typically, a zero or no-change).

Monte Carlo design
The dependent variable is generated with five ordered choices.The values of parameters are calibrated to yield on average the following frequencies of the observed choices: 7%, 14%, 58%, 14% and 7%.The number of replications is 10,0 0 0 in each experiment.Three vectors of covariates v 1 , v 2 and v 3 are drawn at each replication as v 1 ,t ∼0 .41 v 1 ,t−1 + N (0 , 0 .3 2 ) , v 2 ,t ∼0 .17 v 2 ,t−1 + N (0 , 0 .4 2 ) and v 3 ,t ∼0 .01 + 0 .87 v 3 ,t−1 + N (0 , 0 .8 2 ) .Since the dependent variable represents changes to policy interest rates, the simulated artificial covariates v 1 , v 2 and v 3 mimic macroeconomic variables that are of interest to central banks such as changes to the inflation rate (personal consumption expenditures: chain-type price index, percent change from year ago), changes to the output gap, and the interest rate spread (one-year treasury constant maturity minus federal funds rate), respectively.The vectors of error terms in the latent Eqs. ( 1) and ( 2) are repeatedly generated as IID standard normal random variables.
Under each true DGP, the competing models are estimated using the same set of covariates: (i) under the AOP DGP -the AOP model is estimated with the covariates v 1 , v 2 and v 3 , and SwAOP model is estimated with X = X − = X + = ( v 1 , v 2 , v 3 ) ; (ii) and under the SwAOP DGP -the AOP model is estimated with v 1 , v 2 and v 3 , and the SwAOP model is estimated with X = ( v 1 , v 2 ) and X − = X + = v 3 .The simulations in (ii) are expected to provide some evidence that the AOP estimator deliver asymptotically biased estimates under the SwAOP DGP, whereas the simulations in (i) are expected to show that the SwAOP estimator provides asymptotically unbiased estimates under its own DGP and also even if the true DGP is the AOP model.
For each repeated estimation, we initialize the parameters at the starting point described in Section 3 .We run the Gibbs sampler for a burn-in period with 80,0 0 0 iterations.We found that the Gibbs sampler converges much faster in most runs; if not, we produce 30,0 0 0 more iterations to approximate the posterior distribution.From this estimation period we compute the posterior means and all other estimates.

Monte Carlo results
First, we study the finite sample performance of the SwAOP estimator, when the data are generated by its own DGP with 20 0, 50 0 and 10 0 0 observations.As Table A1 shows, the accuracy of the estimates improves with the number of observations.The small true values of φ − and φ + imply a small positive AR effect, which, however, is detected even for only 200 observations.The root mean square errors (RMSE) of the parameters shrinks roughly with the root of the sample size.The biases do so as well, with an exception for a few cutpoint parameters.The results support the asymptotic consistency of the employed Gibbs sampler.
Next, we compare the performance of the AOP and SwAOP models under each DGP.The parameter estimates are not compatible due to the different structure of the two models.To compare them we also estimate the choice probabilities, the MEs of each covariate on each choice probability (the matrix of the MEs has 3 × 5 = 15 elements; their values, which depend on the values of the covariates, are computed at the population medians of the covariates and lagged latent dependent variables), and the standard errors of ME estimates.Table A2 shows the measures of accuracy of the ME estimates in both models under two alternative DGPs.If the simulated and estimated models are identical, the empirical coverage probabilities

Table A1
Monte Carlo results: the accuracy of parameter estimates in the SwAOP model.

Table A2
Monte Carlo results: the accuracy of ME estimates in the AOP and SwAOP models under each DGP.

Fig. 1 .
Fig. 1.Historical changes to the federal funds rate target during the Greenspan tenure.
( y t = j) − d jt ] 2 , where indicator d jt = 1 if y t = j and d jt = 0 otherwise.Ranked probability score is computed as 1 T T t=1 5 j=1 [ P jt − D jt ] 2 , where P jt = j i =1 Pr ( y t = i ) and D jt = j i =1 d it .The better the prediction, the smaller both score values.Mean absolute error is the average absolute difference between the observed and predicted discrete choice.

Fig. 4 .
Fig. 4. Out-of-sample predicted changes to the target and probabilities of latent policy regimes at each FOMC meeting.Notes.Out-of-sample period: 3/2006-6/2019 (111 observations).The forecasts are obtained from the SwAOP model (see Table1) using a recursive re-estimation with an increasing window.
Notes.All reported measures of accuracy are averaged across all five choices and all three covariates.Bias, RMSE and standard deviation bias are multiplied by 100.are quite close to 95% nominal level even for 200 observations and are getting even closer as sample size increases.The bias and RMSE are larger for the SwAOP model since it has more parameters than the AOP model.The biases and RMSE of both estimators seem to converge to zero.The biases of the standard deviation estimates also shrink with growing number of observations, which means the standard deviation of all posterior means is close to the average posterior standard deviation, fostering the reliability of the posterior.The SwAOP estimator behaves much better under the AOP true DGP than the AOP estimator under the data generated using the SwAOP model.When the SwAOP model is fitted to data from the AOP model, as sample size grows, the biases