ARIMA models are very powerful for forecasting time series data when this data is univariate. However, there is a type of ARIMA model that can also consider other variables. This type of model is called ARIMAX, which stands for “Auto-Regressive Integrated Moving Average with eXogenous variables”.

ARIMAX is an extension of the traditional ARIMA model that allows for the inclusion of additional variables, known as exogenous variables, which may have an effect on the time series being forecasted.

These exogenous variables can be any type of data:

• time-varying measurements: economic indicators such as inflation rate or price indices, weather data…
• categorical variables: day of the week, month…
• Boolean values: festive days, leap year…

By incorporating these external factors, ARIMAX models can provide more accurate and comprehensive predictions. Additionally, ARIMAX models can also be used for causal analysis, where the relationship between the exogenous variables and the time series data can be examined. Overall, ARIMAX models offer a powerful tool for forecasting and analyzing time series data in a multivariate context.

Join the newsletter of MLPills to stay up to date with our articles and additional content:

### Model description

We can see how this ARIMAX model compares with the standard ARIMA.

For simplicity let’s first consider an ARIMA(1,1,1):

$$y_t’ = c + \phi_1 y_{t-1}’ + \theta_1 \varepsilon_{t-1} + \varepsilon_t$$

• c : represents a constant or drift
• y : refers to the variable of interest (which appears differentiated because of d = 1)
• ϕ : are the AR coefficients
• θ : are the MA coefficients
• εₜ : is the error term, which is white noise

The ARIMAX(1,1,1) will add another term to the equation:

$$y_t’ = c + \beta X + \phi_1 y_{t-1}’ + \theta_1 \varepsilon_{t-1} + \varepsilon_t$$

The new term consists of the ARIMAX coefficient β fitted based on the model and data, and the exogenous variable X. It is important to remark that this exogenous variable must be available for every time period.

### Prediction vs Forecast

Before moving forward is worth discussing the difference between these two words. Prediction and forecast are similar in that they both involve making an estimation about a future time period. However, there is a subtle difference between the two terms:

• Predictions: this refers to in-sample estimations.
• Forecasts: this refers to out-of-sample estimations.

When we have the testing or validation set, we can do predictions with our model, as we already have those values for the future, they are “in-sample”. The values of the variables are already known as they are part of the historical data. However, when we want to estimate a value in the future that has not arrived yet, we are estimating a value that is “out-of-sample”, we don’t know that value yet.

The reason why this distinction is important is that the exogenous variables must be known to estimate the variable of interest $y_t$. This is fine as long as our exogenous variables are festive holidays, days of the week, etc. But, we won’t be able to use the price of an index or the temperature at a particular place to forecast the value of the dependent variable in the future. This is because we require to know the value of this variable too. For this latter case, ARIMAX models are great only for analysis, but not for forecasting.

### Multi-variable forecasts

If what we need is a multi-variable forecast, we need to either do some tricks on the data or use a different type of model. Let’s introduce each of them:

1. The first possibility is transforming the data used as exogenous variables. One possibility is shifting the data to a specific period in the past. For example, shifting it to refer to yesterday’s, last week’s or last month’s value. Also, we could calculate the average value of that exogenous variable last week or month. There are many possibilities.
2. The second and most interesting option is Vector Auto-Regression models (VAR). They allow for estimating several dependent variables at the same time. Therefore the variables we were calling exogenous will be also forecasted, instead of requiring having previous knowledge of them.
Categories: Time Series