Evaluation metrics, also known as performance measures or evaluative metrics, are quantitative measurements used to evaluate the performance and quality of a model or algorithm in solving a particular problem. It provides a standardized way to evaluate and compare different models and algorithms based on specific criteria.

In our case, we are interested in evaluation metrics that are specifically used to evaluate the performance of time series models. They are designed to measure various aspects such as accuracy, precision, recall, error, and predictive power, depending on the type of problem being addressed. These metrics can be helpful in several scenarios:

• Performance Evaluation: Evaluation metrics quantitatively measure how well a time series model performs in predicting future values. They help assess the model’s accuracy and effectiveness in capturing the underlying patterns and trends in the data.
• Model comparison: You can use various metrics to compare multiple models and determine which model performs better. By comparing metrics between models, we can make informed decisions about which model to use for our particular needs.
• Parameter optimization: Evaluation metrics can guide the process of parameter optimization and model selection. Evaluating model performance under different parameter configurations allows us to optimize model performance and select the optimal set of parameters.
• Decision-making: Time series models are often used for forecasting and decision-making purposes. Metrics help us assess the reliability and accuracy of predictions, giving us confidence in making informed decisions based on model results.

There are several evaluation metrics available because different metrics capture different aspects of model performance. Each metric focuses on a particular characteristic or requirement, allowing us to choose the most suitable metric based on their specific needs and objectives.

In this article, we will introduce what we called “Error Metrics“. These metrics focus on measuring the accuracy and magnitude of errors in the forecasted values when compared to the actual values. They emphasize the magnitude of errors rather than the specific direction and provide insights into the overall performance and precision of the forecasting model.

Before introducing the most commonly used evaluation metrics of this kind, let’s introduce the notation used in the equations:

• ŷᵢ : represents the predicted value i-th observation
• yᵢ : represents the true value for the i-th observation
• n : represents the number of observations

### Error Metrics

#### Mean Absolute Error (MAE)

Mean Absolute Error (MAE) measures the average size of the error between predicted and actual values ​​in a time series dataset. MAE is calculated by taking the absolute difference between the predicted and actual values ​​and averaging them.

$$\text{MAE} = \frac{1}{n} \sum_{i=1}^{n} \left| \hat{y}_i – y_i \right|$$

• Advantages: MAE is simple and easy to interpret as the mean error is expressed in the same units as the original data. It is less sensitive to outliers compared to other error metrics such as mean squared error (MSE).
• Disadvantages: MAE does not distinguish between overestimation and underestimation and does not provide information about the direction or magnitude of individual errors. In addition, depending on your particular problem it may be seen as a disadvantage that it does not penalize wrong predictions as much as MSE.
Python
# Install library
!pip install scikit-learn

# Import function
from sklearn.metrics import mean_absolute_error

# Calculate MAE
mae = mean_absolute_error(actual_values, predicted_values)
print("Mean Absolute Error:", mae)

#### Mean Squared Error (MSE)

Mean Squared Error (MSE) is a commonly used metric for evaluating the accuracy of time series forecasting models. It measures the average squared difference between the predicted and actual values over a given time period.

$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} \left( \hat{y}_i – y_i \right)^2$$

• Advantages: its simplicity and ability to capture both large and small errors. It penalizes large errors due to squaring.
• Disadvantages: it emphasizes large errors due to squaring, which can be a disadvantage when outliers are present. Also, it is not easily interpretable since it’s expressed in squared units of the data. Other metrics, such as Mean Absolute Error (MAE) or Root Mean Squared Error (RMSE), may be preferred in certain situations where interpretability is a requirement. Also, it does not differentiate between overestimation and underestimation.
Python
# Install library
!pip install scikit-learn

# Import function
from sklearn.metrics import mean_squared_error

# Calculate MSE
mse = mean_squared_error(actual_values, predicted_values)
print("Mean Squared Error:", mse)

Stay up-to-date with our latest articles!

#### Root Mean Squared Error (RMSE)

Root Mean Squared Error (RMSE) is another commonly used metric for evaluating the accuracy of time series forecasting models. It is the square root of the MSE. Therefore, it measures the average difference between the predicted and actual values, taking into account the squared differences to emphasize larger errors. However, it applies the square root to express the result in the original magnitude of the variables.

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( \hat{y}_i – y_i \right)^2}$$

• Advantages: it penalizes large errors due to squaring and it is easily interpretable thanks to expressing the number in the magnitude of the variables.
• Disadvantages: it is sensitive to outliers and can be skewed by extreme values. Additionally, RMSE doesn’t provide insights into the direction or pattern of forecast errors, and it doesn’t distinguish between over- and under-predictions.
Python
# Install library
!pip install scikit-learn

# Import libraries
from sklearn.metrics import mean_squared_error
import numpy as np

# Calculate MSE
mse = mean_squared_error(actual_values, predicted_values)

# Calculate RMSE
rmse = np.sqrt(mse)

print("RMSE:", rmse)

#### Mean Absolute Percentage Error (MAPE)

Mean Absolute Percentage Error (MAPE) calculates the average percentage difference between the predicted values and the actual values. MAPE is calculated by taking the absolute difference between the predicted and actual values, dividing it by the actual value, and then averaging the results.

$$\text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{\hat{y}_i – y_i}{y_i} \right| \times 100$$

• Advantages: its ability to interpret errors in terms of percentage, making it easier to understand and compare across different datasets.
• Disadvantages: it is sensitive to extreme values and cannot handle zero or near-zero actual values since division by zero is not possible. MAPE treats overestimations and underestimations differently, which means that it is not a symmetric metric. The direction of the error (whether the prediction is above or below the actual value) affects the MAPE value. This lack of symmetry means that it may not be suitable for situations where overestimations and underestimations have different implications or costs.
Python
# Install library
!pip install statsmodels

# Import function
from statsmodels.tools.eval_measures import mean_absolute_percentage_error

# Calculate MAPE
mape = mean_absolute_percentage_error(actual_values, predicted_values) * 100
print("MAPE:", mape, "%")

#### Symmetric Mean Absolute Percentage Error (SMAPE)

Symmetric Mean Absolute Percentage Error (SMAPE) is a metric that measures the percentage difference between the observed and predicted values, taking into account the scale of the data. SMAPE is calculated by finding the absolute difference between the actual and forecasted values, dividing it by the average of the absolute values of the actual and forecasted values, and then multiplying by 100 to express it as a percentage.

$$\text{SMAPE} = \frac{1}{n} \sum_{i=1}^{n} \frac{\left| \hat{y}_i – y_i \right|}{(\left| \hat{y}_i \right| + \left| y_i \right|)/2} \times 100$$

• Advantages: its ability to handle zero values and its symmetry, which ensures that overestimation and underestimation errors are treated equally
• Disadvantages: It is sensitive to extreme outliers as the percentage difference is calculated. It can produce infinite values if the actual and predicted values are both zero.
Python
# Import libraries
import numpy as np

# Define function
def sym_mean_absolute_percentage_error(actual, predicted):
"""
Calculate SMAPE (Symmetric Mean Absolute Percentage Error).
"""
return 2 * np.mean(np.abs(actual - predicted) / (np.abs(actual) + np.abs(predicted))) * 100

# Calculate SMAPE
smape = sym_mean_absolute_percentage_error(actual_values, predicted_values)
print("SMAPE:", smape, "%")

#### Mean Absolute Scaled Error (MASE)

Mean Absolute Scaled Error (MASE) measures the relative forecast accuracy compared to the naïve or benchmark model, which refers to simply using the last observed value of the time series as the forecast for all future points. MASE is calculated by dividing the mean absolute error (MAE) of the model’s forecasts by the MAE of the naïve model.

$$\text{MASE} = \frac{\text{MAE}}{\frac{1}{n-1} \sum_{i=2}^{n} \left| y_i – y_{i-1} \right|}$$

• Advantages: it is scale-independent, making it suitable for comparing forecast accuracy across different time series with varying scales. It provides a meaningful and interpretable measure of forecast accuracy. Finally, it is robust against outliers and extreme values in the data.
• Disadvantages: it can be sensitive to zero or near-zero values in the denominator, which can lead to instability or division by zero. It assumes the naïve model is accurate, which might not always be the case.
Python
# Install library
!pip install scikit-learn

# Import libraries
from sklearn.metrics import mean_absolute_error

def mean_absolute_scaled_error(actual, predicted):
"""
Calculate MASE (Mean Absolute Scaled Error).
"""
mae = mean_absolute_error(actual, predicted)
naive_error = np.mean(np.abs(actual[1:] - actual[:-1]))
return mae / naive_error

# Calculate MASE
mase = mean_absolute_scaled_error(actual_values, predicted_values)
print("MASE:", mase)

### Example

We could see how to apply them and interpret them with an example. Let’s use the SARIMA model we trained in our previous article to predict the maximum monthly temperature in Madrid.

Let’s check one by one:

• MAE (Mean Absolute Error): The MAE value is 1.468. It represents the average magnitude of the errors between the predicted values and the actual values. In this case, the average error magnitude is 1.468 degrees. A lower MAE indicates better accuracy, as it means the model’s predictions are closer to the actual values on average.
• MSE (Mean Squared Error): The MSE value is 2.805. It measures the average of the squared differences between the predicted and actual values. MSE gives more weight to larger errors compared to MAE. In this case, the average squared error is 2.805. It is important to emphasize that the unit of this metric is the squared unit of the original magnitude, in this case, squared degrees. A lower MSE indicates better accuracy.
• RMSE (Root Mean Squared Error): The RMSE value is 1.675. It is the square root of the MSE and provides a measure of the standard deviation of the errors. RMSE is in the same unit as the predicted and actual values. In this case, the average standard deviation of the errors is 1.675 degrees. Similarly, a lower RMSE indicates better accuracy.
• MAPE (Mean Absolute Percentage Error): The MAPE value is 8.42%. It represents the average percentage difference between the predicted and actual values, relative to the actual values. MAPE is useful when you want to understand the relative error in terms of percentage. In this case, the average percentage difference is 8.42%. In this case, a lower MAPE also indicates better accuracy.
• SMAPE (Symmetric Mean Absolute Percentage Error): The SMAPE value is 8.15%. It is similar to MAPE but uses the average of the absolute difference and the sum of the predicted and actual values. SMAPE provides a symmetric view of the percentage difference between the predicted and actual values. In this case, the average symmetric percentage difference is 8.15%. A lower SMAPE also indicates better accuracy.
• MASE (Mean Absolute Scaled Error): The MASE value is 0.333. It measures the accuracy of a forecast relative to the mean absolute error of a naive baseline model. MASE is a useful metric for comparing forecast accuracy across different time series datasets. In this case, the model’s forecast is approximately 0.333 times more accurate than the naive baseline. A lower MASE indicates better accuracy, as it means the model’s forecast is more accurate compared to the baseline. Therefore, it is better for the MASE value to be low.

### Conclusion

In addition to these metrics, we can define another type of evaluation metric, which we will refer to as “Performance Metrics“. These metrics are commonly employed in time series analysis and forecasting to assess different aspects of the forecasting model beyond the error measurements. They focus on aspects such as the bias in the forecasts, the coverage of prediction intervals, the accuracy of predicting the direction of change, and the overall goodness-of-fit of the model.

Depending on the particular requirements and characteristics of the time series data, different metrics may be more suitable for use. Typically, it’s common to use a mix of evaluation metrics from both types to assess forecasting models. In our future article, we will delve into Performance Metrics, so you gain a complete overview of how to assess your Time Series forecasting model.

Categories: Time Series