In a previous article, we introduced the so-called “Error Metrics“, which focus on measuring the accuracy and magnitude of errors in the forecasted values when compared to the actual values. They emphasize the magnitude of errors rather than the specific direction and provide insights into the overall performance and precision of the forecasting model.

However, there is a distinct category of evaluation metrics known as “Performance Metrics“. These metrics go beyond the conventional error measurements and play a vital role in assessing various aspects of forecasting models. These include forecast bias, prediction interval coverage and accuracy in predicting direction changes.

As time series data exhibit unique characteristics and requirements, the selection of appropriate metrics becomes imperative. A balanced approach often involves using a combination of evaluation metrics from both standard (error metrics) and performance evaluation metrics to properly evaluate forecasting models. In this article, we will explore Performance Metrics in detail, providing you with a comprehensive understanding of how to assess your Time Series forecasting model effectively.

Before introducing the most commonly used evaluation metrics of this kind, let’s introduce the notation used in the equations:

• ŷᵢ : represents the predicted value i-th observation
• yᵢ : represents the true value for the i-th observation
• n : represents the number of observations

#### Forecast Bias

Forecast Bias is an evaluation metric for time series forecasting that measures the systematic deviation of forecasted values from the actual values. It quantifies the tendency of forecasts to consistently overestimate or underestimate the true values. A positive bias indicates overestimation, while a negative bias suggests underestimation.

$$\text{Bias} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i – y_i)$$

• Advantages: it provides insights into the accuracy and consistency of forecasts; helps identify systematic errors, aiding in model improvement; and facilitates decision-making by highlighting the direction and magnitude of forecast deviations.
• Disadvantages: it ignores the timing and direction of errors, treating overestimation and underestimation equally; ignores random errors, only considers systematic errors; and it requires historical data to compare forecasts against actual values.
Python
# Calculate forecast differences
differences = [predicted - actual for predicted, actual in zip(predicted_values, actual_values)]

# Calculate forecast bias
bias = sum(differences) / len(differences)
print("Forecast Bias:", bias)

#### Forecast Interval Coverage

Forecast Interval Coverage or FIC is an evaluation metric used to assess the accuracy and reliability of time series forecasting models. It measures the proportion of actual values that fall within the predicted forecast intervals.

In time series forecasting, forecast intervals are created alongside the individual value predictions to show the possible range of values the actual data might fall into. These intervals or confidence intervals help to capture the uncertainty related to the forecasted values, providing an indication of where the actual value is likely to be within a certain confidence level.

To calculate the Forecast Interval Coverage, we can use the following equation:

$$\text{FIC} = \frac{\text{Number of actual values within the forecasted intervals}}{n} \times 100\%$$

In this equation, you count the number of actual values that fall within the forecast intervals predicted by the model and then divide it by the total number of observations in the testing dataset. Finally, multiply it by 100 to express the result in percentage. The result represents the percentage of actual values that are covered by the forecast intervals. A higher coverage indicates better accuracy and reliability of the forecasting model.

• Advantages: simplicity and interpretability. It provides a straightforward measure of how well the forecast intervals capture the actual observations, making it easy to understand and communicate the reliability of the forecasts. Additionally, it can help identify potential issues with the forecast intervals, such as underestimation or overestimation of uncertainty.
• Disadvantages: it does not consider the width of the intervals or the distribution of the forecast errors. Two forecast models could have the same coverage, but one may have narrower intervals and thus provide more precise predictions. Another drawback is that it treats all observations equally, without considering the potential impact of certain observations. In cases where certain observations are more critical or have higher importance, the Forecast Interval Coverage may not adequately reflect the performance of the forecast intervals in capturing those specific values.
Python
def calculate_coverage(forecasted_intervals, actual_values):
# Calculate the number of actual values that fall within the intervals
num_within_interval = sum((lower <= actual <= upper) for actual, (lower, upper) in zip(actual_values, forecasted_intervals))

# Calculate the total number of observations
total_observations = len(actual_values)

# Calculate the coverage or FIC by dividing both values
fic = num_within_interval / total_observations * 100

return fic

# Calculate FIC
fic = calculate_coverage(forecasted_intervals, actual_values)
print("Forecast Interval Coverage:", fic, '%')

#### Prediction Direction Accuracy

Prediction Direction Accuracy or PDA is an evaluation metric used to assess the accuracy of a time series forecasting model in terms of predicting the direction of future values. It measures the percentage of correct directional predictions made by the model. In this context, direction refers to the movement or trend of future values, indicating whether they are predicted to increase, decrease, or remain unchanged.

To calculate Prediction Direction Accuracy, we can use the following equation:

$$PDA = \frac{\sum_{i=1}^{n} (\text{Prediction Direction}_i = \text{Actual Direction}_i)}{n} \times 100\%$$

• Advantages: a straightforward and intuitive measure of the model’s ability to predict the direction of future values in a time series. It focuses on the correct directionality rather than the magnitude of the predictions, making it suitable for scenarios where the absolute values are less critical. It is easy to interpret, as it provides a percentage that directly represents the accuracy of the model’s directional predictions. It can be particularly useful in financial forecasting or trend analysis, where determining the correct direction of change is often more important than precise numerical predictions.
• Disadvantages: it does not capture the magnitude or precision of predictions as it only considers the direction of the forecasted values. Consequently, two forecasts with the same PDA can have very different levels of accuracy in terms of actual values. PDA may not be suitable for time series with minimal or random directional changes, as even a naive model predicting a consistent direction would achieve a high accuracy score. Therefore, it is essential to use it along with other error metrics.
Python
def calculate_pda(predicted_values, actual_values):
# Initialize the value of the correct directions variable
correct_directions = 0

# Iterate each value
for i in range(1, len(predicted_values)):
# Calculate predicted and actual directions
pred_change = predicted_values[i] - predicted_values[i - 1]
actual_change = actual_values[i] - actual_values[i - 1]

# Check if the predictions match the actual directions
if (pred_change > 0 and actual_change > 0) or (pred_change < 0 and actual_change < 0):
correct_directions += 1

# Calculate PDA
pda = (correct_directions / (len(predicted_values) - 1)) * 100
return pda

# Calculate PDA
pda = calculate_pda(predicted_values, actual_values)
print("Prediction Direction Accuracy:", pda, '%')

### Example

Similarly to what we did in the previous article, we could see how to apply these performance metrics and interpret them with an example. We will use again the SARIMA model we trained in our previous article to predict the maximum monthly temperature in Madrid.

From these performance metrics we can see the following:

• Forecast Bias: on average our model underestimates the predictions with an average bias of 0.0495.
• Forecast Interval Coverage (FIC): all the actual values lie within the forecasted confidence intervals, which is a very good indication of the performance of our model.
• Prediction Direction Accuracy (PDA): in 91.30% of the cases the direction of the value (whether it increases or decreases) is correctly forecasted.

### Conclusion

To ensure a comprehensive evaluation of a model, it is essential to employ a combination of error metrics and performance metrics. While error metrics provide valuable insights into the accuracy of predictions, performance metrics offer a broader perspective by assessing additional aspects such as forecast bias, prediction interval coverage, and directional prediction accuracy. When we use both types of metrics together, we can get a better understanding of how well the model works. This helps us make smarter decisions and improves the accuracy of our forecasts.

Categories: Time Series