[NeurIPS🎉] Are Self-Attentions Effective for Time Series Forecasting?

Introduction
Preliminary
- Black-box
- Post-hoc Methods vs. Intrinsic Methods
Main Results
Conclusion
- Related Papers from the Lab

Introduction

When Artificial Intelligence (AI) models are deployed in the real world, working “most of the time” becomes merely the minimum requirement. What matters more is whether we can trust that model. Explainability refers to “Can the AI sufficiently explain its results to humans?” It is an element that must be achieved to fulfill and oversee core AI regulations such as the transparency of AI technology.

“Are Self-Attentions Effective for Time Series Forecasting?” [Paper, Repo] is a paper from our lab presented at NeurIPS 2024, one of the top AI conferences. In this article, we aim to explore the basic concepts of explainability and the contents of the paper.

Preliminary

Black-box

AI demonstrates high performance across various fields, but one problem persists: the black-box nature. Black-box is a term describing the characteristic of AI models, especially deep learning models, where the internal workings are difficult to understand or explain. This arises from the large number of parameters and the structural complexity of AI models.

Black-box: Cases where it is difficult to clearly understand how outputs are derived from inputs

The research field that seeks to overcome this black-box nature of AI is called Explainable AI (XAI). Explainable AI aims to enhance the explainability of AI, making the decision-making process of AI more transparent and building trust among people. XAI research can be broadly divided into two approaches.

Post-hoc Methods vs. Intrinsic Methods

Post-hoc methods apply separate explanation techniques to interpret results after the model has been trained. For example, a complex deep learning model is first trained, and then interpretation techniques (LIME, SHAP) are applied to explain the prediction results.

A representative example of interpretation techniques that enable post-hoc explanation is LIME (Local Interpretable Model-agnostic Explanations). Given an original frog image and a trained black-box AI model, LIME finds “which part of the original frog image was most critical.” It feeds images with specific regions masked into the trained black-box model, then identifies the most significant regions based on the corresponding probability values.

LIME (Local Interpretable Model-agnostic Explanations)

In contrast, intrinsic methods implement the model itself to be inherently explainable. These models do not require separate interpretation techniques and can easily interpret prediction results as-is in their trained state. Among AI models, linear regression and decision trees fall into this category, and these models can be interpreted by plotting them with scatter plots or visualizing the model itself.

Linear regression (left) and decision trees (right) can be easily interpreted through visualization

Deep learning models that achieve high performance have complex structures unlike linear regression or decision trees, so post-hoc methods have been predominantly used over intrinsic methods. However, recent studies have confirmed that applying post-hoc methods to deep learning models can be inaccurate.

"Saliency tells us where a deep learning model looks, but does not provide information about the correct answer" - Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature machine intelligence 1.5 (2019): 206-215.

Accordingly, a line of research has proposed methods to improve explainability by modifying the architecture of deep learning models. Related studies have leveraged Generalized Additive Models (GAM) or the attention mechanism of Transformers to enhance the explanatory power of deep learning. The paper below also discovered that structural modifications to the Transformer can significantly enhance interpretability.

Darcet, Timothée, et al. "Vision Transformers Need Registers." The Twelfth International Conference on Learning Representations.

Main Results

Building on prior work, this paper aimed to analyze the interpretive and performance significance of attention, particularly self-attention, in time-series forecasting. To this end, the self-attention component of PatchTST, a widely used time-series model, was analyzed.

Visualization of a time-series model (PatchTST): (left) using original self-attention as-is, (right) replacing original self-attention with a simple linear model

The above results show that the relationship between input and output values, which appeared blurry when using the original self-attention, actually became clearer when self-attention was replaced with a simple linear network. In other words, self-attention may not be better than a simple linear network for interpreting temporal information. In terms of performance as well, self-attention did not show a significant performance difference compared to using a simple linear network, and in some cases even showed decreased performance.

Based on these observations and prior work on simple linear networks, the paper concluded that cross-attention, rather than self-attention, can have several advantages for time-series forecasting, including temporal information analysis.

Comparison of the proposed model (d) with previous models (a-c)

Based on the proposed cross-attention-based model (CATS), our research team achieved high performance with fewer parameters, outperforming existing state-of-the-art (SOTA) models. Furthermore, the proposed cross-attention-based model (CATS) provided easily interpretable results. As shown in the figure below, it accurately captured periodic patterns in the input time series, and additionally demonstrated the ability to detect shocks.

Comparison of the proposed model (Ours) and previous SOTA models (x-axis: input time series length, y-axis: prediction performance)

Analysis of the proposed model's cross-attention

Conclusion

Explainability is one of the core concepts in AI trustworthiness. Improving model explainability is essential for verifying whether AI is operating as intended by humans and whether there is any bias. Explainability enhances system transparency and strengthens trustworthiness by providing the ability to understand and analyze how the decisions made by AI models were derived. We hope that the findings of this paper regarding self-attention and cross-attention contribute to advancing the field of model explainability.

Outside the (Black) Box: Explaining Risk Premium via Interpretable Machine Learning [Under Review] | [Paper]
Fair sampling in diffusion models through switching mechanism [AAAI 2024] | [Paper] | [Code]

Introduction

Preliminary

Black-box

Post-hoc Methods vs. Intrinsic Methods

Main Results

Conclusion

Related Papers from the Lab

Related Articles

[CVPR🎉] Kimcaddie AI Team's Golf Swing Analysis Technology Paper Presented at CVPR 2025 Workshop

Explaining determinants of bank failure prediction via neural additive model