search
HomeTechnology peripheralsAITransformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

In time series forecasting, Transformer has demonstrated its powerful ability to describe dependencies and extract multi-level representations. However, some researchers have questioned the effectiveness of Transformer-based predictors. Such predictors typically embed multiple variables of the same timestamp into indistinguishable channels and focus on these timestamps to capture temporal dependencies. The researchers found that simple linear layers that consider numerical relationships rather than semantic relationships outperformed complex Transformers in both performance and efficiency. At the same time, the importance of ensuring the independence of variables and exploiting mutual information has received increasing attention from recent research. These studies explicitly model multivariate correlations to achieve precise predictions. However, it is still difficult to achieve this goal without subverting the common Transformer architecture

When considering the controversy caused by Transformer-based predictors, the researchers Thinking about why Transformer performs even worse than linear models in time series forecasting, while dominating in many other fields

Recently, a new paper from Tsinghua University proposed a Different perspective - Transformer performance is not intrinsic, but is caused by improper application of the schema to time series data.

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

The link to the paper is: https://arxiv.org/pdf/2310.06625.pdf

The existing structure of Transformer-based predictors may not be suitable for multivariate time series forecasting. The left side of Figure 2 shows that points at the same time step represent different physical meanings, but the measurement results are inconsistent. These points are embedded into a token, and multivariate correlations are ignored. Furthermore, in the real world, individual time steps are rarely labeled with useful information due to misalignment of local receptive fields and timestamps at multivariate time points. In addition, although sequence variation is significantly affected by sequence order, the variant attention mechanism in the temporal dimension has not been fully adopted. Therefore, Transformer's ability to capture basic sequence representation and describe multivariate correlations is weakened, limiting its ability and generalization ability on different time series data

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

Regarding the irrationality of embedding the multi-variable points of each time step into a (time) token, the researcher started from the reverse perspective of the time series and independently embedded the entire time series of each variable into a (variable ) token, this is an extreme case of patching that expands the local receptive field. Through inversion, the embedded token aggregates the global representation of the sequence, which can be more variable-centered and better utilize the attention mechanism for multi-variable association. At the same time, feedforward networks can skillfully learn generalized representations of different variables encoded by any lookback sequence and decode them to predict future sequences.

Researchers pointed out that for time series prediction, Transformer is not invalid, but its use is inappropriate. In this paper, the researchers re-examined the structure of Transformer and recommended iTransformer as the basic pillar of time series prediction. They embed each time series as a variable token, adopt a multi-variable correlation attention mechanism, and use a feed-forward network to encode the sequence. Experimental results show that the proposed iTransformer reaches the state-of-the-art level in the actual prediction benchmark Figure 1 and unexpectedly solves the problems faced by Transformer-based predictors

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

# In summary, the contributions of this article have the following three points:

  • The researcher reflected on the architecture of Transformer and found that The capabilities of the native Transformer component on time series have not yet been fully exploited.
  • The iTransformer proposed in this article treats independent time series as tokens, captures multi-variable correlations through self-attention, and uses layer normalization and feed-forward network modules to learn better sequences Global representation,for time series forecasting.
  • Through experiments, iTransformer reaches SOTA on real-world prediction benchmarks. The researchers analyzed the inversion module and architectural choices, pointing out the direction for future improvements of Transformer-based predictors.

iTransformer

In multivariate time series forecasting, given historical observations:


Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges


Using T time steps and N variables, the researcher predicts S time steps in the future: Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges. For convenience, it is expressed as Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges is the multivariate variables recorded simultaneously at time step t, and Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges is the entire time series indexed by n for each variable. It is worth noting that in the real world, Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges may not contain time points with essentially the same timestamp due to system latency of monitors and loosely organized datasets. Elements of

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges can differ from each other in physical measurements and statistical distributions, and variables Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges often share these data.


The Transformer variant equipped with the architecture proposed in this article is called iTransformer. Basically, there are no more specific requirements for the Transformer variant, just note that The force mechanism should be suitable for multivariate correlation modeling. Therefore, an effective set of attention mechanisms can serve as a plug-in to reduce the complexity of associations when the number of variables increases.

iTransformer is shown in the fourth picture, using a simpler Transformer encoder architecture, including embedding, projection and Transformer blocks

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

Experiments and Results

The researchers conducted a comprehensive evaluation of iTransformer in various time series forecasting applications, confirming the versatility of the framework and Further studied the effect of inverting the responsibilities of the Transformer component for specific time series dimensions

The researchers extensively included 6 real-world data sets in the experiment, including ETT, weather, electricity, and transportation data set, solar data set and PEMS data set. For detailed data set information, please refer to the original text

##The rewritten content is: Prediction results

are shown in Table 1 shown, with red indicating the best and underline indicating the best. The lower the MSE/MAE, the rewritten content is: the more accurate the prediction results are. The iTransformer proposed in this article achieves SOTA performance. The native Transformer component is capable of time modeling and multivariate correlation, and the proposed inverted architecture can effectively solve real-world time series prediction scenarios.

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

The content that needs to be rewritten is: Universality of iTransformer

Researchers who have applied this framework to Transformer and its variants to evaluate iTransformers have found that these variants generally address the quadratic complexity of the self-attention mechanism, including Reformer, Informer, Flowformer, and FlashAttention. The researchers also found that simply inverting the perspective can improve the performance of Transformer-based predictors, improve efficiency, generalize to unseen variables, and make better use of historical observational data

Table 2 Transformers and corresponding iTransformers were evaluated. It is worth noting that the framework continues to improve various Transformers. Overall, Transformers improved by an average of 38.9%, Reformers by an average of 36.1%, Informers by an average of 28.5%, Flowformers by an average of 16.8%, and Flashformers by an average of 32.2%.

Another factor is that iTransformer can be widely used in Transformer-based predictors because it adopts the inverted structure of the attention mechanism in the variable dimension, introducing efficient attention with linear complexity, from Fundamentally solve the efficiency problem caused by 6 variables. This problem is common in real-world applications, but can be resource-consuming for Channel Independent

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

To test the hypothesis, the researchers used iTransformer Comparisons are made with another generalization strategy: Channel Independent, which enforces a shared Transformer to learn the pattern for all variants. As shown in Figure 5, the generalization error of Channel Independent (CI-Transformers) can increase significantly, while the increase in iTransformer prediction error is much smaller.

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

Since the responsibilities of the attention and feedforward networks are inverted, Figure 6 evaluates the Transformers and iTransformers as the lookback length increases. performance. It validates the rationale of leveraging MLP in the temporal dimension, i.e. Transformers can benefit from extended look-back windows, resulting in more accurate predictions.

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

Model analysis

In order to verify the rationality of the Transformer component, research The authors conducted detailed ablation experiments, including component replacement (Replace) and component removal (w/o) experiments. Table 3 lists the experimental results.

Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges

For more details, please refer to the original article.

The above is the detailed content of Transformer Revisited: Inversion is more effective, a new SOTA for real-world prediction emerges. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Tesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserTesla's Robovan Was The Hidden Gem In 2024's Robotaxi TeaserApr 22, 2025 am 11:48 AM

Since 2008, I've championed the shared-ride van—initially dubbed the "robotjitney," later the "vansit"—as the future of urban transportation. I foresee these vehicles as the 21st century's next-generation transit solution, surpas

Sam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailSam's Club Bets On AI To Eliminate Receipt Checks And Enhance RetailApr 22, 2025 am 11:29 AM

Revolutionizing the Checkout Experience Sam's Club's innovative "Just Go" system builds on its existing AI-powered "Scan & Go" technology, allowing members to scan purchases via the Sam's Club app during their shopping trip.

Nvidia's AI Omniverse Expands At GTC 2025Nvidia's AI Omniverse Expands At GTC 2025Apr 22, 2025 am 11:28 AM

Nvidia's Enhanced Predictability and New Product Lineup at GTC 2025 Nvidia, a key player in AI infrastructure, is focusing on increased predictability for its clients. This involves consistent product delivery, meeting performance expectations, and

Exploring the Capabilities of Google's Gemma 2 ModelsExploring the Capabilities of Google's Gemma 2 ModelsApr 22, 2025 am 11:26 AM

Google's Gemma 2: A Powerful, Efficient Language Model Google's Gemma family of language models, celebrated for efficiency and performance, has expanded with the arrival of Gemma 2. This latest release comprises two models: a 27-billion parameter ver

The Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaThe Next Wave of GenAI: Perspectives with Dr. Kirk Borne - Analytics VidhyaApr 22, 2025 am 11:21 AM

This Leading with Data episode features Dr. Kirk Borne, a leading data scientist, astrophysicist, and TEDx speaker. A renowned expert in big data, AI, and machine learning, Dr. Borne offers invaluable insights into the current state and future traje

AI For Runners And Athletes: We're Making Excellent ProgressAI For Runners And Athletes: We're Making Excellent ProgressApr 22, 2025 am 11:12 AM

There were some very insightful perspectives in this speech—background information about engineering that showed us why artificial intelligence is so good at supporting people’s physical exercise. I will outline a core idea from each contributor’s perspective to demonstrate three design aspects that are an important part of our exploration of the application of artificial intelligence in sports. Edge devices and raw personal data This idea about artificial intelligence actually contains two components—one related to where we place large language models and the other is related to the differences between our human language and the language that our vital signs “express” when measured in real time. Alexander Amini knows a lot about running and tennis, but he still

Jamie Engstrom On Technology, Talent And Transformation At CaterpillarJamie Engstrom On Technology, Talent And Transformation At CaterpillarApr 22, 2025 am 11:10 AM

Caterpillar's Chief Information Officer and Senior Vice President of IT, Jamie Engstrom, leads a global team of over 2,200 IT professionals across 28 countries. With 26 years at Caterpillar, including four and a half years in her current role, Engst

New Google Photos Update Makes Any Photo Pop With Ultra HDR QualityNew Google Photos Update Makes Any Photo Pop With Ultra HDR QualityApr 22, 2025 am 11:09 AM

Google Photos' New Ultra HDR Tool: A Quick Guide Enhance your photos with Google Photos' new Ultra HDR tool, transforming standard images into vibrant, high-dynamic-range masterpieces. Ideal for social media, this tool boosts the impact of any photo,

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor