马树森 题为 “FMamba: Bi-Mamba Based on Fast Gated-Attention for Multivariate Time-Series Forecasting” 的论文已被《IEEE Transactions on Artificial Intelligence》接受发表。该论文摘要如下:

In multivariate time-series forecasting (MTSF), extracting the temporal features of the input sequences and the correlation features of variables is crucial. In the channelindependent strategy, while popular Transformer-based predictive models can perform well in capturing the correlation features of variables, their quadratic computational complexity results in inefficiency and high overhead in processing inputs with numerous variables. The recently emerged Mamba, a selective state space model, has shown promising results in many fields due to its strong temporal feature extraction capabilities and linear computational complexity in processing long sequences. However, the Mamba cannot explicitly capture inter-variable relationships as the Transformer does, although the impact of its unilateral nature can be mitigated by the bidirectional Mamba (Bi-Mamba) structure. Therefore, to both explicitly perceive inter-variable relationships and achieve linear computational complexity when processing long sequences, we innovatively combine the Bi-Mamba structure with the improved Performers (Gated-Performers), a linear Transformer structure with a gating mechanism, and propose a novel framework named FMamba with global selective capabilities for MTSF. Technically, we first extract the temporal features of the input variables through an embedding layer, then explicitly compute the dependencies among input variables via the proposed fast gated-attention module of the Gated-Performers. Subsequently, we use Bi-Mamba to selectively deal with the input features and further extract the temporal dependencies of the variables through the multi-layer perceptron block (MLP-block). Finally, FMamba obtains the predictive results through the projector, a linear layer. Experimental results on seven public datasets show that FMamba surpasses the Mamba-based model by achieving an average reduction in mean squared error (MSE) of 15.5% and outperforms the Transformerbased model by achieving an average reduction in MSE of 15.2%.