Publications

You can also find my articles on my Google Scholar profile.

Paper List


WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation

Published in The 40th Annual AAAI Conference on Artificial Intelligence, 2026

Vision modeling has advanced rapidly with Transformers, whose attention mechanisms capture visual dependencies but lack a principled account of how semantic information propagates spatially. We revisit this problem from a wave-based perspective: feature maps are treated as spatial signals whose evolution over an internal propagation time (aligned with network depth) is governed by an underdamped wave equation. In this formulation, spatial frequency—from low-frequency global layout to high-frequency edges and textures—is modeled explicitly, and its interaction with propagation time is controlled rather than implicitly fixed. We derive a closed-form, frequency–time decoupled solution and implement it as the Wave Propagation Operator (WPO), a lightweight module that models global interactions in $\mathcal{O}(N \log N)$ time—far lower than attention. Building on WPO, we propose a family of WaveFormer models as drop-in replacements for standard ViTs and CNNs, achieving competitive accuracy across image classification, object detection, and semantic segmentation, while delivering up to $1.6\times$ higher throughput and 30\% fewer FLOPs than attention-based alternatives. Furthermore, our results demonstrate that wave propagation introduces a complementary modeling bias to heat-based methods, effectively capturing both global coherence and high-frequency details essential for rich visual semantics. Codes are available at: https://github.com/ZishanShu/WaveFormer.

Recommended citation: Zishan Shu and Juntong Wu and Wei Yan and Xudong Liu and Hongyu Zhang and Chang Liu and Youdong Mao and Jie Chen, "WaveFormer: Frequency-Time Decoupled Vision Modeling with Wave Equation," in The 40th Annual AAAI Conference on Artificial Intelligence, Jan 2026.
Download Paper

MTPNet: Multi-Grained Target Perception for Unified Activity Cliff Prediction

Published in International Joint Conferences on Artificial Intelligence (IJCAI), 2025

Activity cliff prediction is a critical task in drug discovery and material design. Existing computational methods are limited to handling single binding targets, which restricts the applicability of these prediction models. In this paper, we present the Multi-Grained Target Perception network (MTPNet) to incorporate the prior knowledge of interactions between the molecules and their target proteins. Specifically, MTPNet is a unified framework for activity cliff prediction, which consists of two components: Macro-level Target Semantic (MTS) guidance and Micro-level Pocket Semantic (MPS) guidance. By this way, MTPNet dynamically optimizes molecular representations through multigrained protein semantic conditions. To our knowledge, it is the first time to employ the receptor proteins as guiding information to effectively capture critical interaction details. Extensive experiments on 30 representative activity cliff datasets demonstrate that MTPNet significantly outperforms previous approaches, achieving an average RMSE improvement of 18.95% on top of several mainstream GNN architectures. Overall, MTPNet internalizes interaction patterns through conditional deep learning to achieve unified predictions of activity cliffs, helping to accelerate compound optimization and design. Codes are available at: https://github.com/ZishanShu/MTPNet.

Recommended citation: Shu, Zishan and Deng, Yufan and Zhang, Hongyu and Nie, Zhiwei and Chen, Jie, "MTPNet: Multi-Grained Target Perception for Unified Activity Cliff Prediction" in Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, {IJCAI-25}, 7733--7741, August 2025, doi: 10.24963/ijcai.2025/860.
Download Paper

A Real-Time Forecast Model Based on Convolutional Neural Network and Attention Mechanism for Passenger Car Sales in 5G Environment

Published in IEEE Transactions on Intelligent Transportation Systems, 2023

Achieving accurate forecasts of passenger car sales can help car companies set reasonable sales targets. However, the existing forecast models are plagued by the following problems. First, the models do not take into account the impact of the importance of features on the forecast ability. Secondly, single feature data cannot reflect the complex buying and selling logic of the passenger car market, and the previous models have not been able to explore the combination effects between different features well. Therefore, in this work, we propose a passenger car sales forecast model based on the convolutional neural network and attention mechanism (PCSFCA). Its innovation lies firstly in the use of the attention mechanism to calculate the importance of features, which enables the model to value important features and ignore unimportant ones. The second is the use of convolutional neural networks to extract the higher-order information of features, which facilitates the model to capture complex data distribution. Besides, we use the 5G network to build a cloud platform for real-time collection of passenger car sales records. The collected sales data are input to the model, and then the model can be learned in real-time. The comparison of experimental results with several benchmark models illustrates the effectiveness of the PCSFCA model.

Recommended citation: Y. Lu, Z. Shu, A. Li and H. Zhang, "A Real-Time Forecast Model Based on Convolutional Neural Network and Attention Mechanism for Passenger Car Sales in 5G Environment," in IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 3, pp. 2858-2868, March 2024, doi: 10.1109/TITS.2023.3311541.
Download Paper