Skeleton-Based Human Action Recognition Using Seq2Seq and BiLSTM Networks
Enabling smart surveillance and ambient intelligence with efficient AI motion recognition systems.
The research paper “Seq2seq Model for Human Action Recognition Based on Skeleton and Two-Layer Bidirectional LSTM” investigates an efficient deep learning framework for recognizing human actions from video sequences using skeletal motion data. Conducted by Shouke Wei, Jindong Zhao, Junhuai Li, and Meixue Yuan, the study contributes to the field of intelligent surveillance, human-computer interaction, smart environments, and activity analysis by proposing a lightweight yet highly accurate human action recognition (HAR) model.
Human Action Recognition is an important topic in computer vision because it enables machines to understand human behaviors from visual data. Traditional RGB video-based approaches often require large computational resources and may raise privacy concerns because they process full image frames containing personal appearance information. To address these issues, the authors focused on skeleton-based action recognition, where only body joint coordinates and movement trajectories are analyzed. Skeleton representations significantly reduce data complexity while improving privacy protection and computational efficiency.
The proposed model, called SB2_Seq2Seq, combines a Sequence-to-Sequence (Seq2Seq) architecture with a two-layer Bidirectional Long Short-Term Memory (BiLSTM) network. Seq2Seq models are designed to process sequential data by learning temporal dependencies between frames, while BiLSTM networks capture both forward and backward motion information across time. This dual-directional temporal learning allows the model to better understand complex human movement patterns compared with traditional unidirectional recurrent networks.
The system was evaluated using the widely recognized UCF50 dataset, which contains diverse categories of human activities captured from realistic videos. The dataset was divided into 60% training data, 20% validation data, and 20% testing data. Experimental results demonstrated that the proposed SB2_Seq2Seq model achieved approximately 93.54% recognition accuracy with a low Mean Square Error (MSE) of 0.0214, outperforming several existing CNN-, LSTM-, and Seq2Seq-based methods. The study also showed that the model achieved competitive or state-of-the-art performance compared with previously published approaches in the HAR literature.
One of the key innovations of this research is its emphasis on balancing three critical factors simultaneously:
- Recognition accuracy
- Computational efficiency
- User privacy
By using lightweight skeleton representations instead of raw RGB video frames, the model becomes suitable for deployment on edge devices, smart surveillance systems, IoT platforms, and embedded AI applications where computing resources are limited. This design philosophy aligns with modern trends in ambient intelligence and smart environments.
The research is also connected to broader developments in skeleton-based action recognition using deep recurrent neural networks and attention-based learning methods. Previous studies demonstrated the effectiveness of LSTM networks for temporal sequence modeling, but many suffered from high computational complexity or insufficient exploitation of spatial-temporal relationships among body joints. The SB2_Seq2Seq model contributes to this field by providing a lightweight architecture capable of strong temporal feature extraction with lower computational cost.
This work has potential applications in multiple real-world domains, including:
- Smart surveillance and public safety
- Healthcare and elderly monitoring
- Human-computer interaction
- Sports analytics
- Robotics and intelligent assistants
- Smart home and ambient intelligence systems
The study demonstrates how lightweight deep learning architectures can enable accurate and privacy-preserving activity recognition systems suitable for practical deployment in modern intelligent environments.
Reference:
Wei, S., Zhao, J., Li, J., Yuan, M. (2023). Seq2seq model for human action recognition based on skeleton and two-layer bidirectional LSTM. Journal of Ambient Intelligence and Smart Environments, 15(4), 315–331. DOI: https://doi.org/10.3233/AIS-220125

