Comparative Analysis of Neural Network Architectures for ECG Classification

Abstract

This project presents a comprehensive comparative analysis of 26+ machine learning architectures for electrocardiogram (ECG) classification, including deep learning, state space models, advanced transformers, neuroevolution, and probabilistic/statistical approaches. The deep learning models include: a traditional feedforward neural network (FFNN), a Transformer-based model, a Three-Stage Hierarchical Transformer (3stageFormer), a 1D Convolutional Neural Network (CNN), a Long Short-Term Memory (LSTM) network, a Hopfield Network, a Variational Autoencoder (VAE), and a Liquid Time-Constant Network (LTC). The probabilistic and statistical models include: Hidden Markov Models (HMM), Hierarchical HMM, Dynamic Bayesian Networks (DBN), Markov Decision Processes (MDP), Partially Observable MDPs (PO-MDP), Markov Random Fields (MRF), and Granger Causality. The feedforward architecture is based on the seminal work by Lloyd et al. (2001) for ischemia detection, the Transformer model follows the approach by Ikram et al. (2025) for early detection of cardiac arrhythmias, the 3stageFormer implements the hierarchical multi-scale approach by Tang et al. (2025), the Hopfield Network is based on energy-based associative memory approaches for ECG analysis (ETASR, 2013), the VAE implements the FactorECG approach by van de Leur et al. (2022) for explainable ECG analysis, and the LTC implements the continuous-time neural ODE approach by Hasani et al. (2020) for adaptive temporal dynamics. We additionally implement CNN and LSTM models, which represent alternative approaches using convolution and recurrent connections respectively. We implement all fifteen models from scratch and conduct extensive benchmarking on synthetic ECG data. Our results demonstrate that Transformer-based models achieve superior classification accuracy by effectively capturing temporal dependencies, with the Three-Stage Hierarchical Transformer providing additional benefits through multi-scale feature extraction. The CNN model offers an excellent balance between accuracy and efficiency, effectively capturing local morphological patterns. The LSTM model provides strong sequential modeling capabilities. The Hopfield Network demonstrates unique energy-based pattern recognition capabilities. The VAE provides explainable latent representations that enable both reconstruction and classification tasks. The LTC model demonstrates adaptive temporal dynamics through continuous-time neural ODEs, effectively capturing both fast and slow patterns. The feedforward neural network offers significant advantages in computational efficiency, making it more suitable for real-time applications. This study provides comprehensive insights into the trade-offs between model complexity and performance, guiding the selection of appropriate architectures for different ECG classification scenarios.

26+ Machine Learning Architectures

Original 15 Models + 11 New State-of-the-Art Architectures
Longformer MoE Transformer Big Bird MAMBA BAMBA Infinite Transformer Stacked Transformer HyperNEAT Super-NEAT Neural ODE Neural PDE

🎯 Original 15 Models

1. Feedforward NN

Type: Feature-based MLP
Input: Statistical features
Speed: Fastest
Best For: Real-time, edge devices

2. Transformer

Type: Single-scale Attention
Input: Raw signals
Speed: Moderate
Best For: High-accuracy, research

3. 3stageFormer

Type: Multi-scale Attention
Input: Raw (3 resolutions)
Speed: Slowest
Best For: Multi-scale patterns

4. 1D CNN

Type: Convolutional
Input: Raw signals
Speed: Fast
Best For: Local patterns, efficiency

5. LSTM

Type: Recurrent
Input: Raw signals
Speed: Moderate
Best For: Sequential patterns

6. Hopfield

Type: Energy-based
Input: Raw signals
Speed: Moderate
Best For: Pattern completion

7. VAE

Type: Variational Autoencoder
Input: Raw signals
Speed: Moderate
Best For: Explainable AI

8. LTC

Type: Continuous-time Neural ODE
Input: Raw signals
Speed: Moderate
Best For: Adaptive temporal dynamics

9. HMM

Type: Probabilistic Sequence
Input: Raw signals (discretized)
Speed: Fast
Best For: Probabilistic modeling

10. Hierarchical HMM

Type: Multi-level HMM
Input: Raw signals (discretized)
Speed: Fast
Best For: Multi-scale patterns

11. DBN

Type: Temporal Bayesian Network
Input: Raw signals
Speed: Moderate
Best For: Uncertainty quantification

12. MDP

Type: Sequential Decision
Input: Raw signals
Speed: Moderate
Best For: Decision-making

13. PO-MDP

Type: Partially Observable MDP
Input: Raw signals
Speed: Moderate
Best For: Hidden state modeling

14. MRF

Type: Spatial-temporal
Input: Raw signals
Speed: Moderate
Best For: Dependency modeling

15. Granger

Type: Causal Analysis
Input: Raw signals
Speed: Moderate
Best For: Causal relationships

🚀 New State-of-the-Art Models (11 Added)

16. Longformer

Type: Efficient Transformer
Complexity: O(n) Linear
Speed: Fast
Best For: Long sequences

17. MoE Transformer

Type: Mixture of Experts
Experts: 8 with Top-2 routing
Speed: Moderate
Best For: Scalability

18. Big Bird

Type: Sparse Transformer
Attention: Global+Window+Random
Speed: Fast
Best For: Memory efficiency

19. MAMBA

Type: Selective SSM
Complexity: O(n) Linear
Speed: Very Fast
Best For: Efficiency

20. BAMBA

Type: Bidirectional SSM
Direction: Forward+Backward
Speed: Fast
Best For: Context modeling

21. Infinite Transformer

Variants: 3 (Memorizing, Infini, XL)
Memory: Infinite context
Speed: Moderate
Best For: Long-term memory

22. Stacked Transformer

Layers: 12-24 deep
Features: Layer scaling
Speed: Slow
Best For: Max accuracy

23. HyperNEAT

Type: Neuroevolution
Method: CPPN-based
Speed: Slow (evolution)
Best For: Architecture search

24. Super-NEAT

Type: Advanced NEAT
Features: Speciation
Speed: Slow (evolution)
Best For: Topology search

25. Neural ODE

Type: Continuous-depth
Solvers: Euler, RK4, Dopri5
Speed: Moderate
Best For: Continuous-time

26. Neural PDE

Types: Heat, Wave, Reaction-Diffusion
Physics: PDE-based
Speed: Moderate
Best For: Physical modeling

🚢 Production Deployment Ready

🔧 Model Export

Export any model to ONNX, TorchScript, or quantized formats for production deployment across platforms.

🌐 REST API

FastAPI server with automatic documentation, batch predictions, health checks, and model switching.

🐳 Docker Support

Multi-stage Dockerfile, Docker Compose orchestration, and Nginx reverse proxy with load balancing.

📊 Comprehensive Metrics

ROC-AUC curves, confusion matrices, precision-recall curves, sensitivity/specificity, and computational profiling.

Comprehensive Comparison

📊 Live Benchmark Results

Performance Metrics Comparison

Training Time Comparison (seconds)

Accuracy vs Computational Complexity

Architectural Comparison

Performance Metrics Comparison

Model	Architecture Type	Input Format	Temporal Modeling	Parameters	Training Speed	Accuracy	Explainability
FFNN	Feature MLP	Statistical features	None	Few (100s-1Ks)	Fastest	Good	Moderate
Transformer	Single-scale Attention	Raw signals	Global	Many (100Ks)	Moderate	Excellent	High (attention)
3stageFormer	Multi-scale Attention	Raw (3 resolutions)	Multi-scale	Most (100Ks+)	Slowest	Excellent+	High (hierarchical)
CNN	Convolutional	Raw signals	Local	Moderate (10Ks-100Ks)	Fast	Good-Excellent	Moderate
LSTM	Recurrent	Raw signals	Sequential	Moderate (10Ks-100Ks)	Moderate	Good-Excellent	High (sequential)
Hopfield	Energy-based	Raw signals	Associative	Moderate (10Ks-100Ks)	Moderate	Good-Excellent	Moderate
VAE	Variational Autoencoder	Raw signals	Latent factors	Moderate (10Ks-100Ks)	Moderate	Good-Excellent	Highest (factors)
LTC	Continuous-time Neural ODE	Raw signals	Continuous-time	Moderate (10Ks-100Ks)	Moderate	Good-Excellent	Moderate
HMM	Probabilistic Sequence	Raw signals (discretized)	Hidden states	Few (1Ks)	Fast	Good	Moderate
Hierarchical HMM	Multi-level HMM	Raw signals (discretized)	Multi-scale hidden states	Few (1.5Ks)	Fast	Good-Excellent	Moderate
DBN	Temporal Bayesian Network	Raw signals	Temporal dependencies	Moderate (50Ks)	Moderate	Good-Excellent	High (uncertainty)
MDP	Sequential Decision	Raw signals	Decision process	Few (5Ks)	Moderate	Good	Moderate
PO-MDP	Partially Observable MDP	Raw signals	Hidden state decision	Moderate (8Ks)	Moderate	Good	Moderate
MRF	Spatial-temporal	Raw signals	Dependency modeling	Moderate (40Ks)	Moderate	Good-Excellent	Moderate
Granger	Causal Analysis	Raw signals	Causal relationships	Moderate (30Ks)	Moderate	Good	High (causal)

Trade-offs Visualization

Architectural Paradigms Comparison

Key Features

🎯 Comprehensive Benchmarking

Systematic comparison of fifteen distinct machine learning architectures on standardized metrics including accuracy, precision, recall, F1-score, training time, and inference time.

📊 Multi-Scale Processing

Three-Stage Hierarchical Transformer uniquely processes ECG signals at multiple temporal resolutions (1000, 500, 250 timesteps) for comprehensive pattern recognition.

🔍 Explainable AI

Variational Autoencoder provides 21 interpretable latent factors (FactorECG approach) enabling clinical interpretability and generative capabilities.

⚡ Efficiency Optimization

CNN model offers optimal balance between accuracy and computational efficiency, making it ideal for practical deployment scenarios.

🧠 Energy-Based Learning

Hopfield Network demonstrates unique pattern completion and noise robustness through energy-based associative memory mechanisms.

🔄 Sequential Modeling

LSTM network provides bidirectional sequential processing with explicit memory gates for rhythm analysis and temporal pattern recognition.

⏱️ Continuous-Time Dynamics

Liquid Time-Constant Network (LTC) models ECG signals as continuous-time processes using neural ODEs with adaptive time constants, capturing both fast and slow temporal patterns.

Key Findings

Accuracy Performance

3stageFormer achieves highest accuracy through multi-scale hierarchical processing. Transformer provides excellent accuracy with global attention. CNN, LSTM, VAE, Hopfield, and LTC offer competitive accuracy with different architectural strengths.

Computational Efficiency

FFNN is fastest for training and inference, ideal for real-time applications. CNN provides the best accuracy-efficiency balance. 3stageFormer is slowest but achieves highest accuracy.

Explainability

VAE offers highest explainability through interpretable latent factors. Transformer and 3stageFormer provide attention-based interpretability. LSTM offers sequential processing interpretability.

Generalization

Models processing raw signals (all except FFNN) demonstrate better generalization. 3stageFormer excels at multi-scale patterns. Hopfield shows superior noise robustness.

Quick Start

Installation

pip install -r requirements.txt

Run Complete Benchmark

python benchmark.py

Individual Model Testing

# Feedforward Neural Network
python neural_network.py

# Transformer Model
python transformer_ecg.py

# Three-Stage Hierarchical Transformer
python three_stage_former.py

# CNN and LSTM
python cnn_lstm_ecg.py

# Hopfield Network
python hopfield_ecg.py

# Variational Autoencoder
python vae_ecg.py

# Liquid Time-Constant Network
python ltc_ecg.py

# Hidden Markov Model
python hmm_ecg.py

# Dynamic Bayesian Network
python dbn_ecg.py

# Markov Decision Process / PO-MDP
python mdp_ecg.py

# Markov Random Field
python mrf_ecg.py

# Granger Causality
python granger_ecg.py

Citation

If you use this code or findings, please cite:

@article{chandra2025ecg,
  title={Comparative Analysis of Neural Network Architectures for ECG Classification: A Comprehensive Study of Eight Deep Learning Approaches},
  author={Chandra, Shyamal Suhana},
  journal={Sapana Micro Software Research},
  year={2025},
  note={Implementation and benchmarking of FFNN, Transformer, 3stageFormer, CNN, LSTM, Hopfield, VAE, LTC, HMM, Hierarchical HMM, DBN, MDP, PO-MDP, MRF, and Granger Causality architectures}
}

Related Work Citations

This work builds upon the following foundational research:

Feedforward NN: Lloyd, M. D., et al. (2001). "Detection of Ischemia in the Electrocardiogram Using Artificial Neural Networks." Circulation, 103(22), 2711-2716. DOI: 10.1161/01.CIR.103.22.2711
Transformer: Ikram, Sunnia, et al. (2025). "Transformer-based ECG classification for early detection of cardiac arrhythmias." Frontiers in Medicine, 12, 1600855.
3stageFormer: Tang, Xiaoya, Berquist, Jake, Steinberg, Benjamin A., and Tasdizen, Tolga. (2024). "Hierarchical Transformer for Electrocardiogram Diagnosis." arXiv preprint arXiv:2411.00755.
CNN: LeCun, Y., et al. (1998). "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11), 2278-2324. (Standard convolutional neural network architecture)
LSTM: Hochreiter, S., & Schmidhuber, J. (1997). "Long short-term memory." Neural Computation, 9(8), 1735-1780. (Standard LSTM architecture for sequential modeling)
Hopfield Network: "Electrocardiogram (ECG) Signal Modeling and Noise Reduction Using Hopfield Neural Networks." Engineering, Technology & Applied Science Research (ETASR), Vol. 3, No. 1, 2013. https://etasr.com/index.php/ETASR/article/view/243/156
VAE (FactorECG): van de Leur, Rutger R., et al. (2022). "Improving explainability of deep neural network-based electrocardiogram interpretation using variational auto-encoders." European Heart Journal - Digital Health, 3(3), 2022. DOI: 10.1093/ehjdh/ztac038. https://github.com/UMCUtrecht-ECGxAI/ecgxai
LTC: Hasani, Ramin, et al. (2020). "Liquid Time-Constant Networks." arXiv preprint arXiv:2006.04439. https://github.com/raminmh/liquid_time_constant_networks
HMM: Rabiner, L. R. (1989). "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE, 77(2), 257-286. (Classical HMM reference)
Hierarchical HMM: Fine, S., Singer, Y., & Tishby, N. (1998). "The hierarchical hidden Markov model: Analysis and applications." Machine Learning, 32(1), 41-62.
DBN: Murphy, K. P. (2002). "Dynamic Bayesian Networks: Representation, Inference and Learning." Ph.D. thesis, UC Berkeley. (Classical DBN reference)
MDP: Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons. (Classical MDP reference)
PO-MDP: Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). "Planning and acting in partially observable stochastic domains." Artificial Intelligence, 101(1-2), 99-134.
MRF: Kindermann, R., & Snell, J. L. (1980). Markov Random Fields and Their Applications. American Mathematical Society. (Classical MRF reference)
Granger Causality: Granger, C. W. J. (1969). "Investigating causal relations by econometric models and cross-spectral methods." Econometrica, 37(3), 424-438.

References

Lloyd, M. D., et al. (2001). "Detection of Ischemia in the Electrocardiogram Using Artificial Neural Networks." Circulation, 103(22), 2711-2716. DOI: 10.1161/01.CIR.103.22.2711
Ikram, Sunnia, et al. (2025). "Transformer-based ECG classification for early detection of cardiac arrhythmias." Frontiers in Medicine, 12, 1600855.
Tang, Xiaoya, Berquist, Jake, Steinberg, Benjamin A., and Tasdizen, Tolga. (2024). "Hierarchical Transformer for Electrocardiogram Diagnosis." arXiv preprint arXiv:2411.00755.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11), 2278-2324.
Hochreiter, S., & Schmidhuber, J. (1997). "Long short-term memory." Neural Computation, 9(8), 1735-1780.
"Electrocardiogram (ECG) Signal Modeling and Noise Reduction Using Hopfield Neural Networks." Engineering, Technology & Applied Science Research (ETASR), Vol. 3, No. 1, 2013. https://etasr.com/index.php/ETASR/article/view/243/156
van de Leur, Rutger R., et al. (2022). "Improving explainability of deep neural network-based electrocardiogram interpretation using variational auto-encoders." European Heart Journal - Digital Health, 3(3), 2022. DOI: 10.1093/ehjdh/ztac038. https://github.com/UMCUtrecht-ECGxAI/ecgxai
Hasani, Ramin, et al. (2020). "Liquid Time-Constant Networks." arXiv preprint arXiv:2006.04439. https://github.com/raminmh/liquid_time_constant_networks
Rabiner, L. R. (1989). "A tutorial on hidden Markov models and selected applications in speech recognition." Proceedings of the IEEE, 77(2), 257-286.
Fine, S., Singer, Y., & Tishby, N. (1998). "The hierarchical hidden Markov model: Analysis and applications." Machine Learning, 32(1), 41-62.
Murphy, K. P. (2002). "Dynamic Bayesian Networks: Representation, Inference and Learning." Ph.D. thesis, UC Berkeley.
Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons.
Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). "Planning and acting in partially observable stochastic domains." Artificial Intelligence, 101(1-2), 99-134.
Kindermann, R., & Snell, J. L. (1980). Markov Random Fields and Their Applications. American Mathematical Society.
Granger, C. W. J. (1969). "Investigating causal relations by econometric models and cross-spectral methods." Econometrica, 37(3), 424-438.
Liang, Junbang, et al. (2025). "Video Generators are Robot Policies." arXiv preprint arXiv:2508.00795. https://videopolicy.cs.columbia.edu/