An Application of Deep Learning Model to Specify Cardiovascular Diseases via Analyzing ECG Diagrams

A UTMIST Project by Hanni Ahmed, Maanas Arora, Nabil Mohamed, Thomas Zeger, (Timothy Kwong), and Yan Zhu.

* Reference for image: A. J. Huber, A. H. Leggett, and A. K. Miller, “Electrocardiogram: Blog chronicles value of old, but still vital cardiac test,” Scope, 19-Dec-2017. [Online]. Available: [Accessed: 24-May-2022].


Cardiovascular disease (CVD) is the leading cause of death in the world. According to the World Health Organization, about 17.9 million people died from CVDs in 2019, representing 32% of all global deaths [1]. While there are many types of CVDs, we focus on heart arrhythmias — one of the most common cardiac abnormalities among all the CVD categories.

Heart arrhythmia is characterized by irregular heartbeats and is caused by the dysfunctionalities of the electrical system in the heart. The diagnosis of arrhythmia is based on the comparisons of patients’ ECG recordings with the normal healthy ECGs and the identifications of abnormal heart rhythm [2]. An electrocardiogram (ECG) measures the electrical activity of the heart; it is helpful to doctors as it can help diagnose arrhythmias non-invasively and decide on the best course of treatment. Unfortunately, distinguishing heartbeats on ECGs can be time-consuming and difficult — the collected signals usually contain a lot of noise that significantly increases the time required for analyzing the ECG morphology and the probability of misinterpretations [3][4]. Thus, automating the classification process can help doctors to identify which type of ECG pattern a patient has quickly and more accurately.

This motivates our project to develop supervised machine learning models for classification of ECG heartbeats. Nowadays, many researchers believe that many healthcare and biomedical problems can be solved more efficiently by applying machine learning strategies. It is worthy of learning how to utilize computation to aid health science development, as it might become one of the main research directions in the future. In particular, we draw upon two different methods used in literature [3] and [5] to develop convolutional neural networks that classify heartbeats from ECG recordings. We are interested in comparing the two different architectures, evaluating their advantages and drawbacks, and seeking the model that performs the best for heart arrhythmia classifications.

Key Words: Arrhythmia, Heartbeats Classification, Convolutional Neural Network, Long Short-Term Memory, Deep Learning

Background: What is ECG

The heartbeats are primarily controlled by a group of autorhythmic cells lying on the heart’s right atrium, namely the sinoatrial node (or SA node). The electrical signals generated by the SA node spread through the entire heart and result in regular muscle contraction/relaxation cycles. An ECG measures and records the electrical potential changes of these signals. A healthy heart should have a normal rate (between 60–100 cycles per minute) and a constant pattern that contains the P wave, QRS complex, and the T wave. These waves correspond to the contraction and relaxation of the heart atria and ventricles. Many cardiac diseases can be identified from the ECGs; for instance, Figure2 shows the ECGs of some typical CVDs.

Figure1: Segments In ECG Signal [12]
Figure2: Normal ECG vs. ECGs for CVDs [12]

Related Works

CNN is the most commonly used model for biomedical image processing among deep learning architectures. It is computationally cheap compared to the conventional deep neural network and is good at analyzing spatial information [6]. It has been widely applied for medical image classifications, such as classifying lung diseases on chest X-ray diagrams and brain tumors on Magnetic Resonance Imaging (MRI) images [7]. Recently, CNNs are also popular in the domain of ECG analysis and classifications. The following two prior works exhibit how the CNNs are applied for such tasks, and they also inspire our project:

  1. A Deep Convolutional Neural Network Model To Classify Heartbeats BY U. Rajendra Acharya et al.[3]

This article proposed a 9-layer CNN model to detect cardiac abnormalities (especially arrhythmia) and classify the heart arrhythmias into five categories: non-ectopic (N), supraventricular ectopic (S), ventricular ectopic (V), fusion (F), and unknown (Q). Figure 1 illustrates the CNN structure, and Figure 2 shows the detailed information on different heartbeats categories. The input ECGs are filtered to remove high-frequency noise, and the results have shown that the model is robust enough to handle the noisy dataset. The model was trained on the open-source PhysioBank MIT-BIH Arrhythmia database [8]. When it was trained with the original, non-processed dataset (imbalanced as the majority of data belong to the N class), the accuracy of the CNN was 89.07% and 89.3% in noisy and noise-free ECGs, respectively. When the CNN was trained using the augmented data, the accuracy increased to 94.03% and 93.47% in original and noise-free ECGs, respectively.

This work motivates our first approach to the classification problem, as the CNN model is easy to implement while producing acceptable accuracy.

Figure3: CNN model layers [3]
Figure4: Explanations for five categories [3]

2. Automated Atrial Fibrillation Detection using a Hybrid CNN-LSTM Network on Imbalanced ECG Datasets BY Georgios Petmezas et al. [5]

Although CNN is well-known for its outstanding spatial information processing ability, it can’t process temporal data well. Therefore, while the first article proposed an easy and intuitive solution, some researchers argue that combining CNN with an RNN model, such as the LSTM, would further improve the model’s accuracy and robustness.

For instance, in the Automated Atrial Fibrillation Detection using a Hybrid CNN-LSTM Network on Imbalanced ECG Datasets, Georgios et al. proposed “a hybrid neural model utilizing focal loss, an improved version of cross-entropy loss, to deal with training data imbalance”. The spatial features are initially extracted via a Convolutional Neural Network (CNN) and are then fed to a Long Short-Term Memory (LSTM) model for temporal dynamics memorization. Instead of classifying the heartbeats into the five classes defined in the first related work, Georgios et al. classified ECGs into four rhythm types, namely normal (N), atrial fibrillation (AFIB), atrial flutter (AFL) and AV junctional rhythm (J). The model was trained on the MIT-BIH Atrial Fibrillation Database [8][9] and achieved a sensitivity of 97.87% and specificity of 99.29% using a ten-fold cross-validation strategy.

[note: Atrial fibrillation is also a type of heart arrhythmia]

Figure5: CNN-LSTM Architecture from Automated Atrial Fibrillation Detection using a Hybrid CNN-LSTM Network on Imbalanced ECG Datasets [5]


The dataset we used for this project is the PhysioBank MIT-BIH arrhythmia database [8]. “It contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979” [8], and has been widely used in many biomedical/machine learning projects. It is also the dataset used in the first related work.

Figure6: ECG Samples For The Five Categories We Aim To Classify [3]
Figure7: Sample ECGs from MIT-BIH Arrhythmia database [8]


The resulting data was preprocessed in two steps. In the first step, a wavelet transform was used to denoise the signal. In the second step, we used the Pan-Tompkins algorithm to detect R peaks and segment the ECG signal into heartbeats.

Figure8: The original input ECG diagram
Figure9: ECG after denoising and oversampling

Signal Denoising: Daubechies-6 Wavelet Decomposition Transform

The wavelet transform we performed was a daubechies-6 wavelet decomposition transform. “The Daubechies wavelets are a family of orthogonal wavelets defining a discrete wavelet transform and characterized by a maximal number of vanishing moments for some given support” [13]. The resulting wavelets, after the transform, are noise-free and retain the necessary diagnostics information contained in the original ECG signal [3][10], and therefore, can deliver the heartbeat information more precisely.

Peak Detection: Pan Tompkins Algorithm

To avoid using the dataset for expert-labeled locations for the R-peaks and generate results closer to a live detection environment, we used the Pan Tompkins algorithm to detect R peaks and segment the ECG signal into individual heartbeats [3].

Figure7: Visualization of the Pan-Tompkins Algorithms for R peak detection

The algorithm works as follows: a derivative of the ECG signal was computed with the difference coefficients of [1, -2, 2, 1]. On the moving average of this derivative, all points at which the signal moved from a negative derivative to a positive derivative (i.e., peaked) were located and marked as QRS complexes. These QRS complex locations were used to extract QRS intervals; the maximal point of each signal interval was marked as a potential R peak. Using an adaptive noise algorithm described in [3][11], detected peaks that fell below a signal threshold were removed as false R peaks. Finally, to avoid detecting T waves, any two detected peaks that occurred within a period shorter than a refractory period were scanned for the larger peak, and the other peak was removed. The resulting R peak detections were used to segment the ECG signal into individual heartbeats to be used as model inputs.

Addressing Class Imbalance

Once these two steps had been performed, the class imbalance issue had to be resolved: Around 90% of the heartbeat sequences obtained from the MIT-BIH Arrhythmia database correspond to the non-ectopic (N) class. Balancing of classes present in data is important for preventing overfitting and improving sensitivity scores of the model, and so we attempted two approaches to addressing the class imbalance in the dataset: generation of synthetic data and oversampling. We chose to stick with the latter since our model achieved similar if not better performance despite being much simpler to implement. It must be noted that this data augmentation was performed only on the training set. This ensures that our test set contains only real heartbeat sequences present in their usual distributions and thus serves as a fair measure of performance in a practical setting.

Generation of Synthetic Data

We generated new synthetic heartbeat sequences from the obtained sequences by introducing random Gaussian noise to the existing sequences belonging to minority classes.


This method addresses class imbalance by simply duplicating examples for relatively underrepresented classes in the dataset.

Network Architecture

Model 1: Convolutional Neural Network

The convolutional neural network we implemented was based on the aforementioned paper by Acharya et al. (Related Work 1). There are three one-dimensional convolutional layers, each of which uses a Leaky-ReLU activation and is followed by a max-pooling layer. The convolutional layers are important for picking up the key features of the heartbeat sequences, such as the QRS complexes, as well as the locations of different kinds of arrhythmia. Afterwards, the output feature map gets passed into a 3-layer fully-connected network, also using Leaky-ReLUs, followed by a softmax in the final layer. The exact model architecture, with kernel sizes used for the convolutional layers, is portrayed in the diagram below.

Figure8: The layers in the CNN model Figure9: Visualization of CNN model [3]
Figure9: Visualization of CNN model [3]

Model 2: CNN-LSTM Combination

While the convolutional layers are fit to extract local features of the data, they are not adequate for analyzing long-distance relations present in sequential data. This is because, in CNNs, long-distance relations are lost due to the local nature of the convolution kernels.

For that reason, when dealing with sequential data, recurrent neural networks (RNNs) models are preferred over CNNs.
However, as has been proven by the results obtained by the first model, local features are still important when dealing with ECG data. Furthermore, the convolution kernels can learn to denoise the data, thereby eliminating the need for the denoising preprocessing step, which an RNN would not be able to do. For those reasons, the convolutional layers are still desirable.
To leverage the inductive biases of both architectures, in our LSTM model, firstly, convolutional layers are applied to the data for denoising and local feature extraction purposes. Then, the extracted features are fed into LSTM layers so that the sequential nature of the data is exploited.
The LSTM model comprises three initial 1D convolutional layers followed by the LSTM layer and two dense linear layers. For all layers, the leaky ReLU activation function is employed. To learn more about the CNN-LSTM model, please refer to Figure 5.

The model architecture is presented in detail in Figure10, shown below.

Figure 10: The layers in the LSTM_CNN combination model


We trained the CNN model and CNN_LSTM models on the preprocessed dataset for 20 epochs, using categorical cross-entropy loss and the Adam optimizer with a learning rate of 0.001 and beta values of 0.9 and 0.999. The training-testing split is 90%-10%.

In order to compare and evaluate the performances of 2 models, in total, we did four experiments, including training and testing:

  1. CNN model on the denoised dataset
  2. CNN-LSTM model on the denoised dataset
  3. CNN model on the noisy dataset
  4. CNN-LSTM model on the noisy dataset

The training loss and accuracy for models 1 and 2 on denoised and noisy datasets are illustrated in Figures 11 to 18, respectively.

Figure 11: Training loss for CNN on denoised dataset (0.006459 after 20 epochs)
Figure 12: Training accuracy for CNN on denoised dataset (99.93% after 20 epochs)
Figure 13: Training loss for CNN-LSTM on denoised dataset (0.014350 after 20 epochs)
Figure 14: Training accuracy for CNN-LSTM on denoised dataset (99.82% after 20 epochs)
Figure 15: Training loss for CNN on noisy dataset (0.006163 after 20 epochs)
Figure 16: Training accuracy for CNN on noisy dataset (99.94% after 20 epochs)
Figure 17: Training loss for CNN-LSTM on noisy dataset (0.013887 after 20 epochs)
Figure 18: Training accuracy for CNN-LSTM on noisy dataset (99.83% after 20 epochs)

Results & Discussion

Comparison with Acharya et al.

It is worth mentioning that our CNN model outperformed the model proposed by Acharya et al. in terms of model accuracy on both original and denoised test sets (see table 1 below). Furthermore, it seems that denoising the ECG waveforms has little effect on model accuracy for both implementations, which indicates the robustness of the models and agrees with the conclusion made by Acharya et al. In addition, Acharya et al. used stochastic gradient descent for training, while we found that using Adam can further improve the training and testing accuracy using the same neural net and training for the same number of epochs.

It is important to note that this perceived improvement in accuracy is partially due to the fact that the test set used by Acharya et al. also contained synthetically generated data. This allowed them to ensure that both test and training sets had a balanced class distribution. However, this is not a rigorous method of evaluation because data that is generated from the training set can be present in the test set and vice versa, causing the model, to some extent, to lose generality. As mentioned earlier, our approach was only to use oversampling on the training set, which meant that our test set would have relatively more examples from the non-ectopic (class ’N’) than from any other class. Because of this, reaching higher accuracy levels on the test set may have been an easier task for our models (which had more ’N’ class examples to train from) than those used by Acharya et. al.. However, the data distribution in our test set is a better representation of the reality, and the accuracy we obtained is more helpful in demonstrating the model’s performance in real-world applications.

Table 1: Reported accuracies for our CNN and CNN-LSTM models on normal and denoised train and test sets. To compare the effect of noises in the input dataset, we also include the results of inputs without denoising. The negligible change in accuracy after denoising indicates that the transformation had a negligible effect on model performance.
Table 2: Reported test accuracies for CNN and CNN-LSTM models on different arrhythmia classes.

Comparison with Georgios et al.

Since Georgios et al. used a different dataset in the second related work, it is harder for us to conclude whether our model can beat their performance. Due to constraints on compute resources, our team was limited to running experiments on the MIT-BIH Arrhythmia dataset (which had ~24 hours of ECG recording data) with 20 epochs for training our models. Because of this, a rigorous and fair comparison between our model and that of Georgios et al. (who used ~230 hours of training data and trained models for ~100 epochs) is not possible.

Comparison of CNN and CNN-LSTM

Yet, while evaluating the CNN-LSTM model on the same dataset with the same classification method as the CNN model, the CNN model exhibits higher training and testing accuracy on both denoised and noisy datasets. Table 2 also shows the test accuracies of 2 models in all five arrhythmia classes. The CNN model beats the CNN-LSTM model for all the classes except for class F, where it underperforms by less than 3%. Furthermore, our simple CNN model is 2x faster to train than the CNN-LSTM model, implying the CNN-LSTM approach is more computationally expensive and power-consuming.

Therefore, according to our experiment results, a CNN model might be more suitable for classifying ECG diagrams, as it is easy to implement and is robust enough to the noises in the database.

For the sake of completeness, we also compare the average sensitivity, average specificity, and average PPV of the two models we implemented and the CNN model proposed by Acharya et al. The results are summarized in Table 3 below:

Table 3: Comparison of average sensitivity, specificity, and positive predictive (PPV) values across different models.

Sensitivity, specificity and PPV are metrics for evaluating the model performances (higher values are better). They are defined as [5]:


In this project, we implemented two deep learning models that take ECG signals as input and can classify the heartbeats into five categories. Among all the heart diseases, we focused on cardiac arrhythmia and used the PhysioBank MIT-BIH arrhythmia database for model training and final evaluations. The results have shown that both models performed similarly well and are robust to the noise in the dataset. Despite having fewer layers, the CNN model produces slightly higher accuracy overall and in most arrhythmia classes. After hyperparameter tuning and optimizer selection, our CNN model outperforms the model proposed by Acharya et al., achieving test accuracy of 98.32% and 98.19% on denoised and original datasets, respectively.


[1] “Cardiovascular diseases (cvds),” World Health Organization, 11-Jul-2021. [Online]. Available: [Accessed: 24-May-2022].

[2] “Arrhythmias — what is an arrhythmia?,” National Heart Lung and Blood Institute, 24-Mar-2022. [Online]. Available: [Accessed: 24-May-2022].

[3] U. R. Acharya, S. L. Oh, Y. Hagiwara, J. H. Tan, M. Adam, A. Gertych, and R. S. Tan, “A deep convolutional neural network model to classify heartbeats,” Computers in Biology and Medicine, vol. 89, pp. 389–396, 2017.

[4] Z. Faramand, S. O. Frisch, A. DeSantis, M. Alrawashdeh, C. Martin-Gill, C. Callaway, and S. Al-Zaiti, “Lack of significant coronary history and ECG misinterpretation are the strongest predictors of undertriage in prehospital chest pain,” Journal of Emergency Nursing, vol. 45, no. 2, pp. 161–168, 2019.

[5] G. Petmezas, K. Haris, L. Stefanopoulos, V. Kilintzis, A. Tzavelis, J. A. Rogers, A. K. Katsaggelos, and N. Maglaveras, “Automated atrial fibrillation detection using a hybrid CNN-LSTM network on imbalanced ECG datasets,” Biomedical Signal Processing and Control, vol. 63, p. 102194, 2021.

[6] C. Cao, F. Liu, H. Tan, D. Song, W. Shu, W. Li, Y. Zhou, X. Bo, and Z. Xie, “Deep learning and its applications in biomedicine,” Genomics, Proteomics & Bioinformatics, vol. 16, no. 1, pp. 17–32, 2018.

[7] S. S. Yadav and S. M. Jadhav, “Deep convolutional neural network based medical image classification for disease diagnosis,” Journal of Big Data, vol. 6, no. 1, 2019.

[8] A.L. Goldberger, et al., PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals, Circulation 101 (23) (2000) e215–e220.

[9] G.B. Moody, R.G. Mark, A new method for detecting atrial fibrillation using R-R intervals, Computers in Cardiology 10 (1983) 227–230.

[10] B.N. Singh, A.K. Tiwari, Optimal selection of wavelet basis function applied to ECG signal denoising, Digit. Signal Process. A Rev. J. 16 (3) (2006) 275–287.

[11] J. Pan, W.J. Tompkins, A real-time QRS detection algorithm, IEEE Trans. Biomed. Eng. BME-32 (3) (1985) 230–236.

[12] L. Sherwood, Human physiology: From cells to systems. Boston, MA, USA: Cengage Learning, 2016.

[13] “Daubechies wavelet,” Wikipedia, 28-Nov-2021. [Online]. Available:
%2C%20based%20on,moments%20for%20some%20given%20support. [Accessed: 24-May-2022].



University of Toronto Machine Intelligence Team

UTMIST’s Technical Writing Team publishes articles on topics within machine learning to our official publication: