Emotion-Recognition in a noisy environment using Machine Learning Algorithms

sound wave

Project Description

The aim of this project is to find how different machine learning algorithms perform in classifying emotions from noisy samples. Additionally, it endeavours to find whether there is an optimal set of sound filters that when applied to the noisy samples increase the accuracy of these algorithms. Due to a lack of available databases that have "real" samples with speakers that exhibit emotion, the data was personally retrieved from youtube clips. Things that were taken into account when selecting the clips were the amount and type of background noise, the emotion exhibited, the gender of the person, and the length of the samples. This way 60 sampels were retrieved in total, evenly split to the two emotions happiness and neutral and the gender of the speaker. These were then split to 40:20 which were used for training/validation and testing respectively.


Filters

Before any sort of training could take place, the samples had to be preprocessed. This includes converting them into the right format of .wav, and applying audio filters in-order to reduce the noise. The software that was used for both of these steps is ffmpeg. The filters that were used to preprocess the samples were a combination of the lowpass and highpass filters, and the arnndn filter. With the lowpass and highpass filter it is possible to cut out certain frequencies from the files. This means that all frequencies other than human voice frequencies can be removed. The frequency range that was used was between 200 to 3000 Hz. The arnndn filter is used to reduce noise from speech using recurrent neural networks. It takes as input a model m and then applies this to the sample in order try and isolate the voice. These filters were tested extensively with different options, alone and together in-order to determine the best accuracy in the training and validation set.


Machine Learning Algorithms

Once the preprocessing of the samples was completed, they were used to train models to be used in 3 different machine learning algorithms. SVM, Random Forest, and K-Nearest Neighbours. Grid search and 5-fold cross-validation was used in order to determine the best hyperparameters to use. Preliminary results of this shows that Random Forest performs better than SVM with an average accuracy of 73% compared to SVMs and KNNs accuracy of 72%. For Random Forest an average was taken since the results fluctuate due to the nature of the classifier. SVM and KNN returned consistent results and did not require the calculation of an average. Take note that this testing is still on the training and validation set and not the final result.


Results

When comparing the results, it can be seen that SVM performed best from the three algorithms with an F1-score of 0.762 and 0.737 for the happy and neutral classes respectively. Both KNN and RF performed similarly with KNN outperforming RF in the happy class and RF outperforming KNN in the neutral class. For a more in-depth analysis of the results, please refer to the thesis, which you can find below.

Download the thesis here.