Comparative analysis of machine and deep learning models with text embeddings for sentiment analysis
DOI:
https://doi.org/10.18488/76.v13i1.4709Abstract
This study presents a comprehensive comparative evaluation of traditional machine learning (ML) algorithms Naïve Bayes, Random Forest, and Support Vector Machine (SVM) against a deep learning model, Long Short-Term Memory (LSTM), using three distinct text embedding techniques: Term Frequency-Inverse Document Frequency (TF-IDF), FastText, and Word2Vec. A dataset comprising 30,001 social media posts was employed to assess performance across multiple evaluation metrics, including accuracy, precision, recall, F1-score, ROC-AUC, and log loss. Experimental findings reveal that the combination of LSTM with Word2Vec embeddings achieves superior performance, recording an accuracy of 92.65%, an F1-score of 94.37%, a ROC-AUC of 95.70%, and the lowest log loss value of 0.2074. Among the classical machine learning models, Random Forest emerged as the most effective, outperforming Naïve Bayes and SVM in terms of balanced accuracy and generalization capability. The results underscore the pivotal influence of embedding representation in sentiment analysis and demonstrate that deep learning models, when integrated with semantically rich embeddings, can effectively capture contextual dependencies within textual data. The study thus provides valuable insights into developing robust sentiment analysis frameworks and recommends future exploration of hybrid and ensemble learning approaches to enhance generalization and interpretability in real-world natural language processing applications.
