Deepfake video detection using a PSO-optimized Efficientnet-B4 and LSTM hybrid framework
DOI:
https://doi.org/10.18488/76.v13i1.4818Abstract
Recent advances in deepfake generation technologies have made it possible to generate synthetic videos of unprecedented quality with relatively limited effort, raising serious concerns for digital security, media authenticity, and misinformation. This paper presents a hybrid architecture that combines EfficientNet-B4 to extract high-quality spatial features and Long Short-Term Memory (LSTM) networks for modeling the temporal sequence. Moreover, Particle Swarm Optimization (PSO) is utilized within the training framework to automatically adjust important hyperparameters such as learning rate and LSTM hidden layer units, leading to convergence stability and improved detection performance. The model is trained and tested on the FaceForensics++ (FF++) dataset, which contains 6,450 videos with both real and fake data. Experimental results show that the baseline EfficientNet-B4+LSTM model achieves an accuracy of 86.51%, with precision and recall at 85.28% and 73.87%, respectively. After hyperparameter optimization with PSO, performance improves significantly to 90.91% accuracy, 86.98% precision, and 81.23% recall. A comparative study with re-implemented baseline models, RNN+LSTM and ResNet-50+LSTM, further verifies the superiority of the proposed hybrid method. The results demonstrate the effectiveness of integrating optimized spatial-temporal learning for deepfake detection. Practically, the proposed framework is envisioned to provide a reliable solution for digital forensics, cybersecurity, and media authentication systems, with strong potential for deployment in real-world content verification applications.
