How DeepShield AI Detects Deepfakes
A dual-branch neural network that analyses both spatial pixel patterns and frequency-domain artifacts — trained on 566K+ face images from three major deepfake datasets.
Model Architecture
Spatial Branch
EfficientNetV2-S (ImageNet pretrained) extracts 1280-dimensional spatial feature vectors. Detects pixel-level artifacts like blending boundaries, lighting inconsistencies, and warping.
Frequency Branch
A 4-layer CNN processes the FFT magnitude spectrum of each face. Deepfakes leave subtle spectral fingerprints invisible to the human eye but detectable in the frequency domain.
SE Attention Fusion
Squeeze-and-Excitation attention dynamically weights the concatenated 1536-d feature vector before the final binary classifier, focusing on the most discriminative signals.
Training Pipeline
Datasets
4 manipulation methods (Deepfakes, Face2Face, FaceSwap, NeuralTextures)
High-quality celebrity deepfakes
Diverse demographics and lighting conditions
Training Config
Optimizer
AdamW
LR (backbone)
1e-4
LR (new layers)
1e-3
Scheduler
Cosine Annealing
Warmup
3 epochs
Batch Size
8 (eff. 32)
Precision
FP16 AMP
Regularisation
Dropout 0.4
Label Smoothing
0.05
Early Stopping
Patience 7
Unfreezing
3-phase progressive
Best Epoch
6 of 13
Grad-CAM++ Explainability
Every detection comes with a visual explanation. Grad-CAM++ generates a heatmap showing which spatial regions the model focused on — making AI decisions transparent and interpretable.
Face Region
The aligned 224×224 face crop automatically extracted using RetinaFace landmark detection.
Grad-CAM++ Heatmap
Red/warm regions show highest model attention. For fakes, this typically highlights blending boundaries and texture inconsistencies.
FFT Spectrum
The frequency magnitude spectrum reveals spectral artifacts. Deepfakes often show distinctive cross or grid patterns.