← Back
🎬
Efficient 3D CNN for Action Recognition
PythonPyTorch3D CNNsVideo Classification
Designed an efficient 3D CNN architecture inspired by ResNet-50 bottleneck designs, combined with TrivialAugmentation. The model uses approximately 13.8M parameters — 44.8% fewer than I3D — while achieving 86.24% accuracy on UCF-101 (vs I3D's 84.5% RGB-only). Trained from scratch, demonstrating that careful architectural choices and simple augmentation strategies can yield competitive performance without massive parameter counts.