Famous CNN Architectures (LeNet, AlexNet, VGG, ResNet)

🔶 1. LeNet-5 (1998) – By Yann LeCun

🕰 वर्ष: 1998
👤 Developer: Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner

📌 विवरण:

LeNet-5 को Yann LeCun ने विकसित किया था, और इसे पहली बार बैंकिंग सिस्टम में ZIP code पढ़ने के लिए उपयोग किया गया।
यह architecture CNN की पहली वास्तविक success story मानी जाती है।

📚 महत्व:

MNIST जैसे digit recognition datasets पर काफी सफल रहा
पहली बार convolution, pooling, और fully connected layers का इस्तेमाल एक structure में किया गया
यह काम Bell Labs में किया गया था

📌 Use Case: Handwritten digit recognition (MNIST)

🧱 Architecture:

Input: 32x32 Grayscale Image  
→ Conv (6 filters, 5x5)  
→ Avg Pooling  
→ Conv (16 filters, 5x5)  
→ Avg Pooling  
→ Flatten  
→ FC (120)  
→ FC (84)  
→ Output (10 classes)

✅ Highlights:

पहली सफल CNN architecture
बहुत कम parameters
आज भी MNIST जैसे tasks पर उपयोगी

🔶 2. AlexNet (2012) – By Alex Krizhevsky et al.

📌 Use Case: ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

🕰 वर्ष: 2012
👤 Developer: Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton
🏫 संस्थान: University of Toronto

📌 विवरण:

AlexNet ने 2012 में ImageNet Large Scale Visual Recognition Challenge (ILSVRC) में हिस्सा लिया और बाकी सभी models से बहुत आगे निकल गया — 15.3% error से सीधे 10.1% पर आया।

📚 महत्व:

Deep Learning की mainstream लोकप्रियता इसी model से शुरू हुई
First CNN जिसने GPU का उपयोग करके बहुत बड़ा dataset train किया
Introduced ReLU, Dropout, और Data Augmentation

🧱 Architecture:

Input: 224x224x3 Image  
→ Conv1 (96 filters, 11x11, stride 4)  
→ MaxPool  
→ Conv2 (256 filters, 5x5)  
→ MaxPool  
→ Conv3 (384 filters, 3x3)  
→ Conv4 (384 filters, 3x3)  
→ Conv5 (256 filters, 3x3)  
→ MaxPool  
→ Flatten  
→ FC (4096)  
→ FC (4096)  
→ FC (1000)

✅ Highlights:

ReLU activation introduced
Used Dropout and Data Augmentation
Trained on GPU
Won ILSVRC 2012 with huge margin

🔶 3. VGGNet (2014) – By Oxford (Simonyan & Zisserman)

📌 Use Case: ImageNet Classification

🕰 वर्ष: 2014
👤 विकासकर्ता: Karen Simonyan & Andrew Zisserman
🏫 संस्थान: University of Oxford – Visual Geometry Group (VGG)

📌 विवरण:

VGGNet ने convolution layers को एक uniform pattern (3×3 filters) में रखा, जिससे deeper networks को design करना सरल हो गया।

📚 महत्व:

Depth बढ़ाने से accuracy कैसे improve होती है, यह दिखाया
आज भी कई pre-trained models और transfer learning में उपयोग किया जाता है
VGG-16 और VGG-19 के रूप में दो variants प्रसिद्ध हैं

🧱 Architecture (VGG-16):

Input: 224x224x3  
→ (Conv 3x3, 64) ×2  
→ MaxPool  
→ (Conv 3x3, 128) ×2  
→ MaxPool  
→ (Conv 3x3, 256) ×3  
→ MaxPool  
→ (Conv 3x3, 512) ×3  
→ MaxPool  
→ (Conv 3x3, 512) ×3  
→ MaxPool  
→ Flatten  
→ FC (4096)  
→ FC (4096)  
→ FC (1000)

✅ Highlights:

Simple, uniform design
Uses only 3×3 convolution filters
Very deep (~16–19 layers)
More accurate than AlexNet, but slower

🔶 4. ResNet (2015) – By Microsoft Research

📌 Use Case: Deep classification without degradation

🕰 वर्ष: 2015
👤 विकासकर्ता: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
🏢 संस्थान: Microsoft Research

📌 विवरण:

ResNet ने deep learning की सबसे बड़ी समस्या — Vanishing Gradient — को solve किया, जिससे 100+ layers वाले deep networks train हो सके।

📚 महत्व:

Introduced Residual (Skip) Connections, जिससे deep models भी सीख सकते हैं
2015 ILSVRC में 1st position हासिल की
आज के अधिकांश deep vision models (e.g., Faster R-CNN, Mask R-CNN, ResNeXt) ResNet के foundation पर बने हैं

🧱 Key Innovation: Residual Connections

Output = F(x) + x

🧱 Architecture (ResNet-50):

Uses bottleneck blocks:
- Conv1x1 → Conv3x3 → Conv1x1
- Shortcut connections (skip layers)
Total ~50 layers
Variants: ResNet-18, 34, 50, 101, 152

✅ Highlights:

Solves vanishing gradient problem
Enables very deep networks (100+ layers)
Won ILSVRC 2015

📊 Comparison Table:

Model	Year	Layers	Key Feature	Use Case
LeNet	1998	~7	Simplicity	Digit Recognition
AlexNet	2012	8	ReLU, GPU, Dropout	Large-scale image classification
VGG-16	2014	16	Uniform 3×3 filters	ImageNet
ResNet-50	2015	50	Residual connections	Deep classification

📈 Visual Insight:

LeNet → Basic template
AlexNet → CNN popularized
VGG → Depth with simplicity
ResNet → Deepest with performance

🔧 PyTorch Example: Load Pretrained ResNet

from torchvision import models

model = models.resnet50(pretrained=True)

📝 Practice Questions:

LeNet और AlexNet में क्या अंतर है?
VGG architecture में सभी convolutions का आकार 3×3 क्यों रखा गया?
ResNet में residual connection का क्या लाभ है?
AlexNet को GPU पर क्यों train किया गया था?
ResNet को 100+ layers तक train करना कैसे संभव हो पाया?

🎯 Summary:

Architecture	Best For	Key Contribution
LeNet	Simple tasks	CNN base design
AlexNet	Complex images	ReLU, GPU training
VGG	Clean structure	Deep with uniformity
ResNet	Ultra-deep nets	Skip connections