Vanishing and Exploding Gradients

(घटते और फूटते ग्रेडिएंट्स की समस्या)

🔶 1. Problem Statement:

जब DNN को train किया जाता है (backpropagation के ज़रिए), तो gradients को layers के बीच backward propagate किया जाता है।

लेकिन बहुत गहरी networks में, ये gradients:

Gradient values इतनी छोटी हो जाती हैं कि weights effectively update ही नहीं हो पाते।
Training slow या completely stuck हो जाती है।

जब activation functions (जैसे Sigmoid या Tanh) के derivatives हमेशा < 1 होते हैं
और बहुत सी layers multiply होती हैं:

Gradients बहुत तेजी से बड़े हो जाते हैं
→ Weights extremely large
→ Model becomes unstable
→ Loss: NaN या infinity

Layer 1 ← 0.0003
Layer 2 ← 0.0008
Layer 3 ← 0.0011
...
Final layers learn nothing

Layer 1 ← 8000.2
Layer 2 ← 40000.9
Layer 3 ← 90000.1
...
Loss becomes NaN

Problem	Solution
Vanishing Gradient	ReLU Activation Function
	He Initialization (weights)
	Batch Normalization
	Residual Connections (ResNet)
Exploding Gradient	Gradient Clipping
	Proper Initialization
	Lower Learning Rate

torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()

Issue	Cause	Effect	Fix
Vanishing	Small gradients in deep layers	No learning	ReLU, He init, BatchNorm
Exploding	Large gradients	NaN loss	Gradient clipping, Proper init