What is Batch Normalization

Batch Normalization (BatchNorm) एक technique है जो training को stabilize, accelerate और improve करने के लिए इस्तेमाल होती है।

यह हर layer के output (activations) को normalize कर देती है ताकि उनका distribution mean=0 और variance=1 के आस-पास रहे।

इससे gradients ज़्यादा smooth होते हैं और training तेज़ होती है।

🔁 2. क्यों ज़रूरी है?

Deep networks में, जैसे-जैसे layers बढ़ती हैं, activations का distribution shift होने लगता है — इस समस्या को कहते हैं:

📉 Internal Covariate Shift

BatchNorm इसका समाधान है — यह हर batch के output को rescale और re-center करता है।

🧮 3. Mathematical Explanation

मान लीजिए किसी layer का output x है।

Step 1: Mean और Variance निकालना

Step 2: Normalize

Step 3: Scale and Shift

यहाँ:

γ, β सीखने योग्य parameters हैं
ϵएक छोटा constant है stability के लिए

🔧 4. PyTorch Implementation

import torch.nn as nn

model = nn.Sequential(
    nn.Linear(128, 64),
    nn.BatchNorm1d(64),     # BatchNorm for 1D input
    nn.ReLU(),
    nn.Linear(64, 10)
)

For images, use nn.BatchNorm2d(num_channels)

📈 5. Benefits of BatchNorm

Benefit	Explanation
✅ Faster Training	Smoother gradients → fast convergence
✅ Higher Learning Rates	Without instability
✅ Reduced Need for Dropout	Acts as light regularizer
✅ Mitigates Vanishing/Exploding Gradients	Keeps activations in check
✅ Generalization Improves	Better test accuracy

🔍 6. Where to Apply?

Type	Apply BatchNorm After
Linear (Dense)	`Linear → BatchNorm1d → Activation`
Conv2D Layer	`Conv2d → BatchNorm2d → Activation`

⚠️ 7. Training vs Inference

During training → mean & variance per-batch
During inference → running average of mean & variance

PyTorch automatically handles this internally using .train() and .eval() modes.

🔁 With and Without BatchNorm (Effect on Accuracy):

Epoch	Without BatchNorm	With BatchNorm
5	62%	79%
10	71%	87%
20	76%	91%

📝 Practice Questions:

Batch Normalization का मुख्य उद्देश्य क्या है?
Internal Covariate Shift किसे कहते हैं?
PyTorch में BatchNorm1d और BatchNorm2d में क्या अंतर है?
BatchNorm में γ और βका क्या role है?
क्या BatchNorm dropout की तरह regularization भी करता है?

🎯 Summary:

Feature	BatchNorm Impact
Stability	⬆️ Improves
Speed	⬆️ Faster Training
Generalization	✅ Helps prevent overfitting
Gradient Flow	✅ Prevents vanishing/exploding