Anand Singh, Author at AlfaTechLab

Deep Neural Networks (DNN)

July 11, 2025 by Anand Singh

(डीप न्यूरल नेटवर्क्स)

🔶 1. What is a Deep Neural Network?

📌 परिभाषा:

Deep Neural Network (DNN) एक ऐसा artificial neural network होता है जिसमें एक से ज़्यादा hidden layers होते हैं।

👉 यह shallow network (जैसे simple MLP जिसमें 1 hidden layer हो) से अलग होता है क्योंकि इसमें “depth” होती है — यानी कई layers जो input से output तक data को progressively abstract करती हैं।

🧠 Structure of a DNN:

Input Layer → Hidden Layer 1 → Hidden Layer 2 → ... → Hidden Layer N → Output Layer

हर layer neurons का group होता है
Each neuron applies:

z=w⋅x+b, a=f(z)

जहाँ f कोई activation function होता है

📊 Example:

मान लीजिए एक DNN जिसमें:

Input Layer: 784 nodes (28×28 image pixels)
Hidden Layer 1: 512 neurons
Hidden Layer 2: 256 neurons
Output Layer: 10 neurons (digits 0–9 classification)

🔷 2. Why Use Deep Networks?

❓ क्यों shallow networks काफी नहीं होते?

Shallow networks simple problems के लिए ठीक हैं
लेकिन complex tasks (जैसे image recognition, NLP, audio classification) में input-output relationship बहुत nonlinear होती है

✅ Deep networks:

High-level features को automatically extract कर सकते हैं
Abstractions को hierarchy में capture करते हैं

🧠 Hierarchical Feature Learning:

Layer	Learns
Layer 1	Edges, curves
Layer 2	Shapes, textures
Layer 3	Objects, faces

🔶 DNN की Architecture क्या होती है?

Architecture का मतलब होता है कि DNN में कितनी layers हैं, हर layer में कितने neurons हैं, activation functions क्या हैं, और input-output data का flow कैसा है।

📊 High-Level Structure:

Input Layer → Hidden Layer 1 → Hidden Layer 2 → ... → Output Layer

हर layer दो चीज़ें करती है:

Linear Transformation z=W⋅x+b
Activation Function a=f(z)

🔷 2. Components of a DNN Architecture

Component	Description
Input Layer	Raw input data (e.g., image pixels, features)
Hidden Layers	Intermediate processing layers (more = more depth)
Output Layer	Final predictions (e.g., class scores)
Weights & Biases	Parameters learned during training
Activation Functions	Adds non-linearity (ReLU, Sigmoid, etc.)
Loss Function	Measures prediction error
Optimizer	Updates weights using gradients (SGD, Adam)

🧠 Typical Architecture Example (MNIST Digits):

Layer Type	Shape	Notes
Input	(784,)	28×28 image flattened
Dense 1	(784 → 512)	Hidden Layer 1 + ReLU
Dense 2	(512 → 256)	Hidden Layer 2 + ReLU
Output	(256 → 10)	Digit prediction + Softmax

🧮 3. Mathematical View

🔧 4. PyTorch Code: Custom DNN Architecture

import torch.nn as nn

class DNN(nn.Module):
    def __init__(self):
        super(DNN, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(784, 512),     # Input to Hidden 1
            nn.ReLU(),
            nn.Linear(512, 256),     # Hidden 1 to Hidden 2
            nn.ReLU(),
            nn.Linear(256, 10)       # Output Layer
        )

    def forward(self, x):
        return self.net(x)

📈 Visualization of Architecture

[Input Layer: 784]
         ↓
[Dense Layer: 512 + ReLU]
         ↓
[Dense Layer: 256 + ReLU]
         ↓
[Output Layer: 10 (classes)]

🔍 Key Architecture Design Questions

कितनी hidden layers होनी चाहिए?
हर layer में कितने neurons?
कौन सा activation function चुनना है?
क्या dropout, batch norm चाहिए?
Loss function कौन सा है?

🎯 Summary:

Element	Role
Layers	Input → Hidden(s) → Output
Activation	Non-linearity लाती है
Depth	Layers की संख्या
Width	Neurons per layer
Optimizer	Gradient से weights update करता है

📝 Practice Questions:

DNN की architecture में कौन-कौन से भाग होते हैं?
Hidden layers कितनी होनी चाहिए — इससे क्या फर्क पड़ता है?
Activation function का क्या महत्व है architecture में?
DNN architecture में overfitting कैसे रोका जाता है?
Architecture tuning कैसे किया जाता है?

🔶 Training a DNN

💡 Standard Process:

Forward Pass: Prediction generate करना
Loss Calculation: Prediction vs ground truth
Backward Pass: Gradient computation
Optimizer Step: Weights update

🚧 Challenges in Training Deep Networks:

Challenge	Solution
Vanishing Gradients	ReLU, BatchNorm, Residual connections
Overfitting	Dropout, Data Augmentation
Computational Cost	GPU acceleration, Mini-batch training

🔧 4. PyTorch Code: Simple DNN for Classification

import torch.nn as nn

class SimpleDNN(nn.Module):
    def __init__(self):
        super(SimpleDNN, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(784, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        return self.model(x)

🔬 5. Applications of DNNs

Domain	Use Case
Computer Vision	Image classification, Object detection
NLP	Text classification, Sentiment analysis
Healthcare	Disease prediction from X-rays
Finance	Credit scoring, Fraud detection
Robotics	Sensor fusion, control systems

📈 Summary:

Term	Meaning
DNN	Neural network with 2+ hidden layers
Depth	Refers to number of layers
Power	Learns complex mappings from data
Challenges	Vanishing gradients, Overfitting, Compute cost

📝 Practice Questions:

DNN और shallow network में क्या फर्क है?
DNN के training में कौन-कौन सी steps होती हैं?
Vanishing gradient क्या होता है और इसे कैसे solve किया जाता है?
PyTorch में DNN implement करने का तरीका बताइए।
DNN किन-किन क्षेत्रों में प्रयोग किया जाता है?

Overfitting, Underfitting and Regularization

July 11, 2025 by Anand Singh

(ओवरफिटिंग, अंडरफिटिंग और रेग्युलराइजेशन)

🔶 1. Underfitting क्या है?

📌 परिभाषा:

Underfitting तब होता है जब model training data को भी सही से नहीं सीख पाता।

🔍 संकेत:

High training loss
Low accuracy (train & test दोनों पर)
Model simple है या data complex

🧠 कारण:

Model बहुत छोटा है
कम training epochs
Features अच्छे से represent नहीं किए गए

🔶 2. Overfitting क्या है?

📌 परिभाषा:

Overfitting तब होता है जब model training data को बहुत अच्छे से याद कर लेता है, लेकिन test data पर fail हो जाता है।

🔍 संकेत:

Training loss बहुत low
Test loss बहुत high
Accuracy train पर high, test पर low

🧠 कारण:

Model बहुत complex है (बहुत सारे parameters)
कम data
ज़्यादा epochs
Noise को भी सीख लिया model ने

📈 Summary Table:

Type	Train Accuracy	Test Accuracy	Error
Underfitting	Low	Low	High Bias
Overfitting	High	Low	High Variance
Just Right	High	High	Low Bias & Variance

🔧 3. Regularization Techniques

🔷 Purpose:

Regularization techniques model को generalize करने में मदद करते हैं — यानी unseen (test) data पर बेहतर perform करना।

📌 Common Regularization Methods:

✅ A. L1 & L2 Regularization:

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, weight_decay=0.001)  # L2

✅ B. Dropout:

कुछ neurons को randomly deactivate कर दिया जाता है training के दौरान
इससे model सभी features पर ज़रूरत से ज़्यादा निर्भर नहीं करता

nn.Dropout(p=0.5)

✅ C. Early Stopping:

जैसे ही validation loss बढ़ना शुरू हो जाए — training रोक दी जाती है
इससे overfitting रोका जाता है

✅ D. Data Augmentation:

Image, text, या audio data को थोड़ा modify करके training set को बड़ा बनाना
इससे model को general patterns सीखने में मदद मिलती है

✅ E. Batch Normalization:

nn.BatchNorm1d(num_features)

🔁 PyTorch Example with Dropout:

import torch.nn as nn

model = nn.Sequential(
    nn.Linear(100, 50),
    nn.ReLU(),
    nn.Dropout(p=0.5),
    nn.Linear(50, 10)
)

🧠 Diagnostic Plot:

Epochs →	📉 Train Loss	📈 Test Loss
1–5	High → Low	High → Low
6–20	Low	Starts rising → Overfitting starts

🎯 Summary:

Concept	Definition	Solution
Underfitting	Model कम सीखता है	Bigger model, more training
Overfitting	Model बहुत ज़्यादा सीख लेता है	Regularization
Regularization	Generalization सुधारने की तकनीक	Dropout, L2, Data Augmentation

📝 Practice Questions:

Underfitting और Overfitting में क्या अंतर है?
Dropout कैसे काम करता है?
L2 Regularization का loss function में क्या योगदान है?
Early stopping क्यों काम करता है?
Data augmentation overfitting से कैसे बचाता है?

Learning Rate, Epochs, Batches

July 11, 2025 by Anand Singh

(लर्निंग रेट, एपॉक्स, और बैचेस)

🔶 1. Learning Rate (सीखने की रफ़्तार)

📌 Definition:

Learning Rate (η) एक hyperparameter है जो यह नियंत्रित करता है कि training के दौरान weights कितनी तेज़ी से update हों।

यह Gradient Descent के update rule का हिस्सा होता है:

🎯 Learning Rate की भूमिका:

Value	Effect
बहुत छोटा (<0.0001)	Slow learning, stuck in local minima
बहुत बड़ा (>1.0)	Overshooting, unstable training
सही मध्यम	Smooth convergence to minimum loss

📈 Visual Explanation:

Low LR: धीरे-धीरे valley में पहुंचता है
High LR: आगे-पीछे कूदता रहता है, valley मिस कर देता है
Ideal LR: सीधे valley में पहुँचता है

📘 PyTorch में Learning Rate:

optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

🔶 2. Epochs (Training Iterations over Dataset)

📌 Definition:

Epoch एक cycle होती है जिसमें पूरा training dataset once neural network में pass किया जाता है — forward + backward pass दोनों।

अगर आपके पास 1000 images हैं और आपने 10 epochs चलाए, तो model ने dataset को 10 बार देखा।

🎯 अधिक Epochs का मतलब:

Model को सीखने का ज्यादा मौका मिलता है
लेकिन overfitting का खतरा बढ़ता है

🔶 3. Batches और Batch Size

📌 Batch:

Dataset को छोटे-छोटे टुकड़ों (chunks) में divide करके training करना batch training कहलाता है।

हर batch पर forward और backward pass किया जाता है।

Batch Size: कितने samples एक साथ process होंगे
Common sizes: 8, 16, 32, 64, 128

🎯 Why Use Batches?

Advantage	Explanation
Memory Efficient	पूरा dataset memory में लोड करने की ज़रूरत नहीं
Faster Computation	GPU पर vectorized तरीके से काम होता है
Noise helps generalization	Stochastic updates model को overfitting से बचाते हैं

🔁 Relationship Between All Three:

Concept	Definition
Epoch	One full pass over the entire dataset
Batch Size	Number of samples processed at once
Iteration	One update step = One batch

Example:

Dataset size = 1000
Batch size = 100
Then, 1 epoch = 10 iterations
If we train for 10 epochs → total 100 iterations

🔧 PyTorch Code:

from torch.utils.data import DataLoader, TensorDataset
import torch

# Dummy data
X = torch.randn(1000, 10)
y = torch.randint(0, 2, (1000, 1)).float()
dataset = TensorDataset(X, y)

# DataLoader with batch size
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

# Training loop
for epoch in range(5):  # 5 epochs
    for batch_x, batch_y in dataloader:
        # Forward pass, loss calculation, backward, step
        ...

📝 Summary Table:

Term	Meaning	Typical Value
Learning Rate	Step size for weight updates	0.001 – 0.01
Epoch	One full pass over dataset	10 – 100
Batch Size	Samples per update	32, 64, 128
Iteration	One weight update step	dataset_size / batch_size

🎯 Objectives Recap:

Learning Rate = Weights कितना move करें
Epoch = Dataset कितनी बार pass हो
Batch Size = एक बार में कितने samples process हों
इन तीनों का tuning model performance के लिए critical है

📝 Practice Questions:

Learning Rate क्या होता है और इसका काम क्या है?
Batch Size और Iteration में क्या संबंध है?
Overfitting का खतरा किस स्थिति में अधिक होता है: कम epochs या ज़्यादा epochs?
PyTorch में DataLoader का क्या काम है?
Batch training क्यों करना ज़रूरी होता है?

Back Propagation :Backward Pass (Gradient Descent)

July 11, 2025 by Anand Singh

Backward Pass (Gradient Descent)

(बैकवर्ड पास और ग्रेडिएंट डिसेंट)

🔶 1. Backward Pass क्या है?

Backward Pass (या Backpropagation) एक ऐसी प्रक्रिया है जिसमें neural network द्वारा की गई गलती (loss) को input की दिशा में “वापस” propagate किया जाता है — ताकि यह पता लगाया जा सके कि network की किस weight ने कितनी गलती की।

यह gradient information तब use होती है weights को सही दिशा में adjust करने के लिए ताकि अगली बार prediction बेहतर हो सके।

🎯 उद्देश्य:

“Neural network की prediction में हुई गलती को mathematically trace करके यह पता लगाना कि model के कौन-कौन से weights इस गलती के ज़िम्मेदार हैं, और उन्हें कैसे सुधारना है।”

🔄 2. Process Overview: Forward → Loss → Backward → Update

पूरा Training Loop:

Input → Forward Pass → Output → Loss Calculation → 
Backward Pass → Gradient Calculation → 
Optimizer Step → Weight Update

🧮 3. गणितीय दृष्टिकोण: Chain Rule से Gradient निकालना

मान लीजिए:

y=f(x)
L=Loss(y,y^)

तो:

जहाँ:

w: model parameter (weight)
z=w⋅x+b
y=f(z) (activation function)

यहाँ हम chain rule का उपयोग कर एक neuron से अगले neuron तक derivative propagate करते हैं — यही कहलाता है backpropagation.

📘 4. Gradient Descent: Training की Core Algorithm

Weight Update Rule:

जहाँ:

η: learning rate (सीखने की रफ़्तार)
∂L/∂w: loss का gradient उस weight के respect में
यह बताता है weight को किस दिशा और मात्रा में adjust करना है

⚠️ यदि Learning Rate बहुत बड़ी हो:

Model overshoot कर जाता है
Training unstable हो जाती है

⚠️ यदि बहुत छोटी हो:

Model बहुत धीरे सीखता है
Local minima में अटक सकता है

🔧 5. PyTorch Implementation Example:

import torch
import torch.nn as nn
import torch.optim as optim

# Model
model = nn.Sequential(
    nn.Linear(2, 3),
    nn.ReLU(),
    nn.Linear(3, 1),
    nn.Sigmoid()
)

# Data
x = torch.tensor([[1.0, 2.0]])
y = torch.tensor([[0.0]])

# Loss and Optimizer
criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# ----------- Training Step ------------
# Step 1: Forward Pass
y_pred = model(x)

# Step 2: Loss Calculation
loss = criterion(y_pred, y)

# Step 3: Backward Pass
optimizer.zero_grad()     # Clear old gradients
loss.backward()           # Backpropagate
optimizer.step()          # Update weights

🧠 6. Visual Explanation:

Training Flowchart:

          Prediction
              ↑
          Forward Pass
              ↓
        Loss Calculation
              ↓
        Backward Pass
              ↓
    Gradient w.r.t. Weights
              ↓
       Optimizer Step
              ↓
        Weights Updated

🔍 7. Roles of Key PyTorch Methods:

Method	Purpose
`loss.backward()`	Gradient calculate करता है loss से सभी weights तक
`optimizer.step()`	Calculated gradients को use करके weights update करता है
`optimizer.zero_grad()`	पुराने gradients को clear करता है

💡 उदाहरण: Gradient कैसे काम करता है?

मान लीजिए model ने y^=0.8 predict किया और true label था y=1, तो loss होगा: L=(y−y^)2=(1−0.8)2=0.04

इसका gradient: dL/ dy =2(y^−y)=2(0.8−1)=−0.4

यह negative gradient बताता है कि prediction कम था, weight को बढ़ाने की ज़रूरत है।

📝 अभ्यास प्रश्न (Practice Questions):

Backward Pass क्या करता है neural network में?
Gradient Descent का update rule लिखिए
PyTorch में loss.backward() किसका काम करता है?
Chain Rule क्यों ज़रूरी है backpropagation में?
Learning Rate अधिक होने से क्या खतरा होता है?

🎯 Objectives Recap:

Backward Pass = Loss से gradient निकालने की प्रक्रिया
Gradient Descent = Weights update करने की तकनीक
Chain Rule = Gradient को propagate करने का आधार
PyTorch ने इस पूरे process को automate किया है

Backpropagation and Training ,Forward Pass and Loss Calculation

July 11, 2025July 9, 2025 by Anand Singh

(बैकप्रोपेगेशन और प्रशिक्षण प्रक्रिया)

🔷 1. Introduction (परिचय)

Backpropagation = “Backwards propagation of errors”
यह एक algorithm है जो Neural Network के weights को loss function के आधार पर update करता है।

🔁 2. Training का पूरा Flow

Forward Pass:
Input → Hidden → Output
(Prediction generate होता है)
Loss Calculation:
Prediction और Ground Truth के बीच Loss मापा जाता है
Backward Pass (Backpropagation):
Loss का Gradient calculate होता है हर weight के लिए
Weight Update (Optimizer):
Gradient Descent द्वारा Weights को update किया जाता है
Repeat for Epochs:
जब तक model सटीक prediction न करने लगे

🔧 3. Backpropagation: कैसे काम करता है?

Backpropagation एक mathematical tool है जो Chain Rule of Calculus का उपयोग करता है:

जहाँ:

L: Loss
y: Prediction
z: Neuron input
w: Weight

🧠 Visual Diagram (Flowchart):

Input → Hidden → Output → Loss
                    ↑
                 Backpropagate Gradient
                    ↓
           Update Weights (via Optimizer)

💻 PyTorch में Training Code (Simple)

import torch
import torch.nn as nn
import torch.optim as optim

# Simple Model
model = nn.Linear(1, 1)
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training Data
x = torch.tensor([[1.0], [2.0]])
y = torch.tensor([[2.0], [4.0]])

# Training Loop
for epoch in range(10):
    outputs = model(x)                  # Forward
    loss = criterion(outputs, y)       # Compute Loss
    
    optimizer.zero_grad()              # Clear gradients
    loss.backward()                    # Backpropagation
    optimizer.step()                   # Update weights
    
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

🔍 Important Terms Recap

Term	Explanation
Forward Pass	Prediction बनाना
Loss	गलती मापना
Backward Pass	Gradient निकालना
Optimizer	Weight update करना
Epoch	Dataset पर एक complete pass

🎯 Learning Objectives Summary

Backpropagation सीखने का आधार है
यह Loss function के gradient के आधार पर weights को सुधारता है
Optimizer gradient का उपयोग कर model को बेहतर बनाता है
पूरा training loop PyTorch में automate होता है

📝 अभ्यास प्रश्न (Practice Questions)

Backpropagation किस rule पर आधारित है?
Forward Pass और Backward Pass में क्या अंतर है?
Loss.backward() क्या करता है?
Optimizer.zero_grad() क्यों ज़रूरी है?
एक training loop के चार मुख्य steps क्या होते हैं?

Forward Pass and Loss Calculation

(फ़ॉरवर्ड पास और लॉस की गणना)

🔶 1. Forward Pass (इनपुट से आउपट तक की यात्रा)

Forward Pass वह चरण है जिसमें neural network किसी इनपुट को लेकर उसे विभिन्न layers के माध्यम से प्रवाहित करता है ताकि अंतिम output (prediction) उत्पन्न हो।

🔁 Step-by-Step Process:

Input Vector (x) network में भेजा जाता है
Each Layer:
- Performs a linear transformation

Applies an activation function

Final output layer देता है prediction y^

🧠 Neural Network Flow:

scssCopyEditInput → Hidden Layer 1 → Hidden Layer 2 → ... → Output Layer → Prediction

Linear + Activation sequence हर layer पर लागू होता है
Output layer की activation function task पर निर्भर करती है (जैसे classification के लिए softmax या sigmoid)

📌 Forward Pass का उद्देश्य:

Input features को progressively abstract करना
Neural network के current weights से output generate करना
इस output को असली label से compare कर पाने लायक बनाना

🔷 2. Loss Calculation (Prediction की ग़लती मापना)

Once prediction (y^) is ready, it is compared to the true label (y) using a Loss Function.
Loss function हमें बताता है prediction कितना सही या गलत है।

🔢 Common Loss Functions:

🔁 Loss Calculation का Flow:

Prediction (y^) compute करो
Ground truth label yसे compare करो
Error calculate करो using loss function
यही error backpropagation में propagate किया जाएगा

💻 PyTorch Example: Loss and Prediction

import torch
import torch.nn as nn

# Input and True label
x = torch.tensor([[1.0, 2.0]])
y_true = torch.tensor([[0.0]])

# Define simple model
model = nn.Sequential(
    nn.Linear(2, 3),
    nn.ReLU(),
    nn.Linear(3, 1),
    nn.Sigmoid()
)

# Loss function
criterion = nn.BCELoss()

# Forward Pass
y_pred = model(x)

# Loss Calculation
loss = criterion(y_pred, y_true)

print("Prediction:", y_pred.item())
print("Loss:", loss.item())

🔁 Complete Diagram Flow:

Input → Hidden → Output → Loss
                    ↑
                 Backpropagate Gradient
                    ↓
           Update Weights (via Optimizer)

✅ यह दर्शाता है कि forward pass prediction करता है, और loss function उस prediction की गुणवत्ता को मापता है।

📌 Summary Table

Step	Description
Forward Pass	Input को through layers भेजना
Output	Prediction generate करना
Loss Calculation	Prediction और label के बीच का अंतर मापना
Next Step	Loss को use करके gradient calculate करना

🎯 Objectives Recap:

Forward Pass transforms input → prediction
Loss function tells how wrong the prediction is
Loss is key for guiding weight updates

📝 Practice Questions:

Forward Pass में कौन-कौन सी operations होती हैं?
z=Wx+b का क्या अर्थ है?
Binary Classification के लिए कौन-सा loss function उपयोग होता है?
Loss function और optimizer में क्या अंतर है?
PyTorch में loss calculate करने के लिए कौन-कौन सी steps होती हैं?