ML Archives - Page 7 of 17

Model Training, Saving, and Loading in PyTorch

July 12, 2025 by Anand Singh

अब हम PyTorch में Model Training, फिर उसे Save और Load करने की पूरी प्रक्रिया विस्तार से सीखते हैं —
जो किसी भी Deep Learning project का core हिस्सा है।

🔷 1. 🔁 Model Training in PyTorch

🧱 Training Steps Overview:

Model बनाना (nn.Module)
Loss function चुनना (nn.CrossEntropyLoss, etc.)
Optimizer सेट करना (torch.optim)
Forward pass करना
Loss calculate करना
Backward pass (loss.backward())
Optimizer step (optimizer.step())

🧪 Full Example (Classifier):

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sample data
X = torch.tensor([[0.,0.],[0.,1.],[1.,0.],[1.,1.]])
y = torch.tensor([[0.],[1.],[1.],[0.]])

dataset = TensorDataset(X, y)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

# Step 1: Model
class XORNet(nn.Module):
    def __init__(self):
        super(XORNet, self).__init__()
        self.fc1 = nn.Linear(2, 4)
        self.fc2 = nn.Linear(4, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return torch.sigmoid(self.fc2(x))

model = XORNet()

# Step 2: Loss and Optimizer
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)

# Step 3: Training loop
for epoch in range(500):
    for xb, yb in dataloader:
        y_pred = model(xb)
        loss = criterion(y_pred, yb)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    if epoch % 100 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

🔷 2. 💾 Saving a PyTorch Model

PyTorch में 2 तरीके हैं model save करने के:

✅ Option 1: State Dict Only (Recommended)

torch.save(model.state_dict(), "xor_model.pth")

यह केवल model के weights save करता है, architecture नहीं।

✅ Option 2: Complete Model (Not Recommended)

torch.save(model, "xor_model_full.pth")

यह पूरा model + structure save करता है, पर version compatibility issues आ सकते हैं।

🔷 3. 📂 Loading a Model

🔁 Load from State Dict (Best Practice):

model = XORNet()  # पहले architecture बनाओ
model.load_state_dict(torch.load("xor_model.pth"))
model.eval()  # evaluation mode में डालना ज़रूरी

🔍 .eval() inference के समय BatchNorm / Dropout को deactivate करता है।

🧠 Bonus: GPU Compatibility (Saving/Loading)

✅ Save on GPU, Load on CPU:

# Save (from GPU)
torch.save(model.state_dict(), "model_gpu.pth")

# Load on CPU
device = torch.device("cpu")
model.load_state_dict(torch.load("model_gpu.pth", map_location=device))

🧪 Example: Inference After Loading

model.eval()
with torch.no_grad():
    test = torch.tensor([[1., 1.]])
    output = model(test)
    print("Predicted:", output.item())

📦 Advanced Tip: Save Optimizer State Too

torch.save({
    'model_state': model.state_dict(),
    'optimizer_state': optimizer.state_dict(),
}, "checkpoint.pth")

Load Later:

checkpoint = torch.load("checkpoint.pth")
model.load_state_dict(checkpoint['model_state'])
optimizer.load_state_dict(checkpoint['optimizer_state'])

🧠 Evaluation Tips

हमेशा model.eval() use करें inference के लिए
torch.no_grad() में prediction करें (memory efficiency के लिए)
Large models के लिए model checkpoints का उपयोग करें

📝 Practice Questions:

PyTorch में model train करने के मुख्य steps क्या हैं?
State dict और full model saving में क्या अंतर है?
Optimizer state क्यों save करना चाहिए?
.eval() और .train() modes में क्या फर्क है?
Inference में torch.no_grad() का उपयोग क्यों करें?

🧠 Summary Table

Task	Method
🔧 Train	Forward → Loss → Backward → Optimizer
💾 Save Weights	`torch.save(model.state_dict(), path)`
📂 Load Weights	`model.load_state_dict(torch.load(path))`
📌 Inference	`model.eval()` + `with torch.no_grad()`
🔁 Save with Optimizer	Use `checkpoint = {'model': ..., 'opt': ...}`

Introduction of PyTorch

July 12, 2025July 12, 2025 by Anand Singh

आपने TensorFlow और Keras सीख लिया — अब बारी है PyTorch की, जो research और flexibility के लिए Deep Learning community में सबसे लोकप्रिय framework है।

PyTorch एक open-source deep learning framework है जिसे Facebook AI Research Lab (FAIR) ने 2016 में विकसित किया।

यह framework खासकर researchers और advanced developers के बीच पसंदीदा है, क्योंकि यह:

Dynamic computation graph प्रदान करता है (runtime पर बदलता है)
Pythonic और NumPy जैसा syntax देता है
GPU acceleration आसानी से करता है

🔑 Features:

Feature	Description
🧮 Dynamic Graphs	Real-time control (ज़्यादा flexibility)
📊 Tensor Library	NumPy जैसे operations with GPU support
🧠 Autograd	Automatic gradient calculation
🔧 Modular API	Neural nets = Modules
🖥️ GPU Ready	CUDA support

🔶 2. PyTorch Installation

pip install torch torchvision

🔷 3. Tensors in PyTorch

Tensors are multi-dimensional arrays (similar to NumPy arrays) but with GPU support.

import torch

x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
print(x.shape)           # torch.Size([2, 2])
print(x + x)             # Tensor addition
print(x @ x.T)           # Matrix multiplication

✅ Use GPU:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = x.to(device)

🔷 4. Autograd – Automatic Differentiation

x = torch.tensor(2.0, requires_grad=True)
y = x**3 + 2*x
y.backward()
print(x.grad)   # dy/dx = 3x^2 + 2 = 3*2^2 + 2 = 14

🔷 5. Building a Simple Neural Network

🔨 Step 1: Import Libraries

🔨 Step 1: Import Libraries

import torch
import torch.nn as nn
import torch.optim as optim

🔨 Input and Output for XOR

X = torch.tensor([[0.,0.],[0.,1.],[1.,0.],[1.,1.]])
y = torch.tensor([[0.],[1.],[1.],[0.]])

🔨 Step 2: Define Model

class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.fc1 = nn.Linear(2, 4)
        self.fc2 = nn.Linear(4, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        return torch.sigmoid(self.fc2(x))

🔨 Step 3: Instantiate Model

model = MyNet()

🔨 Step 4: Define Loss and Optimizer

criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

🔨 Step 5: Train the Model

for epoch in range(100):
    y_pred = model(X)
    loss = criterion(y_pred, y)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    print(f"Epoch {epoch}, Loss: {loss.item()}")

🔧 Important PyTorch Modules

Module	Description
`torch.Tensor`	Main data structure
`torch.nn`	For building neural nets
`torch.nn.functional`	Activation functions, loss functions
`torch.optim`	Optimizers like Adam, SGD
`torch.utils.data`	Dataset and DataLoader tools
`torchvision`	Image datasets and transformations

🔷 6. Example: XOR with MLP

X = torch.tensor([[0.,0.],[0.,1.],[1.,0.],[1.,1.]])
y = torch.tensor([[0.],[1.],[1.],[0.]])

model = nn.Sequential(
    nn.Linear(2, 4),
    nn.ReLU(),
    nn.Linear(4, 1),
    nn.Sigmoid()
)

loss_fn = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

for epoch in range(2000):
    y_pred = model(X)
    loss = loss_fn(y_pred, y)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if epoch % 200 == 0:
        print(f"Epoch {epoch}, Loss: {loss.item()}")

📝 Practice Questions:

PyTorch और TensorFlow में मुख्य अंतर क्या है?
PyTorch में tensor कैसे बनाते हैं?
Autograd का उपयोग gradient के लिए कैसे करते हैं?
एक simple model class कैसे बनाते हैं?
Sequential API और custom model class में क्या फर्क है?

🧠 Summary Table

Concept	Explanation
Tensor	PyTorch का data container (NumPy + GPU)
Autograd	Automatic differentiation
`nn.Module`	Neural network architecture का base class
Optimizer	Parameters को update करता है
Loss Function	Model की गलती measure करता है

TensorFlow and Keras Basics

July 12, 2025 by Anand Singh

अब जब आपने Deep Learning के theory और models (जैसे CNN, RNN, BERT) अच्छे से समझ लिए हैं —
तो अगला practical step है:
⚙️ TensorFlow और Keras के साथ Deep Learning models बनाना सीखना।

🔷 1. TensorFlow क्या है?

TensorFlow एक open-source deep learning framework है जिसे Google ने बनाया है।
यह numerical computation और large-scale machine learning models के लिए design किया गया है।

🧠 TensorFlow का नाम “Tensor” (data structure) + “Flow” (computation graph) से आया है।

✅ Key Features:

Feature	Detail
📊 Automatic Differentiation	Gradient calculation
🧮 GPU/TPU Support	तेज़ computation
🧠 High-level + Low-level APIs	Flexibility
🔧 Deployment	Android, Web, Edge devices
🤝 Ecosystem	TF Hub, TF Lite, TF.js, TF-Serving

🔷 2. Keras क्या है?

Keras एक high-level deep learning API है, जो TensorFlow के ऊपर चलता है।
यह models को लिखना, train करना और debug करना बहुत आसान बना देता है।

🎯 “Keras = Simplicity + Productivity + Modularity”

✅ Keras क्यों चुनें?

Benefit	Reason
🚀 Easy to Learn	Pythonic syntax
🧩 Modular	Layers, Optimizers, Loss अलग-अलग
🧠 Powerful	Advanced models possible
🔧 Fast prototyping	जल्दी result देखने के लिए
🔌 TF Backend	TensorFlow की ताकत use करता है

🔷 3. Tensor, Model, and Layer Basics

🔹 Tensor:

Multidimensional array (जैसे NumPy array, लेकिन GPU-compatible)

import tensorflow as tf
x = tf.constant([[1, 2], [3, 4]])
print(x.shape)  # (2, 2)

🔹 Layer:

Neural network का एक building block (Dense, Conv2D, LSTM)

from tensorflow.keras.layers import Dense
dense = Dense(units=64, activation='relu')

🔹 Model:

Input से Output तक का पूरा network architecture

from tensorflow.keras.models import Sequential

model = Sequential([
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

🔷 4. Keras में Model बनाना (Step-by-Step)

✅ Step 1: Import

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

✅ Step 2: Model Define

model = Sequential([
    Dense(64, activation='relu', input_shape=(100,)),
    Dense(10, activation='softmax')
])

✅ Step 3: Compile

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

✅ Step 4: Train

model.fit(x_train, y_train, epochs=10, batch_size=32)

✅ Step 5: Evaluate

model.evaluate(x_test, y_test)

🧪 Example: Simple Binary Classifier

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(16, activation='relu', input_shape=(2,)),
    Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model.fit(X_train, y_train, epochs=20)

🔧 Useful Layers in Keras

Layer	Use
Dense	Fully connected layer
Conv2D	Image convolution layer
LSTM / GRU	Sequence modeling
Dropout	Regularization
Flatten	Input flattening
Embedding	Word embedding for NLP

🧠 Visualization: Model Summary

model.summary()

📝 Practice Questions:

TensorFlow और Keras में क्या अंतर है?
Sequential model क्या होता है?
Model को compile करने में किन चीज़ों की ज़रूरत होती है?
Dense layer क्या है?
एक simple 3-layer model का कोड लिखिए।

🧠 Summary Table

Concept	Description
TensorFlow	Google का ML framework
Keras	Easy high-level API
Tensor	Multidimensional data
Layer	Model का हिस्सा (Dense, Conv)
Model	Complete NN architecture

Transformers and BERT

July 12, 2025 by Anand Singh

अब हम NLP के सबसे क्रांतिकारी अविष्कारों की ओर बढ़ते हैं —
🚀 Transformers और BERT — जिन्होंने NLP की दुनिया को पूरी तरह बदल दिया है।

🔶 1. Transformers: Introduction

Transformer architecture 2017 में Google ने पेश किया, पेपर:

📄 “Attention is All You Need” — Vaswani et al.

इसने Recurrent Networks (RNN, LSTM) की dependency को हटा दिया और NLP को पूरी तरह से revolutionize कर दिया।

📐 Transformer की Key Idea: Self-Attention

हर word sentence के बाकी सभी words के context को साथ में समझता है, न कि केवल पिछले शब्दों को।

🔧 Architecture Overview

Transformer दो मुख्य हिस्सों में बंटा होता है:

[Encoder] →→→→→→→→→ [Decoder]

Part	Role
Encoder	Input text को समझना (e.g., sentence meaning)
Decoder	Output generate करना (e.g., translation, caption)

Note: BERT सिर्फ Encoder यूज़ करता है, GPT सिर्फ Decoder।

🔁 Self-Attention Mechanism

हर शब्द input में बाकी सभी शब्दों से relate करता है:

Sentence: "The cat sat on the mat"
"cat" → attends to "the", "sat", "mat" etc. via attention scores

🔢 Attention Equation:

जहाँ:

Q: Query
K: Key
V: Value
dk: Key vector dimension

⚙️ Transformer के Components:

Component	Explanation
🔹 Multi-Head Attention	Parallel attention layers for better learning
🔹 Positional Encoding	Sequence order की जानकारी add करता है
🔹 Feedforward Network	Linear + non-linear layers
🔹 Layer Normalization	Stable training
🔹 Residual Connections	Gradient flow बनाए रखता है

🧠 2. BERT: Bidirectional Encoder Representations from Transformers

📄 “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” – Devlin et al., 2018

🎯 मुख्य उद्देश्य:

Language Understanding — Chatbots, Q&A, classification

🔧 कैसे काम करता है?

BERT केवल Transformer Encoder architecture पर आधारित है।
यह दोनों तरफ के context को एक साथ पढ़ता है — इसलिए Bidirectional है।

📊 Pretraining Tasks:

Masked Language Modeling (MLM)
- Sentence में कुछ शब्दों को mask किया जाता है, और model को predict करना होता है।
textCopyEditInput: "The [MASK] is shining" Output: "sun"
Next Sentence Prediction (NSP)
- दो sentences दिए जाते हैं — model को यह predict करना होता है कि दूसरा sentence पहले के बाद आता है या नहीं।

📦 Pretrained BERT Models:

Variant	Description
`bert-base-uncased`	Lowercase English, 12 layers
`bert-large-uncased`	24 layers, large model
`DistilBERT`	Lightweight, faster
`Multilingual BERT`	100+ languages

🔧 BERT Applications:

Task	Example
✅ Sentiment Analysis	“I love this product!” → Positive
🧠 Question Answering	“Where is Taj Mahal?” → “Agra”
✍️ Named Entity Recognition	“Barack Obama is from USA” → Person, Country
💬 Chatbots	Intent understanding
📃 Text Classification	News, spam, legal docs

🧰 Example: HuggingFace Transformers

from transformers import BertTokenizer, BertModel

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")

inputs = tokenizer("I love deep learning", return_tensors="pt")
outputs = model(**inputs)

🧠 Transformer vs BERT

Aspect	Transformer	BERT
Type	General architecture	Pretrained NLP model
Structure	Encoder + Decoder	Only Encoder
Direction	Depends	Bidirectional
Application	Translation, captioning	Understanding, classification

📈 Transformers & BERT Impact

Area	Impact
📚 Research	NLP को neural-level accuracy
🗣️ Chatbots	Smarter conversations
🧾 Legal/Medical	Automated document understanding
🧠 AI Models	Foundation for GPT, T5, RoBERTa, etc.

📝 Practice Questions:

Transformer architecture में self-attention का क्या role है?
BERT bidirectional क्यों है?
Masked Language Modeling का मतलब क्या है?
BERT किन NLP tasks के लिए use होता है?
HuggingFace से BERT कैसे load करते हैं?

🧠 Summary Table

Term	Description
Transformer	Sequence model using attention mechanism
BERT	Bidirectional encoder for NLP tasks
MLM	Mask words and predict
NSP	Predict sentence relationship
Applications	Q&A, classification, chatbot, NER

Sequence Models for Text (RNN, LSTM)

July 12, 2025 by Anand Singh

NLP की एक महत्वपूर्ण category — Sequence Models — की ओर बढ़ते हैं।
Text data inherently sequential होता है (हर word का order matter करता है), और इसी कारण हमें ऐसे models की ज़रूरत होती है जो sequence को याद रख सकें।

🔶 1. Sequence Data क्या होता है?

Text = शब्दों का क्रम (sequence of words):
जैसे: "मैं स्कूल जा रहा हूँ।"
यहाँ “जा रहा” और “जा रही” में फर्क होता है — क्रम मायने रखता है।

🧠 Sequence models का कार्य है – इस क्रम और संदर्भ को समझना।

🔁 2. Recurrent Neural Network (RNN)

📌 उद्देश्य:

ऐसे model बनाना जो पिछले शब्दों का context याद रखकर अगला शब्द समझें या predict करें।

🔧 Working (Step-by-step):

हर समय step पर input आता है (word) और hidden state update होता है:

x₁ → x₂ → x₃ ...
↓     ↓     ↓
h₁ → h₂ → h₃ → Output

यह hidden state ht पिछली जानकारी को अगली word processing में उपयोग करता है।

⚠️ RNN की सीमाएं (Limitations)

समस्या	विवरण
❌ Vanishing Gradient	लंबे sentences में पिछले context की जानकारी खो जाती है
❌ Fixed memory	पुराने शब्दों को ठीक से नहीं याद रख पाता
❌ Slow training	Sequential nature के कारण parallelization कठिन

🔄 3. LSTM (Long Short-Term Memory)

LSTM, RNN का एक बेहतर version है — जिसे इस समस्या को हल करने के लिए 1997 में Hochreiter & Schmidhuber ने प्रस्तावित किया।

📌 Core Idea:

LSTM में एक special memory cell होता है जो decide करता है कि कौन-सी जानकारी याद रखनी है, कौन-सी भूलनी है, और कौन-सी update करनी है।

🧠 Key Components of LSTM:

Gate	Role
🔒 Forget Gate	क्या भूलना है
🔓 Input Gate	क्या जोड़ना है
📤 Output Gate	अगले step में क्या भेजना है

📊 LSTM Architecture (Flow)

Input xₜ → [Forget Gate, Input Gate, Output Gate] → Cell State → Output hₜ

LSTM sequence को ज़्यादा देर तक याद रखने में सक्षम होता है।

🔢 Equations (Simplified):

🧪 Practical Example:

📌 Use Case: Text Generation

Input: “The sun”
Output: “The sun is shining brightly today…”

LSTM last words को याद रखकर अगला word predict करता है।

🧰 Python Code Example (PyTorch)

import torch.nn as nn

class LSTMModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, vocab_size)

    def forward(self, x):
        x = self.embedding(x)
        out, _ = self.lstm(x)
        out = self.fc(out)
        return out

🤖 RNN vs LSTM Comparison

Feature	RNN	LSTM
Memory	Short	Long
Gates	No	Yes (forget, input, output)
Vanishing Gradient	Common	Handled
Use Case	Simple patterns	Complex sequences

📈 Applications of Sequence Models

Task	Use
🔤 Language Modeling	Next word prediction
✍️ Text Generation	Poetry, story generation
📧 Spam Detection	Sequential classification
🎧 Speech Recognition	Audio-to-text
🧠 Sentiment Analysis	Review understanding
💬 Chatbots	Human-like conversation

📝 Practice Questions:

Sequence model की जरूरत NLP में क्यों पड़ती है?
RNN का drawback क्या है?
LSTM कैसे context याद रखता है?
LSTM में तीन मुख्य gates कौन से हैं?
एक छोटा सा PyTorch LSTM model का code लिखिए।

🧠 Summary Table

Term	Meaning
RNN	Sequence modeling network
LSTM	Long-memory capable RNN
Gates	Decide memory control
Application	Text, audio, time-series
Limitation	RNN: short memory; LSTM: handles long-term context