Deep Q-Network (DQN) - AlfaTechLab

आपने Reinforcement Learning की core technique Q-Learning को समझा —
अब हम उसी का Deep Learning version सीखेंगे:
🧠 Deep Q-Network (DQN)

🔶 1. What is DQN?

Deep Q-Network (DQN) एक ऐसा algorithm है जो traditional Q-Learning को Deep Neural Network से combine करता है।
जब state space बहुत बड़ा या continuous होता है (जैसे images, video frames), वहाँ Q-table बनाना possible नहीं होता — इसलिए हम use करते हैं Neural Network to approximate the Q-function: Q(s,a)≈Qθ(s,a)

🎯 “DQN maps states to Q-values using a deep neural network.”

📈 2. Why DQN?

Limitation of Q-Learning	DQN का समाधान
Large state-action space	Neural network approximation
Slow convergence	Experience replay
Instability in training	Target networks

🧠 3. Key Concepts in DQN

🔹 a) Q-Network

A deep neural network takes state as input
Outputs Q-values for each possible action

🔹 b) Experience Replay Buffer

Past experiences (s,a,r,s′) store किए जाते हैं
Random mini-batches से training होती है → reduces correlation

🔹 c) Target Network

Q-value targets एक fixed target network से लिए जाते हैं
Target network को हर कुछ steps पर update किया जाता है
इससे training stable होता है

🧪 4. DQN Architecture (High Level)

Input: State (e.g., image pixels)
↓
Convolutional Layers (if image input)
↓
Fully Connected Layers
↓
Output: Q-values for all possible actions

🔁 5. DQN Training Loop

Initialize Q-network (Q) and target network (Q_target)
Initialize replay memory D

For each episode:
    Initialize state s
    For each step in episode:
        Choose action a using ε-greedy policy on Q(s)
        Execute action a → observe reward r and next state s'
        Store (s, a, r, s') in replay memory D

        Sample random mini-batch from D:
            For each (s, a, r, s'):
                target = r + γ * max_a' Q_target(s', a')
                loss = (Q(s, a) - target)^2
                Backpropagate and update Q

        Every N steps:
            Q_target ← Q  # update target network

🧮 6. Loss Function

Qθ: main network
Qθ: target network (frozen for N steps)

🕹️ 7. Applications of DQN

Domain	Example
🎮 Games	Atari (Breakout, Space Invaders)
🤖 Robotics	Navigation and obstacle avoidance
📈 Finance	Trading bots
🧭 Navigation	Path finding agents
🧪 Simulations	Training virtual agents (OpenAI Gym)

🧠 DQN Variants

Variant	Idea
🧮 Double DQN	Reduces overestimation of Q-values
🔄 Dueling DQN	Splits value and advantage streams
📦 PER	Prioritized Experience Replay
🌐 Rainbow DQN	Combines all tricks for best results

📝 Practice Questions

DQN में Q-Table क्यों नहीं बनती?
Experience replay का क्या लाभ है?
Target network क्यों ज़रूरी है?
DQN और Q-Learning में मुख्य अंतर क्या है?
ε-greedy policy क्या है और क्यों use होती है?

🧠 Summary Table

Concept	Description
DQN	Deep learning + Q-Learning
Q-Network	Approximates Q(s, a)
Experience Replay	Past experience store and reuse
Target Network	Stability के लिए frozen Q-network
Loss	Squared Bellman error