Basics of Reinforcement Learning

अब हम Deep Learning की एक और शानदार शाखा की ओर बढ़ते हैं —
🎮 Reinforcement Learning (RL)
जहाँ agent खुद से environment से सीखता है — trial and error के ज़रिए।

🔶 1. Reinforcement Learning क्या होता है?

Reinforcement Learning (RL) एक ऐसा learning paradigm है जिसमें एक agent environment में actions लेता है और rewards के आधार पर सीखता है कि कैसे बेहतर decision लिए जाएँ।

🎯 “RL is learning by interacting with the environment.”

🧠 Real-World Analogy:

Scenario	RL Mapping
बच्चा साइकिल चलाना सीखता है	Agent learns by falling & balancing
गेम खेलते समय स्कोर बढ़ाना	Agent earns reward by right actions
रेस्टोरेंट में नया खाना try करना	Exploration of unknown choices

🧩 2. Key Components of RL

Component	Description
🧠 Agent	जो decision लेता है (AI system)
🌍 Environment	जिसमें agent operate करता है
🎯 State (S)	वर्तमान स्थिति (e.g., board configuration)
🎮 Action (A)	जो कदम agent लेता है
💰 Reward (R)	Action के बदले मिलने वाली feedback
🔄 Policy (π)	Action लेने की strategy
🔮 Value Function (V)	किसी state की “future reward” expectation
🧮 Q-Value (Q)	Action के आधार पर reward की quality

🔁 3. RL का Interaction Cycle (Markov Decision Process – MDP)

     ┌──────────────┐
     │  Environment │
     └──────────────┘
            ▲
            │ reward r(t)
            ▼
        ┌────────┐
        │ Agent  │
        └────────┘
            ▲
            │ action a(t)
            ▼
     state s(t+1) ←─────── state s(t)

🔁 Cycle Explained:

Agent observes state St
Chooses an action At using policy
Environment responds with next state St+1and reward Rt
Agent uses this feedback to improve policy π

🎮 4. Types of Reinforcement Learning

Type	Description
✅ Positive RL	Reward मिलने पर behavior को reinforce करना
❌ Negative RL	Punishment से गलत behavior avoid करना
🔄 Model-Free RL	Direct interaction से सीखना (e.g., Q-Learning)
🧠 Model-Based RL	Environment का internal model बनाना

📈 5. Applications of RL

Domain	Example
🕹️ Games	AlphaGo, Chess, Atari
🚗 Robotics	Arm control, walking agents
📈 Finance	Portfolio optimization
🌐 Recommendation	Ad placement, content ranking
🤖 NLP	Chatbot behavior tuning
🧬 Healthcare	Treatment policies, dosage optimization

📝 Practice Questions

Reinforcement Learning और Supervised Learning में क्या अंतर है?
Markov Decision Process (MDP) क्या है?
Policy और Value function में क्या फर्क है?
RL में Reward का क्या महत्व है?
Q-Value क्या दर्शाता है?

🧠 Summary

Concept	Explanation
RL	Learning by interacting
Agent	Learner / decision maker
Environment	Where actions happen
Reward	Feedback for actions
Policy	Strategy to act
Q-Value	Expected future reward for action in state