import torch
import torch.nn as nn
import torch.optim as optim
X = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
Y = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)
class MLP(nn.Module):
def init(self):
super(MLP, self).init()
self.hidden = nn.Linear(2, 2) # 2 neurons in hidden layer
self.output = nn.Linear(2, 1) # 1 neuron in output layer
self.sigmoid = nn.Sigmoid() # Activation function
def forward(self, x):
x = self.sigmoid(self.hidden(x)) # Hidden layer
x = self.sigmoid(self.output(x)) # Output layer
return x
model = MLP()
criterion = nn.MSELoss() # Mean Squared Error Loss
optimizer = optim.SGD(model.parameters(), lr=0.1) # Stochastic Gradient Descent
epochs = 1000
for epoch in range(epochs):
optimizer.zero_grad() # Clear gradients
output = model(X) # Forward pass
loss = criterion(output, Y) # Compute loss
loss.backward() # Backpropagation
optimizer.step() # Update weights
if (epoch + 1) % 100 == 0:
print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}")
print(“\nFinal Predictions After Training:”)
print(model(X).detach().numpy()) # Convert to NumPy for readability
A simple Multi-Layer Perceptron (MLP) in PyTorch to learn the XOR function. Let’s break it line by line and understand the purpose of each keyword and expression, especially with a beginner-friendly explanation.
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn as nn
: Imports the neural network module, which includes layers, activation functions, etc.
import torch.optim as optim
: Imports optimization algorithms (e.g., SGD, Adam) to update model weights during training.
import torch
: Loads the PyTorch library, which allows you to work with tensors (like arrays) and build models.
X = torch.tensor([[0,0],[0,1],[1,0],[1,1]], dtype=torch.float32)
Y = torch.tensor([[0],[1],[1],[0]], dtype=torch.float32)
torch.tensor(...)
: Converts lists into PyTorch tensors.
dtype=torch.float32
: Ensures inputs are in float format (required for neural networks).
X
is the input (2-bit values for XOR), and Y
is the target (output of XOR).
So:
0 XOR 0 = 0
0 XOR 1 = 1
1 XOR 0 = 1
1 XOR 1 = 0
class MLP(nn.Module):
def init(self):
super(MLP,self).init()
self.hidden = nn.Linear(2,2)
self.output = nn.Linear(2,1)
self.sigmoid = nn.Sigmoid()
class MLP(nn.Module)
: Defines a custom neural network called MLP
, subclassing nn.Module
(base class for all models).
def __init__(self)
: Constructor; defines the layers and activations.
super(MLP, self).__init__()
: Initializes the parent class (nn.Module
) to use its features.
self.hidden = nn.Linear(2, 2)
: First layer (input to hidden); 2 inputs → 2 hidden units.
self.output = nn.Linear(2, 1)
: Second layer (hidden to output); 2 inputs → 1 output.
self.sigmoid = nn.Sigmoid()
: Sigmoid activation to introduce non-linearity, necessary for learning XOR.
def forward(self, x):
x = self.sigmoid(self.hidden(x))
x = self.sigmoid(self.output(x))
return x
forward()
: This is automatically called when you do model(X)
. It defines the flow of data in the network.
self.hidden(x)
: Applies the hidden layer (matrix multiplication + bias).
self.sigmoid(...)
: Applies sigmoid to each layer’s output.
return x
: Gives the final output.
model = MLP()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
model = MLP()
: Instantiates the model.
criterion = nn.MSELoss()
: Mean Squared Error Loss function (commonly used in regression and simple binary tasks).
optimizer = optim.SGD(model.parameters(), lr=0.1)
:
SGD
is Stochastic Gradient Descent (used to update weights).model.parameters()
: Gets the weights and biases of the model.lr=0.1
: Learning rate controls how fast the model learns.
epochs = 1000
for epoch in range(epochs):
optimizer.zero_grad()
output = model(X)
loss = criterion(output, Y)
loss.backward()
optimizer.step()
Explanation:
epochs = 1000
: Number of training iterations.for epoch in range(epochs)
: Loop over the training process.optimizer.zero_grad()
: Clears gradients from the previous step (important!).output = model(X)
: Runs a forward pass through the model.loss = criterion(output, Y)
: Calculates loss between predicted output and true output.loss.backward()
: Backpropagates the loss (computes gradients).optimizer.step()
: Updates the model parameters using gradients.
if (epoch+1)%100 == 0:
print(f”Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}”)
Every 100 epochs, prints the loss.
loss.item()
gets the Python number from the tensor for display.
:.4f
formats it to 4 decimal places.
print(“\nFinal Predictions after Trainnig:”)
print(model(X).detach().numpy())
model(X)
: Feeds the input to the trained model.
.detach()
: Detaches the output from the computation graph. We use this because we don’t need gradients during inference.
.numpy()
: Converts the tensor into a NumPy array for easy reading/printing.
✅ WHY USE THESE KEYWORDS?
Keyword | Why It’s Used |
---|---|
super() | Initializes base class (nn.Module ) |
Sigmoid() | Required for non-linearity to solve XOR |
zero_grad() | Clears old gradients to avoid accumulation |
backward() | Computes gradients using backpropagation |
step() | Updates weights using gradients |
detach() | Stops PyTorch from tracking computations |
numpy() | Converts tensor to NumPy array for viewing |