A simple Multi-Layer Perceptron (MLP) in PyTorch to learn the XOR function

import torch
import torch.nn as nn
import torch.optim as optim

X = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
Y = torch.tensor([[0], [1], [1], [0]], dtype=torch.float32)

class MLP(nn.Module):
def init(self):
super(MLP, self).init()
self.hidden = nn.Linear(2, 2) # 2 neurons in hidden layer
self.output = nn.Linear(2, 1) # 1 neuron in output layer
self.sigmoid = nn.Sigmoid() # Activation function

def forward(self, x):
    x = self.sigmoid(self.hidden(x))  # Hidden layer
    x = self.sigmoid(self.output(x))  # Output layer
    return x

model = MLP()
criterion = nn.MSELoss() # Mean Squared Error Loss
optimizer = optim.SGD(model.parameters(), lr=0.1) # Stochastic Gradient Descent

epochs = 1000
for epoch in range(epochs):
optimizer.zero_grad() # Clear gradients
output = model(X) # Forward pass
loss = criterion(output, Y) # Compute loss
loss.backward() # Backpropagation
optimizer.step() # Update weights

if (epoch + 1) % 100 == 0:
    print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}")

print(“\nFinal Predictions After Training:”)
print(model(X).detach().numpy()) # Convert to NumPy for readability

A simple Multi-Layer Perceptron (MLP) in PyTorch to learn the XOR function. Let’s break it line by line and understand the purpose of each keyword and expression, especially with a beginner-friendly explanation.

import torch
import torch.nn as nn
import torch.optim as optim

import torch.nn as nn: Imports the neural network module, which includes layers, activation functions, etc.

import torch.optim as optim: Imports optimization algorithms (e.g., SGD, Adam) to update model weights during training.

import torch: Loads the PyTorch library, which allows you to work with tensors (like arrays) and build models.

X = torch.tensor([[0,0],[0,1],[1,0],[1,1]], dtype=torch.float32)
Y = torch.tensor([[0],[1],[1],[0]], dtype=torch.float32)

torch.tensor(...): Converts lists into PyTorch tensors.

dtype=torch.float32: Ensures inputs are in float format (required for neural networks).

X is the input (2-bit values for XOR), and Y is the target (output of XOR).
So:

0 XOR 0 = 0
0 XOR 1 = 1
1 XOR 0 = 1
1 XOR 1 = 0

class MLP(nn.Module):
def init(self):
super(MLP,self).init()
self.hidden = nn.Linear(2,2)
self.output = nn.Linear(2,1)
self.sigmoid = nn.Sigmoid()

class MLP(nn.Module): Defines a custom neural network called MLP, subclassing nn.Module (base class for all models).

def __init__(self): Constructor; defines the layers and activations.

super(MLP, self).__init__(): Initializes the parent class (nn.Module) to use its features.

self.hidden = nn.Linear(2, 2): First layer (input to hidden); 2 inputs → 2 hidden units.

self.output = nn.Linear(2, 1): Second layer (hidden to output); 2 inputs → 1 output.

self.sigmoid = nn.Sigmoid(): Sigmoid activation to introduce non-linearity, necessary for learning XOR.

def forward(self, x):
x = self.sigmoid(self.hidden(x))
x = self.sigmoid(self.output(x))
return x

forward(): This is automatically called when you do model(X). It defines the flow of data in the network.

self.hidden(x): Applies the hidden layer (matrix multiplication + bias).

self.sigmoid(...): Applies sigmoid to each layer’s output.

return x: Gives the final output.

model = MLP()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

model = MLP(): Instantiates the model.

criterion = nn.MSELoss(): Mean Squared Error Loss function (commonly used in regression and simple binary tasks).

optimizer = optim.SGD(model.parameters(), lr=0.1):

  • SGD is Stochastic Gradient Descent (used to update weights).
  • model.parameters(): Gets the weights and biases of the model.
  • lr=0.1: Learning rate controls how fast the model learns.

epochs = 1000
for epoch in range(epochs):
optimizer.zero_grad()
output = model(X)
loss = criterion(output, Y)
loss.backward()
optimizer.step()

Explanation:

  • epochs = 1000: Number of training iterations.
  • for epoch in range(epochs): Loop over the training process.
  • optimizer.zero_grad(): Clears gradients from the previous step (important!).
  • output = model(X): Runs a forward pass through the model.
  • loss = criterion(output, Y): Calculates loss between predicted output and true output.
  • loss.backward(): Backpropagates the loss (computes gradients).
  • optimizer.step(): Updates the model parameters using gradients.

if (epoch+1)%100 == 0:
print(f”Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}”)

Every 100 epochs, prints the loss.

loss.item() gets the Python number from the tensor for display.

:.4f formats it to 4 decimal places.

print(“\nFinal Predictions after Trainnig:”)
print(model(X).detach().numpy())

model(X): Feeds the input to the trained model.

.detach(): Detaches the output from the computation graph. We use this because we don’t need gradients during inference.

.numpy(): Converts the tensor into a NumPy array for easy reading/printing.

✅ WHY USE THESE KEYWORDS?

KeywordWhy It’s Used
super()Initializes base class (nn.Module)
Sigmoid()Required for non-linearity to solve XOR
zero_grad()Clears old gradients to avoid accumulation
backward()Computes gradients using backpropagation
step()Updates weights using gradients
detach()Stops PyTorch from tracking computations
numpy()Converts tensor to NumPy array for viewing