Back to Blog
Coding a Recurrent Neural Network (RNN) from scratch using Pytorch
ML10 min readJan 23, 2025

Coding a Recurrent Neural Network (RNN) from scratch using Pytorch

Build a Recurrent Neural Network (RNN) from scratch with PyTorch. Our guide makes RNN coding easy for all skill levels. Start deep learning now!
SD
SolarDevs Team
Technical Leadership

RNN vs Feedforward Architecture

Personally I find it is easier to understand RNNs when I compare it to a feedforward networks because its a known concept, so I just add new concepts to previous knowledge, so I will be comparing them often.

Unlike feedforward networks, RNNs machinery is a bit more complex. Inside a single Recurrent Neural Network layer we have 3 weight matrices as well as 2 input tensors and 2 output tensors.

People often say “RNNs are simple feedforward with an internal state”, however with this simple diagram we can see it’s not that simple. The components are way more complex in a Recurrent Net, but don’t worry, I will try to explain to you how this works and hopefully by seeing the code you will be able to understand it.

Recurrent Neural Network (RNN) Layer Architecture

Recurrent Nets introduce a new concept called “hidden state”, which is simply another input based on previous layer outputs. But wait, if this is based on previous layer outputs, how do I get it for the first run? Simple, just start it with zeros.

RNNs are fed in a different way than feedforward networks. Because we are working with sequences, the order that we input the data matters, this is why each time we feed the net, we have to input a single item in the sequence. For example, if it’s a stock price, we input the stock price for each day. If it’s a text, we enter a single letter/word each time.

We enter one step at a time because we need to compute the hidden state on each iteration, so this hidden state will hold previous information so the next sequence we input will have data from previous runs by summing the matrices.

Inputs

  • Input tensor: This tensor should be 1 single step in the sequence. If your total sequence is for example 100 characters of text, then the input will be 1 single character of text.
  • Hidden state tensor: This tensor is the hidden state. Remember for the first run of each entire sequence, this tensor will be filled with zeros. Following the example above, if you have 10 sequences of 100 characters each (a text of 1000 characters in total) then for each sequence you will generate a hidden state filled with zeros.

Weight Matrices

  • Input Dense: Dense matrix used to compute inputs (just like feedforward).
  • Hidden Dense: Dense matrix used to compute hidden state input.
  • Output Dense: Dense matrix used to compute the result of activation(input_dense + hidden_dense).
activation(input_dense + hidden_dense)

Outputs

  • New hidden state: New hidden state tensor which is activation(input_dense + hidden_dense). You will use this as input on the next iteration in the sequence.
  • Output: activation(output_dense). This is your prediction vector, which is like the feedforward output prediction vector.

Recurrent Neural Network (RNN) Layer Code

import torch
import torch.nn as nn

class RNN(nn.Module):
    """ Basic RNN block. This represents a single layer of RNN """
    def __init__(self, input_size: int, hidden_size: int, output_size: int) -> None:
        """
        input_size: Number of features of your input vector
        hidden_size: Number of hidden neurons
        output_size: Number of features of your output vector
        """
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        
        self.i2h = nn.Linear(input_size, hidden_size, bias=False)
        self.h2h = nn.Linear(hidden_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden_state) -> tuple[torch.Tensor, torch.Tensor]:
        """
        Returns computed output and tanh(i2h + h2h)
        Inputs
        ------
        x: Input vector
        hidden_state: Previous hidden state
        Outputs
        -------
        out: Linear output (without activation because of how pytorch works)
        hidden_state: New hidden state matrix
        """
        x = self.i2h(x)
        hidden_state = self.h2h(hidden_state)
        hidden_state = torch.tanh(x + hidden_state)
        out = self.h2o(hidden_state)
        return out, hidden_state

    def init_zero_hidden(self, batch_size=1) -> torch.Tensor:
        """
        Helper function. Returns a hidden state with specified batch size. Defaults to 1
        """
        return torch.zeros(batch_size, self.hidden_size, requires_grad=False)

Training with batches

Feeding a Recurrent Neural Network in batches always computes way faster (10x faster easily), and Recurrent Neural Networks are no exception. Training with batches will not improve performance in any way though, so if your NN doesn’t work with a single training example at a time, it won’t work with 10 or 100.

The Recurrent Neural Network I show as example is trained with text, one character at a time, so the training function should feed 1 character of the whole text at a time. I save a ton of time by doing this with batches. So I can feed any number of batches for every epoch.

def train(model: RNN, data: DataLoader, epochs: int, optimizer: optim.Optimizer, loss_fn: nn.Module) -> None:
    """ Trains the model for the specified number of epochs """
    train_losses = {}
    model.to(device)
    model.train()
    print("=> Starting training")
    for epoch in range(epochs):
        epoch_losses = list()
        for X, Y in data:
            if X.shape[0] != model.batch_size:
                continue
            hidden = model.init_zero_hidden(batch_size=model.batch_size)
            X, Y, hidden = X.to(device), Y.to(device), hidden.to(device)
            
            model.zero_grad()
            loss = 0
            for c in range(X.shape[1]):
                out, hidden = model(X[:, c].reshape(X.shape[0],1), hidden)
                l = loss_fn(out, Y[:, c].long())
                loss += l
            
            loss.backward()
            nn.utils.clip_grad_norm_(model.parameters(), 3)
            optimizer.step()
            epoch_losses.append(loss.detach().item() / X.shape[1])
        
        train_losses[epoch] = torch.tensor(epoch_losses).mean()
        print(f'=> epoch: {epoch + 1}, loss: {train_losses[epoch]}')

Manually coding this really helps to understand the underlying operations and workflow, it is also very satisfying to see how the Recurrent Neural Network learns from the text and generates cool text.

Build your future.

Ready to transform your infrastructure with intelligent AI agents?

Initiate Discovery