How to create non-square images with DALL-E 2

The DALL-E API from OpenAI is a very powerful image generation tool. However one limitation is that it only outputs square images. In this tutorial we will learn how to generate images of arbitrary dimension such as portraits and landscapes.

6 min read

How to generate subtitles with the Whisper API in Python

In this tutorial we will learn how to use the Whisper API to generate subtitles for a video. We will generate subtitles for the opening of A Star Is Born (1937), an early colour movie that has been frequently remade.

3 min read

Transformer Part 2: Train a Translation Model

In Part 1 of the tutorial we learned how to how build a transformer model. Now it is time to put it into action. There are lots of things we can do with a transformer but we will start off with a machine translation task to translate from Spanish to English.

19 min read

Transformer Part 1: Build

Since its introduction the Transformer architecture has become very influential and has had successes in many tasks not just in NLP but in other areas like vision. This tutorial is inspired by the approach in The Annotated Transformer and uses text directly quoted from the paper to explain the code.

19 min read

PixelCNN

The PixelRNN family of auto-regressive architectures consists of models that learn to generate images a pixel at a time. In this post we will focus on the PixelCNN model which is based on the same principles but uses convolutional layers instead of recurrent layers. It is simpler to implement and once we have grasped the core concepts, it is straightforward to understand the recurrent versions.

17 min read

Strassen’s algorithm for matrix multiplication

This is the first of a planned series of blogs covering background topics for DeepMind’s AlphaTensor paper. In this post we will cover Strassen’s algorithm for matrix multiplication.

9 min read

Implementing the Fréchet Inception distance

A popular metric for evaluating image generation models is the Fréchet Inception Distance (FID). Like the Inception score, it is computed on the embeddings from an Inception model. But unlike the Inception score, it makes use of the true images as well as the generated ones. In the post we will learn how to implement it in PyTorch.

6 min read