How to generate subtitles with the Whisper API in Python

In this tutorial we will learn how to use the Whisper API to generate subtitles for a video. We will generate subtitles for the opening of A Star Is Born (1937), an early colour movie that has been frequently remade. This first version is in my opinion the best and very much worth watching if you have not done so before. The film is in the public domain and may be downloaded from the Internet Archive if you want to follow along.

Setup

Import stuff. moviepy is a useful library for manipulating video files with python. You can install it with pip

pip install moviepy

Replace YOUR_API_KEY with your key.

import moviepy.editor as mp
import requests
import openai
openai.api_key = "YOUR_API_KEY" 

Change this to the location where you have saved the video.

filename = "YOUR_FILENAME" 
video_intro = 'Star-intro.mp4'
audio_intro = 'Star-intro.mp3'

Preparing the inputs

Let us load the movie and save a clip a section from a scene where the heroine Esther tries to get a job as an extra.

start, end = (12*60 + 35), (12*60 + 49)

logger = None # Turn off logging for cleaner output for blog post
# Uncomment below to see progress bar when saving
# logger='bar' 

video = mp.VideoFileClip(filename)

# Clip a small section
video_clip = video.subclip(start, end)

# Save audio
video_clip.audio.write_audiofile(audio_intro, logger=None)

# Save video
# From here: https://stackoverflow.com/questions/40445885/no-audio-when-adding-mp3-to-videofileclip-moviepy
# Doesn't appear to save the audio otherwise
video_clip.write_videofile(video_intro,
                     codec='libx264', 
                     audio_codec='aac', 
                     temp_audiofile='temp-audio.m4a', 
                     remove_temp=True,
                     logger=logger 
                     )

Generating the subtitles

First we will send the smaller of the audio file, which skips the credits and see what happens. At this point we are going to use the functionality from the openai library. We only set two arguments

model which is whisper-1
file which is a file buffer

result = openai.Audio.transcribe(
    model='whisper-1',
    file=open(audio_intro, 'rb')
)

print(result['text'])

I beg your pardon. I'd like to register for extra work. How long have you been in Hollywood? Well, it's about a month now. We haven't put anyone on our books for over two years.

It is an accurate transcription but we would like this in the form of subtitles that may be added to a video. From the documentation we see that we can request alternative formats by setting the response_format field

The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.

But at present it openai.Audio.transcribe does not seem to be able to handle the output returned when using this setting and throws a JSONDecodeError

openai.Audio.transcribe(
    model='whisper-1',
    file=open(audio_intro, 'rb'),
    response_format='vtt'
) #=> JSONDecodeError     

So we will use the python requests library instead. The parameters go in the data dict and the file in the files dict. Notice that we have added an extra parameter language which is the language of the input video. It is not necessary and as we saw about we got a good transcription without it but the documentation states:

Supplying the input language in ISO-639-1 format will improve accuracy and latency.

def get_subtitles(file, subtitle_format='srt', **kwargs):
    url = 'https://api.openai.com/v1/audio/transcriptions'
    headers = {
        'Authorization': f'Bearer {openai.api_key}',
    }
    data = {
        'model': 'whisper-1',
        'response_format': subtitle_format,
        'language': 'en',
    }
    data.update(kwargs)
    files = {
        'file': (file, open(file, 'rb'))
    }

    response = requests.post(url, headers=headers, data=data, files=files)
    return response.text

subtitles = get_subtitles(audio_intro)

print(subtitles)

1
00:00:00,000 --> 00:00:03,000
I beg your pardon.

2
00:00:03,000 --> 00:00:06,000
I'd like to register for extra work.

3
00:00:06,000 --> 00:00:08,000
How long have you been in Hollywood?

4
00:00:08,000 --> 00:00:10,000
Well, it's about a month now.

5
00:00:10,000 --> 00:00:30,000
We haven't put anyone on our books for over two years.

We can also send the video file directly since MP4 is also accepted as input format. It will take a bit longer as the file size is larger but returns the same result.

subtitles_from_video = get_subtitles(video_intro)

print(subtitles_from_video)

1
00:00:00,000 --> 00:00:02,000
I beg your pardon.

2
00:00:02,000 --> 00:00:05,000
I'd like to register for extra work.

3
00:00:05,000 --> 00:00:07,000
How long have you been in Hollywood?

4
00:00:07,000 --> 00:00:09,000
Well, it's about a month now.

5
00:00:09,000 --> 00:00:31,000
We haven't put anyone on our books for over two years.

Finally let us save the subtitles to an .srt file. If save the file in the same folder as the video and open the video in a player such as VLC, the subtitles will be shown automatically.

with open('Star-intro.srt', 'w') as f:
    f.write(subtitles)

Categories

Setup

Preparing the inputs

Generating the subtitles