In this tutorial we will learn how to use the Whisper API to generate subtitles for a video. We will generate subtitles for the opening of A Star Is Born (1937), an early colour movie that has been frequently remade. This first version is in my opinion the best and very much worth watching if you have not done so before. The film is in the public domain and may be downloaded from the Internet Archive if you want to follow along.
Setup
Import stuff. moviepy
is a useful library for manipulating video files with python. You can install it with pip
pip install moviepy
Replace YOUR_API_KEY
with your key.
import moviepy.editor as mp
import requests
import openai
openai.api_key = "YOUR_API_KEY"
Change this to the location where you have saved the video.
filename = "YOUR_FILENAME"
video_intro = 'Star-intro.mp4'
audio_intro = 'Star-intro.mp3'
Preparing the inputs
Let us load the movie and save a clip a section from a scene where the heroine Esther tries to get a job as an extra.
start, end = (12*60 + 35), (12*60 + 49)
logger = None # Turn off logging for cleaner output for blog post
# Uncomment below to see progress bar when saving
# logger='bar'
video = mp.VideoFileClip(filename)
# Clip a small section
video_clip = video.subclip(start, end)
# Save audio
video_clip.audio.write_audiofile(audio_intro, logger=None)
# Save video
# From here: https://stackoverflow.com/questions/40445885/no-audio-when-adding-mp3-to-videofileclip-moviepy
# Doesn't appear to save the audio otherwise
video_clip.write_videofile(video_intro,
codec='libx264',
audio_codec='aac',
temp_audiofile='temp-audio.m4a',
remove_temp=True,
logger=logger
)
Generating the subtitles
First we will send the smaller of the audio file, which skips the credits and see what happens. At this point we are going to use the functionality from the openai
library. We only set two arguments
model
which iswhisper-1
file
which is a file buffer
result = openai.Audio.transcribe(
model='whisper-1',
file=open(audio_intro, 'rb')
)
print(result['text'])
I beg your pardon. I'd like to register for extra work. How long have you been in Hollywood? Well, it's about a month now. We haven't put anyone on our books for over two years.
It is an accurate transcription but we would like this in the form of subtitles that may be added to a video. From the documentation we see that we can request alternative formats by setting the response_format
field
The format of the transcript output, in one of these options: json, text, srt, verbose_json, or vtt.
But at present it openai.Audio.transcribe
does not seem to be able to handle the output returned when using this setting and throws a JSONDecodeError
openai.Audio.transcribe(
model='whisper-1',
file=open(audio_intro, 'rb'),
response_format='vtt'
) #=> JSONDecodeError
So we will use the python requests
library instead. The parameters go in the data
dict and the file in the files
dict. Notice that we have added an extra parameter language
which is the language of the input video. It is not necessary and as we saw about we got a good transcription without it but the documentation states:
Supplying the input language in ISO-639-1 format will improve accuracy and latency.
def get_subtitles(file, subtitle_format='srt', **kwargs):
url = 'https://api.openai.com/v1/audio/transcriptions'
headers = {
'Authorization': f'Bearer {openai.api_key}',
}
data = {
'model': 'whisper-1',
'response_format': subtitle_format,
'language': 'en',
}
data.update(kwargs)
files = {
'file': (file, open(file, 'rb'))
}
response = requests.post(url, headers=headers, data=data, files=files)
return response.text
subtitles = get_subtitles(audio_intro)
print(subtitles)
1
00:00:00,000 --> 00:00:03,000
I beg your pardon.
2
00:00:03,000 --> 00:00:06,000
I'd like to register for extra work.
3
00:00:06,000 --> 00:00:08,000
How long have you been in Hollywood?
4
00:00:08,000 --> 00:00:10,000
Well, it's about a month now.
5
00:00:10,000 --> 00:00:30,000
We haven't put anyone on our books for over two years.
We can also send the video file directly since MP4 is also accepted as input format. It will take a bit longer as the file size is larger but returns the same result.
subtitles_from_video = get_subtitles(video_intro)
print(subtitles_from_video)
1
00:00:00,000 --> 00:00:02,000
I beg your pardon.
2
00:00:02,000 --> 00:00:05,000
I'd like to register for extra work.
3
00:00:05,000 --> 00:00:07,000
How long have you been in Hollywood?
4
00:00:07,000 --> 00:00:09,000
Well, it's about a month now.
5
00:00:09,000 --> 00:00:31,000
We haven't put anyone on our books for over two years.
Finally let us save the subtitles to an .srt
file. If save the file in the same folder as the video and open the video in a player such as VLC, the subtitles will be shown automatically.
with open('Star-intro.srt', 'w') as f:
f.write(subtitles)