How to Build and Run a Flask API with OpenAI’s Whisper Local Model Using Docker

A while ago I worked on a project to transcribe contact center calls to help ensure adherence with GDPR and potentially automate actions that are picked up in post processing. Whilst that was a big complex solution, it highlighted how easy a lightweight version could be using only a few custom files and standard packages.

The key for this one, is that it is an offline copy of the model, so no API keys or usage costs involved!

In this blog post, i’ll go through how to create a Flask API that uses OpenAI’s Whisper model for transcribing audio files. We’ll then containerize the application using Docker, allowing for easy deployment and scalability. As a quick word of warning – this is a local dev tool – do not use for production!

Prerequisites

To work through this, you will need the following pre-installed. Some prior knowledge of working with them is expected but if you’re new and get stuck just google the line and there should be something on google to help you over it.

  1. Python 3.7 or higher installed.
  2. Docker installed on your machine.
  3. An IDE, VS Code or similar

Step 1: Setting Up the Flask API

To start, we’ll make a Flask API that accepts an audio file, transcribes it using Whisper, and returns the transcription as a JSON response.

Install Required Dependencies

First, you need to install Python packages. Whisper uses ffmpeg for audio processing, so you’ll also need to install it.

Run the following command to install the necessary Python packages:

pip install Flask pydub

Next, install the Whisper model directly from its GitHub repository:

pip install git+https://github.com/openai/whisper.git

You’ll also need ffmpeg, which is used to handle audio files. Here’s how to install ffmpeg based on your operating system:

  • Linux: sudo apt update && sudo apt install ffmpeg
  • macOS (using Homebrew):brew install ffmpeg
  • Windows: Download and install ffmpeg from its official site, or use a package manager like choco to install it:bashCopy codechoco install ffmpeg

Writing the API

Make a new file called app.py and add the following

from flask import Flask, request, jsonify
import whisper
import os
import tempfile
from pydub import AudioSegment

app = Flask(__name__)

# Load the Whisper model once when the app starts
model = whisper.load_model("base")

def transcribe_audio(file_path):
# Convert the file to WAV format if necessary
audio = AudioSegment.from_file(file_path)
if file_path.split('.')[-1] != 'wav':
file_path = file_path.rsplit('.', 1)[0] + '.wav'
audio.export(file_path, format='wav')

# Transcribe the audio file using the local Whisper model
result = model.transcribe(file_path)
return result['text']

@app.route('/transcribe', methods=['POST'])
def transcribe():
if 'audio' not in request.files:
return jsonify({'error': 'No audio file provided'}), 400

audio_file = request.files['audio']

# Save the uploaded file to a temporary location
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
audio_file.save(tmp_file.name)
tmp_file_path = tmp_file.name

try:
# Transcribe the audio file
transcription = transcribe_audio(tmp_file_path)
return jsonify({'transcription': transcription}), 200
except Exception as e:
return jsonify({'error': str(e)}), 500
finally:
# Clean up the temporary file
os.remove(tmp_file_path)

if __name__ == "__main__":
app.run(host='0.0.0.0', port=5000)

This code sets up a simple API that accepts POST requests with audio files and returns the transcription.

Step 2: Running the API Locally

Now that the Flask API is set up, you can run it locally on any platform (Windows, macOS, or Linux).

Installing ffmpeg on Your Local System

Before running the app, make sure ffmpeg is installed. Depending on your platform, you may need to manually install it:

  • Linux:
    Run the following command:sudo apt install ffmpeg
  • macOS:
    Install ffmpeg with Homebrew:brew install ffmpeg
  • Windows:
    You can either manually download ffmpeg or use Chocolatey (Windows package manager):choco install ffmpeg

Running the API

Once everything is installed, you can run the API with:

python app.py

Now, your Flask app is running at http://localhost:5000. You can send audio files to the /transcribe endpoint for transcription.

Step 3: Containerizing the Application with Docker

To make the app more usable though we can containerize it with Docker. This makes sure it runs wherever you put it, either on your machine or in the cloud.

Creating the Dockerfile

Here’s a Dockerfile that installs Whisper, ffmpeg, and all required dependencies inside the container:

DockerfileCopy code# Use an official Python runtime as a parent image
FROM python:3.9-slim

# Install ffmpeg for audio processing
RUN apt-get update && apt-get install -y ffmpeg && apt-get clean

# Set the working directory in the container
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app

# Install Whisper and other Python dependencies
RUN pip install --no-cache-dir git+https://github.com/openai/whisper.git
RUN pip install --no-cache-dir -r requirements.txt

# Make port 5000 available to the world outside this container
EXPOSE 5000

# Run app.py when the container launches
CMD ["python", "app.py"]

Creating the requirements.txt

Add a requirements.txt file in your project directory to handle Python dependencies:

codeFlask
pydub

Building and Running the Docker Container

  1. Build the Docker Image: Run this command in the directory containing the Dockerfile and app.py:bashCopy codedocker build -t flask-whisper-app-local .
  2. Run the Docker Container: Once the image is built, run it with:docker run -p 5000:5000 flask-whisper-app-local

Your API will now be available on http://localhost:5000 inside the container.

Step 4: Testing the API

Now that the API is up and running, you can test it by sending audio files for transcription.

Using curl, you can send a POST request with an audio file:

curl -X POST -F "audio=@your-audio-file.wav" http://localhost:5000/transcribe

The server will return a JSON response containing the transcription.

Summary

So hopefully you now have a Flask API that uses the OpenAI Whisper model locally to transcribe audio files and return the transcript in the API response. By using Docker you can run it locally on your machine or stick it in the cloud (although please add security layers like a vpn/vnet or auth on the api if you do this) so you can use it easily for your own use cases. The joy as i said before though, by installing the model locally, the only cost is your own machine, no API costs to worry about!

Happy transcribing!

1 Comment

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.