A while ago I worked on a project to transcribe contact center calls to help ensure adherence with GDPR and potentially automate actions that are picked up in post processing. Whilst that was a big complex solution, it highlighted how easy a lightweight version could be using only a few custom files and standard packages.
The key for this one, is that it is an offline copy of the model, so no API keys or usage costs involved!
In this blog post, i’ll go through how to create a Flask API that uses OpenAI’s Whisper model for transcribing audio files. We’ll then containerize the application using Docker, allowing for easy deployment and scalability. As a quick word of warning – this is a local dev tool – do not use for production!
Prerequisites
To work through this, you will need the following pre-installed. Some prior knowledge of working with them is expected but if you’re new and get stuck just google the line and there should be something on google to help you over it.
- Python 3.7 or higher installed.
- Docker installed on your machine.
- An IDE, VS Code or similar
Step 1: Setting Up the Flask API
To start, we’ll make a Flask API that accepts an audio file, transcribes it using Whisper, and returns the transcription as a JSON response.
Install Required Dependencies
First, you need to install Python packages. Whisper uses ffmpeg for audio processing, so you’ll also need to install it.
Run the following command to install the necessary Python packages:
pip install Flask pydub
Next, install the Whisper model directly from its GitHub repository:
pip install git+https://github.com/openai/whisper.git
You’ll also need ffmpeg, which is used to handle audio files. Here’s how to install ffmpeg based on your operating system:
- Linux:
sudo apt update && sudo apt install ffmpeg - macOS (using Homebrew):
brew install ffmpeg - Windows: Download and install
ffmpegfrom its official site, or use a package manager likechocoto install it:bashCopy codechoco install ffmpeg
Writing the API
Make a new file called app.py and add the following
from flask import Flask, request, jsonify
import whisper
import os
import tempfile
from pydub import AudioSegment
app = Flask(__name__)
# Load the Whisper model once when the app starts
model = whisper.load_model("base")
def transcribe_audio(file_path):
# Convert the file to WAV format if necessary
audio = AudioSegment.from_file(file_path)
if file_path.split('.')[-1] != 'wav':
file_path = file_path.rsplit('.', 1)[0] + '.wav'
audio.export(file_path, format='wav')
# Transcribe the audio file using the local Whisper model
result = model.transcribe(file_path)
return result['text']
@app.route('/transcribe', methods=['POST'])
def transcribe():
if 'audio' not in request.files:
return jsonify({'error': 'No audio file provided'}), 400
audio_file = request.files['audio']
# Save the uploaded file to a temporary location
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
audio_file.save(tmp_file.name)
tmp_file_path = tmp_file.name
try:
# Transcribe the audio file
transcription = transcribe_audio(tmp_file_path)
return jsonify({'transcription': transcription}), 200
except Exception as e:
return jsonify({'error': str(e)}), 500
finally:
# Clean up the temporary file
os.remove(tmp_file_path)
if __name__ == "__main__":
app.run(host='0.0.0.0', port=5000)
This code sets up a simple API that accepts POST requests with audio files and returns the transcription.
Step 2: Running the API Locally
Now that the Flask API is set up, you can run it locally on any platform (Windows, macOS, or Linux).
Installing ffmpeg on Your Local System
Before running the app, make sure ffmpeg is installed. Depending on your platform, you may need to manually install it:
- Linux:
Run the following command:sudo apt install ffmpeg - macOS:
Installffmpegwith Homebrew:brew install ffmpeg - Windows:
You can either manually downloadffmpegor use Chocolatey (Windows package manager):choco install ffmpeg
Running the API
Once everything is installed, you can run the API with:
python app.py
Now, your Flask app is running at http://localhost:5000. You can send audio files to the /transcribe endpoint for transcription.
Step 3: Containerizing the Application with Docker
To make the app more usable though we can containerize it with Docker. This makes sure it runs wherever you put it, either on your machine or in the cloud.
Creating the Dockerfile
Here’s a Dockerfile that installs Whisper, ffmpeg, and all required dependencies inside the container:
DockerfileCopy code# Use an official Python runtime as a parent image
FROM python:3.9-slim
# Install ffmpeg for audio processing
RUN apt-get update && apt-get install -y ffmpeg && apt-get clean
# Set the working directory in the container
WORKDIR /app
# Copy the current directory contents into the container at /app
COPY . /app
# Install Whisper and other Python dependencies
RUN pip install --no-cache-dir git+https://github.com/openai/whisper.git
RUN pip install --no-cache-dir -r requirements.txt
# Make port 5000 available to the world outside this container
EXPOSE 5000
# Run app.py when the container launches
CMD ["python", "app.py"]
Creating the requirements.txt
Add a requirements.txt file in your project directory to handle Python dependencies:
codeFlask
pydub
Building and Running the Docker Container
- Build the Docker Image: Run this command in the directory containing the
Dockerfileandapp.py:bashCopy codedocker build -t flask-whisper-app-local . - Run the Docker Container: Once the image is built, run it with:
docker run -p 5000:5000 flask-whisper-app-local
Your API will now be available on http://localhost:5000 inside the container.
Step 4: Testing the API
Now that the API is up and running, you can test it by sending audio files for transcription.
Using curl, you can send a POST request with an audio file:
curl -X POST -F "audio=@your-audio-file.wav" http://localhost:5000/transcribe
The server will return a JSON response containing the transcription.
Summary
So hopefully you now have a Flask API that uses the OpenAI Whisper model locally to transcribe audio files and return the transcript in the API response. By using Docker you can run it locally on your machine or stick it in the cloud (although please add security layers like a vpn/vnet or auth on the api if you do this) so you can use it easily for your own use cases. The joy as i said before though, by installing the model locally, the only cost is your own machine, no API costs to worry about!
Happy transcribing!
1 Comment