Installation
Install Montaj on macOS or Linux — Homebrew, pip, and optional dependencies.
Installation
macOS (Recommended)
One command installs everything — Node.js and all Python dependencies including bundled ffmpeg:
brew install theSamPadilla/montaj/montajThen install Whisper model weights:
montaj install whisperLinux / Manual
git clone https://github.com/theSamPadilla/montaj
cd montaj
pip install -e .This installs Python dependencies (including a bundled ffmpeg). Node.js cannot be installed via pip — install it separately:
# Install Node.js >= 18: https://nodejs.org
montaj install whisper # whisper-cpp binary + model weights
montaj install ui # npm deps + UI buildOptional Dependencies
montaj install rvm # background removal (torch + RVM weights)
montaj install connectors # pyjwt, requests, google-genai, openai (for API steps)
montaj install credentials # interactive setup for API keys
montaj install all # everything aboveDependency Groups
| Group | What it installs | Required for |
|---|---|---|
whisper | whisper-cpp binary (pinned), base.en model weights | transcribe, rm_fillers, rm_nonspeech, waveform_trim, render pipeline |
ui | npm deps for render/ and ui/; production UI build | montaj serve, render engine |
rvm | torch, torchvision, av (pip) + RVM model weights | remove_bg |
connectors | pyjwt, requests, google-genai, openai | kling_generate, analyze_media, generate_image |
Whisper Models
By default, montaj install whisper downloads the base.en model. To install a different model:
montaj install whisper --model medium.enmontaj install whisper is safe to re-run — it skips the binary if already at the pinned version and skips weights if already downloaded.
Upgrade
montaj update # upgrade everything (whisper binary, pip packages)
montaj update whisper # re-download whisper binary if pinned version changed
montaj update pip # pip install --upgrade for all Python packagesSystem Requirements
| Tool | Purpose |
|---|---|
ffmpeg + ffprobe | Core video processing (bundled via pip/brew) |
whisper.cpp | Local speech-to-text with word-level timestamps |
Python 3.x | Script + step runtime |
Node.js >= 18 | Render engine (React + Puppeteer) + UI server |