MontajMontajdocs

Installation

Install Montaj on macOS or Linux — Homebrew, pip, and optional dependencies.

Installation

One command installs everything — Node.js and all Python dependencies including bundled ffmpeg:

brew install theSamPadilla/montaj/montaj

Then install Whisper model weights:

montaj install whisper

Linux / Manual

git clone https://github.com/theSamPadilla/montaj
cd montaj
pip install -e .

This installs Python dependencies (including a bundled ffmpeg). Node.js cannot be installed via pip — install it separately:

# Install Node.js >= 18: https://nodejs.org
montaj install whisper   # whisper-cpp binary + model weights
montaj install ui        # npm deps + UI build

Optional Dependencies

montaj install rvm          # background removal (torch + RVM weights)
montaj install connectors   # pyjwt, requests, google-genai, openai (for API steps)
montaj install credentials  # interactive setup for API keys
montaj install all          # everything above

Dependency Groups

GroupWhat it installsRequired for
whisperwhisper-cpp binary (pinned), base.en model weightstranscribe, rm_fillers, rm_nonspeech, waveform_trim, render pipeline
uinpm deps for render/ and ui/; production UI buildmontaj serve, render engine
rvmtorch, torchvision, av (pip) + RVM model weightsremove_bg
connectorspyjwt, requests, google-genai, openaikling_generate, analyze_media, generate_image

Whisper Models

By default, montaj install whisper downloads the base.en model. To install a different model:

montaj install whisper --model medium.en

montaj install whisper is safe to re-run — it skips the binary if already at the pinned version and skips weights if already downloaded.

Upgrade

montaj update            # upgrade everything (whisper binary, pip packages)
montaj update whisper    # re-download whisper binary if pinned version changed
montaj update pip        # pip install --upgrade for all Python packages

System Requirements

ToolPurpose
ffmpeg + ffprobeCore video processing (bundled via pip/brew)
whisper.cppLocal speech-to-text with word-level timestamps
Python 3.xScript + step runtime
Node.js >= 18Render engine (React + Puppeteer) + UI server