Quick Start

First-Run Setup

After installing Montaj (via Homebrew or pip), run the one-time setup:

montaj doctor               # diagnose what's missing
montaj install ui           # build UI bundle into ~/.cache/montaj/

montaj doctor prints the exact commands to fix each missing piece. The same flow works for both brew and pip installs.

Send It to Your Agent

The fastest way to get started is to paste this into your AI agent:

Install Montaj from https://github.com/theSamPadilla/montaj, then read skills/onboarding/SKILL.md to get us started.

Headless — Agent Edits, Renders, Done

montaj run ./clips --prompt "tight cuts, remove filler, 9:16"

This runs the default workflow (overlays) against all clips in the directory. The pipeline:

Creates project.json [pending] with clip paths, prompt, and workflow name
Agent picks it up, reads the workflow, calls steps as tools
Agent writes project.json as it works
Agent marks project [draft] when the edit is complete

With a Named Workflow

montaj run ./clips --workflow tight-reel --prompt "tight cuts, upbeat pacing"

Animation Project (No Source Footage)

montaj run --workflow animations --prompt "60s animated explainer, dark theme"

With UI — Watch the Agent Work Live

montaj serve

Opens http://localhost:3000 with the browser UI. From here you can:

Upload — drag and drop clips, write your editing prompt, select a workflow
Live View — watch the agent build the edit in real time via SSE
Review — adjust the timeline, captions, and overlays when the agent finishes
Render — trigger the final render to MP4

Render

After the agent finishes (or after your review), render to a final MP4:

montaj render

With explicit paths:

montaj render --project ./workspace/project.json --out ./output/final.mp4

Clean up intermediate files after compositing:

montaj render --clean

What Happens Under the Hood

montaj run ./clips --prompt "tight cuts"
    │
    ├── Creates project.json [pending]
    ├── Reads workflows/overlays.json (default)
    │
    │   Agent takes over:
    ├── probe + snapshot → understand the clips
    ├── waveform_trim → detect and remove silence
    ├── transcribe → word-level transcript
    ├── rm_fillers → remove um/uh/hmm
    ├── select_takes → pick best takes from repeats
    ├── concat → join clips in a single encode pass
    ├── caption → animated caption track
    ├── overlays → custom JSX title cards, lower thirds
    └── resize → 9:16 for shorts/reels

Every step outputs a result to stdout. Steps chain — the output of one becomes the input of the next. No intermediate video files are created until concat — all editing steps work with trim specs (timestamp ranges) that get applied in a single encode pass.

Quick Start

On this page