Quick Start
Your first video edit with Montaj — headless and with UI.
Quick Start
First-Run Setup
After installing Montaj (via Homebrew or pip), run the one-time setup:
montaj doctor # diagnose what's missing
montaj install ui # build UI bundle into ~/.cache/montaj/montaj doctor prints the exact commands to fix each missing piece. The same flow works for both brew and pip installs.
Send It to Your Agent
The fastest way to get started is to paste this into your AI agent:
Install Montaj from https://github.com/theSamPadilla/montaj, then read skills/onboarding/SKILL.md to get us started.Headless — Agent Edits, Renders, Done
montaj run ./clips --prompt "tight cuts, remove filler, 9:16"This runs the default workflow (overlays) against all clips in the directory. The pipeline:
- Creates
project.json [pending]with clip paths, prompt, and workflow name - Agent picks it up, reads the workflow, calls steps as tools
- Agent writes
project.jsonas it works - Agent marks project
[draft]when the edit is complete
With a Named Workflow
montaj run ./clips --workflow tight-reel --prompt "tight cuts, upbeat pacing"Animation Project (No Source Footage)
montaj run --workflow animations --prompt "60s animated explainer, dark theme"With UI — Watch the Agent Work Live
montaj serveOpens http://localhost:3000 with the browser UI. From here you can:
- Upload — drag and drop clips, write your editing prompt, select a workflow
- Live View — watch the agent build the edit in real time via SSE
- Review — adjust the timeline, captions, and overlays when the agent finishes
- Render — trigger the final render to MP4
Render
After the agent finishes (or after your review), render to a final MP4:
montaj renderWith explicit paths:
montaj render --project ./workspace/project.json --out ./output/final.mp4Clean up intermediate files after compositing:
montaj render --cleanWhat Happens Under the Hood
montaj run ./clips --prompt "tight cuts"
│
├── Creates project.json [pending]
├── Reads workflows/overlays.json (default)
│
│ Agent takes over:
├── probe + snapshot → understand the clips
├── waveform_trim → detect and remove silence
├── transcribe → word-level transcript
├── rm_fillers → remove um/uh/hmm
├── select_takes → pick best takes from repeats
├── concat → join clips in a single encode pass
├── caption → animated caption track
├── overlays → custom JSX title cards, lower thirds
└── resize → 9:16 for shorts/reelsEvery step outputs a result to stdout. Steps chain — the output of one becomes the input of the next. No intermediate video files are created until concat — all editing steps work with trim specs (timestamp ranges) that get applied in a single encode pass.