Quick Start
Your first video edit with Montaj — headless and with UI.
Quick Start
Send It to Your Agent
The fastest way to get started is to paste this into your AI agent:
Install Montaj from https://github.com/theSamPadilla/montaj, then read skills/onboarding/SKILL.md to get us started.Headless — Agent Edits, Renders, Done
montaj run ./clips --prompt "tight cuts, remove filler, 9:16"This runs the default workflow (overlays) against all clips in the directory. The pipeline:
- Creates
project.json [pending]with clip paths, prompt, and workflow name - Agent picks it up, reads the workflow, calls steps as tools
- Agent writes
project.jsonas it works - Agent marks project
[draft]when the edit is complete
With a Named Workflow
montaj run ./clips --workflow tight-reel --prompt "tight cuts, upbeat pacing"Animation Project (No Source Footage)
montaj run --workflow animations --prompt "60s animated explainer, dark theme"With UI — Watch the Agent Work Live
montaj serveOpens http://localhost:3000 with the browser UI. From here you can:
- Upload — drag and drop clips, write your editing prompt, select a workflow
- Live View — watch the agent build the edit in real time via SSE
- Review — adjust the timeline, captions, and overlays when the agent finishes
- Render — trigger the final render to MP4
Render
After the agent finishes (or after your review), render to a final MP4:
montaj renderWith explicit paths:
montaj render --project ./workspace/project.json --out ./output/final.mp4Clean up intermediate files after compositing:
montaj render --cleanWhat Happens Under the Hood
montaj run ./clips --prompt "tight cuts"
│
├── Creates project.json [pending]
├── Reads workflows/overlays.json (default)
│
│ Agent takes over:
├── probe + snapshot → understand the clips
├── waveform_trim → detect and remove silence
├── transcribe → word-level transcript
├── rm_fillers → remove um/uh/hmm
├── select_takes → pick best takes from repeats
├── concat → join clips in a single encode pass
├── caption → animated caption track
├── overlays → custom JSX title cards, lower thirds
└── resize → 9:16 for shorts/reelsEvery step outputs a result to stdout. Steps chain — the output of one becomes the input of the next. No intermediate video files are created until concat — all editing steps work with trim specs (timestamp ranges) that get applied in a single encode pass.