MontajMontajdocs

Quick Start

Your first video edit with Montaj — headless and with UI.

Quick Start

Send It to Your Agent

The fastest way to get started is to paste this into your AI agent:

Install Montaj from https://github.com/theSamPadilla/montaj, then read skills/onboarding/SKILL.md to get us started.

Headless — Agent Edits, Renders, Done

montaj run ./clips --prompt "tight cuts, remove filler, 9:16"

This runs the default workflow (overlays) against all clips in the directory. The pipeline:

  1. Creates project.json [pending] with clip paths, prompt, and workflow name
  2. Agent picks it up, reads the workflow, calls steps as tools
  3. Agent writes project.json as it works
  4. Agent marks project [draft] when the edit is complete

With a Named Workflow

montaj run ./clips --workflow tight-reel --prompt "tight cuts, upbeat pacing"

Animation Project (No Source Footage)

montaj run --workflow animations --prompt "60s animated explainer, dark theme"

With UI — Watch the Agent Work Live

montaj serve

Opens http://localhost:3000 with the browser UI. From here you can:

  1. Upload — drag and drop clips, write your editing prompt, select a workflow
  2. Live View — watch the agent build the edit in real time via SSE
  3. Review — adjust the timeline, captions, and overlays when the agent finishes
  4. Render — trigger the final render to MP4

Render

After the agent finishes (or after your review), render to a final MP4:

montaj render

With explicit paths:

montaj render --project ./workspace/project.json --out ./output/final.mp4

Clean up intermediate files after compositing:

montaj render --clean

What Happens Under the Hood

montaj run ./clips --prompt "tight cuts"

    ├── Creates project.json [pending]
    ├── Reads workflows/overlays.json (default)

    │   Agent takes over:
    ├── probe + snapshot → understand the clips
    ├── waveform_trim → detect and remove silence
    ├── transcribe → word-level transcript
    ├── rm_fillers → remove um/uh/hmm
    ├── select_takes → pick best takes from repeats
    ├── concat → join clips in a single encode pass
    ├── caption → animated caption track
    ├── overlays → custom JSX title cards, lower thirds
    └── resize → 9:16 for shorts/reels

Every step outputs a result to stdout. Steps chain — the output of one becomes the input of the next. No intermediate video files are created until concat — all editing steps work with trim specs (timestamp ranges) that get applied in a single encode pass.