Montaj

Montaj is an open-source video and carousel editing toolkit for AI agents. It is CLI-first, agent-native, and runs entirely on your local machine.

What Montaj Is

Montaj is a CLIP — a CLI Program for agents. It clips onto your existing AI agent (Claude Code, Cursor, or any harness) and gives it the specialized tools to edit video. Built-in steps cover the full editing pipeline. The agent decides what to run, in what order, and with what parameters.

The fundamental dependency is an agent. Montaj does not edit on its own. It provides the tools; the agent makes the creative decisions.

Who It's For

AI agent developers who want to give their agents video editing capabilities
Content creators who use AI coding assistants and want to automate post-production
Developers building video processing pipelines with agent orchestration

How It Works

1. Upload clips (or assets) + write an editing prompt
2. montaj creates project.json [pending]
3. Agent picks it up, reads the workflow, calls steps as tools
4. Agent writes project.json as it works — UI updates live via SSE
5. Agent marks project [draft]
6. Human reviews in browser (optional) — tweaks — marks [final]
7. Render engine → final MP4 (or N PNG slides for carousels)

The same flow shape covers all four project types — editing, music_video, ai_video, and carousel. What changes is what the agent does in step 3 and what the renderer produces in step 7.

What's Inside

steps/              Step executables + JSON schemas (probe, trim, transcribe, generate, etc.)
workflows/          Editing plans (clean_cut, overlays, ai_video, lyrics_video, etc.)
skills/             Agent skill contracts (onboarding, edit-session, ai-video-plan, ai-video-generate, etc.)
connectors/         API connectors (Kling, Gemini, OpenAI)

render/             React + Puppeteer + ffmpeg render engine
serve/              Local HTTP + SSE server (montaj serve)
ui/                 Browser UI (Vite + React + Tailwind)

Key Principles

Agent-native interface — CLI, HTTP, and MCP; steps are callable from any harness without writing code
Editing existing footage — trim, cut, transcribe, composite against source clips
AI video generation — storyboard-driven scene generation via Kling, with music and voiceover
Animation generation — agent can generate React overlay components rendered frame-by-frame via headless Chrome
Image carousels — slide-based design surface for Instagram and TikTok photo posts; renders to N PNGs with a download-as-zip flow
Local-first — ffmpeg + whisper.cpp, no external APIs required for core editing (just an agent)
Open source — MIT, self-hosted, no vendor lock-in

Agent Agnostic

Montaj exposes three interfaces for agents to call steps — CLI, MCP, and HTTP API. Neither is mandatory. The agent uses whichever it has access to. All three wrap the same underlying executables.

Introduction