Agent Integration
How agents use Montaj — MCP, HTTP API, CLI usage, and writing effective editing prompts.
Agent Integration
Montaj exposes three interfaces for agents to call steps — CLI, MCP, and HTTP API. All are optional. All wrap the same underlying executables. The agent uses whichever it has access to.
MCP Server
Montaj runs as a local MCP (Model Context Protocol) server via montaj mcp. This is started automatically by the MCP client (Claude Desktop, Claude Code). The agent calls steps as native tools — no shell access required.
How It Works
Claude Desktop opens
→ spawns: montaj mcp
→ montaj mcp reads steps/*.json, registers each as an MCP tool
→ agent calls: trim({input: "clip.mp4", start: 2.5, end: 8.3})
→ montaj mcp invokes the CLI executable, returns result
Session ends → process diesConfiguration
Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"montaj": { "command": "montaj", "args": ["mcp"] }
}
}Claude Code
Claude Code can use Montaj via MCP or directly through CLI access. With CLI access, the agent runs montaj commands directly — no MCP configuration needed.
Automatic Step Discovery
New steps are picked up automatically. Adding steps/my-step.py + steps/my-step.json makes it available as an MCP tool with no extra configuration.
The MCP server reads step schemas from:
- Built-in steps (
montaj/steps/) - User-global steps (
~/.montaj/steps/) - Project-local steps (
./steps/)
Tool Registration
Each step's JSON schema is used to register an MCP tool. The schema defines:
- name — the tool name (e.g.,
trim,transcribe) - description — what the step does (agent-readable)
- params — input parameters with types and descriptions
The agent sees these as native callable tools and can invoke them directly without constructing shell commands.
Output
The MCP server invokes the CLI executable under the hood and returns stdout as the tool result. Same output convention as CLI — result path or JSON on success, JSON error on failure.
HTTP API
montaj serve exposes a step execution API alongside the browser UI. Any HTTP-capable agent can call steps via POST — no shell access or MCP required. The UI uses the same API.
Start the Server
montaj serve # http://localhost:3000
montaj serve --network # bind to all interfaces (trusted networks only)Step Execution
POST /api/steps/trim body: { "input": "clip.mp4", "start": 2.5, "end": 8.3 }
POST /api/steps/transcribe body: { "input": "clip.mp4", "model": "medium.en" }
GET /api/steps returns: list of available steps with schemasThe server invokes the CLI executable and returns stdout as the response body. Same output convention as CLI.
Project Management
POST /api/run # create project: clips + prompt + workflow
GET /api/projects # list all projects
GET /api/projects?status=pending # agent polls for pending work
PUT /api/projects/{id} # update project.jsonSSE Streaming
GET /api/projects/:id/stream # SSE stream of project.json changesThe server watches the filesystem for project.json writes. Every change the agent makes pushes to connected browsers via Server-Sent Events.
How the Agent Loop Works
Agent (Claude, Cursor, etc.)
├── GET /api/projects?status=pending ← polls for pending work
├── reads project.json from workspace directly
├── makes editorial decisions
└── writes project.json [draft] to disk
│
└── file watcher detects change
│
└── SSE push → browser rerendersThe agent writes directly to disk. montaj serve watches. Every write immediately pushes to the browser.
The agent polls serve — serve does not notify the agent. This is a pull model, not push.
API Route Namespacing
All API routes are namespaced under /api/ so they never collide with React Router paths. The SPA catch-all at /{path} serves index.html for everything else — no Accept-header heuristics needed.
Using from External Agents
Any agent with HTTP access can use the API:
import requests
# Run a step
response = requests.post("http://localhost:3000/api/steps/probe", json={
"input": "/path/to/clip.mp4"
})
metadata = response.json()
# List available steps
steps = requests.get("http://localhost:3000/api/steps").json()CLI Usage from Agents
The simplest integration — agents with shell access run montaj commands directly. This works with Claude Code, Cursor, or any framework that can execute shell commands.
The Core Loop
1. The user provides clips, a prompt, and a preferred workflow
2. Agent reads the workflow from workflows/{name}.json
3. Agent applies editorial judgment (select/order/trim clips via probe + transcribe)
4. Agent executes workflow steps following the dependency graph
5. Agent writes/updates project.json in the project directory as it works
6. Agent probes the final output → sets inPoint: 0, outPoint: <duration>
7. Agent marks project as draft (status: "draft") when completeRunning Steps
montaj probe clip.mp4
montaj snapshot clip.mp4
montaj trim clip.mp4 --start 2.5 --end 8.3
montaj cut clip.mp4 --start 3.0 --end 7.5
montaj cut clip.mp4 --cuts '[[0,1.2],[5.3,7.8]]'
montaj materialize-cut clip.mp4 --inpoint 2.0 --outpoint 8.0
montaj materialize-cut spec.json
montaj waveform-trim clip.mp4 --threshold -30 --min-silence 0.3
montaj rm-nonspeech clip_spec.json --model base
montaj transcribe clip.mp4 --model base.en
montaj caption clip.mp4 --style word-by-word
montaj crop-spec --input spec.json --keep 8.5:14.8 --keep 40.0:end
montaj normalize clip.mp4 --target youtube
montaj resize clip.mp4 --ratio 9:16To see all available steps including project-local custom steps:
montaj step -hStep Chaining
Steps are composable at the shell level — stdout of one step is the --input of the next:
FILE=$(montaj step rm_fillers --input clip.mp4 --model base.en)
FILE=$(montaj step trim --input "$FILE" --start 5 --end 90)
FILE=$(montaj step resize --input "$FILE" --ratio 9:16)
# $FILE is the final output pathJSON Output
Use --json on any command for machine-readable output:
montaj probe clip.mp4 --json
# → {"duration": 12.5, "resolution": [1920, 1080], ...}Detecting the Interface
Agents should auto-detect which interface to use:
- Try
GET http://localhost:3000/api/projects?status=pending - If it responds → HTTP mode (montaj serve is running)
- If connection refused → CLI mode
Skills System
Montaj includes skill files that provide detailed task instructions for agents. The main skill file is SKILL.md at the repository root. Sub-skills provide specialized guidance:
| Skill | When to load |
|---|---|
serve | HTTP mode detected |
parallel | Multiple clips or workflow has foreach steps |
mcp | Running as MCP client |
select-takes | Executing montaj/select_takes in a workflow |
overlay | Executing montaj/overlay in a workflow |
write-overlay | Writing custom JSX overlay components |
Writing the Editing Prompt
The editing prompt is the primary way you communicate your creative vision to the agent. The prompt goes into project.json as the editingPrompt field and guides every decision the agent makes.
How Prompts Work
The agent reads:
- The workflow — the suggested step sequence and default params
- The editing prompt — your creative direction
- The source clips — via
probeandsnapshotto understand the content
The prompt modifies how the agent follows the workflow. It can cause the agent to skip steps, add steps, or change parameters.
Example Prompts
Tight Social Cut
tight cuts, remove filler, 9:16 for ReelsThis tells the agent to aggressively trim silence and filler words, and resize for vertical video.
Minimal Editing
keep it raw, minimal cuts, 16:9This causes the agent to skip rm_fillers and waveform_trim — preserving the natural flow of speech.
Captioned Short
clean edits, word-by-word captions, upbeat pacing, 9:16The agent will run the full clean pipeline plus captions with the word-by-word style.
Explainer with Animation
build an explainer video: clear transitions between sections, add title cards for each topic, dark theme with cyan accentsThe agent generates custom JSX overlays for title cards and transitions.
Deviation Rules
The agent follows the assigned workflow and deviates only when the prompt explicitly requires it:
| Prompt says | Agent action |
|---|---|
| "no captions" | Skip the caption step |
| "keep it raw" | Skip rm_fillers, waveform_trim |
| "YouTube format" | resize --ratio 16:9 instead of 9:16 |
| "no overlays" | Skip the overlay step |
Prompt + Workflow Interaction
The workflow provides the default plan. The prompt overrides it. Examples:
- Workflow:
short_captions(includes caption + resize 9:16) - Prompt: "no captions, 16:9" → agent skips caption step and resizes to 16:9
The agent does not invent a step sequence from scratch — it follows the assigned workflow and adapts based on the prompt.
Style Profiles
When a style profile is loaded, the agent uses it to inform editorial decisions — pacing, color palette, caption style, and editorial direction. Profiles are created via the style-profile skill and stored in ~/.montaj/profiles/.
Tips
- Be specific about aspect ratio (
9:16,16:9,1:1) - Mention caption style if you want captions (
word-by-word,karaoke,pop,subtitle) - Reference the platform (
for Reels,for YouTube) to guide format decisions - Use "tight cuts" or "keep it raw" to control editing aggressiveness
- Describe overlay style if you want title cards or lower thirds