How agents use Montaj — MCP, HTTP API, CLI usage, skills, and writing effective editing prompts.

Agent Integration

Montaj exposes three interfaces for agents to call steps — CLI, MCP, and HTTP API. All are optional. All wrap the same underlying executables. The agent uses whichever it has access to.

MCP Server

Montaj runs as a local MCP (Model Context Protocol) server via montaj mcp. This is started automatically by the MCP client (Claude Desktop, Claude Code). The agent calls steps as native tools — no shell access required.

How It Works

Claude Desktop opens
  → spawns: montaj mcp
  → montaj mcp reads steps/*.json, registers each as an MCP tool
  → agent calls: trim({input: "clip.mp4", start: 2.5, end: 8.3})
  → montaj mcp invokes the CLI executable, returns result
Session ends → process dies

Configuration

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "montaj": { "command": "montaj", "args": ["mcp"] }
  }
}

Claude Code

Claude Code can use Montaj via MCP or directly through CLI access. With CLI access, the agent runs montaj commands directly — no MCP configuration needed.

Automatic Step Discovery

New steps are picked up automatically. Adding steps/my-step.py + steps/my-step.json makes it available as an MCP tool with no extra configuration.

The MCP server reads step schemas from:

Built-in steps (montaj/steps/)
User-global steps (~/.montaj/steps/)
Project-local steps (./steps/)

Tool Registration

Each step's JSON schema is used to register an MCP tool. The schema defines:

name — the tool name (e.g., trim, transcribe)
description — what the step does (agent-readable)
params — input parameters with types and descriptions

The agent sees these as native callable tools and can invoke them directly without constructing shell commands.

Output

The MCP server invokes the CLI executable under the hood and returns stdout as the tool result. Same output convention as CLI — result path or JSON on success, JSON error on failure.

HTTP API

montaj serve exposes a step execution API alongside the browser UI. Any HTTP-capable agent can call steps via POST — no shell access or MCP required. The UI uses the same API.

Start the Server

montaj serve                # http://localhost:3000
montaj serve --network      # bind to all interfaces (trusted networks only)

Step Execution

POST /api/steps/trim        body: { "input": "clip.mp4", "start": 2.5, "end": 8.3 }
POST /api/steps/transcribe  body: { "input": "clip.mp4", "model": "medium.en" }
GET  /api/steps             returns: list of available steps with schemas

The server invokes the CLI executable and returns stdout as the response body. Same output convention as CLI.

Project Management

POST /api/run               # create project: clips + prompt + workflow
GET  /api/projects          # list all projects
GET  /api/projects?status=pending  # agent polls for pending work
PUT  /api/projects/{id}     # update project.json

SSE Streaming

GET /api/projects/:id/stream   # SSE stream of project.json changes

The server watches the filesystem for project.json writes. Every change the agent makes pushes to connected browsers via Server-Sent Events.

How the Agent Loop Works

Agent (Claude, Cursor, etc.)
  ├── GET /api/projects?status=pending   ← polls for pending work
  ├── reads project.json from workspace directly
  ├── makes editorial decisions
  └── writes project.json [draft] to disk
              │
              └── file watcher detects change
                        │
                        └── SSE push → browser rerenders

The agent writes directly to disk. montaj serve watches. Every write immediately pushes to the browser.

The agent polls serve — serve does not notify the agent. This is a pull model, not push.

API Route Namespacing

All API routes are namespaced under /api/ so they never collide with React Router paths. The SPA catch-all at /{path} serves index.html for everything else — no Accept-header heuristics needed.

Using from External Agents

Any agent with HTTP access can use the API:

import requests

# Run a step
response = requests.post("http://localhost:3000/api/steps/probe", json={
    "input": "/path/to/clip.mp4"
})
metadata = response.json()

# List available steps
steps = requests.get("http://localhost:3000/api/steps").json()

CLI Usage from Agents

The simplest integration — agents with shell access run montaj commands directly. This works with Claude Code, Cursor, or any framework that can execute shell commands.

The Core Loop

1. The user provides clips, a prompt, and a preferred workflow
2. Agent reads the workflow from workflows/{name}.json
3. Agent applies editorial judgment (select/order/trim clips via probe + transcribe)
4. Agent executes workflow steps following the dependency graph
5. Agent writes/updates project.json in the project directory as it works
6. Agent probes the final output → sets inPoint: 0, outPoint: <duration>
7. Agent marks project as draft (status: "draft") when complete

Running Steps

montaj probe clip.mp4
montaj snapshot clip.mp4
montaj trim clip.mp4 --start 2.5 --end 8.3
montaj cut clip.mp4 --start 3.0 --end 7.5
montaj cut clip.mp4 --cuts '[[0,1.2],[5.3,7.8]]'
montaj materialize-cut clip.mp4 --inpoint 2.0 --outpoint 8.0
montaj materialize-cut spec.json
montaj waveform-trim clip.mp4 --threshold -30 --min-silence 0.3
montaj rm-nonspeech clip_spec.json --model base
montaj transcribe clip.mp4 --model base.en
montaj caption clip.mp4 --style word-by-word
montaj crop-spec --input spec.json --keep 8.5:14.8 --keep 40.0:end
montaj normalize clip.mp4 --target youtube
montaj resize clip.mp4 --ratio 9:16
montaj generate-music --prompt "upbeat electronic" --out music.wav
montaj generate-voiceover --text "Welcome" --out vo.wav

To see all available steps including project-local custom steps:

montaj step -h

Step Chaining

Steps are composable at the shell level — stdout of one step is the --input of the next:

FILE=$(montaj step rm_fillers --input clip.mp4 --model base.en)
FILE=$(montaj step trim --input "$FILE" --start 5 --end 90)
FILE=$(montaj step resize --input "$FILE" --ratio 9:16)
# $FILE is the final output path

JSON Output

Use --json on any command for machine-readable output:

montaj probe clip.mp4 --json
# → {"duration": 12.5, "resolution": [1920, 1080], ...}

Detecting the Interface

Agents should auto-detect which interface to use:

Try GET http://localhost:3000/api/projects?status=pending
If it responds → HTTP mode (montaj serve is running)
If connection refused → CLI mode

Skills System

Montaj includes skill files that provide detailed task instructions for agents. The main skill file is SKILL.md at the repository root. Skills are organized into two types:

Step skills are loaded automatically when the agent encounters a matching step in a workflow:

Skill	When to load
`ai-video-plan`	`montaj/ai-video-plan` step or `projectType: "ai_video"` — story clarification, storyboard writes, approval gate
`ai-video-generate`	After storyboard approval — scene generation, audio assembly, regenQueue processing
`eval-scenes`	After generating scenes — quality evaluation + retry loop
`select-takes`	Executing `montaj/select_takes` in a workflow
`overlay`	Executing `montaj/overlay` in a workflow
`animation-sections`	Building animation-only sections via JSX overlays
`lyrics-video`	Working on a `lyrics_video` workflow project
`carousel`	`montaj/carousel` step or `projectType: "carousel"` — slide schema, image generation, overlay authoring, render via `render-carousel.js`
`waveform-silence`	Visual silence detection via waveform images

Reference skills are loaded on demand for specialized guidance:

Skill	When to load
`serve`	HTTP mode detected
`parallel`	Multiple clips or workflow has `foreach` steps
`mcp`	Running as MCP client
`write-overlay`	Writing custom JSX overlay components
`camera-vocabulary`	Planning scenes — shot scale + camera move selection
`style-profile`	Creating or updating a creator style profile
`workflow-builder`	Creating or editing workflows
`onboarding`	Orientation for new agents/users
`edit-session`	Interactive editing reference

Skills can declare sub-skills — other skills they automatically load at runtime. For example, ai-video-plan loads camera-vocabulary when planning scenes, and ai-video-generate loads eval-scenes for quality evaluation.

Writing the Editing Prompt

The editing prompt is the primary way you communicate your creative vision to the agent. The prompt goes into project.json as the editingPrompt field and guides every decision the agent makes.

How Prompts Work

The agent reads:

The workflow — the suggested step sequence and default params
The editing prompt — your creative direction
The source clips — via probe and snapshot to understand the content

The prompt modifies how the agent follows the workflow. It can cause the agent to skip steps, add steps, or change parameters.

Example Prompts

tight cuts, remove filler, 9:16 for Reels

This tells the agent to aggressively trim silence and filler words, and resize for vertical video.

Minimal Editing

keep it raw, minimal cuts, 16:9

This causes the agent to skip rm_fillers and waveform_trim — preserving the natural flow of speech.

Captioned Short

clean edits, word-by-word captions, upbeat pacing, 9:16

The agent will run the full clean pipeline plus captions with the word-by-word style.

Explainer with Animation

build an explainer video: clear transitions between sections, add title cards for each topic, dark theme with cyan accents

The agent generates custom JSX overlays for title cards and transitions.

AI Video

a 30-second animated story about a dog exploring a magical forest, 16:9, cartoon style

The agent uses the ai_video workflow — writes a storyboard, generates reference images, waits for approval, then generates scenes via Kling.

Deviation Rules

The agent follows the assigned workflow and deviates only when the prompt explicitly requires it:

Prompt says	Agent action
"no captions"	Skip the `caption` step
"keep it raw"	Skip `rm_fillers`, `waveform_trim`
"YouTube format"	`resize --ratio 16:9` instead of 9:16
"no overlays"	Skip the `overlay` step

Prompt + Workflow Interaction

The workflow provides the default plan. The prompt overrides it. Examples:

Workflow: short_captions (includes caption + resize 9:16)
Prompt: "no captions, 16:9" → agent skips caption step and resizes to 16:9

The agent does not invent a step sequence from scratch — it follows the assigned workflow and adapts based on the prompt.

Style Profiles

When a style profile is loaded, the agent uses it to inform editorial decisions — pacing, color palette, caption style, and editorial direction. Profiles are created via the style-profile skill and stored in ~/.montaj/profiles/.

Tips

Be specific about aspect ratio (9:16, 16:9, 1:1)
Mention caption style if you want captions (word-by-word, karaoke, pop, subtitle)
Reference the platform (for Reels, for YouTube) to guide format decisions
Use "tight cuts" or "keep it raw" to control editing aggressiveness
Describe overlay style if you want title cards or lower thirds

Agent Integration

On this page