MontajMontajdocs

Agent Integration

How agents use Montaj — MCP, HTTP API, CLI usage, and writing effective editing prompts.

Agent Integration

Montaj exposes three interfaces for agents to call steps — CLI, MCP, and HTTP API. All are optional. All wrap the same underlying executables. The agent uses whichever it has access to.


MCP Server

Montaj runs as a local MCP (Model Context Protocol) server via montaj mcp. This is started automatically by the MCP client (Claude Desktop, Claude Code). The agent calls steps as native tools — no shell access required.

How It Works

Claude Desktop opens
  → spawns: montaj mcp
  → montaj mcp reads steps/*.json, registers each as an MCP tool
  → agent calls: trim({input: "clip.mp4", start: 2.5, end: 8.3})
  → montaj mcp invokes the CLI executable, returns result
Session ends → process dies

Configuration

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "montaj": { "command": "montaj", "args": ["mcp"] }
  }
}

Claude Code

Claude Code can use Montaj via MCP or directly through CLI access. With CLI access, the agent runs montaj commands directly — no MCP configuration needed.

Automatic Step Discovery

New steps are picked up automatically. Adding steps/my-step.py + steps/my-step.json makes it available as an MCP tool with no extra configuration.

The MCP server reads step schemas from:

  1. Built-in steps (montaj/steps/)
  2. User-global steps (~/.montaj/steps/)
  3. Project-local steps (./steps/)

Tool Registration

Each step's JSON schema is used to register an MCP tool. The schema defines:

  • name — the tool name (e.g., trim, transcribe)
  • description — what the step does (agent-readable)
  • params — input parameters with types and descriptions

The agent sees these as native callable tools and can invoke them directly without constructing shell commands.

Output

The MCP server invokes the CLI executable under the hood and returns stdout as the tool result. Same output convention as CLI — result path or JSON on success, JSON error on failure.


HTTP API

montaj serve exposes a step execution API alongside the browser UI. Any HTTP-capable agent can call steps via POST — no shell access or MCP required. The UI uses the same API.

Start the Server

montaj serve                # http://localhost:3000
montaj serve --network      # bind to all interfaces (trusted networks only)

Step Execution

POST /api/steps/trim        body: { "input": "clip.mp4", "start": 2.5, "end": 8.3 }
POST /api/steps/transcribe  body: { "input": "clip.mp4", "model": "medium.en" }
GET  /api/steps             returns: list of available steps with schemas

The server invokes the CLI executable and returns stdout as the response body. Same output convention as CLI.

Project Management

POST /api/run               # create project: clips + prompt + workflow
GET  /api/projects          # list all projects
GET  /api/projects?status=pending  # agent polls for pending work
PUT  /api/projects/{id}     # update project.json

SSE Streaming

GET /api/projects/:id/stream   # SSE stream of project.json changes

The server watches the filesystem for project.json writes. Every change the agent makes pushes to connected browsers via Server-Sent Events.

How the Agent Loop Works

Agent (Claude, Cursor, etc.)
  ├── GET /api/projects?status=pending   ← polls for pending work
  ├── reads project.json from workspace directly
  ├── makes editorial decisions
  └── writes project.json [draft] to disk

              └── file watcher detects change

                        └── SSE push → browser rerenders

The agent writes directly to disk. montaj serve watches. Every write immediately pushes to the browser.

The agent polls serve — serve does not notify the agent. This is a pull model, not push.

API Route Namespacing

All API routes are namespaced under /api/ so they never collide with React Router paths. The SPA catch-all at /{path} serves index.html for everything else — no Accept-header heuristics needed.

Using from External Agents

Any agent with HTTP access can use the API:

import requests

# Run a step
response = requests.post("http://localhost:3000/api/steps/probe", json={
    "input": "/path/to/clip.mp4"
})
metadata = response.json()

# List available steps
steps = requests.get("http://localhost:3000/api/steps").json()

CLI Usage from Agents

The simplest integration — agents with shell access run montaj commands directly. This works with Claude Code, Cursor, or any framework that can execute shell commands.

The Core Loop

1. The user provides clips, a prompt, and a preferred workflow
2. Agent reads the workflow from workflows/{name}.json
3. Agent applies editorial judgment (select/order/trim clips via probe + transcribe)
4. Agent executes workflow steps following the dependency graph
5. Agent writes/updates project.json in the project directory as it works
6. Agent probes the final output → sets inPoint: 0, outPoint: <duration>
7. Agent marks project as draft (status: "draft") when complete

Running Steps

montaj probe clip.mp4
montaj snapshot clip.mp4
montaj trim clip.mp4 --start 2.5 --end 8.3
montaj cut clip.mp4 --start 3.0 --end 7.5
montaj cut clip.mp4 --cuts '[[0,1.2],[5.3,7.8]]'
montaj materialize-cut clip.mp4 --inpoint 2.0 --outpoint 8.0
montaj materialize-cut spec.json
montaj waveform-trim clip.mp4 --threshold -30 --min-silence 0.3
montaj rm-nonspeech clip_spec.json --model base
montaj transcribe clip.mp4 --model base.en
montaj caption clip.mp4 --style word-by-word
montaj crop-spec --input spec.json --keep 8.5:14.8 --keep 40.0:end
montaj normalize clip.mp4 --target youtube
montaj resize clip.mp4 --ratio 9:16

To see all available steps including project-local custom steps:

montaj step -h

Step Chaining

Steps are composable at the shell level — stdout of one step is the --input of the next:

FILE=$(montaj step rm_fillers --input clip.mp4 --model base.en)
FILE=$(montaj step trim --input "$FILE" --start 5 --end 90)
FILE=$(montaj step resize --input "$FILE" --ratio 9:16)
# $FILE is the final output path

JSON Output

Use --json on any command for machine-readable output:

montaj probe clip.mp4 --json
# → {"duration": 12.5, "resolution": [1920, 1080], ...}

Detecting the Interface

Agents should auto-detect which interface to use:

  1. Try GET http://localhost:3000/api/projects?status=pending
  2. If it responds → HTTP mode (montaj serve is running)
  3. If connection refused → CLI mode

Skills System

Montaj includes skill files that provide detailed task instructions for agents. The main skill file is SKILL.md at the repository root. Sub-skills provide specialized guidance:

SkillWhen to load
serveHTTP mode detected
parallelMultiple clips or workflow has foreach steps
mcpRunning as MCP client
select-takesExecuting montaj/select_takes in a workflow
overlayExecuting montaj/overlay in a workflow
write-overlayWriting custom JSX overlay components

Writing the Editing Prompt

The editing prompt is the primary way you communicate your creative vision to the agent. The prompt goes into project.json as the editingPrompt field and guides every decision the agent makes.

How Prompts Work

The agent reads:

  1. The workflow — the suggested step sequence and default params
  2. The editing prompt — your creative direction
  3. The source clips — via probe and snapshot to understand the content

The prompt modifies how the agent follows the workflow. It can cause the agent to skip steps, add steps, or change parameters.

Example Prompts

Tight Social Cut

tight cuts, remove filler, 9:16 for Reels

This tells the agent to aggressively trim silence and filler words, and resize for vertical video.

Minimal Editing

keep it raw, minimal cuts, 16:9

This causes the agent to skip rm_fillers and waveform_trim — preserving the natural flow of speech.

Captioned Short

clean edits, word-by-word captions, upbeat pacing, 9:16

The agent will run the full clean pipeline plus captions with the word-by-word style.

Explainer with Animation

build an explainer video: clear transitions between sections, add title cards for each topic, dark theme with cyan accents

The agent generates custom JSX overlays for title cards and transitions.

Deviation Rules

The agent follows the assigned workflow and deviates only when the prompt explicitly requires it:

Prompt saysAgent action
"no captions"Skip the caption step
"keep it raw"Skip rm_fillers, waveform_trim
"YouTube format"resize --ratio 16:9 instead of 9:16
"no overlays"Skip the overlay step

Prompt + Workflow Interaction

The workflow provides the default plan. The prompt overrides it. Examples:

  • Workflow: short_captions (includes caption + resize 9:16)
  • Prompt: "no captions, 16:9" → agent skips caption step and resizes to 16:9

The agent does not invent a step sequence from scratch — it follows the assigned workflow and adapts based on the prompt.

Style Profiles

When a style profile is loaded, the agent uses it to inform editorial decisions — pacing, color palette, caption style, and editorial direction. Profiles are created via the style-profile skill and stored in ~/.montaj/profiles/.

Tips

  • Be specific about aspect ratio (9:16, 16:9, 1:1)
  • Mention caption style if you want captions (word-by-word, karaoke, pop, subtitle)
  • Reference the platform (for Reels, for YouTube) to guide format decisions
  • Use "tight cuts" or "keep it raw" to control editing aggressiveness
  • Describe overlay style if you want title cards or lower thirds