The optional browser-based interface — upload, watch the agent work, review, and render.

UI Guide

The Montaj UI is an optional browser-based interface that wraps the entire pipeline. It does not replace the CLI — every action in the UI maps to a CLI command. The full pipeline works headlessly without it.

montaj serve   # starts local server + opens http://localhost:3000

Overview

The Modes

For video projects:

Upload — drop clips, write a prompt, select a workflow, hit Run
Live View — watch the agent build the edit in real time via SSE
Review — adjust the timeline, captions, and overlays when the agent finishes
Render — trigger the final render to MP4

For carousel projects:

Intake — name + prompt + aspect ratio + drop reference assets
Pending screen — copy a one-line message to your agent; watch its log lines stream as it builds slides
Canvas editor — slide grid, drag-to-reposition with snap guides, double-click overlays to edit text
Render modal — full-screen gallery of every PNG with a download-as-zip button

Tabs

The UI has four top-level tabs:

Tab	Description
Editor	Default view. Upload → live view → review flow.
Workflows	Node graph UI for building and editing workflows.
Overlays	Live preview environment for custom JSX overlay components.
Profiles	View and manage creator style profiles.

How `montaj serve` Works

montaj serve is a thin local HTTP + SSE server — the bridge between the browser and the filesystem.

montaj serve
  ├── POST /api/run              → receives clips + prompt + workflow, starts pipeline
  ├── GET  /api/projects         → list projects and their status
  ├── GET  /api/projects/:id/stream  → SSE stream of project.json changes
  └── file watcher               → watches workspace/ for project.json writes → SSE

The agent polls serve — serve does not notify the agent. The agent writes directly to disk. montaj serve watches. Every write immediately pushes to the browser.

Key Design Decisions

montaj serve is thin — no business logic, just file watching, SSE, and process spawning
Filesystem is the source of truth — agent writes project.json to disk, serve watches, browser reflects
No frame-by-frame browser rendering — native <video> + CSS overlays for fast preview
project.json is the only state — all edits mutate JSON in memory; save writes to disk
Every UI action has a CLI equivalent — the UI is a layer on top of the CLI

Tech Stack

The UI is built with Vite + React + Tailwind CSS. Source code lives in ui/ in the Montaj repository.

ui/
  src/
    app/
      ProjectList.tsx           # Project list (home)
      editor/                   # Editor tab
      WorkflowsPage.tsx         # Workflow node graph
      overlays/                 # JSX overlay preview
      profiles/                 # Creator style profiles
    components/
      PreviewPlayer.tsx         # <video> + CSS overlay rendering
      Timeline.tsx              # Clip / caption / overlay tracks
      NodeGraph.tsx             # Workflow builder
    lib/
      project.ts                # Read/write project.json
      sse.ts                    # SSE client
      overlay-eval.ts           # In-browser JSX compilation

Upload, Live View & Review

The Editor tab follows a three-phase flow: upload your clips, watch the agent work in real time, then review and adjust the result.

Upload

Drop clips, write a prompt, select a workflow, hit Run.

Drag-and-drop clip upload (or file picker)
Free-form prompt textarea: "tight cuts, remove filler, 9:16 for Reels"
Workflow selector: choose from available workflows (native + custom)
Run → POST /api/run to montaj serve → pipeline starts immediately

Live View

As the agent works, the UI updates in real time.

montaj serve watches project.json for any file change
Every write the agent makes — trim points added, clips reordered, captions cleaned — pushes to the browser via SSE
Timeline rerenders on each update
Preview player reflects the current state of the edit
You watch the edit take shape as the agent builds it

Review

When the agent marks the project draft, the UI surfaces it for human adjustment.

Available Controls

Full timeline with clip, caption, and overlay tracks
Preview player: native <video> + CSS overlays synced to scrubber
Caption editor: click to edit text inline, drag to retime
Overlay editor: add/remove/reposition title cards, lower thirds
Prompt bar: modify the prompt and re-run the agent
Save: writes updated project.json to disk

Skipping Review

Review is optional — click Render directly from live view if the first pass is good enough.

Render

Triggers the render pass. Progress streams back via SSE. Final MP4 lands in the project workspace directory.

Preview vs. Final Render

The browser preview is an approximation — CSS overlays are close but not pixel-perfect to the final render. Font rendering and CSS compositing differ slightly from the Puppeteer environment. The render output is what matters.

The preview player uses:

Native <video> element
CSS-positioned overlays (absolutely positioned divs, shown/hidden by currentTime)
Timeline scrubber synced to video.currentTime

Carousel Editor

Carousel projects use a dedicated editor — there's no timeline, no scrubber, no live view in the video sense. The flow is:

Intake (`/projects/new`)

The carousel form takes name, profile, prompt (required), aspect ratio, and a drop zone for reference assets. Assets dropped here are uploaded to the workspace at submit and surface in project.assets[]. Hit "Create carousel" and you land on the pending screen.

Pending Screen

The center of the editing area shows:

A bold "Message your agent to start" headline
A subtitle: "Nothing will happen automatically. Copy this and send it to your agent."
A blue-bordered card with a copy-to-clipboard button. The text resolves to "There is a new project pending: "<name>". Please see @<root-skill-path> and start. Talk to me if you run into questions."

Once the agent starts working and emits log lines via POST /api/projects/{id}/log, the same area swaps to a live status readout: spinner + "N assets attached. Agent is working:" + the latest log line in a blue mono pill. The right rail keeps showing the asset library throughout.

When the agent flips status off pending, the canvas takes over.

Canvas

[Slide grid]  |  [Canvas + hint]  |  [Property panel]
                                    [Asset library]

Slide grid — drag to reorder, hover for duplicate / delete, click to select.
Canvas — direct manipulation of the selected slide. Drag elements to move, resize from the eight handles, rotate from the handle above the element.
Snap guides — pink lines appear when an element gets within 2.5% of the slide's center axes or any of the four edges. Rotation snaps to 0°/90°/180°/270° within ±5°.
In-place text editing — double-click an overlay with a text prop to edit it directly on the canvas; commit with Cmd+Enter, cancel with Esc.
Property panel — exact x/y/w/h/rotation fields, base color picker for the slide, per-prop editors for overlay props.
Refresh button — top-left of the canvas area. Re-fetches project.json from disk if you bypassed SSE.
Render button — top-right. Flips status to final, opens the render modal.

A small line below the canvas: "Drag elements to reposition. Ask the agent for any other changes." The canvas is for polish; structural edits go through the agent.

Streams the carousel renderer's log lines while running. On completion, opens a full-screen overlay:

Left: a grid of every slide as a clickable thumbnail (each opens the full-res PNG in a new tab)
Right: "Render complete · N slides ready", the absolute output dir, Download all (.zip) button, Close

The zip endpoint (GET /api/projects/{id}/render-zip) bundles every PNG as <project-name>-slides.zip and excludes manifest.json from the archive (it stays on disk for tooling).

Timeline Editor

The timeline is the core review interface. It shows all tracks in the project — video clips, captions, and overlays — on a multi-track timeline.

Track Layout

Track 0 (video):    [clip-0 ████████] [clip-1 ████████████]
Track 1 (overlays): [hook ███] [lower-third ██████] [outro █████]
Track 2 (captions): [captions ██████████████████████████████████]

Track 0 — primary video track (clips with inPoint/outPoint and timeline start/end)
Track 1+ — overlay tracks (JSX overlays, images, video clips)
Caption track — word-level caption segments

Interactions

Clip Track

View clip arrangement and ordering
See inPoint/outPoint boundaries for each clip
Clip positions map to the output timeline

Overlay Track

Visual representation of each overlay's time window
Overlays show their start/end on the timeline

Caption Track

Caption segments shown along the timeline
Each segment displays its text content

Scrubber

The scrubber syncs with the preview player's video.currentTime. Moving the scrubber updates:

The preview player position
Which captions are visible
Which overlays are shown

Undo Stack

The UI maintains an in-memory undo stack for the current review session. Every caption, overlay, or trim edit is undoable without touching disk. The undo stack is cleared on save or page reload.

Saving

Click Save to write the updated project.json to disk. This creates a git commit at the state transition for version history.

Caption Editor

The caption editor lets you refine captions after the agent generates them. Captions are stored as data (text + word-level timestamps) in project.json, not as rendered pixels.

Inline Text Editing

Click any caption segment to edit its text directly in the timeline or in the editor panel. Changes update the segments array in the caption track.

Retiming

Drag caption segments to adjust their timing. This updates the start/end values for the segment and the individual word timestamps within it.

Caption Styles

The agent chooses a caption style when running the caption step. You can change the style during review:

Style	Description
`word-by-word`	One word at a time, spring pop-in animation
`pop`	Segment-at-a-time with scale entry
`karaoke`	Words highlight progressively as they're spoken
`subtitle`	Static line at bottom, segments replace sequentially

Caption Data Format

Captions are stored inline in the project's tracks:

{
  "id": "captions",
  "type": "caption",
  "style": "word-by-word",
  "segments": [
    {
      "text": "Hello world",
      "start": 0.0,
      "end": 1.2,
      "words": [
        { "word": "Hello", "start": 0.0, "end": 0.5 },
        { "word": "world", "start": 0.5, "end": 1.2 }
      ]
    }
  ]
}

Preview vs. Render

In the browser, captions are rendered as CSS-positioned divs synced to the video playback time. This is an approximation — the final render uses Puppeteer to render the caption template frame-by-frame for pixel-perfect output.

Workflow Builder

The Workflows tab provides a visual node graph UI for building and editing workflows. Inspired by n8n, it lets you construct editing pipelines visually.

Interface

Sidebar                    Canvas
─────────────────────────────────────────────────────
Native steps:              ┌──────────┐
  probe                    │  probe   ├──► ┌─────────────┐
  rm_fillers               └──────────┘    │  rm_fillers │
  waveform_trim                            └──────┬──────┘
  transcribe                                      │
  trim                                   click to configure:
  concat                                 sensitivity: [====|  ] 0.8
  resize                                 words: [um, uh, hmm]  + add
  caption
  ...

Custom steps:
  viral-hook-detector
  b-roll-inserter
  + New step

Building a Workflow

Drag steps from the sidebar onto the canvas
Connect nodes to define data flow (edges = needs dependencies)
Click a node to configure its params — controls are rendered from the step's JSON schema
Invalid connections (type mismatch) are rejected visually
Save → writes workflows/<name>.json to disk
Run → executes the workflow against the current clips

Step Discovery

The sidebar shows all available steps across all three scopes:

Native steps — built into Montaj
User-global steps — from ~/.montaj/steps/
Project-local steps — from ./steps/ in the current project

Custom steps appear automatically — no registration needed. Adding steps/my-step.py + steps/my-step.json makes it appear in the sidebar.

Node Configuration

Clicking a node opens its configuration panel. The controls are dynamically generated from the step's JSON schema:

String params → text inputs
Number params → number inputs or sliders
Enum params → dropdowns
Boolean params → toggles

Default values from the workflow file are pre-populated. Changes here update the params field in the workflow JSON.

CLI Equivalent

montaj workflow list              # list all available workflows
montaj workflow new <name>        # scaffold a new workflow file
montaj workflow edit <name>       # open in the node graph UI
montaj workflow run <name> ./clips --prompt "..."

Output Format

The node graph saves directly to the standard workflow JSON format:

{
  "name": "my-workflow",
  "description": "Custom editing pipeline",
  "steps": [
    { "id": "probe", "uses": "montaj/probe" },
    { "id": "silence", "uses": "montaj/waveform_trim", "foreach": "clips" },
    { "id": "transcribe", "uses": "montaj/transcribe", "needs": ["silence"] }
  ]
}

The saved file is immediately available to montaj run --workflow <name>.

Overlay Preview

The Overlays tab provides a live preview environment for custom JSX overlay components.

How It Works

Select any overlay JSX file from the current project or global overlays
The overlay is compiled and rendered at 1080 x 1920 (design resolution), scaled to fit the viewport
A file watcher via SSE recompiles and rerenders automatically on every save
Compile errors are displayed inline

Preview Pipeline

When montaj serve is running, the UI previews overlays and captions live in the browser via ui/src/lib/overlay-eval.ts:

The JSX file is fetched from the filesystem
Transpiled in-browser by @babel/standalone
Called directly on every animation frame

This is an approximation — font rendering and CSS compositing differ slightly from the Puppeteer render environment. The render output is what matters.

Real-Time Iteration

The file watcher makes this a rapid iteration loop:

Edit JSX in your editor → save → UI recompiles → preview updates instantly

No manual refresh needed. The SSE connection detects the file change and triggers recompilation.

Component Globals

Overlay components have access to these globals at render time:

frame — current frame number
fps — frames per second (from project settings)
props — arbitrary data from the overlay item in project.json
interpolate(frame, inputRange, outputRange) — map frame number to any value
spring({ frame, fps, config }) — physics-based easing (mass, stiffness, damping)

Design Resolution

Overlays are always designed and rendered at 1080 x 1920 regardless of the output resolution. The render pipeline upscales at compose time (e.g., 2x for 4K output at 2160 x 3840).

UI Guide

On this page