MontajMontajdocs

UI Guide

The optional browser-based interface — upload, watch the agent work, review, and render.

UI Guide

The Montaj UI is an optional browser-based interface that wraps the entire pipeline. It does not replace the CLI — every action in the UI maps to a CLI command. The full pipeline works headlessly without it.

montaj serve   # starts local server + opens http://localhost:3000

Overview

The Four Modes

  1. Upload — drop clips, write a prompt, select a workflow, hit Run
  2. Live View — watch the agent build the edit in real time via SSE
  3. Review — adjust the timeline, captions, and overlays when the agent finishes
  4. Render — trigger the final render to MP4

Tabs

The UI has four top-level tabs:

TabDescription
EditorDefault view. Upload → live view → review flow.
WorkflowsNode graph UI for building and editing workflows.
OverlaysLive preview environment for custom JSX overlay components.
ProfilesView and manage creator style profiles.

How montaj serve Works

montaj serve is a thin local HTTP + SSE server — the bridge between the browser and the filesystem.

montaj serve
  ├── POST /api/run              → receives clips + prompt + workflow, starts pipeline
  ├── GET  /api/projects         → list projects and their status
  ├── GET  /api/projects/:id/stream  → SSE stream of project.json changes
  └── file watcher               → watches workspace/ for project.json writes → SSE

The agent polls serve — serve does not notify the agent. The agent writes directly to disk. montaj serve watches. Every write immediately pushes to the browser.

Key Design Decisions

  • montaj serve is thin — no business logic, just file watching, SSE, and process spawning
  • Filesystem is the source of truth — agent writes project.json to disk, serve watches, browser reflects
  • No frame-by-frame browser rendering — native <video> + CSS overlays for fast preview
  • project.json is the only state — all edits mutate JSON in memory; save writes to disk
  • Every UI action has a CLI equivalent — the UI is a layer on top of the CLI

Tech Stack

The UI is built with Vite + React + Tailwind CSS. Source code lives in ui/ in the Montaj repository.

ui/
  src/
    app/
      ProjectList.tsx           # Project list (home)
      editor/                   # Editor tab
      WorkflowsPage.tsx         # Workflow node graph
      overlays/                 # JSX overlay preview
      profiles/                 # Creator style profiles
    components/
      PreviewPlayer.tsx         # <video> + CSS overlay rendering
      Timeline.tsx              # Clip / caption / overlay tracks
      NodeGraph.tsx             # Workflow builder
    lib/
      project.ts                # Read/write project.json
      sse.ts                    # SSE client
      overlay-eval.ts           # In-browser JSX compilation

Upload, Live View & Review

The Editor tab follows a three-phase flow: upload your clips, watch the agent work in real time, then review and adjust the result.

Upload

Drop clips, write a prompt, select a workflow, hit Run.

  • Drag-and-drop clip upload (or file picker)
  • Free-form prompt textarea: "tight cuts, remove filler, 9:16 for Reels"
  • Workflow selector: choose from available workflows (native + custom)
  • RunPOST /api/run to montaj serve → pipeline starts immediately

Live View

As the agent works, the UI updates in real time.

  • montaj serve watches project.json for any file change
  • Every write the agent makes — trim points added, clips reordered, captions cleaned — pushes to the browser via SSE
  • Timeline rerenders on each update
  • Preview player reflects the current state of the edit
  • You watch the edit take shape as the agent builds it

Review

When the agent marks the project draft, the UI surfaces it for human adjustment.

Available Controls

  • Full timeline with clip, caption, and overlay tracks
  • Preview player: native <video> + CSS overlays synced to scrubber
  • Caption editor: click to edit text inline, drag to retime
  • Overlay editor: add/remove/reposition title cards, lower thirds
  • Prompt bar: modify the prompt and re-run the agent
  • Save: writes updated project.json to disk

Skipping Review

Review is optional — click Render directly from live view if the first pass is good enough.

Render

Triggers the render pass. Progress streams back via SSE. Final MP4 lands in the project workspace directory.

Preview vs. Final Render

The browser preview is an approximation — CSS overlays are close but not pixel-perfect to the final render. Font rendering and CSS compositing differ slightly from the Puppeteer environment. The render output is what matters.

The preview player uses:

  • Native <video> element
  • CSS-positioned overlays (absolutely positioned divs, shown/hidden by currentTime)
  • Timeline scrubber synced to video.currentTime

Timeline Editor

The timeline is the core review interface. It shows all tracks in the project — video clips, captions, and overlays — on a multi-track timeline.

Track Layout

Track 0 (video):    [clip-0 ████████] [clip-1 ████████████]
Track 1 (overlays): [hook ███] [lower-third ██████] [outro █████]
Track 2 (captions): [captions ██████████████████████████████████]
  • Track 0 — primary video track (clips with inPoint/outPoint and timeline start/end)
  • Track 1+ — overlay tracks (JSX overlays, images, video clips)
  • Caption track — word-level caption segments

Interactions

Clip Track

  • View clip arrangement and ordering
  • See inPoint/outPoint boundaries for each clip
  • Clip positions map to the output timeline

Overlay Track

  • Visual representation of each overlay's time window
  • Overlays show their start/end on the timeline

Caption Track

  • Caption segments shown along the timeline
  • Each segment displays its text content

Scrubber

The scrubber syncs with the preview player's video.currentTime. Moving the scrubber updates:

  • The preview player position
  • Which captions are visible
  • Which overlays are shown

Undo Stack

The UI maintains an in-memory undo stack for the current review session. Every caption, overlay, or trim edit is undoable without touching disk. The undo stack is cleared on save or page reload.

Saving

Click Save to write the updated project.json to disk. This creates a git commit at the state transition for version history.


Caption Editor

The caption editor lets you refine captions after the agent generates them. Captions are stored as data (text + word-level timestamps) in project.json, not as rendered pixels.

Inline Text Editing

Click any caption segment to edit its text directly in the timeline or in the editor panel. Changes update the segments array in the caption track.

Retiming

Drag caption segments to adjust their timing. This updates the start/end values for the segment and the individual word timestamps within it.

Caption Styles

The agent chooses a caption style when running the caption step. You can change the style during review:

StyleDescription
word-by-wordOne word at a time, spring pop-in animation
popSegment-at-a-time with scale entry
karaokeWords highlight progressively as they're spoken
subtitleStatic line at bottom, segments replace sequentially

Caption Data Format

Captions are stored inline in the project's tracks:

{
  "id": "captions",
  "type": "caption",
  "style": "word-by-word",
  "segments": [
    {
      "text": "Hello world",
      "start": 0.0,
      "end": 1.2,
      "words": [
        { "word": "Hello", "start": 0.0, "end": 0.5 },
        { "word": "world", "start": 0.5, "end": 1.2 }
      ]
    }
  ]
}

Preview vs. Render

In the browser, captions are rendered as CSS-positioned divs synced to the video playback time. This is an approximation — the final render uses Puppeteer to render the caption template frame-by-frame for pixel-perfect output.


Workflow Builder

The Workflows tab provides a visual node graph UI for building and editing workflows. Inspired by n8n, it lets you construct editing pipelines visually.

Interface

Sidebar                    Canvas
─────────────────────────────────────────────────────
Native steps:              ┌──────────┐
  probe                    │  probe   ├──► ┌─────────────┐
  rm_fillers               └──────────┘    │  rm_fillers │
  waveform_trim                            └──────┬──────┘
  transcribe                                      │
  trim                                   click to configure:
  concat                                 sensitivity: [====|  ] 0.8
  resize                                 words: [um, uh, hmm]  + add
  caption
  ...

Custom steps:
  viral-hook-detector
  b-roll-inserter
  + New step

Building a Workflow

  1. Drag steps from the sidebar onto the canvas
  2. Connect nodes to define data flow (edges = needs dependencies)
  3. Click a node to configure its params — controls are rendered from the step's JSON schema
  4. Invalid connections (type mismatch) are rejected visually
  5. Save → writes workflows/<name>.json to disk
  6. Run → executes the workflow against the current clips

Step Discovery

The sidebar shows all available steps across all three scopes:

  • Native steps — built into Montaj
  • User-global steps — from ~/.montaj/steps/
  • Project-local steps — from ./steps/ in the current project

Custom steps appear automatically — no registration needed. Adding steps/my-step.py + steps/my-step.json makes it appear in the sidebar.

Node Configuration

Clicking a node opens its configuration panel. The controls are dynamically generated from the step's JSON schema:

  • String params → text inputs
  • Number params → number inputs or sliders
  • Enum params → dropdowns
  • Boolean params → toggles

Default values from the workflow file are pre-populated. Changes here update the params field in the workflow JSON.

CLI Equivalent

montaj workflow list              # list all available workflows
montaj workflow new <name>        # scaffold a new workflow file
montaj workflow edit <name>       # open in the node graph UI
montaj workflow run <name> ./clips --prompt "..."

Output Format

The node graph saves directly to the standard workflow JSON format:

{
  "name": "my-workflow",
  "description": "Custom editing pipeline",
  "steps": [
    { "id": "probe", "uses": "montaj/probe" },
    { "id": "silence", "uses": "montaj/waveform_trim", "foreach": "clips" },
    { "id": "transcribe", "uses": "montaj/transcribe", "needs": ["silence"] }
  ]
}

The saved file is immediately available to montaj run --workflow <name>.


Overlay Preview

The Overlays tab provides a live preview environment for custom JSX overlay components.

How It Works

  1. Select any overlay JSX file from the current project or global overlays
  2. The overlay is compiled and rendered at 1080 x 1920 (design resolution), scaled to fit the viewport
  3. A file watcher via SSE recompiles and rerenders automatically on every save
  4. Compile errors are displayed inline

Preview Pipeline

When montaj serve is running, the UI previews overlays and captions live in the browser via ui/src/lib/overlay-eval.ts:

  1. The JSX file is fetched from the filesystem
  2. Transpiled in-browser by @babel/standalone
  3. Called directly on every animation frame

This is an approximation — font rendering and CSS compositing differ slightly from the Puppeteer render environment. The render output is what matters.

Real-Time Iteration

The file watcher makes this a rapid iteration loop:

Edit JSX in your editor → save → UI recompiles → preview updates instantly

No manual refresh needed. The SSE connection detects the file change and triggers recompilation.

Component Globals

Overlay components have access to these globals at render time:

  • frame — current frame number
  • fps — frames per second (from project settings)
  • props — arbitrary data from the overlay item in project.json
  • interpolate(frame, inputRange, outputRange) — map frame number to any value
  • spring({ frame, fps, config }) — physics-based easing (mass, stiffness, damping)

Design Resolution

Overlays are always designed and rendered at 1080 x 1920 regardless of the output resolution. The render pipeline upscales at compose time (e.g., 2x for 4K output at 2160 x 3840).