UI Guide
The optional browser-based interface — upload, watch the agent work, review, and render.
UI Guide
The Montaj UI is an optional browser-based interface that wraps the entire pipeline. It does not replace the CLI — every action in the UI maps to a CLI command. The full pipeline works headlessly without it.
montaj serve # starts local server + opens http://localhost:3000Overview
The Four Modes
- Upload — drop clips, write a prompt, select a workflow, hit Run
- Live View — watch the agent build the edit in real time via SSE
- Review — adjust the timeline, captions, and overlays when the agent finishes
- Render — trigger the final render to MP4
Tabs
The UI has four top-level tabs:
| Tab | Description |
|---|---|
| Editor | Default view. Upload → live view → review flow. |
| Workflows | Node graph UI for building and editing workflows. |
| Overlays | Live preview environment for custom JSX overlay components. |
| Profiles | View and manage creator style profiles. |
How montaj serve Works
montaj serve is a thin local HTTP + SSE server — the bridge between the browser and the filesystem.
montaj serve
├── POST /api/run → receives clips + prompt + workflow, starts pipeline
├── GET /api/projects → list projects and their status
├── GET /api/projects/:id/stream → SSE stream of project.json changes
└── file watcher → watches workspace/ for project.json writes → SSEThe agent polls serve — serve does not notify the agent. The agent writes directly to disk. montaj serve watches. Every write immediately pushes to the browser.
Key Design Decisions
montaj serveis thin — no business logic, just file watching, SSE, and process spawning- Filesystem is the source of truth — agent writes project.json to disk, serve watches, browser reflects
- No frame-by-frame browser rendering — native
<video>+ CSS overlays for fast preview - project.json is the only state — all edits mutate JSON in memory; save writes to disk
- Every UI action has a CLI equivalent — the UI is a layer on top of the CLI
Tech Stack
The UI is built with Vite + React + Tailwind CSS. Source code lives in ui/ in the Montaj repository.
ui/
src/
app/
ProjectList.tsx # Project list (home)
editor/ # Editor tab
WorkflowsPage.tsx # Workflow node graph
overlays/ # JSX overlay preview
profiles/ # Creator style profiles
components/
PreviewPlayer.tsx # <video> + CSS overlay rendering
Timeline.tsx # Clip / caption / overlay tracks
NodeGraph.tsx # Workflow builder
lib/
project.ts # Read/write project.json
sse.ts # SSE client
overlay-eval.ts # In-browser JSX compilationUpload, Live View & Review
The Editor tab follows a three-phase flow: upload your clips, watch the agent work in real time, then review and adjust the result.
Upload
Drop clips, write a prompt, select a workflow, hit Run.
- Drag-and-drop clip upload (or file picker)
- Free-form prompt textarea: "tight cuts, remove filler, 9:16 for Reels"
- Workflow selector: choose from available workflows (native + custom)
- Run →
POST /api/runtomontaj serve→ pipeline starts immediately
Live View
As the agent works, the UI updates in real time.
montaj servewatchesproject.jsonfor any file change- Every write the agent makes — trim points added, clips reordered, captions cleaned — pushes to the browser via SSE
- Timeline rerenders on each update
- Preview player reflects the current state of the edit
- You watch the edit take shape as the agent builds it
Review
When the agent marks the project draft, the UI surfaces it for human adjustment.
Available Controls
- Full timeline with clip, caption, and overlay tracks
- Preview player: native
<video>+ CSS overlays synced to scrubber - Caption editor: click to edit text inline, drag to retime
- Overlay editor: add/remove/reposition title cards, lower thirds
- Prompt bar: modify the prompt and re-run the agent
- Save: writes updated
project.jsonto disk
Skipping Review
Review is optional — click Render directly from live view if the first pass is good enough.
Render
Triggers the render pass. Progress streams back via SSE. Final MP4 lands in the project workspace directory.
Preview vs. Final Render
The browser preview is an approximation — CSS overlays are close but not pixel-perfect to the final render. Font rendering and CSS compositing differ slightly from the Puppeteer environment. The render output is what matters.
The preview player uses:
- Native
<video>element - CSS-positioned overlays (absolutely positioned divs, shown/hidden by
currentTime) - Timeline scrubber synced to
video.currentTime
Timeline Editor
The timeline is the core review interface. It shows all tracks in the project — video clips, captions, and overlays — on a multi-track timeline.
Track Layout
Track 0 (video): [clip-0 ████████] [clip-1 ████████████]
Track 1 (overlays): [hook ███] [lower-third ██████] [outro █████]
Track 2 (captions): [captions ██████████████████████████████████]- Track 0 — primary video track (clips with
inPoint/outPointand timelinestart/end) - Track 1+ — overlay tracks (JSX overlays, images, video clips)
- Caption track — word-level caption segments
Interactions
Clip Track
- View clip arrangement and ordering
- See
inPoint/outPointboundaries for each clip - Clip positions map to the output timeline
Overlay Track
- Visual representation of each overlay's time window
- Overlays show their
start/endon the timeline
Caption Track
- Caption segments shown along the timeline
- Each segment displays its text content
Scrubber
The scrubber syncs with the preview player's video.currentTime. Moving the scrubber updates:
- The preview player position
- Which captions are visible
- Which overlays are shown
Undo Stack
The UI maintains an in-memory undo stack for the current review session. Every caption, overlay, or trim edit is undoable without touching disk. The undo stack is cleared on save or page reload.
Saving
Click Save to write the updated project.json to disk. This creates a git commit at the state transition for version history.
Caption Editor
The caption editor lets you refine captions after the agent generates them. Captions are stored as data (text + word-level timestamps) in project.json, not as rendered pixels.
Inline Text Editing
Click any caption segment to edit its text directly in the timeline or in the editor panel. Changes update the segments array in the caption track.
Retiming
Drag caption segments to adjust their timing. This updates the start/end values for the segment and the individual word timestamps within it.
Caption Styles
The agent chooses a caption style when running the caption step. You can change the style during review:
| Style | Description |
|---|---|
word-by-word | One word at a time, spring pop-in animation |
pop | Segment-at-a-time with scale entry |
karaoke | Words highlight progressively as they're spoken |
subtitle | Static line at bottom, segments replace sequentially |
Caption Data Format
Captions are stored inline in the project's tracks:
{
"id": "captions",
"type": "caption",
"style": "word-by-word",
"segments": [
{
"text": "Hello world",
"start": 0.0,
"end": 1.2,
"words": [
{ "word": "Hello", "start": 0.0, "end": 0.5 },
{ "word": "world", "start": 0.5, "end": 1.2 }
]
}
]
}Preview vs. Render
In the browser, captions are rendered as CSS-positioned divs synced to the video playback time. This is an approximation — the final render uses Puppeteer to render the caption template frame-by-frame for pixel-perfect output.
Workflow Builder
The Workflows tab provides a visual node graph UI for building and editing workflows. Inspired by n8n, it lets you construct editing pipelines visually.
Interface
Sidebar Canvas
─────────────────────────────────────────────────────
Native steps: ┌──────────┐
probe │ probe ├──► ┌─────────────┐
rm_fillers └──────────┘ │ rm_fillers │
waveform_trim └──────┬──────┘
transcribe │
trim click to configure:
concat sensitivity: [====| ] 0.8
resize words: [um, uh, hmm] + add
caption
...
Custom steps:
viral-hook-detector
b-roll-inserter
+ New stepBuilding a Workflow
- Drag steps from the sidebar onto the canvas
- Connect nodes to define data flow (edges =
needsdependencies) - Click a node to configure its params — controls are rendered from the step's JSON schema
- Invalid connections (type mismatch) are rejected visually
- Save → writes
workflows/<name>.jsonto disk - Run → executes the workflow against the current clips
Step Discovery
The sidebar shows all available steps across all three scopes:
- Native steps — built into Montaj
- User-global steps — from
~/.montaj/steps/ - Project-local steps — from
./steps/in the current project
Custom steps appear automatically — no registration needed. Adding steps/my-step.py + steps/my-step.json makes it appear in the sidebar.
Node Configuration
Clicking a node opens its configuration panel. The controls are dynamically generated from the step's JSON schema:
- String params → text inputs
- Number params → number inputs or sliders
- Enum params → dropdowns
- Boolean params → toggles
Default values from the workflow file are pre-populated. Changes here update the params field in the workflow JSON.
CLI Equivalent
montaj workflow list # list all available workflows
montaj workflow new <name> # scaffold a new workflow file
montaj workflow edit <name> # open in the node graph UI
montaj workflow run <name> ./clips --prompt "..."Output Format
The node graph saves directly to the standard workflow JSON format:
{
"name": "my-workflow",
"description": "Custom editing pipeline",
"steps": [
{ "id": "probe", "uses": "montaj/probe" },
{ "id": "silence", "uses": "montaj/waveform_trim", "foreach": "clips" },
{ "id": "transcribe", "uses": "montaj/transcribe", "needs": ["silence"] }
]
}The saved file is immediately available to montaj run --workflow <name>.
Overlay Preview
The Overlays tab provides a live preview environment for custom JSX overlay components.
How It Works
- Select any overlay JSX file from the current project or global overlays
- The overlay is compiled and rendered at 1080 x 1920 (design resolution), scaled to fit the viewport
- A file watcher via SSE recompiles and rerenders automatically on every save
- Compile errors are displayed inline
Preview Pipeline
When montaj serve is running, the UI previews overlays and captions live in the browser via ui/src/lib/overlay-eval.ts:
- The JSX file is fetched from the filesystem
- Transpiled in-browser by
@babel/standalone - Called directly on every animation frame
This is an approximation — font rendering and CSS compositing differ slightly from the Puppeteer render environment. The render output is what matters.
Real-Time Iteration
The file watcher makes this a rapid iteration loop:
Edit JSX in your editor → save → UI recompiles → preview updates instantlyNo manual refresh needed. The SSE connection detects the file change and triggers recompilation.
Component Globals
Overlay components have access to these globals at render time:
frame— current frame numberfps— frames per second (from project settings)props— arbitrary data from the overlay item in project.jsoninterpolate(frame, inputRange, outputRange)— map frame number to any valuespring({ frame, fps, config })— physics-based easing (mass, stiffness, damping)
Design Resolution
Overlays are always designed and rendered at 1080 x 1920 regardless of the output resolution. The render pipeline upscales at compose time (e.g., 2x for 4K output at 2160 x 3840).