MontajMontajdocs

Connectors

External API integrations — Kling, Gemini, OpenAI — and credential management.

Connectors

Connectors are Python modules in connectors/ that wrap external vendor APIs. They turn a vendor's SDK or HTTP endpoints into clean Python functions that Montaj steps can call.


Overview

The Layering Rule

Connectors are organized by vendor. Steps are organized by use case.

LayerOrganized byExample
connectors/<vendor>.pyVendor — one file per API keyconnectors/gemini.py handles video analysis + image gen + TTS + music
steps/<verb>_<noun>.pyUse case — one file per actionsteps/generate/generate_image.py dispatches to gemini or openai

A vendor like Gemini unlocks multiple use cases (video analysis, image generation, TTS, music generation) through one API key and one SDK. The connector owns auth, request shape, polling, and response normalization. The step owns the agent-facing interface.

Architecture

cli/commands/<step>.py     # thin argparse wrapper (agent-facing)
serve/server.py            # /api/steps/{name} dispatch (agent-facing)
mcp/server.js              # introspects CLI parsers (agent-facing)


steps/<category>/<verb>_<noun>.py  # argparse + fail() + stdout (one per use case)


connectors/<vendor>.py     # SDK/HTTP calls (one per vendor)


lib/credentials.py         # ~/.montaj/credentials.json + env override

Key Rules

  • Connectors are never agent-callable directly — workflows, CLI, HTTP API, and MCP all dispatch to steps
  • No vendor SDK at import time — imports are lazy, inside functions
  • Credentials only from lib.credentials — never read env vars directly
  • Errors via ConnectorError — step code catches and translates to fail()
  • Long operations block — connectors return when done, they don't return job IDs

Installing

montaj install connectors        # installs pyjwt, requests, google-genai, openai
montaj install credentials       # interactive setup for API keys

Current Connectors

VendorFunctionsStepsModel(s)Credentials
kling.pygenerate, generate_speechkling_generate, generate_voiceoverkling-v3-omni, kling-video-o1, kling-tts-v1kling.access_key, kling.secret_key
gemini.pyanalyze_media, generate_image, generate_speech, generate_musicanalyze_media, generate_image, generate_voiceover, generate_musicgemini-2.5-flash, gemini-3-pro-image-preview, gemini-2.5-flash-preview-tts, lyria-3-clip-previewgemini.api_key
openai.pygenerate_imagegenerate_imagegpt-image-1openai.api_key

Multi-Provider Steps

A single step can dispatch to multiple connectors. For example, steps/generate/generate_image.py dispatches to either connectors/gemini.py or connectors/openai.py based on a --provider flag:

montaj generate-image --prompt "portrait" --provider gemini --out portrait.png
montaj generate-image --prompt "portrait" --provider openai --out portrait.png

Same step name, same CLI interface, different backend. Similarly, generate_voiceover dispatches to Kling or Gemini TTS via --vendor.


Kling

Kling provides video generation and text-to-speech. Two video models are available.

Setup

montaj install connectors
montaj install credentials --provider kling --key access_key --value YOUR_ACCESS_KEY
montaj install credentials --provider kling --key secret_key --value YOUR_SECRET_KEY

Or interactively:

montaj install credentials   # select kling, enter keys

Credentials are stored in ~/.montaj/credentials.json (0600 permissions).

Credentials

KeyDescription
kling.access_keyKling API access key
kling.secret_keyKling API secret key

Get your keys at app.klingai.com.

Video Generation (generate)

Text-to-Video

montaj kling-generate \
  --prompt "a calico cat walking through a sunlit kitchen, cinematic" \
  --out /tmp/cat.mp4

Image-to-Video (First Frame)

montaj kling-generate \
  --prompt "slow zoom in" \
  --first-frame frame.png \
  --out /tmp/zoom.mp4

Image-to-Video (First + Last Frame)

montaj kling-generate \
  --prompt "character walks left" \
  --first-frame start.png \
  --last-frame end.png \
  --out /tmp/walk.mp4

Style Reference

montaj kling-generate \
  --prompt "same style" \
  --ref-image style1.png \
  --ref-image style2.png \
  --out /tmp/styled.mp4

Pro Mode

montaj kling-generate \
  --prompt "cinematic scene" \
  --out /tmp/pro.mp4 \
  --mode pro \
  --duration 10 \
  --aspect-ratio 9:16

Multi-Shot Batch

montaj kling-generate \
  --multi-shot \
  --shot-type customize \
  --multi-prompt '[{"index":1,"prompt":"scene 1","duration":"5"},{"index":2,"prompt":"scene 2","duration":"5"}]' \
  --out /tmp/batch.mp4

Video Parameters

ParamDefaultDescription
--promptrequiredGeneration prompt
--outrequiredOutput file path
--first-frame <img>Starting image for image-to-video
--last-frame <img>Ending image (requires --first-frame)
--ref-image <img>Reference image (repeatable, max 7)
--duration <3-15>Video duration in seconds
--negative-promptWhat to avoid in generation
--sound <on|off>Enable/disable sound
--aspect-ratio16:9, 9:16, 1:1
--mode <std|pro>stdStandard (cheaper/faster) or Pro (higher quality)
--model <name>autokling-v3-omni or kling-video-o1

Video Models

ModelDurationAudioMulti-shotNotes
kling-v3-omni3-15sYes (sound: "on")YesFlexible durations, audio generation
kling-video-o15s or 10s onlyNoNoHighest visual quality. End frame requires --mode pro.

The step auto-upgrades to kling-video-o1 when duration is 5/10 and sound is off.

Text-to-Speech (generate_speech)

montaj generate-voiceover --text "Welcome to our farm" --vendor kling --out /tmp/vo.mp3

Uses model kling-tts-v1. See the generate_voiceover step for full parameter documentation.


Gemini

The Gemini connector wraps Google's Gemini API for four use cases: media analysis, image generation, text-to-speech, and music generation.

Setup

montaj install connectors
montaj install credentials --provider gemini --key api_key --value YOUR_API_KEY

Or interactively:

montaj install credentials   # select gemini, enter key

Credentials

KeyDescription
gemini.api_keyGoogle Gemini API key

Get your key at ai.google.dev.

Media Analysis (analyze_media)

Analyze any media file — video, audio, or image — with a natural language prompt.

montaj analyze-media clip.mp4 --prompt "Describe the scene in 2 sentences."

montaj analyze-media song.mp3 --prompt "Transcribe with timestamps."

montaj analyze-media photo.jpg --prompt "Return JSON: {subject, mood, dominant_colors}" --json-output

montaj analyze-media clip.mp4 --prompt "..." --model gemini-2.5-pro
ParamDefaultDescription
<input>requiredMedia file (video, audio, image)
--promptrequiredAnalysis prompt
--modelgemini-2.5-flashModel override
--json-outputRequest structured JSON from the model
--outWrite output to file

Note: Images under approximately 18 MB take a fast inline path — no Files API round-trip needed.

Image Generation (generate_image)

Generate images from text prompts, optionally conditioned on reference images.

montaj generate-image --prompt "portrait, studio lighting" --out portrait.png

montaj generate-image --prompt "same character, profile view" --ref-image portrait.png --out profile.png

montaj generate-image --prompt "..." --provider gemini --aspect-ratio 9:16 --out tall.png
ParamDefaultDescription
--promptrequiredGeneration prompt
--outrequiredOutput file path
--ref-imageReference image (repeatable)
--aspect-ratioAspect ratio (Gemini-specific)
--modelgemini-3-pro-image-previewModel override

Text-to-Speech (generate_speech)

Generate speech audio from text.

montaj generate-voiceover --text "Welcome to our channel" --vendor gemini --voice Kore --out vo.wav
ParamDefaultDescription
--voiceKoreVoice name (Gemini voices: Kore, Puck, Charon, etc.)

Uses model gemini-2.5-flash-preview-tts.

Music Generation (generate_music)

Generate music from a text description using Lyria 3 Clip. Produces approximately 30 seconds of audio.

montaj generate-music --prompt "upbeat electronic, 120 bpm" --out music.wav

montaj generate-music --prompt "acoustic guitar, mellow" --with-vocals --out song.wav

Uses model lyria-3-clip-preview.

Models

Use CaseDefault Model
Media analysisgemini-2.5-flash
Image generationgemini-3-pro-image-preview
Text-to-speechgemini-2.5-flash-preview-tts
Music generationlyria-3-clip-preview

OpenAI

The OpenAI connector wraps OpenAI's image generation API.

Setup

montaj install connectors
montaj install credentials --provider openai --key api_key --value YOUR_API_KEY

Or interactively:

montaj install credentials   # select openai, enter key

Credentials

KeyDescription
openai.api_keyOpenAI API key

Get your key at platform.openai.com.

Image Generation (generate_image)

Generate images from text prompts, optionally with reference images.

montaj generate-image --prompt "red apple on white table" --provider openai --out apple.png

montaj generate-image --prompt "same scene, sunset" --provider openai --ref-image scene.png --out sunset.png
ParamDefaultDescription
--promptrequiredGeneration prompt
--outrequiredOutput file path
--providerMust be openai to use this connector
--ref-imageReference image (repeatable)
--size <WxH>Image dimensions
--modelgpt-image-1Model override

Gemini vs. OpenAI

The generate_image step supports both Gemini and OpenAI as providers. Choose based on:

  • Gemini — supports --aspect-ratio flag, good for specific dimensions
  • OpenAI — supports --size WxH flag, different artistic style
# Gemini (default)
montaj generate-image --prompt "portrait" --out portrait.png

# OpenAI
montaj generate-image --prompt "portrait" --provider openai --out portrait.png

Credentials

API credentials for external connectors live in ~/.montaj/credentials.json with 0600 permissions.

Installation Methods

Interactive

montaj install credentials
# Prompts for provider selection and key entry

Scripted (CI/Automation)

montaj install credentials --provider kling --key access_key --value YOUR_KEY
montaj install credentials --provider kling --key secret_key --value YOUR_KEY
montaj install credentials --provider gemini --key api_key --value YOUR_KEY
montaj install credentials --provider openai --key api_key --value YOUR_KEY

Check Status

montaj install credentials --list
# Shows set/unset status per provider

Credential Precedence

Each connector reads credentials via lib.credentials.get_credential(provider, key). The precedence order:

  1. Environment variable — e.g., KLING_ACCESS_KEY, GEMINI_API_KEY
  2. Credentials file~/.montaj/credentials.json
  3. Fail — with install instructions

Supported Providers

ProviderKeysEnvironment Variables
klingaccess_key, secret_keyKLING_ACCESS_KEY, KLING_SECRET_KEY
geminiapi_keyGEMINI_API_KEY
openaiapi_keyOPENAI_API_KEY

Credentials File Format

{
  "kling": {
    "access_key": "...",
    "secret_key": "..."
  },
  "gemini": {
    "api_key": "..."
  },
  "openai": {
    "api_key": "..."
  }
}

The file is stored at ~/.montaj/credentials.json with 0600 permissions (owner read/write only).

Using in CI/CD

For CI environments, use environment variables instead of the credentials file:

export KLING_ACCESS_KEY=...
export KLING_SECRET_KEY=...
export GEMINI_API_KEY=...
export OPENAI_API_KEY=...

Environment variables take precedence over the credentials file, so this works without any additional configuration.