Connectors
External API integrations — Kling, Gemini, OpenAI — and credential management.
Connectors
Connectors are Python modules in connectors/ that wrap external vendor APIs. They turn a vendor's SDK or HTTP endpoints into clean Python functions that Montaj steps can call.
Overview
The Layering Rule
Connectors are organized by vendor. Steps are organized by use case.
| Layer | Organized by | Example |
|---|---|---|
connectors/<vendor>.py | Vendor — one file per API key | connectors/gemini.py handles video analysis + image gen + TTS + music |
steps/<verb>_<noun>.py | Use case — one file per action | steps/generate/generate_image.py dispatches to gemini or openai |
A vendor like Gemini unlocks multiple use cases (video analysis, image generation, TTS, music generation) through one API key and one SDK. The connector owns auth, request shape, polling, and response normalization. The step owns the agent-facing interface.
Architecture
cli/commands/<step>.py # thin argparse wrapper (agent-facing)
serve/server.py # /api/steps/{name} dispatch (agent-facing)
mcp/server.js # introspects CLI parsers (agent-facing)
│
▼
steps/<category>/<verb>_<noun>.py # argparse + fail() + stdout (one per use case)
│
▼
connectors/<vendor>.py # SDK/HTTP calls (one per vendor)
│
▼
lib/credentials.py # ~/.montaj/credentials.json + env overrideKey Rules
- Connectors are never agent-callable directly — workflows, CLI, HTTP API, and MCP all dispatch to steps
- No vendor SDK at import time — imports are lazy, inside functions
- Credentials only from
lib.credentials— never read env vars directly - Errors via
ConnectorError— step code catches and translates tofail() - Long operations block — connectors return when done, they don't return job IDs
Installing
montaj install connectors # installs pyjwt, requests, google-genai, openai
montaj install credentials # interactive setup for API keysCurrent Connectors
| Vendor | Functions | Steps | Model(s) | Credentials |
|---|---|---|---|---|
kling.py | generate, generate_speech | kling_generate, generate_voiceover | kling-v3-omni, kling-video-o1, kling-tts-v1 | kling.access_key, kling.secret_key |
gemini.py | analyze_media, generate_image, generate_speech, generate_music | analyze_media, generate_image, generate_voiceover, generate_music | gemini-2.5-flash, gemini-3-pro-image-preview, gemini-2.5-flash-preview-tts, lyria-3-clip-preview | gemini.api_key |
openai.py | generate_image | generate_image | gpt-image-1 | openai.api_key |
Multi-Provider Steps
A single step can dispatch to multiple connectors. For example, steps/generate/generate_image.py dispatches to either connectors/gemini.py or connectors/openai.py based on a --provider flag:
montaj generate-image --prompt "portrait" --provider gemini --out portrait.png
montaj generate-image --prompt "portrait" --provider openai --out portrait.pngSame step name, same CLI interface, different backend. Similarly, generate_voiceover dispatches to Kling or Gemini TTS via --vendor.
Kling
Kling provides video generation and text-to-speech. Two video models are available.
Setup
montaj install connectors
montaj install credentials --provider kling --key access_key --value YOUR_ACCESS_KEY
montaj install credentials --provider kling --key secret_key --value YOUR_SECRET_KEYOr interactively:
montaj install credentials # select kling, enter keysCredentials are stored in ~/.montaj/credentials.json (0600 permissions).
Credentials
| Key | Description |
|---|---|
kling.access_key | Kling API access key |
kling.secret_key | Kling API secret key |
Get your keys at app.klingai.com.
Video Generation (generate)
Text-to-Video
montaj kling-generate \
--prompt "a calico cat walking through a sunlit kitchen, cinematic" \
--out /tmp/cat.mp4Image-to-Video (First Frame)
montaj kling-generate \
--prompt "slow zoom in" \
--first-frame frame.png \
--out /tmp/zoom.mp4Image-to-Video (First + Last Frame)
montaj kling-generate \
--prompt "character walks left" \
--first-frame start.png \
--last-frame end.png \
--out /tmp/walk.mp4Style Reference
montaj kling-generate \
--prompt "same style" \
--ref-image style1.png \
--ref-image style2.png \
--out /tmp/styled.mp4Pro Mode
montaj kling-generate \
--prompt "cinematic scene" \
--out /tmp/pro.mp4 \
--mode pro \
--duration 10 \
--aspect-ratio 9:16Multi-Shot Batch
montaj kling-generate \
--multi-shot \
--shot-type customize \
--multi-prompt '[{"index":1,"prompt":"scene 1","duration":"5"},{"index":2,"prompt":"scene 2","duration":"5"}]' \
--out /tmp/batch.mp4Video Parameters
| Param | Default | Description |
|---|---|---|
--prompt | required | Generation prompt |
--out | required | Output file path |
--first-frame <img> | — | Starting image for image-to-video |
--last-frame <img> | — | Ending image (requires --first-frame) |
--ref-image <img> | — | Reference image (repeatable, max 7) |
--duration <3-15> | — | Video duration in seconds |
--negative-prompt | — | What to avoid in generation |
--sound <on|off> | — | Enable/disable sound |
--aspect-ratio | — | 16:9, 9:16, 1:1 |
--mode <std|pro> | std | Standard (cheaper/faster) or Pro (higher quality) |
--model <name> | auto | kling-v3-omni or kling-video-o1 |
Video Models
| Model | Duration | Audio | Multi-shot | Notes |
|---|---|---|---|---|
kling-v3-omni | 3-15s | Yes (sound: "on") | Yes | Flexible durations, audio generation |
kling-video-o1 | 5s or 10s only | No | No | Highest visual quality. End frame requires --mode pro. |
The step auto-upgrades to kling-video-o1 when duration is 5/10 and sound is off.
Text-to-Speech (generate_speech)
montaj generate-voiceover --text "Welcome to our farm" --vendor kling --out /tmp/vo.mp3Uses model kling-tts-v1. See the generate_voiceover step for full parameter documentation.
Gemini
The Gemini connector wraps Google's Gemini API for four use cases: media analysis, image generation, text-to-speech, and music generation.
Setup
montaj install connectors
montaj install credentials --provider gemini --key api_key --value YOUR_API_KEYOr interactively:
montaj install credentials # select gemini, enter keyCredentials
| Key | Description |
|---|---|
gemini.api_key | Google Gemini API key |
Get your key at ai.google.dev.
Media Analysis (analyze_media)
Analyze any media file — video, audio, or image — with a natural language prompt.
montaj analyze-media clip.mp4 --prompt "Describe the scene in 2 sentences."
montaj analyze-media song.mp3 --prompt "Transcribe with timestamps."
montaj analyze-media photo.jpg --prompt "Return JSON: {subject, mood, dominant_colors}" --json-output
montaj analyze-media clip.mp4 --prompt "..." --model gemini-2.5-pro| Param | Default | Description |
|---|---|---|
<input> | required | Media file (video, audio, image) |
--prompt | required | Analysis prompt |
--model | gemini-2.5-flash | Model override |
--json-output | — | Request structured JSON from the model |
--out | — | Write output to file |
Note: Images under approximately 18 MB take a fast inline path — no Files API round-trip needed.
Image Generation (generate_image)
Generate images from text prompts, optionally conditioned on reference images.
montaj generate-image --prompt "portrait, studio lighting" --out portrait.png
montaj generate-image --prompt "same character, profile view" --ref-image portrait.png --out profile.png
montaj generate-image --prompt "..." --provider gemini --aspect-ratio 9:16 --out tall.png| Param | Default | Description |
|---|---|---|
--prompt | required | Generation prompt |
--out | required | Output file path |
--ref-image | — | Reference image (repeatable) |
--aspect-ratio | — | Aspect ratio (Gemini-specific) |
--model | gemini-3-pro-image-preview | Model override |
Text-to-Speech (generate_speech)
Generate speech audio from text.
montaj generate-voiceover --text "Welcome to our channel" --vendor gemini --voice Kore --out vo.wav| Param | Default | Description |
|---|---|---|
--voice | Kore | Voice name (Gemini voices: Kore, Puck, Charon, etc.) |
Uses model gemini-2.5-flash-preview-tts.
Music Generation (generate_music)
Generate music from a text description using Lyria 3 Clip. Produces approximately 30 seconds of audio.
montaj generate-music --prompt "upbeat electronic, 120 bpm" --out music.wav
montaj generate-music --prompt "acoustic guitar, mellow" --with-vocals --out song.wavUses model lyria-3-clip-preview.
Models
| Use Case | Default Model |
|---|---|
| Media analysis | gemini-2.5-flash |
| Image generation | gemini-3-pro-image-preview |
| Text-to-speech | gemini-2.5-flash-preview-tts |
| Music generation | lyria-3-clip-preview |
OpenAI
The OpenAI connector wraps OpenAI's image generation API.
Setup
montaj install connectors
montaj install credentials --provider openai --key api_key --value YOUR_API_KEYOr interactively:
montaj install credentials # select openai, enter keyCredentials
| Key | Description |
|---|---|
openai.api_key | OpenAI API key |
Get your key at platform.openai.com.
Image Generation (generate_image)
Generate images from text prompts, optionally with reference images.
montaj generate-image --prompt "red apple on white table" --provider openai --out apple.png
montaj generate-image --prompt "same scene, sunset" --provider openai --ref-image scene.png --out sunset.png| Param | Default | Description |
|---|---|---|
--prompt | required | Generation prompt |
--out | required | Output file path |
--provider | — | Must be openai to use this connector |
--ref-image | — | Reference image (repeatable) |
--size <WxH> | — | Image dimensions |
--model | gpt-image-1 | Model override |
Gemini vs. OpenAI
The generate_image step supports both Gemini and OpenAI as providers. Choose based on:
- Gemini — supports
--aspect-ratioflag, good for specific dimensions - OpenAI — supports
--size WxHflag, different artistic style
# Gemini (default)
montaj generate-image --prompt "portrait" --out portrait.png
# OpenAI
montaj generate-image --prompt "portrait" --provider openai --out portrait.pngCredentials
API credentials for external connectors live in ~/.montaj/credentials.json with 0600 permissions.
Installation Methods
Interactive
montaj install credentials
# Prompts for provider selection and key entryScripted (CI/Automation)
montaj install credentials --provider kling --key access_key --value YOUR_KEY
montaj install credentials --provider kling --key secret_key --value YOUR_KEY
montaj install credentials --provider gemini --key api_key --value YOUR_KEY
montaj install credentials --provider openai --key api_key --value YOUR_KEYCheck Status
montaj install credentials --list
# Shows set/unset status per providerCredential Precedence
Each connector reads credentials via lib.credentials.get_credential(provider, key). The precedence order:
- Environment variable — e.g.,
KLING_ACCESS_KEY,GEMINI_API_KEY - Credentials file —
~/.montaj/credentials.json - Fail — with install instructions
Supported Providers
| Provider | Keys | Environment Variables |
|---|---|---|
kling | access_key, secret_key | KLING_ACCESS_KEY, KLING_SECRET_KEY |
gemini | api_key | GEMINI_API_KEY |
openai | api_key | OPENAI_API_KEY |
Credentials File Format
{
"kling": {
"access_key": "...",
"secret_key": "..."
},
"gemini": {
"api_key": "..."
},
"openai": {
"api_key": "..."
}
}The file is stored at ~/.montaj/credentials.json with 0600 permissions (owner read/write only).
Using in CI/CD
For CI environments, use environment variables instead of the credentials file:
export KLING_ACCESS_KEY=...
export KLING_SECRET_KEY=...
export GEMINI_API_KEY=...
export OPENAI_API_KEY=...Environment variables take precedence over the credentials file, so this works without any additional configuration.