Documentation
Overview
TwelveLabs Studio turns a folder of raw video into something you can search, understand, and edit. It pairs a native macOS app — where your library, metadata, and editing state live — with TwelveLabs models for the heavy intelligence (Marengo embeddings, Pegasus analyses, optional Jockey reasoning).
The mental model
Studio is organized around projects. A project is a workspace that holds a set of videos (its Library) plus everything derived from them: embeddings for search, Pegasus analyses, segmentations, Director runs, and a curation Timeline.
When you direct an edit, you pick a reasoning engine — Local or Jockey — on the Direct sheet. The choice is per run, not per project: the same project can run Local on one brief and Jockey on the next, in any order.
- Local — shots are detected with FFmpeg, embeddings come from TwelveLabs Embed v2 (Marengo) and live in Studio's local SQLite database, and the Director's planning runs through an LLM (Claude by default, via TwelveLabs Cloud) with each editorial agent visible step-by-step in the Job Monitor. Lower cost, more transparency.
- Jockey — reasoning runs through the Jockey Agents API in a single streaming master call over the project's TwelveLabs-hosted Knowledge Store. One-shot reasoning, fewer moving parts.
Studio's ingest pipeline is unified — every project provisions a Knowledge Store at creation and every asset is uploaded to TwelveLabs (the same /assets endpoint feeds Embed v2, Pegasus, and Jockey alike), so either engine can drive any run without re-processing. Both also go through the same deterministic enforcers post-pipeline (no mid-sentence cuts, ±10% duration, chronology re-sort), so the final output respects the same constraints regardless of which engine you ran. See Reasoning engines for the full side-by-side.
The six surfaces
Studio is organized around six core surfaces. In the order you'd use them: Library holds the footage, Search and Segmentation make it findable, Director auto-assembles edits from a brief, Timeline is where you refine (or hand-build) a cut, and Publish sends the result out.
- Library — import videos; Studio hashes, thumbnails, and indexes them.
- Search — type what you're looking for; get ranked clips you can play inline. Multi-modal (visual / audio / transcription) blending with per-modality weights you control.
- Segmentation — open an asset and break it into labeled beats (visual, spoken, structure, transcript, recap) or run a custom AI-authored profile.
- Director — write a brief, pick a template, and an editorial team of agents assembles a cut grounded by deterministic rules.
- Timeline — hand-pick clips from Search and Segmentation, then arrange, trim, split, and version your own edit. Same multi-track editor the Director uses, just driven by you.
- Publish — send a Director run or your Timeline to TwelveLabs Cloud as a public link, or export to an NLE / MP4.
Principles
A few ideas baked into how Studio is built:
- Library-aware editing. The Director doesn't see one clip at a time — it searches the whole library when assembling a cut. A trailer of "the conference" pulls the climactic moment, the right reaction shot, and the cleanest establishing frame from across the entire footage set.
- Content-addressable assets. Every video is keyed by SHA-256 of its bytes. Re-importing the same file (in any project, under any name) collapses to one asset row. You can move files on disk without re-indexing as long as Studio can still find them at a known location.
- Editing is versioned, not destructive. Every Director run, manual gesture, or (in Milestone 2) agent suggestion creates a new immutable timeline snapshot. Switch between versions, restore an earlier one, duplicate a version to branch — nothing is ever overwritten.
- Templates carry editorial DNA, not subject knowledge. A "Trailer" template knows about Hook → Setup → Build → Climax → Payoff. It doesn't know what a goal or a wedding is. The Story Editor reads each run's footage and figures out the content category itself. See Templates.
- Jobs over background tasks. Every long-running operation (ingest, shot detection, embedding, Pegasus analysis, Director run, publish) is a first-class
Jobin a persistent queue. They survive crashes, retry, dedupe on key, and surface in the Job Monitor — you can always see what's in flight, cancel it, and replay the log.
Where your data lives
~/Library/Application Support/TwelveLabsStudio/
studio.db ← all metadata (projects, assets, jobs, timelines, ...)
assets/{xx}/{hash}/
← per-asset artifacts (content-addressable)
thumbnail.jpg
shots/ ← per-shot frames
embeddings/ ← reserved for future on-disk vector store
renders/ ← cached MP4 renders for publish / export
templates/ ← your editorial template overrides (JSON)
credentials.plist ← file-backed Keychain replacement (ad-hoc signing rules out the real Keychain)- Original video files stay wherever you put them. Studio only records their paths and a content hash. Moving or renaming a file is safe as long as the new path is reachable.
- Secrets live in the Keychain-backed
credentials.plist. API keys (your Anthropic / OpenAI keys, the cached cloud-issued TL key, Cognito refresh tokens) never touchstudio.db. - Schema is validated on every launch. Studio compares the SQLite schema against an expected shape; any drift throws and the app refuses to run rather than corrupt data. Don't run an older build against a database a newer build upgraded — install the latest version.
What touches the network
Worth knowing the exact surface area:
- Model inference — embeddings (Marengo) and Pegasus analysis call TwelveLabs APIs. LLM planning calls Claude by default via TwelveLabs Cloud's Bedrock proxy (no Anthropic key needed); if you provide your own Anthropic or OpenAI key, calls go directly to that provider instead. Footage is uploaded to TwelveLabs at ingest time for every project (the unified
/assetsupload feeds Embed v2, Pegasus, and the Knowledge Store); the Jockey engine reasons over that same KS at run time. - Sign-in & telemetry — Studio authenticates against TwelveLabs Cloud at startup and sends pseudonymous usage events (event types + numerics + enums; no video, no file paths, no PII).
- Publishing — only when you explicitly publish: the rendered MP4 is uploaded to S3 and a public URL is created.
- Updates — a once-a-day check against the release portal for a new version. Manual checks are also possible (Settings → Account → About).
Beyond those four, nothing else leaves your Mac. The cloud-side analytics dashboard only ever sees aggregate event types — never your library or your edits.
Studio vs. a traditional NLE
Studio isn't trying to replace Final Cut Pro or DaVinci Resolve. It's a front-of-pipeline tool that turns a library into a draft. The typical workflow is to run Studio to get to a first cut, then export FCPXML / EDL into your NLE for finishing — color, sound design, motion graphics.
The framing is editing over your library, not editing over a timeline. Studio's job is reading a deep footage library and producing the short cut you actually want; refining that cut is what finishing tools are for.