Section 03
Library & semantic search
The Library is your project's set of videos and the home of semantic search. Behind it is a content-addressable index that makes every clip in your footage searchable down to the segment level.
Importing
- Drag & drop files or a folder onto the Library grid, or use the import button.
- Assets are content-addressable: each file is hashed with SHA-256 of its bytes. The same file imported twice — under a different name, in another project, or by another user on the same machine — collapses to one asset row. Re-importing is safe and free.
- Each asset records every disk path it's been seen at. Moving a file is fine as long as Studio can still find it at one known location. If every known path becomes invalid, the asset shows as "missing" until you re-add it from any path.
- The same asset can sit in multiple projects simultaneously. One content hash, one unified TwelveLabs
asset_id, one set of local embeddings — referenced from each project via theproject_assetsjunction (with a per-(project, asset) Knowledge Store item id). Either reasoning engine can drive runs against any project on the same shared corpus.
The ingest pipeline
With auto-process on (Settings → Embedding, default), each new asset moves through a defined sequence of jobs. The pipeline is unified — it doesn't branch on engine, because both the Local and Jockey reasoning engines need the same prep substrate to run. Every project provisions a Knowledge Store at creation and every asset is uploaded to TwelveLabs, so the engine choice (made per-run on the Direct sheet) has no effect on what runs at ingest.
You can watch each step in the Job Monitor — cancel, retry, or replay the per-job log inline.
- `INGEST_FILE` — hash, extract video metadata (duration, codec, resolution), generate a thumbnail via AVFoundation, write rows into
assets+asset_locations+project_assets. Posts.assetDidIngest. - `DETECT_SHOTS` — FFmpeg's
scdetfilter finds cut boundaries. Per-shot 240px JPEG thumbnails are written underassets/{xx}/{hash}/shots/. Thresholds tunable in Settings → Processing. - `JOCKEY_UPLOAD_ASSET` — uploads to TwelveLabs via the unified
POST /assetsendpoint (multipart above 200 MB), polls until ready, writes back the unifiedasset_idon the asset row. The sameasset_idfeeds embeddings, Pegasus, and Jockey — one upload, reused everywhere. Content-hash dedupe means re-importing the same file in any other project is a cache hit. - `EMBED_ASSET` — TwelveLabs Embed v2 (Marengo 3.0) via
/embed-v2/tasks, referencing the unifiedasset_id. Three modalities (visual, audio, transcription) per segment, stored locally in SQLite. - `ANALYZE_ASSET` — Pegasus 1.5 runs once per asset per analysis kind (recap, structure, visual, spoken, transcript). Five separate jobs per asset.
- `JOCKEY_ADD_TO_KS` — adds the asset to the project's Knowledge Store (
POST /knowledge-stores/{ksId}/items). One row per (project, asset); idempotent. Required for the Jockey engine to reason over the asset; harmless if you only ever run the Local engine on this project. Entity extraction (used by both engines) also iterates over KS items, so this step is also a prerequisite for entity-aware Local-engine runs.
With auto-enrich on (default), the post-prep enrichment layer then runs for every project:
- Asset insights — a per-(project, asset) web-grounded research dossier (named entities, references, surrounding context). Feeds into both the Story Editor (Local engine) and the Jockey master prompt.
- Entity extraction + web enrichment — named people, brands, locations across the library, with web-resolved canonical names + summary + image where confidence is high. Consumed by both engines.
The pipeline badge
Every asset card carries a small pipeline badge showing where the asset is in the post-ingest pipeline. The badge has 7 distinct states — shots, upload, embed, analyze, index, entities, insights — so you can scan a Library and see at a glance which assets are search-ready vs. fully enriched.
You don't have to wait for everything. The moment an asset has embeddings, it's searchable. The moment it has Pegasus analyses, it's segmentable. The moment it has insights, the Director has richer context to work from.
Cross-project reuse
Because assets are global (content-hash keyed) but projects are filters (through project_assets), the same expensive processing is shared across projects:
- Embedding a 90-minute conference talk once means it's searchable in every project you add it to.
- Pegasus analyses are per-asset, not per-project — the same
recapyou generated for project A is reused in project B. - Only the per-(project, asset) bits — like the Jockey KS-item id, or a display-name override — are re-derived per project.
Semantic search
Open Search (below the Library header). Type what you're looking for in plain language — "a wide shot of the stadium at sunset", "someone saying the word sustainability", "the crowd celebrating".
How search works (under the hood)
- Your query is embedded by the same Marengo model that embedded your library (visual + audio + transcription embedding vectors).
- Studio computes cosine similarity between the query vector and every segment vector in your project, per modality.
- Per-modality scores are blended using the weights in Settings → Embedding (defaults skew visual; transcription and audio fill in the rest, normalized).
- Results are sorted by the blended score and returned as clip cards — each card is one segment, playable inline.
Search always runs locally regardless of which Director engine you'd use to assemble — there's no network round-trip beyond the initial query-embedding call. The vDSP cosine kernel returns near-instantly for libraries of tens of thousands of segments.
Search scopes
The Advanced button opens a panel with the controls that focus the search. The Advanced button exposes:
- Per-modality scope toggles (visual / audio / transcript) — turn off the modalities you don't want to search over for this query.
- Per-page result count — 12, 24 (default), 48, or 96.
Working with results
- Each result is a clip card — a thumbnail you can play inline without leaving the page.
- Click to preview — a single click on a card starts inline silent playback; click again to stop. Double-click opens the full asset at that moment.
- The + on a card adds the clip straight to your project Timeline. A floating dock at the bottom shows your current assembly (count + total duration + a filmstrip) as you add clips.
Tuning search for your footage
Different content types favor different modality weights:
- Dialogue-heavy footage (interviews, conferences) — push transcription up, visual down. Most queries describe what's said.
- Action footage (sports, b-roll) — visual dominates; transcription can stay low.
- Music-driven content (montages, performance) — audio matters more than the defaults give it.
Adjust these weights in Settings → Embedding; they apply to every search.
If results feel off and weights aren't the issue, the most likely cause is that the assets you expect aren't fully embedded yet — check the pipeline badge or the Job Monitor.