sábado, abril 5, 2025
HomeBig DataOptimize multimodal search using the TwelveLabs Embed API and Amazon OpenSearch Service

Optimize multimodal search using the TwelveLabs Embed API and Amazon OpenSearch Service


This blog is co-authored by James Le, Head of Developer Experience – TwelveLabs

The exponential growth of video content has created both opportunities and challenges. Content creators, marketers, and researchers are now faced with the daunting task of efficiently searching, analyzing, and extracting valuable insights from vast video libraries. Traditional search methods such as keyword-based text search often fall short when dealing with video content to analyze the visual content, spoken words, or contextual elements within the video itself, leaving organizations struggling to effectively search through and unlock the full potential of their multimedia assets.

With the integration of TwelveLabs’ Embed API and Amazon OpenSearch Service, we can interact with and derive value from video content. By using TwelveLabs‘ advanced AI-powered video understanding technology and OpenSearch Service’s search and analytics capabilities, we can now perform advanced video discovery and gain deeper insights.

In this blog post, we show you the process of integrating TwelveLabs Embed API with OpenSearch Service to create a multimodal search solution. You’ll learn how to generate rich, contextual embeddings from video content and use OpenSearch Service’s vector database capabilities to enable search functionalities. By the end of this post, you’ll be equipped with the knowledge to implement a system that can transform the way your organization handles and extracts value from video content.

TwelveLabs’ multimodal embeddings process visual, audio, and text signals together to create unified representations, capturing the direct relationships between these modalities. This unified approach delivers precise, context-aware video search that matches human understanding of video content. Whether you’re a developer looking to enhance your applications with advanced video search capabilities, or a business leader seeking to optimize your content management strategies, this post will provide you with the tools and steps to implement multimodal search for your organizational data.

About TwelveLabs

TwelveLabs is an Advanced AWS Partner and AWS Marketplace Seller that offers video understanding solutions. Embed API is designed to revolutionize how you interact with and extract value from video content.

At its core, the Embed API transforms raw video content into meaningful, searchable data by using state-of-the-art machine learning models. These models extract and represent complex video information in the form of dense vector embeddings, each a standard 1024-dimensional vector that captures the essence of the video content across multiple modalities (image, text, and audio).

Key features of TwelveLabs Embed API

Below are the key features of TwelveLabs Embed API:

  • Multimodal understanding: The API generates embeddings that encapsulate various aspects of the video, including visual expressions, body language, spoken words, and overall context.
  • Temporal coherence: Unlike static image-based models, TwelveLabs’ embeddings capture the interrelations between different modalities over time, providing a more accurate representation of video content.
  • Flexibility: The API supports native processing of all modalities present in videos, eliminating the need for separate text-only or image-only models.
  • High performance: By using a video-native approach, the Embed API provides more accurate and temporally coherent interpretation of video content compared to traditional CLIP-like models.

Benefits and use cases

The Embed API offers numerous advantages for developers and businesses working with video content:

  • Enhanced Search Capabilities: Enable powerful multimodal search across video libraries, allowing users to find relevant content based on visual, audio, or textual queries.
  • Content Recommendation: Improve content recommendation systems by understanding the deep contextual similarities between videos.
  • Scene Detection and Segmentation: Automatically detect and segment different scenes within videos for easier navigation and analysis.
  • Content Moderation: Efficiently identify and flag inappropriate content across large video datasets.

Use cases include:

  • Anomaly detection
  • Diversity sorting
  • Sentiment analysis
  • Recommendations

Architecture overview

The architecture for using TwelveLabs Embed API and OpenSearch Service for advanced video search consists of the following components:

  • TwelveLabs Embed API: This API generates 1024-dimensional vector embeddings from video content, capturing visual, audio, and textual elements.
  • OpenSearch Vector Database: Stores and indexes the video embeddings generated by TwelveLabs.
  • Secrets Manager to store secrets such as API access keys, and the Amazon OpenSearch Service username and password.
  • Integration of TwelveLabs SDK and the OpenSearch Service client to process videos, generate embeddings, and index them in OpenSearch Service.

The following diagram illustrates:

  1. A video file is stored in Amazon Simple Storage Service (Amazon S3). Embeddings of the video file are created using TwelveLabs Embed API.
  2. Embeddings generated from the TwelveLabs Embed API are now ingested to Amazon OpenSearch Service.
  3. Users can search the video embeddings using text, audio, or image. The user uses TwelveLabs Embed API to create the corresponding embeddings.
  4. The user searches video embeddings in Amazon OpenSearch Service and retrieves the corresponding vector.

The use case

For the demo, you will work on these videos: Robin bird forest Video by Federico Maderno from Pixabay and Island Video by Bellergy RC from Pixabay.

However, the use case can be expanded to various other segments. For example, the news organization struggles with:

  1. Needle-in-haystack searches through thousands of hours of archival footage
  2. Manual metadata tagging that misses nuanced visual and audio context
  3. Cross-modal queries such as querying a video collection using text or audio descriptions
  4. Rapid content retrieval for breaking news tie-ins

By integrating TwelveLabs Embed API with OpenSearch Service, you can:

  • Generate 1024-dimensional embeddings capturing each video’s visual concepts. The embeddings are also capable of extracting spoken narration, on-screen text, and audio cues.
  • Enable multimodal search capabilities allowing users to:
    • Find specific demonstrations using text-based queries.
    • Locate activities through image-based queries.
    • Identify segments using audio pattern matching.
  • Reduce search time from hours to seconds for complex queries.

Solution walkthrough

GitHub repository contains a notebook with detailed walkthrough instructions for implementing advanced video search capabilities by combining TwelveLabs’ Embed API with Amazon OpenSearch Service.

Prerequisites

Before you proceed further, verify that the following prerequisites are met:

  • Confirm that you have an AWS account. Sign in to the AWS account.
  • Create a TwelveLabs account because it will be required to get the API Key. TwelveLabs offer free tier pricing but you can upgrade if necessary to meet your requirement.
  • Have an Amazon OpenSearch Service domain. If you don’t have an existing domain, you can create one using the steps outlined in our public documentation for Creating and Managing Amazon OpenSearch Service Domain. Make sure that the OpenSearch Service domain is accessible from your Python environment. You can also use Amazon OpenSearch Serverless for this use case and update the interactions to OpenSearch Serverless using AWS SDKs.

Step 1: Set up the TwelveLabs SDK

Start by setting up the TwelveLabs SDK in your Python environment:

  1. Obtain your API key from TwelveLabs Dashboard.
  2. Follow steps here to create a secret in AWS Secrets Manager. For example, name the secret as TL_API_Key.Note down the ARN or name of the secret (TL_API_Key) to retrieve. To retrieve a secret from another account, you must use an ARN.For an ARN, we recommend that you specify a complete ARN rather than a partial ARN. See Finding a secret from a partial ARN.Use this value for the SecretId in the code block below.
import boto3
import json
secrets_manager_client=boto3.client("secretsmanager")
API_secret=secrets_manager_client.get_secret_value(
SecretId="TL_API_KEY"
)
TL_API_KEY=json.loads(API_secret["SecretString"])["TL_API_Key"]

Step 2: Generate video embeddings

Use the Embed API to create multimodal embeddings that are contextual vector representations for your videos and texts. TwelveLabs video embeddings capture all the subtle cues and interactions between different modalities, including the visual expressions, body language, spoken words, and the overall context of the video, encapsulating the essence of all these modalities and their interrelations over time.

To create video embeddings, you must first upload your videos, and the platform must finish processing them. Uploading and processing videos require some time. Consequently, creating embeddings is an asynchronous process comprised of three steps:

  1. Upload and process a video: When you start uploading a video, the platform creates a video embedding task and returns its unique task identifier.
  2. Monitor the status of your video embedding task: Use the unique identifier of your task to check its status periodically until it’s completed.
  3. Retrieve the embeddings: After the video embedding task is completed, retrieve the video embeddings by providing the task identifier. Learn more in the docs.

Video processing implementation

This demo depends upon some video data. To use this, you will download two mp4 files and upload it to an Amazon S3 bucket.

  1. Click on the links containing the Robin bird forest Video by Federico Maderno from Pixabay and Island Video by Bellergy RC from Pixabay videos.
  2. Download the 21723-320725678_small.mp4 and 2946-164933125_small.mp4 files.
  3. Create an S3 bucket if you don’t have one already. Follow the steps in the Creating a bucket doc. Note down the name of the bucket and replace it the code block below (Eg., MYS3BUCKET).
  4. Upload the 21723-320725678_small.mp4 and 2946-164933125_small.mp4 video files to the S3 bucket created in the step above by following the steps in the Uploading objects doc. Note down the name of the objects and replace it the code block below (Eg., 21723-320725678_small.mp4 and 2946-164933125_small.mp4)
s3_client=boto3.client("s3")
bird_video_data=s3_client.download_file(Bucket="MYS3BUCKET",  Key='21723-320725678_small.mp4', Filename="robin-bird.mp4")
island_video_data=s3_client.download_file(Bucket="MYS3BUCKET",  Key='2946-164933125_small.mp4', Filename="island.mp4")

def print_segments(segments: List[SegmentEmbedding], max_elements: int = 1024):
    for segment in segments:
        print(
            f"  embedding_scope={segment.embedding_scope} start_offset_sec={segment.start_offset_sec} end_offset_sec={segment.end_offset_sec}"
        )
        print(f"  embeddings: {segment.embeddings_float[:max_elements]}")

# Initialize client with API key
twelvelabs_client = TwelveLabs(api_key=TL_API_KEY)

video_files=["robin-bird.mp4", "island.mp4"]
tasks=[]

Embedding generation process

With the SDK configured, generate embeddings for your video and monitor task completion with real-time updates. Here you use the Marengo 2.7 model to generate the embeddings:

for video in video_files:
    # Create embedding task
    task = twelvelabs_client.embed.task.create(
        model_name="Marengo-retrieval-2.7",
        video_file=video
    )
    print(
        f"Created task: id={task.id} engine_name={task.model_name} status={task.status}"
    )
    
    def on_task_update(task: EmbeddingsTask):
        print(f"  Status={task.status}")
    
    status = task.wait_for_done(
        sleep_interval=2,
        callback=on_task_update
    )
    print(f"Embedding done: {status}")
    
    # Retrieve and inspect results
    task = task.retrieve()
    if task.video_embedding is not None and task.video_embedding.segments is not None:
        print_segments(task.video_embedding.segments)
    tasks.append(task)

Key features demonstrated include:

  • Multimodal capture: 1024-dimensional vectors encoding visual, audio, and textual features
  • Model specificity: Using Marengo-retrieval-2.7 optimized for scientific content
  • Progress tracking: Real-time status updates during embedding generation

Expected output

Created task: id=67ca93a989d8a564e80dc3ba engine_name=Marengo-retrieval-2.7 status=processing
  Status=processing
  Status=processing
  Status=processing
  Status=processing
  Status=processing
  Status=processing
  Status=processing
  Status=processing
  Status=processing
  Status=ready
Embedding done: ready
  embedding_scope=clip start_offset_sec=0.0 end_offset_sec=6.0
  embeddings: [0.022429451, 0.00040668788, -0.01825908, -0.005862708, -0.03371106, 
-6.357456e-05, -0.015320076, -0.042556215, -0.02782445, -0.00019097517, 0.03258314, 
-0.0061399476, -0.00049206393, 0.035632476, 0.028209884, 0.02875258, -0.035486065, 
-0.11288028, -0.040782217, -0.0359422, 0.015908664, -0.021092793, 0.016303983, 
0.06351931,…………………

Step 3: Set up OpenSearch

To enable vector search capabilities, you first need to set up an OpenSearch client and test the connection. Follow these steps:

Install the required libraries

Install the necessary Python packages for working with OpenSearch:

!pip install opensearch-py
!pip install botocore
!pip install requests-aws4auth

Configure the OpenSearch client

Set up the OpenSearch client with your host details and authentication credentials:

from opensearchpy import OpenSearch, RequestsHttpConnection, helpers
from requests_aws4auth import AWS4Auth
from requests.auth import HTTPBasicAuth

# OpenSearch connection configuration
# host="your-host.aos.us-east-1.on.aws"
host="search-new-domain-mbgs7wth6r5w6hwmjofntiqcge.aos.us-east-1.on.aws"
port = 443  # Default HTTPS port

# Get OpenSearch username secret from Secrets Manager
opensearch_username=secrets_manager_client.get_secret_value(
    SecretId="AOS_username"
)
opensearch_username_string=json.loads(opensearch_username["SecretString"])["AOS_username"]

# Get OpenSearch password secret from Secrets Manager
opensearch_password = secrets_manager_client.get_secret_value(
    SecretId="AOS_password"
)
opensearch_password_string=json.loads(opensearch_password["SecretString"])["AOS_password"]

auth=(opensearch_username_string, opensearch_password_string)

# Create the client configuration
client_aos = OpenSearch(
    hosts=[{'host': host, 'port': port}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection
)

# Test the connection
try:
    # Get cluster information
    cluster_info = client_aos.info()
    print("Successfully connected to OpenSearch")
    print(f"Cluster info: {cluster_info}")
except Exception as e:
    print(f"Connection failed: {str(e)}")

Expected output

If the connection is successful, you should see a message like the following:

Successfully connected to OpenSearch
Cluster info: {'name': 'bb36e8d98ee7bd517891ecd714bfb9d7', ...}

This confirms that your OpenSearch client is properly configured and ready for use.

Step 4: Create an index in OpenSearch Service

Next, you create an index optimized for vector search to store the embeddings generated by the TwelveLabs Embed API.

Define the index configuration

The index is configured to support k-nearest neighbor (kNN) search with a 1024-dimensional vector field. You will these values for this demo but follow these best practices to find appropriate values for your application. Here’s the code:

# Define the enhanced index configuration
index_name="twelvelabs_index"
new_vector_index_definition = {
    "settings": {
        "index": {
            "knn": "true",
            "number_of_shards": 1,
            "number_of_replicas": 0
        }
    },
    "mappings": {
        "properties": {
            "embedding_field": {
                "type": "knn_vector",
                "dimension": 1024
            },
            "video_title": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword"
                    }
                }
            },
            "segment_start": {
                "type": "date"
            },
            "segment_end": {
                "type": "date"
            },
            "segment_id": {
                "type": "text"
            }
        }
    }
}

Create the Index

Use the following code to create the index in OpenSearch Service:

# Create the index in OpenSearch
response = client_aos.indices.create(index=index_name, body=new_vector_index_definition, ignore=400)

# Retrieve and display index details to confirm creation
index_info = client_aos.indices.get(index=index_name)
print(index_info)

Expected output

After running this code, you should see details of the newly created index. For example:

{'twelvelabs_index': {'aliases': {}, 'mappings': {'properties': {'embedding_field': {'type': 'knn_vector', 'dimension': 1024}}}, 'settings': {...}}}

The following screenshot confirms that an index named twelvelabs_index has been successfully created with a knn_vector field of dimension 1024 and other specified settings. With these steps completed, you now have an operational OpenSearch Service domain configured for vector search. This index will serve as the repository for storing embeddings generated from video content, enabling advanced multimodal search capabilities.

Step 5: Ingest embeddings to the created index in OpenSearch Service

With the TwelveLabs Embed API successfully generating video embeddings and the OpenSearch Service index configured, the next step is to ingest these embeddings into the index. This process helps ensure that the embeddings are stored in OpenSearch Service and made searchable for multimodal queries.

Embedding ingestion process

The following code demonstrates how to process and index the embeddings into OpenSearch Service:

from opensearchpy.helpers import bulk

def generate_actions(tasks, video_files):
    count = 0
    for task in tasks:
        # Check if video embeddings are available
        if task.video_embedding is not None and task.video_embedding.segments is not None:
            embeddings_doc = task.video_embedding.segments
            
            # Generate actions for bulk indexing
            for doc_id, elt in enumerate(embeddings_doc):
                yield {
                    '_index': index_name,
                    '_id': doc_id,
                    '_source': {
                        'embedding_field': elt.embeddings_float,
                        'video_title': video_files[count],
                        'segment_start': elt.start_offset_sec,
                        'segment_end': elt.end_offset_sec,
                        'segment_id': doc_id
                    }
                }
        print(f"Prepared bulk indexing data for task {count}")
        count += 1

# Perform bulk indexing
try:
    success, failed = bulk(client_aos, generate_actions(tasks, video_files))
    print(f"Successfully indexed {success} documents")
    if failed:
        print(f"Failed to index {len(failed)} documents")
except Exception as e:
    print(f"Error during bulk indexing: {e}")

Explanation of the code

  1. Embedding extraction: The video_embedding.segments object contains a list of segment embeddings generated by the TwelveLabs Embed API. Each segment represents a specific portion of the video.
  2. Document creation: For each segment, a document is created with a key (embedding_field) that stores its 1024-dimensional vector, video_title with the title of the video, segment_start and segment_end indicating the timestamp of the video segment, and a segment_id.
  3. Indexing in OpenSearch: The index() method uploads each document to the twelvelabs_index created earlier. Each document is assigned a unique ID (doc_id) based on its position in the list.

Expected output

After the script runs successfully, you will see:

  • A printed list of embeddings being indexed.
  • A confirmation message:
Prepared bulk indexing data for task 0
Prepared bulk indexing data for task 1
Successfully indexed 6 documents

Result

At this stage, all video segment embeddings are now stored in OpenSearch and ready for advanced multimodal search operations, such as text-to-video or image-to-video queries. This sets up the foundation for performing efficient and scalable searches across your video content.

Step 6: Perform vector search in OpenSearch Service

After embeddings are generated, you use it as a query vector to perform a kNN search in the OpenSearch Service index. Below are the functions to perform vector search and format the search results:

# Function to perform vector search
def search_similar_segments(query_vector, k=5):
    query = {
        "size": k,
        "_source": ["video_title", "segment_start", "segment_end", "segment_id"],
        "query": {
            "knn": {
                "embedding_field": {
                    "vector": query_vector,
                    "k": k
                }
            }
        }
    }
    
    response = client_aos.search(
        index=index_name,
        body=query
    )

    results = []
    for hit in response['hits']['hits']:
        result = {
            'score': hit['_score'],
            'title': hit['_source']['video_title'],
            'start_time': hit['_source']['segment_start'],
            'end_time': hit['_source']['segment_end'],
            'segment_id': hit['_source']['segment_id']
        }
        results.append(result)

    return (results)

# Function to format search results
def print_search_results(results):
    print("\nSearch Results:")
    print("-" * 50)
    for i, result in enumerate(results, 1):
        print(f"\nResult {i}:")
        print(f"Video: {result['title']}")
        print(f"Time Range: {result['start_time']} - {result['end_time']}")
        print(f"Similarity Score: {result['score']:.4f}")

Key points:

  • The _source field contains the video title, segment start, segment end, and segment id corresponding to the video embeddings.
  • The embedding_field in the query corresponds to the field where video embeddings are stored.
  • The k parameter specifies how many top results to retrieve based on similarity.

Step 7:Performing text-to-video search

You can use text-to-video search to retrieve video segments that are most relevant to a given textual query. In this solution, you will do this by using TwelveLabs’ text embedding capabilities and OpenSearch’s vector search functionality. Here’s how you can implement this step:

Generate text embeddings

To perform a search, you first need to convert the text query into a vector representation using the TwelveLabs Embed API:

from typing import List
from twelvelabs.models.embed import SegmentEmbedding

def print_segments(segments: List[SegmentEmbedding], max_elements: int = 1024):
    for segment in segments:
        print(
            f"  embedding_scope={segment.embedding_scope} start_offset_sec={segment.start_offset_sec} end_offset_sec={segment.end_offset_sec}"
        )
        print(f"  embeddings: {segment.embeddings_float[:max_elements]}")

# Create text embeddings for the query
text_res = twelvelabs_client.embed.create(
    model_name="Marengo-retrieval-2.7",
    text="Bird eating food",  # Replace with your desired query
)

print("Created a text embedding")
print(f" Model: {text_res.model_name}")

# Extract and inspect the generated text embeddings
if text_res.text_embedding is not None and text_res.text_embedding.segments is not None:
    print_segments(text_res.text_embedding.segments)

vector_search = text_res.text_embedding.segments[0].embeddings_float
print("Generated Text Embedding Vector:", vector_search)

Key points:

  • The Marengo-retrieval-2.7 model is used to generate a dense vector embedding for the query.
  • The embedding captures the semantic meaning of the input text, enabling effective matching with video embeddings.

Perform vector search in OpenSearch Service

After the text embedding is generated, you use it as a query vector to perform a kNN search in the OpenSearch index:

# Define the vector search query
query_vector = vector_search
text_to_video_search = search_similar_segments(query_vector)
# print(text_video_search)
print_search_results(text_to_video_search)

Expected output

The following illustrates similar results retrieved from OpenSearch.

Search Results:
--------------------------------------------------

Result 1:
Video: robin-bird.mp4
Time Range: 18.0 - 21.087732
Similarity Score: 0.4409

Result 2:
Video: robin-bird.mp4
Time Range: 12.0 - 18.0
Similarity Score: 0.4300

Result 3:
Video: island.mp4
Time Range: 0.0 - 6.0
Similarity Score: 0.3624

Insights from results

  • Each result includes a similarity score indicating how closely it matches the query, a time range indicating the start and end offset in seconds, and the video title.
  • Observe that the top 2 results correspond to the robin bird video segments matching the Bird eating food query.

This process demonstrates how textual queries such as Bird eating food can effectively retrieve relevant video segments from an indexed library using TwelveLabs’ multimodal embeddings and OpenSearch’s powerful vector search capabilities.

Step 8: Perform audio-to-video search

You can use audio-to-video search to retrieve video segments that are most relevant to a given audio input. By using TwelveLabs’ audio embedding capabilities and OpenSearch’s vector search functionality, you can match audio features with video embeddings in the index. Here’s how to implement this step:

Generate audio embeddings

To perform the search, you first convert the audio input into a vector representation using the TwelveLabs Embed API:

# Create audio embeddings for the input audio file
audio_res = twelvelabs_client.embed.create(
    model_name="Marengo-retrieval-2.7",
    audio_file="audio-data.mp3",  # Replace with your desired audio file
)

# Print details of the generated embedding
print(f"Created audio embedding: model_name={audio_res.model_name}")
print(f" Model: {audio_res.model_name}")

# Extract and inspect the generated audio embeddings
if audio_res.audio_embedding is not None and audio_res.audio_embedding.segments is not None:
    print_segments(audio_res.audio_embedding.segments)

# Store the embedding vector for search
vector_search = audio_res.audio_embedding.segments[0].embeddings_float
print("Generated Audio Embedding Vector:", vector_search)

Key points:

  • The Marengo-retrieval-2.7 model is used to generate a dense vector embedding for the input audio.
  • The embedding captures the semantic features of the audio, such as rhythm, tone, and patterns, enabling effective matching with video embeddings

Perform vector search in OpenSearch Service

After the audio embedding is generated, you use it as a query vector to perform a k-nearest neighbor (kNN) search in OpenSearch:

# Perform vector search
query_vector = vector_search
audio_to_video_search = search_similar_segments(query_vector)
# print(text_video_search)
    
print_search_results(audio_to_video_search)

Expected output

The following shows video segments retrieved from OpenSearch Service based on their similarity to the input audio.

Search Results:
--------------------------------------------------

Result 1:
Video: island.mp4
Time Range: 6.0 - 12.0
Similarity Score: 0.2855

Result 2:
Video: robin-bird.mp4
Time Range: 18.0 - 21.087732
Similarity Score: 0.2841

Result 3:
Video: robin-bird.mp4
Time Range: 12.0 - 18.0
Similarity Score: 0.2837

Result 4:
Video: island.mp4
Time Range: 0.0 - 6.0
Similarity Score: 0.2835

Here notice that segments from both videos are returned with a low similarity score.

Step 9: Performing images-to-video search

You can use image-to-video search to retrieve video segments that are visually similar to a given image. By using TwelveLabs’ image embedding capabilities and OpenSearch Service’s vector search functionality, you can match visual features from an image with video embeddings in the index. Here’s how to implement this step:

Generate Image Embeddings

To perform the search, you first convert the input image into a vector representation using the TwelveLabs Embed API:

# Create image embeddings for the input image file
image_res = twelvelabs_client.embed.create(
    model_name="Marengo-retrieval-2.7",
    image_file="image-data.jpg",  # Replace with your desired image file
)

# Print details of the generated embedding
print(f"Created image embedding: model_name={image_res.model_name}")
print(f" Model: {image_res.model_name}")

# Extract and inspect the generated image embeddings
if image_res.image_embedding is not None and image_res.image_embedding.segments is not None:
    print_segments(image_res.image_embedding.segments)

# Store the embedding vector for search
vector_search = image_res.image_embedding.segments[0].embeddings_float
print("Generated Image Embedding Vector:", vector_search)

Key points:

  • The Marengo-retrieval-2.7 model is used to generate a dense vector embedding for the input image.
  • The embedding captures visual features such as shapes, colors, and patterns, enabling effective matching with video embeddings

Perform vector search in OpenSearch

After the image embedding is generated, you use it as a query vector to perform a k-nearest neighbor (kNN) search in OpenSearch:

# Perform vector search
query_vector = vector_search
image_to_video_search = search_similar_segments(query_vector)
# print(text_video_search)
    
print_search_results(image_to_video_search)

Expected output

The following shows video segments retrieved from OpenSearch based on their similarity to the input image.

Search Results:
--------------------------------------------------

Result 1:
Video: island.mp4
Time Range: 6.0 - 12.0
Similarity Score: 0.5616

Result 2:
Video: island.mp4
Time Range: 0.0 - 6.0
Similarity Score: 0.5576

Result 3:
Video: robin-bird.mp4
Time Range: 12.0 - 18.0
Similarity Score: 0.4592

Result 4:
Video: robin-bird.mp4
Time Range: 18.0 - 21.087732
Similarity Score: 0.4540

Observe that image of an ocean was used to search the videos. Video clips from the island video are retrieved with a higher similarity score in the first 2 results.

Clean up

To avoid charges, delete resources created while following this post. For Amazon OpenSearch Service domains, navigate to the AWS Management Console for Amazon OpenSearch Service dashboard and delete the domain.

Conclusion

The integration of TwelveLabs Embed API with OpenSearch Service provides a cutting-edge solution for advanced video search and analysis, unlocking new possibilities for content discovery and insights. By using TwelveLabs’ multimodal embeddings, which capture the intricate interplay of visual, audio, and textual elements in videos, and combining them with OpenSearch Service’s robust vector search capabilities, this solution enables highly nuanced and contextually relevant video search.

As industries increasingly rely on video content for communication, education, marketing, and research, this advanced search solution becomes indispensable. It empowers businesses to extract hidden insights from their video content, enhance user experiences in video-centric applications and make data-driven decisions based on comprehensive video analysis

This integration not only addresses current challenges in managing video content but also lays the foundation for future innovations in how we interact with and derive value from video data.

Get started

Ready to explore the power of TwelveLabs Embed API? Start your free trial today by visiting TwelveLabs Playground to sign up and receive your API key.

For developers looking to implement this solution, follow our detailed step-by-step guide on GitHub to integrate TwelveLabs Embed API with OpenSearch Service and build your own advanced video search application.

Unlock the full potential of your video content today!


About the Authors

James Le runs the Developer Experience function at TwelveLabs. He works with partners, developers, and researchers to bring state-of-the-art video foundation models to various multimodal video understanding use cases.

Gitika is an Senior WW Data & AI Partner Solutions Architect at Amazon Web Services (AWS). She works with partners on technical projects, providing architectural guidance and enablement to build their analytics practice.

Kruthi is a Senior Partner Solutions Architect specializing in AI and ML. She provides technical guidance to AWS Partners in following best practices to build secure, resilient, and highly available solutions in the AWS Cloud.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments