Llama 4 models from Meta now available in Amazon Bedrock serverless

abril 29, 2025

2

The newest AI models from Meta, Llama 4 Scout 17B and Llama 4 Maverick 17B, are now available as a fully managed, serverless option in Amazon Bedrock. These new foundation models (FMs) deliver natively multimodal capabilities with early fusion technology that you can use for precise image grounding and extended context processing in your applications.

Llama 4 uses an innovative mixture-of-experts (MoE) architecture that provides enhanced performance across reasoning and image understanding tasks while optimizing for both cost and speed. This architectural approach enables Llama 4 to offer improved performance at lower cost compared to Llama 3, with expanded language support for global applications.

The models were already available on Amazon SageMaker JumpStart, and you can now use them in Amazon Bedrock to streamline building and scaling generative AI applications with enterprise-grade security and privacy.

Llama 4 Maverick 17B – A natively multimodal model featuring 128 experts and 400 billion total parameters. It excels in image and text understanding, making it suitable for versatile assistant and chat applications. The model supports a 1 million token context window, giving you the flexibility to process lengthy documents and complex inputs.

Llama 4 Scout 17B – A general-purpose multimodal model with 16 experts, 17 billion active parameters, and 109 billion total parameters that delivers superior performance compared to all previous Llama models. Amazon Bedrock currently supports a 3.5 million token context window for Llama 4 Scout, with plans to expand in the near future.

Use cases for Llama 4 models
You can use the advanced capabilities of Llama 4 models for a wide range of use cases across industries:

Enterprise applications – Build intelligent agents that can reason across tools and workflows, process multimodal inputs, and deliver high-quality responses for business applications.

Multilingual assistants – Create chat applications that understand images and provide high-quality responses across multiple languages, making them accessible to global audiences.

Code and document intelligence – Develop applications that can understand code, extract structured data from documents, and provide insightful analysis across large volumes of text and code.

Customer support – Enhance support systems with image analysis capabilities, enabling more effective problem resolution when customers share screenshots or photos.

Content creation – Generate creative content across multiple languages, with the ability to understand and respond to visual inputs.

Research – Build research applications that can integrate and analyze multimodal data, providing insights across text and images.

Using Llama 4 models in Amazon Bedrock
To use these new serverless models in Amazon Bedrock, I first need to request access. In the Amazon Bedrock console, I choose Model access from the navigation pane to toggle access to Llama 4 Maverick 17B and Llama 4 Scout 17B models.

The Llama 4 models can be easily integrated into your applications using the Amazon Bedrock Converse API, which provides a unified interface for conversational AI interactions.

Here’s an example of how to use the AWS SDK for Python (Boto3) with Llama 4 Maverick for a multimodal conversation:

import boto3
import json
import os

AWS_REGION = "us-west-2"
MODEL_ID = "us.meta.llama4-maverick-17b-instruct-v1:0"
IMAGE_PATH = "image.jpg"


def get_file_extension(filename: str) -> str:
    """Get the file extension."""
    extension = os.path.splitext(filename)[1].lower()[1:] or 'txt'
    if extension == 'jpg':
        extension = 'jpeg'
    return extension


def read_file(file_path: str) -> bytes:
    """Read a file in binary mode."""
    try:
        with open(file_path, 'rb') as file:
            return file.read()
    except Exception as e:
        raise Exception(f"Error reading file {file_path}: {str(e)}")

bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    region_name=AWS_REGION
)

request_body = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": "What can you tell me about this image?"
                },
                {
                    "image": {
                        "format": get_file_extension(IMAGE_PATH),
                        "source": {"bytes": read_file(IMAGE_PATH)},
                    }
                },
            ],
        }
    ]
}

response = bedrock_runtime.converse(
    modelId=MODEL_ID,
    messages=request_body["messages"]
)

print(response["output"]["message"]["content"][-1]["text"])

This example demonstrates how to send both text and image inputs to the model and receive a conversational response. The Converse API abstracts away the complexity of working with different model input formats, providing a consistent interface across models in Amazon Bedrock.

For more interactive use cases, you can also use the streaming capabilities of the Converse API:

response_stream = bedrock_runtime.converse_stream(
    modelId=MODEL_ID,
    messages=request_body['messages']
)

stream = response_stream.get('stream')
if stream:
    for event in stream:

        if 'messageStart' in event:
            print(f"\nRole: {event['messageStart']['role']}")

        if 'contentBlockDelta' in event:
            print(event['contentBlockDelta']['delta']['text'], end="")

        if 'messageStop' in event:
            print(f"\nStop reason: {event['messageStop']['stopReason']}")

        if 'metadata' in event:
            metadata = event['metadata']
            if 'usage' in metadata:
                print(f"Usage: {json.dumps(metadata['usage'], indent=4)}")
            if 'metrics' in metadata:
                print(f"Metrics: {json.dumps(metadata['metrics'], indent=4)}")

With streaming, your applications can provide a more responsive experience by displaying model outputs as they are generated.

Things to know
The Llama 4 models are available today with a fully managed, serverless experience in Amazon Bedrock in the US East (N. Virginia) and US West (Oregon) AWS Regions. You can also access Llama 4 in US East (Ohio) via cross-region inference.

As usual with Amazon Bedrock, you pay for what you use. For more information, see Amazon Bedrock pricing.

These models support 12 languages for text (English, French, German, Hindi, Italian, Portuguese, Spanish, Thai, Arabic, Indonesian, Tagalog, and Vietnamese) and English when processing images.

To start using these new models today, visit the Meta Llama models section in the Amazon Bedrock User Guide. You can also explore how our Builder communities are using Amazon Bedrock in their solutions in the generative AI section of our community.aws site.

— Danilo

How is the News Blog doing? Take this 1 minute survey!

(This survey is hosted by an external company. AWS handles your information as described in the AWS Privacy Notice. AWS will own the data gathered via this survey and will not share the information collected with survey respondents.)

Previous articleLegal Implications in Outsourcing Projects

Next articleAI Today and Tomorrow Series #4: Frontier Apps and Bizops

Llama 4 models from Meta now available in Amazon Bedrock serverless

OpenSearch in 2025: Much more than an Elasticsearch fork

Revolutionizing AI Security and Cyber Resilience with Open-Source Innovation and Risk Management Solutions

Five Empowering Tips for ICT Career Success

LEAVE A REPLY Cancel reply

Most Popular

Meta’s LlamaCon was all about undercutting OpenAI

Amazon enters orbit, launches 27 LEO satellites

3D Printer Filament Types: What Is The Strongest?

Nature in Addiction Recovery: Why Your Surroundings Matter

Recent Comments

ABOUT US

POPULAR POSTS

Meta’s LlamaCon was all about undercutting OpenAI

Amazon enters orbit, launches 27 LEO satellites

3D Printer Filament Types: What Is The Strongest?

POPULAR CATEGORY