sábado, janeiro 18, 2025
HomeBig DataAutomate topic provisioning and configuration using Terraform with Amazon MSK

Automate topic provisioning and configuration using Terraform with Amazon MSK


As organizations deploy Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters across multiple use cases, the manual management of topic configurations can be challenging. This can lead to several issues:

  • Inefficiency – Manual configuration is time-consuming and error-prone, especially for large deployments. Maintaining consistency across multiple configurations can be difficult. To avoid this, Kafka administrators often set the create.topics.enable property on brokers, which leads to cluster operation inefficiency.
  • Human error – Manual configuration increases the risk of mistakes that can disrupt data flow and impact applications relying on Amazon MSK.
  • Scalability challenges – Scaling an Amazon MSK environment with manual configuration is cumbersome. Adding new topics or modifying existing ones requires manual intervention, hindering agility.

These challenges highlight the need for a more automated and robust approach to MSK topic configuration management.

In this post, we address this problem by using Terraform to optimize the configuration of MSK topics. This solution supports both provisioned and serverless MSK clusters.

Solution overview

Customers want a better way to manage the overhead of topics and their configurations. Manually handling topic configurations can be cumbersome and error-prone, making it difficult to keep track of changes and updates.

To address these challenges, you can use Terraform, an infrastructure as code (IaC) tool by HashiCorp. Terraform allows you to manage and provision infrastructure declaratively. It uses human-readable configuration files written in HashiCorp Configuration Language (HCL) to define the desired state of infrastructure resources. These resources can span virtual machines, networks, databases, and a vast array of cloud provider-specific offerings.

Terraform offers a compelling solution to the challenges of manual Kafka topic configuration. Terraform allows you to define and manage your Kafka topics through code. This approach provides several key benefits:

  • Automation – Terraform automates the creation, modification, and deletion of MSK topics.
  • Consistency and repeatability – Terraform configurations provide consistent topic structures and settings across your entire Amazon MSK environment. This simplifies management and reduces the likelihood of configuration drift.
  • Scalability – Terraform enables you to provision and manage large numbers of MSK topics, facilitating the growth of your Amazon MSK environment.
  • Version control – Terraform configurations are stored in version control systems, allowing you to track changes, roll back if needed, and collaborate effectively on your Amazon MSK infrastructure.

By using Terraform for MSK topic configuration management, you can streamline your operations, minimize errors, and have a robust and scalable Amazon MSK environment.

In this post, we provide a comprehensive guide for using Terraform to manage Amazon MSK configurations. We explore the process of installing Terraform on Amazon Elastic Compute Cloud (Amazon EC2), defining and decentralizing topic configurations, and deploying and updating configurations in an automated manner.

Prerequisites

Before proceeding with the solution, make sure you have the following resources and access:

By making sure you have these prerequisites in place, you will be ready to streamline your topic configurations with Terraform.

Install Terraform on your client machine

When your cluster and client machine are ready, SSH to your client machine (Amazon EC2) and install Terraform.

  1. Run the following commands to install Terraform:
    sudo yum update -y
    sudo yum install -y yum-utils shadow-utils
    sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/AmazonLinux/hashicorp.repo
    sudo yum -y install terraform

  2. Run the following command to check the installation:

This indicates that Terraform installation is successful and you are ready to automate your MSK topic configuration.

Provision an MSK topic using Terraform

To provision the MSK topic, complete the following steps:

  1. Create a new file called main.tf and copy the following code into this file, replacing the BOOTSTRAP_SERVERS and AWS_REGION information with the details for your cluster. For instructions on retrieving the bootstrap_servers information for IAM authentication from your MSK cluster, see Getting the bootstrap brokers for an Amazon MSK cluster. This script is common for Amazon MSK provisioned and MSK Serverless.
    terraform {
    required_providers {
    kafka = {
    source = "Mongey/kafka" }}}
    provider "kafka" {
    bootstrap_servers = [{BOOTSTRAP_SERVERS}]
    tls_enabled       = true
    sasl_mechanism    = "aws-iam"
    sasl_aws_region   ={AWS_REGION}
    sasl_aws_profile  = "dev" }
    resource "kafka_topic" "sampleTopic" {
    name               = "sampleTopic"
    replication_factor = 1
    partitions         = 50 }

  2. Add IAM bootstrap servers endpoints in a comma separated list format:
    BOOTSTRAP_SERVERS = ["b-2.mskcluster…. ","b-3.mskcluster…. ","b-1.mskcluster…. "]

  3. Run the command terraform init to initialize Terraform and download the required providers.

The terraform init command initializes a working directory containing Terraform configuration files(main.tf). This is the first command that should be run after writing a new Terraform configuration.

  1. Run the command terraform plan to review the run plan.

This command shows the changes that Terraform will make to the infrastructure based on the provided configuration. This step is optional but is often used as a preview of the changes Terraform will make.

  1. If the plan looks correct, run the command terraform apply to apply the configuration.
  2. When prompted for confirmation before proceeding, enter yes.

The terraform apply command runs the actions proposed in a Terraform plan. Terraform will create the sampleTopic topic in your MSK cluster.

  1. After the terraform apply command is complete, verify the infrastructure has been created with the help of the kafka-topics.sh utility:
    kafka/bin/kafka-topics.sh 
    --bootstrap-server "b-1…..amazonaws.com:9098" 
    --command-config ./kafka/bin/client.properties  
    --list

You can use the kafka-toipcs.sh tool with the --list option to retrieve a list of topics associated with your MSK cluster. For more information, refer to the createtopic documentation.

Update the MSK topic configuration using Terraform

To update the MSK topic configuration, let’s assume we want to change the number of partitions from 50 to 10 on our topic. We need to perform the following steps:

  1. Verify the number of partitions on the topic using the --describe command:
    kafka/bin/kafka-topics.sh 
    --bootstrap-server "b-1…...amazonaws.com:9098" 
    --command-config ./kafka/bin/client.properties  
    --describe 
    --topic sampleTopic

This command will show 50 partitions on the sampleTopic topic.

  1. Modify the Terraform file main.tf and change the value of the partitions parameter to 10:
    resource "kafka_topic" "sampleTopic" {
    name               = " sampleTopic "
    replication_factor = 1
    partitions         = 10 }

  2. Run the command terraform plan to review the run plan.

  1. If the plan shows the changes, run the command terraform apply to apply the configuration.
  2. When prompted for confirmation before proceeding, enter yes.

Terraform will drop and recreate the sampleTopic topic with the changed configuration.

  1. Verify the changed number of partitions on the topic, ad rerun the --describe command:
    kafka/bin/kafka-topics.sh 
    --bootstrap-server "b-1…...amazonaws.com:9098" 
    --command-config ./kafka/bin/client.properties  
    --describe --topic sampleTopic

Now, this command will show 10 partitions on the sampleTopic topic.

Delete the MSK topic using Terraform

When you no longer need the infrastructure, you can remove all resources created by your Terraform file.

  1. Run the command terraform destroy to remove the topic.
  2. When prompted for confirmation before proceeding, enter yes.

Terraform will delete the sampleTopic topic from your MSK cluster.

  1. To verify, rerun the --list command:
    kafka/bin/kafka-topics.sh 
    --bootstrap-server "b-1…..amazonaws.com:9098" 
    --command-config ./kafka/bin/client.properties  
    --list

Now, this command will not show the sampleTopic topic.

Conclusion

In this post, we addressed the common challenges associated with manual MSK topic configuration management and presented a robust Terraform-based solution. Using Terraform for automated topic provisioning and configuration streamlines your processes, fosters scalability, and enhances flexibility. Additionally, it facilitates automated deployments and centralized management.

We encourage you to explore Terraform as a means to optimize Amazon MSK configurations and unlock further efficiencies within your streaming data pipelines.


About the author

Vijay Kardile is a Sr. Technical Account Manager with Enterprise Support, India. With over two decades of experience in IT Consulting and Engineering, he specializes in Analytics services, particularly Amazon EMR and Amazon MSK. He has empowered numerous enterprise clients by facilitating their adoption of various AWS services and offering expert guidance on attaining operational excellence.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments