As organizations deploy Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters across multiple use cases, the manual management of topic configurations can be challenging. This can lead to several issues:
- Inefficiency – Manual configuration is time-consuming and error-prone, especially for large deployments. Maintaining consistency across multiple configurations can be difficult. To avoid this, Kafka administrators often set the create.topics.enable property on brokers, which leads to cluster operation inefficiency.
- Human error – Manual configuration increases the risk of mistakes that can disrupt data flow and impact applications relying on Amazon MSK.
- Scalability challenges – Scaling an Amazon MSK environment with manual configuration is cumbersome. Adding new topics or modifying existing ones requires manual intervention, hindering agility.
These challenges highlight the need for a more automated and robust approach to MSK topic configuration management.
In this post, we address this problem by using Terraform to optimize the configuration of MSK topics. This solution supports both provisioned and serverless MSK clusters.
Solution overview
Customers want a better way to manage the overhead of topics and their configurations. Manually handling topic configurations can be cumbersome and error-prone, making it difficult to keep track of changes and updates.
To address these challenges, you can use Terraform, an infrastructure as code (IaC) tool by HashiCorp. Terraform allows you to manage and provision infrastructure declaratively. It uses human-readable configuration files written in HashiCorp Configuration Language (HCL) to define the desired state of infrastructure resources. These resources can span virtual machines, networks, databases, and a vast array of cloud provider-specific offerings.
Terraform offers a compelling solution to the challenges of manual Kafka topic configuration. Terraform allows you to define and manage your Kafka topics through code. This approach provides several key benefits:
- Automation – Terraform automates the creation, modification, and deletion of MSK topics.
- Consistency and repeatability – Terraform configurations provide consistent topic structures and settings across your entire Amazon MSK environment. This simplifies management and reduces the likelihood of configuration drift.
- Scalability – Terraform enables you to provision and manage large numbers of MSK topics, facilitating the growth of your Amazon MSK environment.
- Version control – Terraform configurations are stored in version control systems, allowing you to track changes, roll back if needed, and collaborate effectively on your Amazon MSK infrastructure.
By using Terraform for MSK topic configuration management, you can streamline your operations, minimize errors, and have a robust and scalable Amazon MSK environment.
In this post, we provide a comprehensive guide for using Terraform to manage Amazon MSK configurations. We explore the process of installing Terraform on Amazon Elastic Compute Cloud (Amazon EC2), defining and decentralizing topic configurations, and deploying and updating configurations in an automated manner.
Prerequisites
Before proceeding with the solution, make sure you have the following resources and access:
- To simplify the setup, use the provided AWS CloudFormation template. This template will create the necessary Amazon MSK provisioned cluster and required resources for this post. You can create an MSK Serverless cluster using the Amazon MSK console and use it in this solution. This is sample template, not production ready, and AWS Identity and Access Management (IAM) policies should be implemented using best practices and the principle of least privilege. For more details, see Get started with AWS managed policies and move toward least-privilege permissions. An EC2 instance will be created as a part of this template. The MSK cluster and EC2 instance will be created on a single virtual private cloud (VPC); however, you can install Terraform in a different account or on different VPC. For more details, see Connect Kafka client applications securely to your Amazon MSK cluster from different VPCs and AWS accounts.
- For this post, we use the latest Terraform version (1.10.x) and Terraform plugins – Mongey/Kafka provider. In Terraform, plugins are binary executables responsible for implementing resource types and providers. The plugins are installed automatically when we initialize a Terraform configuration using the terraform init
- You need access to an AWS account with sufficient permissions to create and manage resources, including IAM roles and MSK clusters. For more information, see IAM access control.
By making sure you have these prerequisites in place, you will be ready to streamline your topic configurations with Terraform.
Install Terraform on your client machine
When your cluster and client machine are ready, SSH to your client machine (Amazon EC2) and install Terraform.
- Run the following commands to install Terraform:
- Run the following command to check the installation:
This indicates that Terraform installation is successful and you are ready to automate your MSK topic configuration.
Provision an MSK topic using Terraform
To provision the MSK topic, complete the following steps:
- Create a new file called
main.tf
and copy the following code into this file, replacing the BOOTSTRAP_SERVERS and AWS_REGION information with the details for your cluster. For instructions on retrieving thebootstrap_servers
information for IAM authentication from your MSK cluster, see Getting the bootstrap brokers for an Amazon MSK cluster. This script is common for Amazon MSK provisioned and MSK Serverless. - Add IAM bootstrap servers endpoints in a comma separated list format:
- Run the command
terraform init
to initialize Terraform and download the required providers.
The terraform init
command initializes a working directory containing Terraform configuration files(main.tf). This is the first command that should be run after writing a new Terraform configuration.
- Run the command
terraform plan
to review the run plan.
This command shows the changes that Terraform will make to the infrastructure based on the provided configuration. This step is optional but is often used as a preview of the changes Terraform will make.
- If the plan looks correct, run the command
terraform apply
to apply the configuration. - When prompted for confirmation before proceeding, enter
yes
.
The terraform apply
command runs the actions proposed in a Terraform plan
. Terraform will create the sampleTopic
topic in your MSK cluster.
- After the
terraform apply
command is complete, verify the infrastructure has been created with the help of the kafka-topics.sh utility:
You can use the kafka-toipcs.sh tool with the --list
option to retrieve a list of topics associated with your MSK cluster. For more information, refer to the createtopic documentation.
Update the MSK topic configuration using Terraform
To update the MSK topic configuration, let’s assume we want to change the number of partitions from 50 to 10 on our topic. We need to perform the following steps:
- Verify the number of partitions on the topic using the
--describe
command:
This command will show 50 partitions on the sampleTopic
topic.
- Modify the Terraform file
main.tf
and change the value of the partitions parameter to 10: - Run the command
terraform plan
to review the run plan.
- If the plan shows the changes, run the command
terraform apply
to apply the configuration. - When prompted for confirmation before proceeding, enter
yes
.
Terraform will drop and recreate the sampleTopic
topic with the changed configuration.
- Verify the changed number of partitions on the topic, ad rerun the
--describe
command:
Now, this command will show 10 partitions on the sampleTopic
topic.
Delete the MSK topic using Terraform
When you no longer need the infrastructure, you can remove all resources created by your Terraform file.
- Run the command
terraform destroy
to remove the topic. - When prompted for confirmation before proceeding, enter
yes
.
Terraform will delete the sampleTopic
topic from your MSK cluster.
- To verify, rerun the
--list
command:
Now, this command will not show the sampleTopic
topic.
Conclusion
In this post, we addressed the common challenges associated with manual MSK topic configuration management and presented a robust Terraform-based solution. Using Terraform for automated topic provisioning and configuration streamlines your processes, fosters scalability, and enhances flexibility. Additionally, it facilitates automated deployments and centralized management.
We encourage you to explore Terraform as a means to optimize Amazon MSK configurations and unlock further efficiencies within your streaming data pipelines.
About the author
Vijay Kardile is a Sr. Technical Account Manager with Enterprise Support, India. With over two decades of experience in IT Consulting and Engineering, he specializes in Analytics services, particularly Amazon EMR and Amazon MSK. He has empowered numerous enterprise clients by facilitating their adoption of various AWS services and offering expert guidance on attaining operational excellence.