In the rapidly evolving world of data and analytics, organizations are constantly seeking new ways to optimize their data infrastructure and unlock valuable insights. Amazon Redshift is changing the game for thousands of businesses every day by making analytics straightforward and more impactful. Fully managed, AI powered, and using parallel processing, Amazon Redshift helps companies uncover insights faster than ever. Whether you’re a small startup or a big player, Amazon Redshift helps you make smart decisions quickly and with the best price-performance at scale. Amazon Redshift Serverless is a pay-per-use serverless data warehousing service that eliminates the need for manual cluster provisioning and management. This approach is a game changer for organizations of all sizes with predictable or unpredictable workloads.
The key innovation of Redshift Serverless is its ability to automatically scale compute up or down based on your workload demands, maintaining optimal performance and cost-efficiency without manual intervention. Redshift Serverless allows you to specify the base data warehouse capacity the service uses to handle your queries for a steady level of performance on a well-known workload or use a price-performance target (AI-driven scaling and optimization), better suited in scenarios with fluctuating demands, optimizing costs while maintaining performance. The base capacity is measured in Redshift Processing Units (RPUs), where one RPU provides 16 GB of memory. Redshift Serverless defaults to a robust 128 RPUs, capable of analyzing petabytes of data, allowing you to scale up for more power or down for cost optimization, making sure that your data warehouse is optimally sized for your unique needs. By setting a higher base capacity, you can improve the overall performance of your queries, especially for data processing jobs that tend to consume a lot of compute resources. The more RPUs you allocate as the base capacity, the more memory and processing power Redshift Serverless will have available to tackle your most demanding workloads. This setting gives you the flexibility to optimize Redshift Serverless for your specific needs. If you have a lot of complex, resource-intensive queries, increasing the base capacity can help make sure those queries are executed efficiently, with little to no bottlenecks or delays.
In this post, we explore the new higher base capacity of 1024 RPUs in Redshift Serverless, which doubles the previous maximum of 512 RPUs. This enhancement empowers you to get high performance for your workload containing highly complex queries and write-intensive workloads, with concurrent data ingestion and transformation tasks that require high throughput and low latency with Redshift Serverless. Redshift Serverless also offers scale up to 10 times the base capacity. The focus is on helping you find the right balance between performance and cost to meet your organization’s unique data warehousing needs. By adjusting the base capacity, you can fine-tune Redshift Serverless to deliver the perfect combination of speed and efficiency for your workloads.
The need for 1024 RPUs
Data warehousing workloads are increasingly demanding high-performance computing resources to meet the challenges of modern data processing requirements. The need for 1024 RPUs is driven by several key factors. First, many data warehousing use cases involve processing petabyte-sized historical datasets, whether for initial data loading or periodic reprocessing and querying. This is particularly prevalent in industries like healthcare, financial services, manufacturing, retail, and engineering, where third-party data sources can deliver petabytes of information that must be ingested in a timely manner. Additionally, the seasonal nature of many business processes, such as month-end or quarter-end reporting, creates periodic spikes in computational needs that require substantial scalable resources.
The complexity of the queries and analytics run against data warehouses has also grown exponentially, with many workloads now scanning and processing multi-petabyte datasets. This level of complex data processing requires substantial memory and parallel processing capabilities that can be effectively provided by a 1024 RPU configuration. Furthermore, the increasing integration of data warehouses with data lakes and other distributed data sources adds to the overall computational burden, necessitating high-performing, scalable solutions.
Also, many data warehousing environments are characterized by heavy write-intensive workloads, with concurrent data ingestion and transformation tasks that require a high-throughput, low-latency processing architecture. For workloads requiring access to extremely large volumes of data with complex joins, aggregations, and numerous columns that necessitate substantial memory usage, the 1024 RPU configuration can deliver the necessary performance to help meet demanding service level agreements (SLAs) and provide timely data availability for downstream business intelligence and decision-making processes. And for the control of costs, we can set the maximum capacity (on the Limits tab at the workgroup configuration) to cap the usage of resources to a maximum. The following screenshot shows an example.
During the tests discussed later in this post, we compare using maximum capacity of 1024 RPUs vs. 512 RPUs.
When to consider using 1024 RPUs
Consider using 1024 RPUs in the following scenarios:
- Complex and long-running queries – Large warehouses provide the compute power needed to process complex queries that involve multiple joins, aggregations, and calculations. For workloads analyzing terabytes or petabytes of data, the 1024 RPU capacity can significantly improve query completion times.
- Data lake queries scanning large datasets – Queries that scan extensive data in external data lakes benefit from the additional compute resources. This provides faster processing and reduced latency, even for large-scale analytics.
- High-memory queries – Queries requiring substantial memory—such as those with many columns, large intermediate results, or temporary tables—perform better with the increased capacity of a larger warehouse.
- Accelerated data loading – Large capacity warehouses improve the performance of data ingestion tasks, such as loading massive datasets into the data warehouse. This is particularly beneficial for workloads involving frequent or high-volume data loads.
- Performance-critical use cases – For applications or systems that demand low latency and high responsiveness, a 1024 RPU warehouse provides smooth operation by allocating sufficient compute resources to handle peak loads efficiently.
Balancing performance and cost
Choosing the right warehouse size requires evaluating your workload’s complexity and performance requirements. A larger warehouse size, such as 1024 RPUs, excels at handling computationally intensive tasks but should be balanced against cost-effectiveness. Consider testing your workload on different base capacities or using the Redshift Serverless price-performance slider to find the optimal setting.
When to avoid larger base capacity
Although larger warehouses offer powerful performance benefits, they might not always be the most cost-effective solution. Consider the following scenarios where a smaller base capacity might be more suitable:
- Basic or small queries – Simple queries that process small datasets or involve minimal computation don’t require the high capacity of a 1024 RPU warehouse. In such cases, smaller warehouses can handle the workload effectively, avoiding unnecessary costs.
- Cost-sensitive workloads – For workloads with predictable and moderate complexity, a smaller warehouse can deliver sufficient performance while keeping costs under control. Selecting a larger capacity might lead to overspending without proportional performance gains.
Comparison and cost-effectiveness
The previous maximum of 512 RPUs should suffice for most use cases, but there can be situations that need more. At 512 RPUs, you get 8 TB of memory on your workgroup; with 1024 RPU, it’s doubled to 16 TB. Consider a scenario where you are ingesting large volumes of data with the COPY command and there are healthcare datasets that go into the 30 TB (or more) range.
To illustrate, we ingested the TPC-H 30TB datasets available at AWS Labs Github repository amazon-redshift-utils on the 512 RPU workgroup and the 1024 RPU workgroup.
The following graph provides detailed runtimes. We see an overall 44% performance improvement on 1024 RPUs vs. 512 RPUs. You will notice that the larger ingestion workloads show a greater performance improvement.
The cost for running 6,809 seconds at 512 RPUs in the US East (Ohio) AWS Region at $0.36 per RPU-hour is calculated as 6809 * 512 * 0.36 / 60 / 60 = $348.62.
The cost for running 3,811 seconds at 1024 RPUs in the US East (Ohio) Region at $0.36 per RPU-hour is calculated as 3811 * 1024 * 0.36 / 60 / 60 = $390.25.
1024 RPUs is able to ingest the 30 TB of data 44% faster at a 12% higher cost compared to 512 RPUs.
Next, we ran the 22 TPC-H queries available at AWS Samples Github repository redshift-benchmarks on the same two workgroups to compare query performance.
The following graph provides detailed runtimes for each of the 22 TPC-H queries. We see an overall 17% performance improvement on 1024 RPUs vs. 512 RPUs for a single session sequential query execution, even though performance improved for some and deteriorated for others.
When running 20 sessions concurrently, we see 62% performance improvement, from 6,903 seconds on 512 RPUs down to 2,592 seconds on 1024 RPUs, with each concurrent session running the 22 TPC-H queries in a different order.
Notice the stark difference in performance improvement seen for concurrent execution (62%) vs. serial execution (17%). The concurrent executions represent a typical production system where multiple concurrent sessions are running queries against the database. It’s important to base your proof of concept decisions on production-like scenarios with concurrent executions, and not only on sequential executions, which typically come from a single user running the proof of concept. The following table compares both tests.
512 RPU | 1024 RPU | |
Sequential (seconds) | 1276 | 1065 |
Concurrent executions (seconds) | 6903 | 2592 |
Total (seconds) | 8179 | 3657 |
Total ($) | $418.76 | $374.48 |
The total ($) is calculated by seconds * RPUs * 0.36 / 60 / 60.
1024 RPUs are able to run the TPC-H queries against 30 TB benchmark data 55% faster, and at 11% lower cost compared to 512 RPUs.
Amazon Redshift offers system metadata views and system views, which are useful for tracking resource utilization. We analyzed additional metrics from the sys_query_history and sys_query_detail tables to identify which specific parts of query execution experienced performance improvements or declines. Notice that 1024 RPUs with 16 TB of memory is able to hold a larger number of data blocks in-memory, thereby needing to fetch 35% fewer SSD blocks compared to 512 RPUs with 8 TB of memory. It is able to run the larger workloads better by needing to fetch remote Amazon S3 blocks 71% less compared to 512 RPUs. Finally, local disk spill to SSD (when a query can’t be allocated more memory) was reduced by 63% and remote disk spill to S3 (when the SSD cache is fully occupied) was completely eliminated on 1024 RPUs compared to 512 RPUs.
Metric | Improvement (percentage) |
Elapsed time | 60% |
Queue time | 23% |
Runtime | 59% |
Compile time | -8% |
Planning time | 64% |
Lockwait time | -31% |
Local SSD blocks read | 35% |
Remote S3 blocks read | 71% |
Local disk spill to SSD | 63% |
Remote disk spill to S3 | 100% |
The following are some run characteristic graphs captured from the Amazon Redshift console. To find these, choose Query and database monitoring and Resource monitoring under Monitoring in the navigation pane.
Thanks to the performance enhancement, queries completed sooner with 1024 RPUs than with 512 RPUs, resulting on connections ending faster.
The following graph illustrates the database connection with 512 RPUs.
The following graph illustrates the database connection with 1024 RPUs.
Regarding query classification, there are three categories: short queries (less than 10 seconds), medium queries (10 seconds to 10 minutes), and long queries (more than 10 minutes). We observed that due to performance improvements, the 1024 RPU configuration resulted in fewer long queries compared to the 512 RPU configuration.
The following graph illustrates the queries duration with 512 RPUs.
The following graph illustrates the queries duration with 1024 RPUs.
Due to the better performance, we noticed that the number of queries handled per second is higher on 1024 RPUs.
The following graph illustrates the queries completed per second with 512 RPUs.
The following graph illustrates the queries completed per second with 1024 RPUs.
In the following graphs, we see that although the number of queries running looks similar, the 1024 RPU endpoint ends the queries faster, which means a smaller window to run the same number of queries.
The following graph illustrates the queries running with 512 RPUs.
The following graph illustrates the queries running with 1024 RPUs.
There was no queuing when we compared both tests.
The following graph illustrates the queries queued with 512 RPUs.
The following graph illustrates the queries queued with 1024 RPUs.
The following graph illustrates the query runtime breakdown with 512 RPUs.
The following graph illustrates the query runtime breakdown with 1024 RPUs.
Queuing was largely avoided due to the automatic scaling feature offered by Redshift Serverless. By dynamically adding more resources, we can keep queries running and match the expected performance levels, even during usage peaks. You are able to set a maximum capacity to help prevent automatic scaling from exceeding your desired resource limits.
The following graph illustrates workgroup scaling with 512 RPUs. Redshift Serverless automatically scaled to 2x/1024 RPUs and peaked at 2.5x/1280 RPUs.
The following graph illustrates workgroup scaling with 1024 RPUs. Redshift Serverless automatically scaled to 2x/2048 RPUs and peaked at 3x/3072 RPUs.
The following graph illustrates compute consumed with 512 RPUs.
The following graph illustrates compute consumed with 1024 RPUs.
Conclusion
The introduction of the 1024 RPUs capacity for Redshift Serverless marks a significant advancement in data warehousing capabilities, offering substantial benefits for organizations handling large-scale, complex data processing tasks. Redshift Serverless ingestion scan scales up the ingestion performance with higher capacity. As evidenced by the benchmark tests in this post using the TPC-H dataset, this higher base capacity not only accelerates processing times, but can also prove more cost-effective for workloads as described in this post, demonstrating improvements such as 44% faster data ingestion, 62% better performance in concurrent query execution, and overall cost savings of 11% for combined workloads.
Given these impressive results, it’s crucial for organizations to evaluate their current data warehousing needs and consider running a proof of concept with the 1024 RPU configuration. Analyze your workload patterns using the Amazon Redshift monitoring tools, optimize your configurations accordingly, and don’t hesitate to engage with AWS experts for personalized advice. If your company is covered by an account team, ask them for a meeting. If not, post your analysis and question to the Re:Post forum.
By taking these steps and staying informed about future developments, you can make sure that your organization fully takes advantage of Redshift Serverless, potentially unlocking new levels of performance and cost-efficiency in your data warehousing operations.
About the authors
Ricardo Serafim is a Senior Analytics Specialist Solutions Architect at AWS.
Harshida Patel is a Analytics Specialist Principal Solutions Architect, with AWS.
Milind Oke is a Data Warehouse Specialist Solutions Architect based out of New York. He has been building data warehouse solutions for over 15 years and specializes in Amazon Redshift.