terça-feira, maio 13, 2025
HomeBig DataZero-copy, Coordination-free approach to OpenSearch Snapshots

Zero-copy, Coordination-free approach to OpenSearch Snapshots


Amazon OpenSearch Service provides automated hourly snapshots as a critical backup and recovery mechanism for customer data. These snapshots serve as point-in-time backups that you can use to restore your OpenSearch domains to a previous state, helping to ensure data durability and business continuity. While this functionality is essential, it’s equally important that the snapshot process operates seamlessly without impacting the domain’s core operations. The snapshot workflow must be efficient enough to maintain optimal performance of search and indexing operations, preserve the domain’s ability to scale with growing workloads, and support overall cluster stability.

In this blog post, we tell you how we enhanced the snapshot efficiency in Amazon OpenSearch Service while carefully maintaining these critical operational aspects. These snapshot optimizations are enabled for all OpenSearch optimized instance family (OR1, OR2, OM2) domains from version 2.17 onwards.

Background

In the traditional snapshot mechanism of OpenSearch, the process involves uploading incremental segment files from each shard to Amazon Simple Storage Service (Amazon S3). The workflow begins when the cluster manager node initiates the snapshot creation and coordinates with the nodes holding primary shards to capture their respective snapshots. Throughout this process, data nodes continuously communicate with the cluster manager node to report their snapshot progress. To provide resilience against leader failures, the cluster state maintains detailed tracking of all in-progress snapshots. This state is shared with all data nodes. However, this approach introduces significant communication overhead, especially in large-scale deployments.

Consider a cluster with M nodes and N primary shards. Each snapshot operation requires at least N cluster state updates, with M*N transport calls flowing to and from the cluster manager node to the data nodes (comprising one cluster state update for each primary shard and M transport calls for each update), as shown in the following diagram. In large domains with hundreds of nodes and thousands of shards, this intensive communication pattern can potentially overwhelm the cluster manager node, impacting its ability to handle other critical cluster management tasks.

Traditional Snapshot

The OpenSearch optimized instance family introduced a significant advancement in data durability and snapshot efficiency. Built to deliver high throughput with 11 nines of durability, OpenSearch optimized instances maintain a copy of all indexed data in Amazon S3. This architectural design eliminated the need to re-upload data during snapshot creation. Instead, the system references the existing data checkpoint in the snapshot metadata. Data checkpoints track the state of data on shards at a given point in time to help ensure consistency and durability. We also prevent cleaning up data from Amazon S3 that is referenced in the snapshot metadata. This approach made snapshots substantially more lightweight and faster compared to the conventional method.

The improved snapshot flow with OpenSearch optimized instances, also called a shallow snapshot v1, manages checkpoint referencing by creating explicit lock files for each checkpoint of a given shard. This flow is illustrated in the following diagram where in the fourth step, instead of uploading segments data, we upload a checkpoint lock file.

Shallow Snapshot V1

While this approach successfully addressed the data redundancy issue by replacing segment data uploads with checkpoint lock file creation, it introduced its own set of challenges. The communication overhead between nodes remained unchanged during snapshot creation and deletion operations. Additionally, the system creates lock files for every shard in each snapshot, regardless of whether the shard receives active traffic or not. This design choice generated an excessive number of remote store calls in order to create a lock file per shard during snapshot operations which is particularly problematic for larger OpenSearch domains.

Revised shallow snapshot (v2)

At its core, shallow snapshot v2 reimagines how we handle data backup in OpenSearch. Shallow snapshot v2 takes a more intelligent approach by implementing a timestamp-based referencing system that reduces data duplication while eliminating the communication overhead. In shallow snapshot v2, as shown in the following diagram, instead of putting an explicit lock on the remote store checkpoint file of a shard, it puts an implicit lock based on the timestamp of the snapshot and of the checkpoint file. We track these snapshot timestamps in pinned timestamp files and upload them to the remote store. With this implicit lock, the checkpoints that match with timestamps in pinned timestamp files aren’t cleaned up from Amazon S3. With this architectural change, data nodes don’t need to send shard updates to the cluster manager, avoiding the subsequent cluster state updates. The snapshot restoration process works by reading a pinned timestamp file corresponding to your snapshot, which helps the data node locate and download the correct version of data from Amazon S3.

Key benefits

Let’s explore the major advantages of using shallow snapshot v2.

Performance improvements

The performance benefits of shallow snapshot v2 are substantial and multifaceted. By minimizing the amount of data that needs to be uploaded to the remote store and the number of cluster state updates that need to be communicated between nodes during snapshot creation, the system significantly reduces I/O and network operations. This reduction translates to faster snapshot creation times and lower system resource utilization during backup operations.

The evaluations shown in the following table were performed to assess the influence on snapshot operations when the domain experiences significant load.

Domain config Snapshot creation time
Number of nodes Number of shards Traditional Shallow snapshot v1 Shallow snapshot v2
10 100 15–20 minutes 1–2 minutes <1 second
10 10,000 30–40 minutes 5–10 minutes <5 seconds
100 100,000 >1 hour >1 hour <10 seconds

Scalability

With fixed number of inter-node communication calls during snapshot creation, the snapshot creation time is single digit seconds even as the node, index, and shard count grows. When tested on 1,000 nodes in an Amazon OpenSearch Service domain, shallow snapshot v2 creation time was observed between 10–20 seconds. For organizations managing large Amazon OpenSearch Service domains, shallow snapshot v2 offers particular advantages. The reduced storage cost from shallow snapshot and faster snapshot creation times from shallow snapshot v2 make it possible to maintain more frequent backups without overwhelming storage resources or impacting system performance.

Architectural simplification

The architectural improvements in Shallow Snapshot V2 go beyond performance optimization. The new implementation features a more streamlined and maintainable codebase, reducing the effort needed to debug issues and implement future enhancements. The simplified architecture reduces the complexity of the snapshot and restore process, leading to more reliable operations and fewer potential points of failure for use cases that require frequent backups, such as compliance-driven scenarios or development environments. This means that you can establish a lower recovery point objective for disaster recovery. Shallow snapshot v2’s efficient handling of incremental changes makes it possible to maintain more granular backup schedules without performance penalties.

Storage efficiency

The cornerstone of shallow snapshot v2 is its innovative approach to storage management. Instead of creating multiple copies of unchanged data, the system maintains smart references to existing data blocks. This implicit timestamp-based reference-counting mechanism avoids creating explicit locks per shard. In environments where storage resources are at a premium, the storage efficiency of shallow snapshot v2 can lead to significant cost savings. The reference-based approach helps ensure optimal use of available storage space while maintaining comprehensive backup coverage.

Looking ahead

The introduction of Shallow Snapshot V2 marks the beginning of our journey toward more efficient data backup solutions. Building upon the framework created by shallow snapshot v2, we can implement additional features such as point in time recovery (PITR), better cluster state integration, and various performance optimizations.

Conclusion

Shallow Snapshot V2 represents a significant advancement in OpenSearch’s backup capabilities. By combining storage efficiency, improved performance, and architectural simplification, it provides a robust solution for modern data backup challenges. If you’re using an instance type from the optimized instance family, shallow snapshot v2 is already enabled for you. Whether you’re using a large-scale domain or working within storage constraints, shallow snapshot v2 offers tangible benefits for your Amazon OpenSearch Service domains.


About the Authors

Sachin Kale is a senior software development engineer at AWS working on OpenSearch.

Bukhtawar Khan is a Principal Engineer working on Amazon OpenSearch Service. He is interested in building distributed and autonomous systems. He is a maintainer and an active contributor to OpenSearch.

Gaurav Bafna is a Senior Software Engineer working on OpenSearch at Amazon Web Services. He is fascinated about solving problems in distributed systems. He is a maintainer and an active contributor to OpenSearch.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments