Cloudera, a hybrid platform for data, analytics, and AI, has announced an integration with Snowflake, an AI-powered cloud data platform, aimed at providing enterprises with an open and unified hybrid data lakehouse.
At the heart of this new lakehouse is Iceberg REST Catalog which leverages Apache Iceberg, an open table format designed for large-scale data management, to facilitate easier and more efficient data management across different data engines and compute environments.
The collaboration allows joint users to combine Cloudera’s data management capabilities with Snowflake’s cloud architecture, potentially improving data agility and facilitating deeper insights across organizations.
Cloudera shared the findings of a 2022 study that revealed that 80% of companies surveyed report revenue increase due to real-time data analytics, while 98% noted improved customer satisfaction as a result of leveraging data. However, Cloudera emphasizes that to fully harness the potential of data, enterprises require a single, unified platform for storing, managing, and governing all their data.
With the new Cloudera and Snowflake integration, organizations can combine structured and unstructured data into a unified data lakehouse, removing the complexities associated with transferring data between different systems.
Snowflake users can now directly access data stored in Cloudera’s Ozone, an on-premises object storage solution compatible with AWS S3. This integration enables customers to utilize various deployment options, including on-premises, platform-as-a-service (PaaS), and software-as-a-service (SaaS) solutions, enhancing their data management capabilities.
“By extending our open data lakehouse capabilities through Apache Iceberg to Snowflake, we’re enabling our customers to not only optimize their data workflows but also unlock new opportunities for innovation, efficiency, and growth,” said Abhas Ricky, Chief Strategy Officer of Cloudera.
“This will help customers simplify their data architecture, minimize data pipelines, and reduce total cost of ownership of their data estate while reducing security risks. Together, Snowflake and Cloudera are bringing about the next era of data-driven decision-making for every modern organization.”
As Apache Iceberg removes data from proprietary constraints, organizations can access their data more uniformly across different platforms, simplifying the management process and enabling more comprehensive analysis of their data assets.
A key aspect of the collaboration is that Cloudera users can access data in Cloudera’s Open Data Lakehouse through Snowflake’s Business Intelligence engine without needing data transfer or duplication. This setup simplifies data access while preserving integrity. The integration also aims to reduce the total cost of ownership for businesses using the combined stack by eliminating data and metadata silos and rationalizing data pipelines.
The collaboration features Managed Iceberg Tables, which are intended to enhance data performance and reliability through improved organization and faster query execution. New “Best-of-Breed Engines” have also been introduced to support both AI and business intelligence workloads.
Cloudera reports that clients utilizing this integration have achieved more efficient resource usage and reduced maintenance burdens. Additionally, clients have leveraged this integration to apply various use cases such as AI training, reporting, and analytics to a single dataset, allowing them to derive more insights and value from their data.
“Through this collaboration, customers gain access to a unified, robust data management platform that provides a single source of truth for all of their data, whether in the cloud or on-premises,” said Sanjeev Mohan, analyst at SanjMo.
“This enables them to streamline and secure their data operations while efficiently analyzing and extracting insights across the entire data lifecycle – from ingestion to AI and analytics. It’s a strategic move from two industry giants to partner in a way that will deliver immediate value to businesses.”
Along with the integration, Cloudera announced a technical preview of Lakehouse Optimizer, designed to autonomously optimize Iceberg tables. The goal is to reduce the total cost of ownership (TCO), decrease data management efforts, and improve the performance of the Lakehouse.
Related Items
The AI Data Cycle: Understanding the Optimal Storage Mix for AI Workloads at Scale
Snorkel AI Expands Platform with New Tools for Data-Centric AI
GenAI a Top Driver of Data Modernization in the Cloud, Hakkoda Says