Windward (LSE:WNWD), is the leading Maritime AI™ company, providing an all-in-one platform for risk management and maritime domain awareness needs to accelerate global trade. Windward monitors and analyzes what 500k+ vessels around the world are doing every day including where they go, what cargo is stored, how they handle inclement weather and what ports they frequent. With 90% of trade being transported via sea, this data is crucial to keeping the global supply chain on track but can be difficult to disentangle and take action on. Windward fills this niche by providing actionable intelligence with real-time ETA tracking, carrier performance insights, risk monitoring and mitigation and more.
In 2022, Windward embarked on several changes to its application prompting a reconsideration of its underlying data stack. For one, the company decided to invest in an API Insights Lab where customers and partners across suppliers, carriers, governments and insurance companies could use maritime data as part of their internal systems and workflows. This enabled each of the players to use the maritime data in distinct ways with insurance companies determining price and assessing risk and governments monitoring illegal activities. As a result, Windward wanted an underlying data stack that took an API first approach.
Windward expanded their AI insights to include risks related to illegal, unregulated and unreported (IUU) fishing as well as to identify shadow fleets that obscure the transport of sanctioned Russian oil/wet cargo. To support this, Windward’s data platform needed to enable rapid iteration so they could quickly innovate and build more AI capabilities.
Lastly, Windward wanted to move their entire platform from batch-based data infrastructure to streaming. This transition can support new use cases that require a faster way to analyze events that was not needed until now.
In this blog, we’ll describe the new data platform for Windward and how it is API first, enables rapid product iteration and is architected for real-time, streaming data.
Data Challenges
Windward tracks vessel positions generated by AIS transmissions in the ocean. Over 100M AIS transmissions get added every day to track a vessel’s location at any given point of time. If a vessel makes a turn, Windward can use a minimal number of AIS transmissions to chart its path. This data can also be used to figure out the speed, ports visited and other variables that are part of the journey. Now, this AIS transmission data is a bit flaky, making it challenging to associate a transmission with the right vessel. As a result, about 30% of all data ends up triggering data changes and deletions.
In addition to the AIS transmissions data, there are other data sources for enrichment including weather, nautical charts, ownership and more. This enrichment data has changing schemas and new data providers are constantly being added to enhance the insights, making it challenging for Windward to support using relational databases with strict schemas.
Using real-time and historical data, Windward runs behavioral analysis to examine maritime activities, economic performance and deceptive shipping practices. They also create AI models that are used to determine environmental risk, sanctions compliance risk, operational risk and more. All of these assessments go back to the AI insights initiative that led Windward to re-examine its data stack.
As Windward operated in a batch-based data stack, they stored raw data in S3. They used MongoDB as their metadata store to capture vessel and company data. The vessel positions data which in nature is a time series geospatial data set, was stored in both PostgreSQL and Cassandra to be able to support different use cases. Windward also used specialized databases like Elasticsearch for specific functionality like text search. When Windward took inventory of their data architecture, they had 5 different databases making it challenging to support new use cases, achieve performant contextual queries and scale the database systems.
Furthermore, as Windward introduced new use cases they started to hit limitations with their data stack. In the words of Benny Keinan, Vice President of R&D at Windward, “We were stuck on feature development and working too hard on features that should have been easy to build. The data stack and model that we started Windward with twelve years ago was not ideal for the search and analytical features needed to digitally and intelligently transform the maritime industry.”
Benny and team decided to embark on a new data stack that could better support the logistics tracking needs of their customers and the maritime industry. They started by considering new product requests from prospects and customers that would be hard to support in the current stack, limiting the opportunity to generate significant new revenue. These included:
- Geo queries: Customers wanted to generate personalized polygons to monitor particular maritime areas of interest. Their goal was to have the capability to perform searches on past data for recently defined polygons and obtain results within seconds.
- Vessel search: Customers wanted to search for a specific vessel and see all of the contextual information including AIS transmissions, ownership and activities and relations between activities (for example, sequence of activities). Search and join queries were hard to support in a timely manner in the application experience.
- Partial and fuzzy word search: The customer might only have the partial vessel name and so the database needs to support partial word searches.
Windward realized that the database should support both search and analytics on streaming data to meet their current and future product development needs.
Requirements for Next-Generation Database
The number of databases under management and the challenges supporting new use case requirements prompted Windward to consolidate their data stack. Taking a use case centric approach, Windward was able to identify the following requirements:
After coming up with the requirements, Windward evaluated more than 10 different databases, out of which only Rockset and Snowflake were capable of supporting the main use cases for search and analytics in their application.
Rockset was short-listed for the evaluation as it’s designed for fast search and analytics on streaming data and takes an API first approach. Furthermore, Rockset supports in-place updates making it efficient to process changes to AIS transmissions and their associated vessels. With support for SQL on deeply nested semi-structured data, Windward saw the potential to consolidate geo data and time series data into one system and query using SQL. As one of the limitations of the existing systems was their inability to perform fast searches, Windward liked Rockset’s Converged Index which indexes the data in a search index, columnar store and row store to support a wide range of query patterns out-of-the-box.
Snowflake was evaluated for its columnar store and ability to support large-scale aggregations and joins on historical data. Both Snowflake and Rockset are cloud-native and fully-managed, minimizing infrastructure operations on the Windward engineering team so that they can focus on building new AI insights and capabilities into their maritime application.
Performance Evaluation of Rockset and Snowflake
Windward evaluated the query performance of the systems on a suite of 6 typical queries including search, geosearch, fuzzy matching and large-scale aggregations on ~2B records dataset size.
The performance of Rockset was evaluated on an XL Virtual Instance, an allocation of 32 vCPU and 256 GB RAM, that is $7.3496/hr in the AWS US-West region. The performance of Snowflake was evaluated on a Large virtual data warehouse that is $16/hr in AWS US-West.
The performance tests show that Rockset is able to achieve faster query performance at less than half the price of Snowflake. Rockset saw up to a 30.91x price-performance advantage over Snowflake for Windward’s use case. The query speed gains over Snowflake are due to Rockset’s Converged Indexing technology where a number of indexes are leveraged in parallel to achieve fast performance on large-scale data.
This performance testing made Windward confident that Rockset could meet the seconds query latency desired of the application while staying within budget today and into the future.
Iterating in an Ocean of Data
With Rockset, Windward is able to support the rapidly shifting needs of the maritime ecosystem, giving its customers the visibility and AI insights to respond and stay compliant.
Analytic capabilities that used to take down Windward’s PostgreSQL database or, at a minimum take 40 minutes to load, are now provided to customers within seconds. Furthermore, Windward is consolidating three databases into Rockset to simplify operations and make it easier to support new product requirements. This gives Windward’s engineering team time back to develop new AI insights.
Benny Keinan describes how product development shifted with Rockset, “We are able to offer new capabilities to our customers that were not possible before Rockset. As a result, maritime leaders leverage AI insights to navigate their supply chains through the Coronavirus pandemic, War in the Ukraine, decarbonization initiatives and more. Rockset has helped us address the changing needs of the maritime industry, all in real time.”
You can learn more about the foundational pieces and principles of Windward’s AI on their blog- A Look into the “Engine Room” of Windward’s AI.