Change Data Capture: The Critical Link for Airbnb, Netflix, and Uber

Change Data Capture: The Critical Link for Airbnb, Netflix, and Uber

Couldn’t make it to Transform 2022? Watch all the sessions from the summit in our on-demand library now! Look here.


The modern data stack (MDS) is critical for digital disruptors. Consider Netflix. The company pioneered a new business model around video as a service, but much of its success is based on streaming data in real time.

They are using analytics to send highly relevant recommendations to viewers. They are monitoring data in real time to maintain constant visibility into network performance. They are syncing their database of movies and shows with Elasticsearch to allow users to quickly and easily find what they are looking for.

This has to be in real time, and it has to be 100% accurate. Old-school extract, transform, and load (ETL) is simply too slow. To meet this need, Netflix created a change data capture (CDC) tool called DBLog that captures changes in MySQL, PostgreSQL, and other data sources, and then streams those changes to target data stores for lookup and retrieval. the analysis.

Netflix required high availability and real-time synchronization. They also needed to minimize the impact on operational databases. CDC extracts keys from database records, replicating changes to target databases in the order they occur, thus capturing changes as they happen, without locking records or bogging down the database originally.

Event

MetaBeat 2022

MetaBeat will bring together thought leaders to provide guidance on how metaverse technology will transform the way all industries communicate and do business on October 4 in San Francisco, CA.

register here

Data is central to what Netflix does, but they’re not alone in that regard. Companies like Uber, Amazon, Airbnb, and Meta are thriving because they really understand how to make data work for them. Data management and data analysis are strategic pillars for these organizations, and CDC technology plays a central role in their ability to carry out their core missions.

The same can be said for just about any company operating at the top of its game in today’s business environment. If you want your business to operate like an A player, you need to modernize and master your data. Your competitors are definitely already doing it.

Integration in seconds is the new standard at Airbnb and Uber

In today’s world, a strong customer experience demands real-time data flows. Airbnb recognized the value of CDC technology in creating great CX for its customers and hosts. They also built their own CDC platform, which they call SpinalTap. Airbnb’s dynamic pricing, listing availability, and booking status require impeccable accuracy and consistency across systems. When an Airbnb client books a visit, they expect workflows to be very fast and 100% accurate.

For Uber, immediacy is arguably even more important. Whether a customer is waiting for a ride to the airport or ordering a food delivery, time is of the essence. Like Netflix and Airbnb, they developed their own CDC platform to sync data across multiple data stores in real time. Once again, a common set of requirements emerged. Uber needed its solution to be extremely fast and fault tolerant, with no data loss. They also needed a solution that would not slow down the performance of their source databases.

Change data capture for the rest of us

Once again, the CDC meets the requirements. In the old days, ETL in batch mode overnight might have been adequate for providing a daily executive update or operational reports. Today, real time is more and more the norm. If information is power, then immediate access to information is turbo power.

That’s why CDC is quickly becoming a critical requirement for the modern data stack. It’s all very well, though, that big companies like Netflix, Airbnb, and Uber have the resources to create custom CDC platforms, but what about everyone else?

Out-of-the-box CDC solutions are filling that gap, providing the same high-quality, low-latency streaming pipelines without the need to build from scratch.

Unfortunately, they are not all the same. Most companies operate a collection of systems that handle enterprise resource planning (ERP), customer relationship management (CRM), or specialized operational functions such as purchasing or human resources. These run on different database platforms, with inconsistent data models. If a business operates mainframe systems, it’s likely dealing with arcane data structures that don’t fit easily with modern relational data.

This makes heterogeneous integration especially important. It requires connecting to multiple data sources and destinations, including transactional databases such as SAP, Oracle, IBM Db2, and Salesforce. It means delivering real-time streaming data to platforms like Databricks, Kafka, Snowflake, Amazon DocumentDB, and Azure Synapse Analytics.

Real-time CDC automation

To drive artificial intelligence (AI) and advanced analytics, companies need to feed their data into a common MDS platform. That means ingesting information from a variety of sources, transforming it to fit a unified analytics model, and delivering it to a modern cloud-based data platform.

Change data capture technology serves as a critical link in the data-driven value chain, first by automating the ingestion of data from source systems, then transforming it on the fly and delivering it to a cloud data platform. Real-time CDC automation ensures the right information gets to the right place right away.

Because they focus only on the data that has changed, streaming CDC pipelines offer huge efficiency advantages over the batch-mode operations of the past. The best CDC solutions can deliver more than 100 terabytes of data from source to destination in less than 30 minutes, with no data loss.

The shift to cloud computing is underway. Cloud analytics, in particular, offers clear advantages for companies that truly understand the transformative role of data. Leading companies across industries are aligning their strategic visions around data analytics. They are digitizing their interactions with customers and using algorithms to study data, extract insights and take action. AI and machine learning are incorporating vast amounts of information, discovering correlations and identifying anomalies.

Whether you’re leading the way in digital disruption or just trying to keep up with the rest, CDC technology will play a critical role in bringing the modern data stack to life and opening the door to digital transformation.

Gary Hagmueller is CEO of Arcion.

Data decision makers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including data techies, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data technology, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read more about DataDecisionMakers

Leave a Reply

Your email address will not be published.