Cloud Archives - Kai Waehner

Virta’s Electric Vehicle (EV) Charging Platform with Real-Time Data Streaming: Scalability for Large Charging Businesses

Kai Waehner — Tue, 22 Apr 2025 11:53:00 +0000

The Electric Vehicle (EV) revolution is here, but scaling charging infrastructure and integration with the energy system presents challenges— rapid power supply and demand fluctuations, billing complexity, and real-time availability updates. Virta, a global leader in smart EV charging, is leveraging real-time data streaming to optimize operations, improve user experience, and drive sustainability. By integrating Apache Kafka and Confluent Cloud, Virta ensures seamless energy distribution, predictive maintenance, and dynamic pricing for a smarter, greener future. Read how data streaming is transforming EV charging and enabling scalable, intelligent infrastructure.

I spoke with Jussi Ahtikari (Chief AI Officer at Virta) at a HotTopics C-Suite Exchange about Virta business model around EV charging networks and how they leverage data streaming. The following is a summary of this excellent success story about an innovative EV charging platform.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including several success stories around Kafka and Flink to improve ESG.

The Evolution and Challenges of Electric Vehicle (EV) Charging

The global shift towards electric vehicles (EVs) is accelerating, driven by the surge in variable renewable energy (wind, solar) production, need for sustainable and more cost-efficient transportation solutions, government incentives, and rapid advancements in battery technology. EV charging infrastructure plays a critical role in making this transition successful. It ensures that drivers have access to reliable and efficient charging options while keeping the costs of energy and charging operations in check and energy system in balance.

The innovation in EV charging goes beyond simply providing power to vehicles. Intelligent charging networks, dynamic pricing models, and energy management solutions are transforming the industry. Sustainability is also a key factor, as efficient energy consumption and integration with renewable energy system contribute to environmental, social, and governance (ESG) goals.

While the user and charged energy volumes grow, the real time interplay with the energy system, demand fluctuations, complex billing systems, and real-time station availability updates require a scalable and resilient data infrastructure. Delays in processing real-time data can lead to inefficient energy distribution, poor user experience, and lost revenue.

Virta: Innovating the Future of EV Charging

Virta is a digital cloud platform for electric vehicle (EV) charging businesses and a global leader in connecting of smart charging infrastructure and EV battery capacity with the renewable energy system via bi-directional charging (V2G) and demand response (V1G).

The digital Virta EV Energy platform provides a comprehensive suite of solutions for charging businesses to launch and manage their own EV charging networks. Virta full-service charging platform enables Charging Network and Business Management, Transactions, Pricing, Payments and Invoicing, EV Driver and Fleet Services, Roaming, Energy Management, and Virtual Power Plant services.

Its Charge Point Management System (CPMS) supports over 450 charger models, allowing seamless integration with third-party infrastructure. Virta is the only provider combining CPMS with energy flexibility platform.

Source: Virta

Virta Platform Connecting 100,000+ Charging Stations Serving Millions of EV Drivers

The Virta platform is utilised by professional charge point operators (CPOs) and e-mobility service providers (EMPs) across energy, petrol, retail, automotive and real estate industries in 36 countries in Europe and South-East Asia. Virta is headquartered in Helsinki, Finland.

Virta manages real-time data from well over 100,000 EV charging stations, serving millions of EV drivers, and processes approximately 40 GB of real-time data every hour. Including roaming partnerships, the platform offers EV drivers access to in total over 620,000 public charging stations in over 60 countries.

With this scale, real-time responsiveness is critical. Each time a charging station sends a signal—for example, when a driver starts charging—the platform must immediately trigger a series of actions:

Start billing
Update real-time status in mobile apps
Notify roaming networks
Update metrics and statistics
Conduct fraud checks

At the early days of electric mobility all of these operations could be handled in a monolithic system using tightly coupled and synchronized code. According to Jussi Ahtikari, Chief AI Officer at Virta, this would have made the system “complex, difficult to maintain, and hard to scale” as data volumes grew. Therefore the team identified early a need for a more modular, scalable, and real-time architecture to support its rapid growth and evolving service portfolio.

Innovative Industry Partnerships: Virta and Valeo

Virta is also exploring new opportunities in the EV ecosystem through its partnership with Valeo, a leader in automotive and energy solutions. The companies are working on integrating Valeo’s Ineez charging technology with Virta’s CPMS platform to enhance fleet charging, leasing services, and vehicle-to-grid (V2G) capabilities.

Vehicle-to-grid technology enables EVs to act as distributed energy storage, feeding excess power back into the grid during peak demand. This innovation is expected to play a critical role in balancing electricity supply and demand, contributing to cheaper electricity and more stable renewables based energy system.

The Role of Data Streaming in ESG and EV Charging

Sustainability and environmental responsibility are key drivers of ESG initiatives in industries such as energy, transportation, and manufacturing. Data streaming plays a crucial role in achieving ESG goals by enabling real-time monitoring, predictive maintenance, and energy efficiency improvements.

In the EV charging industry, real-time data streaming supports:

Grid load balancing: Preventing energy spikes and ensuring optimal distribution of power. Example: Tesla’s Energy Platform.
Dynamic pricing: Adjusting charging costs based on demand and electricity availability. Example: Dynamic pricing for road tolling.
Fraud prevention: Detecting unauthorized access or energy theft. Example: Fraud prevention in under 60 seconds.
Predictive maintenance: Identifying potential failures before they occur, reducing downtime. Example: Condition monitoring in manufacturing.
User experience improvements: Providing real-time station availability updates and billing transparency. Example: Improved user experience in mobile apps.

Foreseeing the growing need for these real-time insights led Virta to adopt a data streaming approach with Confluent.

Virta’s Data Streaming Transformation

To maintain its rapid growth and provide an exceptional customer experience, Virta needed a scalable, real-time data streaming solution. The company turned to Confluent’s data streaming platform (DSP), powered by Apache Kafka, to process millions of messages per hour and ensure seamless operations.

Scaling Challenges and the Need for Real-Time Processing

Virta’s rapid growth to scale of millions of charging events and tens of gigawatt hours of charged energy on a monthly basis in Europe and South-East Asia resulted in massive volumes of data that needed to be processed instantly. Something legacy systems, based on sequential authorization, would have struggled with.

Without real-time updates, large scale charging operations would face issues such as:

Unclear station availability
Slow transaction processing
Inaccurate billing information

Initially, Virta worked with open-source Apache Kafka but found managing high-volume data streams at scale to be increasingly resource-intensive. Therefore the team sought an enterprise-grade solution that would remove operational complexities while providing robust real-time capabilities.

Deploying A Data Streaming Platform for Scalable EV Charging

Confluent has become the backbone of Virta’s real-time data architecture. With Confluent’s event streaming platform, Virta is able to maintain a modern event-driven microservices architecture. Instead of tightly coupling all business logic into one system, each charging event—such as a driver starting a session—is published as a single, centralized event. Independent microservices subscribe to that event to trigger specific actions like billing, mobile app updates, roaming notifications, fraud detection, and more.

Here is a diagram of Virta’s cloud-Native microservices architecture powered by AWS, Confluent Cloud, Snowflake, Redis, OpenSearch, and other technologies:

Source: Virta

This architectural shift with an event-driven architecture and the data streaming platform as central nervous system has significantly improved scalability, maintainability, and fault isolation. It has also accelerated innovation with fast roll-out times of new services, including audit trails, improved data governance through schemas, and the foundation for AI-powered capabilities—all built on clean, real-time data streams.

Key Benefits of a SaaS Data Streaming Platform for Virta

As a fully managed data streaming platform, Confluent Cloud has eliminated the need for Virta to maintain Kafka clusters manually, allowing its engineering teams to focus on innovation rather than infrastructure management:

Elastic scalability: Automatically scales up to handle peak loads, ensuring uninterrupted service.
Real-time processing: Supports 45 million messages per hour, enabling immediate updates on charging status and availability.
Simplified development: Tools such as Schema Registry and pre-built APIs provide a standardized approach for developers, speeding up feature deployment.

Data Streaming Landscape: Spoilt for Choice – Open Source Kafka, Confluent, and many other Vendors

To navigate the evolving data streaming landscape, Virta chose a cloud-native, enterprise-grade platform that balances reliability, scalability, cost-efficiency, and ease of use. While many streaming technologies exist, Confluent offered the right trade-offs between operational simplicity and real-time performance at scale.

Read more about the different data streaming frameworks, platforms and cloud services in the data streaming landscape overview:

Business Impact of a Data Streaming Platform

By leveraging Confluent Cloud as its cloud-native and serverless data streaming platform, Virta has realized significant business benefits:

1. Faster Time to Market

Virta’s teams can now deploy new app features, charge points, and business services more quickly. The company has regained the agility of a startup, rolling out improvements without infrastructure bottlenecks.

2. Instant Updates for Customers and Operators

With real-time data streaming, Virta can update station availability and configuration changes in less than a second. This ensures that customers always have the latest information at their fingertips.

3. Cost Savings through Usage-Based Pricing

Virta’s shift to a usage-based pricing model has optimized its operational expenses. Instead of maintaining excess capacity, the company only pays for the resources it consumes.

4. Future-Ready Infrastructure for Advanced Analytics

Virta is building the future of real-time analytics, predictive maintenance, and smart billing by integrating Confluent with Snowflake’s AI-powered data cloud.

By decoupling data streams with Kafka, Virta ensures data consistency, scalability, and agility—enabling advanced analytics without operational bottlenecks.

Beyond EV Charging: Broader Energy and ESG Use Cases

Virta’s success with real-time data streaming highlights broader applications across the energy and ESG sectors. Similar data-driven solutions are being deployed for:

Smart grids: Real-time monitoring of electricity distribution to optimize supply and demand.
Renewable energy integration: Managing wind and solar power fluctuations with predictive analytics.
Industrial sustainability: Tracking carbon emissions and optimizing resource utilization.

The Future of EV Charging with Real-Time Data Streaming using Kafka and Flink

The transition to electric mobility requires more than just an increase in charging stations. The ability to process and act on data in real time is critical to optimizing the use and costs of energy and infrastructure, enhancing user experience, and driving sustainability.

Virta’s usage of a serverless data streaming platform demonstrates the power of real-time data streaming in enabling scalable, efficient, and future-ready EV charging solutions. By eliminating infrastructure constraints, improving responsiveness, and reducing operational costs, Virta is setting new industry standards for innovation in mobility and energy management.

The EV charging landscape will tenfold within the next ten years, and especially with the mass adoption of bi-directional charging (V2G), integrate seamlessly with the energy system. Real-time data streaming will serve as the cornerstone for this evolution, helping businesses navigate challenges while unlocking new opportunities for sustainability and profitability.

For more data streaming success stories and use cases, make sure to download my free ebook. Please let me know your thoughts, feedback and use cases on LinkedIn and stay in touch via my newsletter.

The post Virta’s Electric Vehicle (EV) Charging Platform with Real-Time Data Streaming: Scalability for Large Charging Businesses appeared first on Kai Waehner.

The Importance of Focus: Why Software Vendors Should Specialize Instead of Doing Everything (Example: Data Streaming)

Kai Waehner — Mon, 07 Apr 2025 03:31:55 +0000

As technology landscapes evolve, software vendors must decide whether to specialize in a core area or offer a broad suite of services. Some companies take a highly focused approach, investing deeply in a specific technology, while others attempt to cover multiple use cases by integrating various tools and frameworks. Both strategies have trade-offs, but history has shown that specialization leads to deeper innovation, better performance, and stronger customer trust. This blog explores why focus matters in the context of data streaming software, the challenges of trying to do everything, and how companies that prioritize one thing—data streaming—can build best-in-class solutions that work everywhere.

Specialization vs. Generalization: Why Data Streaming Requires a Focused Approach

Data streaming enables real-time processing of continuous data flows, allowing businesses to act instantly rather than relying on batch updates. This shift from traditional databases and APIs to event-driven architectures has become essential for modern IT landscapes.

Data streaming is no longer just a technique—it is a new software category. The 2023 Forrester Wave for Streaming Data Platforms confirms its role as a core component of scalable, real-time architectures. Technologies like Apache Kafka and Apache Flink have become industry standards. They power cloud, hybrid, and on-premise environments for real-time data movement and analytics.

Businesses increasingly adopt streaming-first architectures, focusing on:

Hybrid and multi-cloud streaming for real-time edge-to-cloud integration
AI-driven analytics powered by continuous optimization and inference using machine learning models
Streaming data contracts to ensure governance and reliability across the entire data pipeline
Converging operational and analytical workloads to replace inefficient batch processing and Lambda architecture with multiple data pipelines

The Data Streaming Landscape

As data streaming becomes a core part of modern IT, businesses must choose the right approach: adopt a purpose-built data streaming platform or piece together multiple tools with limitations. Event-driven architectures demand scalability, low latency, cost efficiency, and strict SLAs to ensure real-time data processing meets business needs.

Some solutions may be “good enough” for specific use cases, but they often lack the performance, reliability, and flexibility required for large-scale, mission-critical applications.

The Data Streaming Landscape highlights the differences—while some vendors provide basic capabilities, others offer a complete Data Streaming Platform (DSP)designed to handle complex, high-throughput workloads with enterprise-grade security, governance, and real-time analytics. Choosing the right platform is essential for staying competitive in an increasingly data-driven world.

The Challenge of Doing Everything

Many software vendors and cloud providers attempt to build a comprehensive technology stack, covering everything from data lakes and AI to real-time data streaming. While this offers customers flexibility, it often leads to overlapping services, inconsistent long-term investment, and complexity in adoption.

A few examples (from the perspective of data streaming solutions).

Amazon AWS: Multiple Data Streaming Services, Multiple Choices

AWS has built the most extensive cloud ecosystem, offering services for nearly every aspect of modern IT, including data lakes, AI, analytics, and real-time data streaming. While this breadth provides flexibility, it also leads to overlapping services, evolving strategies, and complexity in decision-making for customers, resulting in frequent solution ambiguity.

Amazon provides several options for real-time data streaming and event processing, each with different capabilities:

Amazon SQS (Simple Queue Service): One of AWS’s oldest and most widely adopted messaging services. It’s reliable for basic decoupling and asynchronous workloads, but it lacks native support for real-time stream processing, ordering, replayability, and event-time semantics.
Amazon Kinesis Data Streams: A managed service for real-time data ingestion and simple event processing, but lacks the full event streaming capabilities of a complete data streaming platform.
Amazon MSK (Managed Streaming for Apache Kafka): A partially managed Kafka service that mainly focuses on Kafka infrastructure management. It leaves customers to handle critical operational support (MSK does NOT provide SLAs or support for Kafka itself) and misses capabilities such as stream processing, schema management, and governance.
AWS Glue Streaming ETL: A stream processing service built for data transformations but not designed for high-throughput, real-time event streaming.
Amazon Flink (formerly Kinesis Data Analytics): AWS’s attempt to offer a fully managed Apache Flink service for real-time event processing, competing directly with open-source Flink offerings.

Each of these services targets different real-time use cases, but they lack a unified, end-to-end data streaming platform. Customers must decide which combination of AWS services to use, increasing integration complexity, operational overhead, and costs.

Strategy Shift and Rebranding with Multiple Product Portfolios

AWS has introduced, rebranded, and developed its real-time streaming services over time:

Kinesis Data Analytics was originally AWS’s solution for stream processing but was later rebranded as Amazon Flink, acknowledging Flink’s dominance in modern stream processing.
MSK Serverless was introduced to simplify Kafka adoption but also introduces various additional product limitations and cost challenges.
AWS Glue Streaming ETL overlaps with Flink’s capabilities, adding confusion about the best choice for real-time data transformations.

As AWS expands its cloud-native services, customers must navigate a complex mix of technologies—often requiring third-party solutions to fill gaps—while assessing whether AWS’s flexible but fragmented approach meets their real-time data streaming needs or if a specialized, fully integrated platform is a better fit.

Google Cloud: Multiple Approaches to Streaming Analytics

Google Cloud is known for its powerful analytics and AI/ML tools, but its strategy in real-time stream processing has been inconsistent:

Customers looking for stream processing in Google Cloud now have three competing services:

Google Managed Service for Apache Kafka (Google MSK) (a managed Kafka offering). Google MSK is very early stage in the maturity curve and has many limitations.
Google Dataflow (built on Apache Beam)
Google Pub/Sub (event messaging)
Apache Flink on Dataproc (a managed service)

While each of these services has its use cases, they introduce complexity for customers who must decide which option is best for their workloads.

BigQuery Flink was introduced to extend Google’s analytics capabilities into real-time processing but was later discontinued before exiting the preview.

Microsoft Azure: Shifting Strategies in Data Streaming

Microsoft Azure has taken multiple approaches to real-time data streaming and analytics, with an evolving strategy that integrates various tools and services.

Azure Event Hubs has been a core event streaming service within Azure, designed for high-throughput data ingestion. It supports the Apache Kafka protocol (through Kafka version 3.0, so its feature set lags considerably), making it a flexible choice for (some) real-time workloads. However, it primarily focuses on event ingestion rather than event storage, data processing and integration–additional capabilities of a complete data streaming platform.
Azure Stream Analytics was introduced as a serverless stream processing solution, allowing customers to analyze data in motion. Despite its capabilities, its adoption has remained limited, particularly as enterprises seek more scalable, open-source alternatives like Apache Flink.
Microsoft Fabric is now positioned as an all-in-one data platform, integrating business intelligence, data engineering, real-time streaming, and AI. While this brings together multiple analytics tools, it also shifts the focus away from dedicated, specialized solutions like Stream Analytics.

While Microsoft Fabric aims to simplify enterprise data infrastructure, its broad scope means that customers must adapt to yet another new platform rather than continuing to rely on long-standing, specialized services. The combination of Azure Event Hubs, Stream Analytics, and Fabric presents multiple options for stream processing, but also introduces complexity, limitations and increased cost for a combined solution.

Microsoft’s approach highlights the challenge of balancing broad platform integration with long-term stability in real-time streaming technologies. Organizations using Azure must evaluate whether their streaming workloads require deep, specialized solutions or can fit within a broader, integrated analytics ecosystem.

I wrote an entire blog series to demystify what Microsoft Fabric really is.

Instaclustr: Too Many Technologies, Not Enough Depth

Instaclustr has positioned itself as a managed platform provider for a wide array of open-source technologies, including Apache Cassandra, Apache Kafka, Apache Spark, Apache ZooKeeper, OpenSearch, PostgreSQL, Redis, and more. While this broad portfolio offers customers choices, it reflects a horizontal expansion strategy that lacks deep specialization in any one domain.

For organizations seeking help with real-time data streaming, Instaclustr’s Kafka offering may appear to be a viable managed service. However, unlike purpose-built data streaming platforms, Instaclustr’s Kafka solution is just one of many services, with limited investment in stream processing, schema governance, or advanced event-driven architectures.

Because Instaclustr splits its engineering and support resources across so many technologies, customers often face challenges in:

Getting deep technical expertise for Kafka-specific issues
Relying on long-term roadmaps and support for evolving Kafka features
Building integrated event streaming pipelines that require more than basic Kafka infrastructure

This generalist model may be appealing for companies looking for low-cost, basic managed services—but it falls short when mission-critical workloads demand real-time reliability, zero data loss, SLAs, and advanced stream processing capabilities. Without a singular focus, platforms like Instaclustr risk becoming jacks-of-all-trades but masters of none—especially in the demanding world of real-time data streaming.

Cloudera: A Broad Portfolio Without a Clear Focus

Cloudera has adopted a distinct strategy by incorporating various open-source frameworks into its platform, including:

Apache Kafka (event streaming)
Apache Flink (stream processing)
Apache Iceberg (data lake table format)
Apache Hadoop (big data storage and batch processing)
Apache Hive (SQL querying)
Apache Spark (batch and near real-time processing and analytics)
Apache NiFi (data flow management)
Apache HBase (NoSQL database)
Apache Impala (real-time SQL engine)
Apache Pulsar (event streaming, via a partnership with StreamNative)

While this provides flexibility, it also introduces significant complexity:

Customers must determine which tools to use for specific workloads.
Integration between different components is not always seamless.
The broad scope makes it difficult to maintain deep expertise in each area.

Rather than focusing on one core area, Cloudera’s strategy appears to be adding whatever is trending in open source, which can create challenges in long-term support and roadmap clarity.

Splunk: Repeated Attempts at Data Streaming

Splunk, known for log analytics, has tried multiple times to enter the data streaming market:

Initially, Splunk built a proprietary streaming solution that never gained widespread adoption.

Later, Splunk acquired Streamlio to leverage Apache Pulsar as its streaming backbone.This Pulsar-based strategy ultimately failed, leading to a lack of a clear real-time streaming offering.

Splunk’s challenges highlight a key lesson: successful data streaming requires long-term investment and specialization, not just acquisitions or technology integrations.

Why a Focused Approach Works Better for Data Streaming

Some vendors take a more specialized approach, focusing on one core capability and doing it better than anyone else. For data streaming, Confluent became the leader in this space by focusing on improving the vision of a complete data streaming platform.

Confluent: Focused on Data Streaming, Built for Everywhere

At Confluent, the focus is clear: real-time data streaming. Unlike many other vendors and the cloud providers that offer fragmented or overlapping services, Confluent specializes in one thing and ensures it works everywhere:

Cloud: Deploy across AWS, Azure, and Google Cloud with deep native integrations.
On-Premise: Enterprise-grade deployments with full control over infrastructure.
Edge Computing: Real-time streaming at the edge for IoT, manufacturing, and remote environments.
Hybrid Cloud: Seamless data streaming across edge, on-prem, and cloud environments.
Multi-Region: Built-in disaster recovery and globally distributed architectures.

More Than Just “The Kafka Company”

While Confluent is often recognized as “the Kafka company,” it has grown far beyond that. Today, Confluent is a complete data streaming platform, combining Apache Kafka for event streaming, Apache Flink for stream processing, and many additional components for data integration, governance and security to power critical workloads.

However, Confluent remains laser-focused on data streaming—it does NOT compete with BI, AI model training, search platforms, or databases. Instead, it integrates and partners with best-in-class solutions in these domains to ensure businesses can seamlessly connect, process, and analyze real-time data within their broader IT ecosystem.

The Right Data Streaming Platform for Every Use Case

Confluent is not just one product—it matches the specific needs, SLAs, and cost considerations of different streaming workloads:

Fully Managed Cloud (SaaS)
- Dedicated and multi-tenant Enterprise Clusters: Low latency, strict SLAs for mission-critical workloads.
- Freight Clusters: Optimized for high-volume, relaxed latency requirements.
Bring Your Own Cloud (BYOC)
- WarpStream: Bring Your Own Cloud for flexibility and cost efficiency.
Self-Managed
- Confluent Platform: Deploy anywhere—customer cloud VPC, on-premise, at the edge, or across multi-region environments.

Confluent is built for organizations that require more than just “some” data streaming—it is for businesses that need a scalable, reliable, and deeply integrated event-driven architecture. Whether operating in a cloud, hybrid, or on-premise environment, Confluent ensures real-time data can be moved, processed, and analyzed seamlessly across the enterprise.

By focusing only on data streaming, Confluent ensures seamless integration with best-in-class solutions across both operational and analytical workloads. Instead of competing across multiple domains, Confluent partners with industry leaders to provide a best-of-breed architecture that avoids the trade-offs of an all-in-one compromise.

Deep Integrations Across Key Ecosystems

A purpose-built data streaming platform plays well with cloud providers and other data platforms. A few examples:

Cloud Providers (AWS, Azure, Google Cloud): While all major cloud providers offer some data streaming capabilities, Confluent takes a different approach by deeply integrating into their ecosystems. Confluent’s managed services can be:
- Consumed via cloud credits through the cloud provider marketplace
- Integrated natively into cloud provider’s security and networking services
- Fully-managed out-of-the-box connectivity to cloud provider services like object storage, lakehouses, and databases
MongoDB: A leader in NoSQL and operational workloads, MongoDB integrates with Confluent via Kafka-based change data capture (CDC), enabling real-time event streaming between transactional databases and event-driven applications.
Databricks: A powerhouse in AI and analytics, Databricks integrates bi-directionally with Confluent via Kafka and Apache Spark, or object storage and the open table format from Iceberg / Delta Lake via Tableflow. This enables businesses to stream data for AI model training in Databricks and perform real-time model inference directly within the streaming platform.

Rather than attempting to own the entire data stack, Confluent specializes in data streaming and integrates seamlessly with the best cloud, AI, and database solutions.

Beyond the Leader: Specialized Vendors Shaping Data Streaming

Confluent is not alone in recognizing the power of focus. A handful of other vendors have also chosen to specialize in data streaming—each with their own vision, strengths, and approaches.

WarpStream, recently acquired by Confluent, is a Kafka-compatible infrastructure solution designed for Bring Your Own Cloud (BYOC) environments. It re-architects Kafka by running the protocol directly on cloud object storage like Amazon S3, removing the need for traditional brokers or persistent compute. This model dramatically reduces operational complexity and cost—especially for high-ingest, elastic workloads. While WarpStream is now part of the Confluent portfolio, it remains a distinct offering focused on lightweight, cost-efficient Kafka infrastructure.

StreamNative is the commercial steward of Apache Pulsar, aiming to provide a unified messaging and streaming platform. Built for multi-tenancy and geo-replication, it offers some architectural differentiators, particularly in use cases where separation of compute and storage is a must. However, adoption remains niche, and the surrounding ecosystem still lacks maturity and standardization.

Redpanda positions itself as a Kafka-compatible alternative with a focus on performance, especially in low-latency and resource-constrained environments. Its C++ foundation and single-binary architecture make it appealing for edge and latency-sensitive workloads. Yet, Redpanda still needs to mature in areas like stream processing, integrations, and ecosystem support to serve as a true platform.

AutoMQ re-architects Apache Kafka for the cloud by separating compute and storage using object storage like S3. It aims to simplify operations and reduce costs for high-throughput workloads. Though fully Kafka-compatible, AutoMQ concentrates on infrastructure optimization and currently lacks broader platform capabilities like governance, processing, or hybrid deployment support.

Bufstream is experimenting with lightweight approaches to real-time data movement using modern developer tooling and APIs. While promising in niche developer-first scenarios, it has yet to demonstrate scalability, production maturity, or a robust ecosystem around complex stream processing and governance.

Ververica focuses on stream processing with Apache Flink. It offers Ververica Platform to manage Flink deployments at scale, especially on Kubernetes. While it brings deep expertise in Flink operations, it does not provide a full data streaming platform and must be paired with other components, like Kafka for ingestion and delivery.

Great Ideas Are Born From Market Pressure

Each of these companies brings interesting ideas to the space. But building and scaling a complete, enterprise-grade data streaming platform is no small feat. It requires not just infrastructure, but capabilities for processing, governance, security, global scale, and integrations across complex environments.

That’s where Confluent continues to lead—by combining deep technical expertise, a relentless focus on one problem space, and the ability to deliver a full platform experience across cloud, on-prem, and hybrid deployments.

In the long run, the data streaming market will reward not just technical innovation, but consistency, trust, and end-to-end excellence. For now, the message is clear: specialization matters—but execution matters even more. Let’s see where the others go.

How Customers Benefit from Specialization

A well-defined focus provides several advantages for customers, ensuring they get the right tool for each job without the complexity of navigating overlapping services.

Clarity in technology selection: No need to evaluate multiple competing services; purpose-built solutions ensure the right tool for each use case.
Deep technical investment: Continuous innovation focused on solving specific challenges rather than spreading resources thin.
Predictable long-term roadmap: Stability and reliability with no sudden service retirements or shifting priorities.
Better performance and reliability: Architectures optimized for the right workloads through the deep experience in the software category.
Seamless ecosystem integration: Works natively with leading cloud providers and other data platforms for a best-of-breed approach.
Deployment flexibility: Not bound to a single environment like one cloud provider; businesses can run workloads on-premise, in any cloud, at the edge, or across hybrid environments.

Rather than adopting a broad but shallow set of solutions, businesses can achieve stronger outcomes by choosing vendors that specialize in one core competency and deliver it everywhere.

Why Deep Expertise Matters: Supporting 24/7, Mission-Critical Data Streaming

For mission-critical workloads—where downtime, data loss, and compliance failures are not an option—deep expertise is not just an advantage, it is a necessity.

Data streaming is a high-performance, real-time infrastructure that requires continuous reliability, strict SLAs, and rapid response to critical issues. When something goes wrong at the core of an event-driven architecture—whether in Apache Kafka, Apache Flink, or the surrounding ecosystem—only specialized vendors with proven expertise can ensure immediate, effective solutions.

The Challenge with Generalist Cloud Services

Many cloud providers offer some level of data streaming, but their approach is different from a dedicated data streaming platform. Take Amazon MSK as an example:

Amazon MSK provides managed Kafka clusters, but does NOT offer Kafka support itself. If an issue arises deep within Kafka, customers are responsible for troubleshooting it—or must find external experts to resolve the problem.
The terms and conditions of Amazon MSK explicitly exclude Kafka support, meaning that, for mission-critical applications requiring uptime guarantees, compliance, and regulatory alignment, MSK is not a viable choice.
This lack of core Kafka support poses a serious risk for enterprises relying on event streaming for financial transactions, real-time analytics, AI inference, fraud detection, and other high-stakes applications.

For companies that cannot afford failure, a data streaming vendor with direct expertise in the underlying technology is essential.

Why Specialized Vendors Are Essential for Mission-Critical Workloads

A complete data streaming platform is much more than a hosted Kafka cluster or a managed Flink service. Specialized vendors like Confluent offer end-to-end operational expertise, covering:

24/7 Critical Support: Direct access to Kafka and Flink experts, ensuring immediate troubleshooting for core-level issues.
Guaranteed SLAs: Strict uptime commitments, ensuring that mission-critical applications are always running.
No Data Loss Architecture: Built-in replication, failover, and durability to prevent business-critical data loss.
Security & Compliance: Encryption, access control, and governance features designed for regulated industries.
Professional Services & Advisory: Best practices, architecture reviews, and operational guidance tailored for real-time streaming at scale.

This level of deep, continuous investment in operational excellence separates a general-purpose cloud service from a true data streaming platform.

The Power of Specialization: Deep Expertise Beats Broad Offerings

Software vendors will continue expanding their offerings, integrating new technologies, and launching new services. However, focus remains a key differentiator in delivering best-in-class solutions, especially for operational systems with critical SLAs—where low latency, 24/7 uptime, no data loss, and real-time reliability are non-negotiable.

For companies investing in strategic data architectures, choosing a vendor with deep expertise in one core technology—rather than one that spreads across multiple domains—ensures stability, predictable performance, and long-term success.

In a rapidly evolving technology landscape, clarity, specialization, and seamless integration are the foundations of lasting innovation. Businesses that prioritize proven, mission-critical solutions will be better equipped to handle the demands of real-time, event-driven architectures at scale.

How do you see the world of software? Better to specialize or become an allrounder? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter. And download my free book about data streaming use cases.

The post The Importance of Focus: Why Software Vendors Should Specialize Instead of Doing Everything (Example: Data Streaming) appeared first on Kai Waehner.

Data Streaming as the Technical Foundation for a B2B Marketplace

Kai Waehner — Wed, 05 Mar 2025 06:26:59 +0000

A B2B data marketplace is a groundbreaking platform enabling businesses to exchange, monetize, and use data in real time. Beyond the basic promise of data sharing, these marketplaces are evolving into self-service platforms with features such as subscription management, usage-based billing, and secure data monetization. This post explores the core technical and functional aspects of building a data marketplace for subscription commerce using data streaming technologies like Apache Kafka. Drawing inspiration from real-world implementations like AppDirect, the post examines how these capabilities translate into a robust and scalable architecture.

Subscription Commerce with a Digital Marketplace

Subscription commerce refers to business models where customers pay recurring fees—monthly, annually, or usage-based—for access to products or services, such as SaaS, streaming platforms, or subscription boxes.

Digital marketplaces are online platforms where multiple vendors can sell their products or services to customers, often incorporating features like catalog management, payment processing, and partner integrations.

Together, subscription commerce and digital marketplaces enable businesses to monetize recurring offerings efficiently, manage customer relationships, and scale through multi-vendor ecosystems. These solutions enables organizations to sell own or third-party recurring technology services through a white-labeled marketplace, or streamline procurement with an internal IT marketplace to manage and acquire services. The platform empowers digital growth for businesses of all sizes across direct and indirect go-to-market channels.

The Competitive Landscape for Subscription Commerce

The subscription commerce and digital marketplace space includes several prominent players offering specialized solutions.

Zuora leads in enterprise-grade subscription billing and revenue management, while Chargebee and Recurly focus on flexible billing and automation for SaaS and SMBs. Paddle provides global payment and subscription management tailored to SaaS businesses. AppDirect stands out for enabling SaaS providers and enterprises to manage subscriptions, monetize offerings, and build partner ecosystems through a unified platform.

For marketplace platforms, CloudBlue (from Ingram Micro) enables as-a-service ecosystems for telcos and cloud providers, and Mirakl excels at building enterprise-level B2B and B2C marketplaces.

Solutions like ChannelAdvisor and Vendasta cater to resellers and localized businesses with marketplace and subscription tools. Each platform offers unique capabilities, making the choice dependent on specific needs like scalability, industry focus, and integration requirements.

What Makes a B2B Data Marketplace Technically Unique?

A data marketplace is more than a repository; it is a dynamic, decentralized platform that enables the continuous exchange of data streams across organizations. Its key distinguishing features include:

Real-Time Data Sharing: Enables instantaneous exchange and consumption of data streams.
Decentralized Design: Avoids reliance on centralized data hubs, reducing latency and risk of single points of failure.
Fine-Grained Access Control: Ensures secure and compliant data sharing.
Self-Service Capabilities: Simplifies the discovery and consumption of data through APIs and portals.
Usage-Based Billing and Monetization: Tracks data usage in real time to enable flexible pricing models.

These characteristics require a scalable, fault-tolerant, and real-time data processing backbone. Enter data streaming with the de facto standard Apache Kafka.

Data Streaming as the Backbone of a B2B Data Marketplace

At the heart of a B2B data marketplace lies data streaming, a technology paradigm enabling continuous data flow and processing. Kafka’s publish-subscribe architecture aligns seamlessly with the marketplace model, where data producers publish streams that consumers can subscribe to in real time.

Why Apache Kafka for a Data Marketplace?

A data streaming platform uniquely combines different characteristics and capabilities:

Scalability and Fault Tolerance: Kafka’s distributed architecture allows for handling large volumes of data streams, ensuring high availability even during failures.
Event-Driven Design: Kafka provides a natural fit for event-driven architectures, where data exchanges trigger workflows, such as subscription activation or billing.
Stream Processing with Kafka Streams or ksqlDB: Real-time transformation, filtering, and enrichment of data streams can be performed natively, ensuring the data is actionable as it flows.
Integration with Ecosystem: Kafka’s connectors enable seamless integration with external systems such as billing platforms, monitoring tools, and data lakes.
Security and Compliance: Built-in features like TLS encryption, SASL authentication, and fine-grained ACLs ensure the marketplace adheres to strict security standards.

I wrote a separate article that explores how an Event-driven Architecture (EDA) and Apache Kafka build the foundation of a streaming data exchange.

Architecture Overview

Modern architectures for data marketplaces are often inspired by Domain-Driven Design (DDD), microservices, and the principles of a data mesh.

Domain-Driven Design helps structure the platform around distinct business domains, ensuring each part of the marketplace aligns with its core functionality, such as subscription management or billing.
Microservices decompose the marketplace into independently deployable services, promoting scalability and modularity.
A Data mesh decentralizes data ownership, empowering individual teams or providers to manage and share their datasets while adhering to shared governance policies.

Together, these principles create a flexible, scalable, and business-aligned architecture. A high-level architecture for such a marketplace involves:

Data Providers: Publish real-time data streams to Kafka Topics. Use Kafka Connect to ingest data from external sources.
Data Marketplace Platform: A front-end portal backed by Kafka for subscription management, search, and discovery. Kafka Streams or Apache Flink for real-time processing (e.g., billing, transformation). Integration with billing systems, identity management, and analytics platforms.
Data Consumers: Subscribe to Kafka Topics, consuming data tailored to their needs. Integrate the marketplace streams into their own analytics or operational workflows.

A data streaming platoform enable simple and secure data sharing within or across organizations with chargeback capabilities built-in to build cost APIs and new business models. The following is an implementation leveraging Confluent’s Stream Sharing functionality in Confluent Cloud:

Source: Confluent

Data Marketplace Features and Their Technical Implementation

A robust B2B data marketplace should offer the following vendor-agnostic features:

Self-Service Data Discovery

Functionality: Allows users to browse available datasets or streams, explore metadata, and request subscriptions.
Technical Implementation: Build a front-end portal or API leveraging Kafka’s schema registry for metadata management. Use Kafka Topics to represent each data stream or dataset in the data mesh.

Real-Time Subscription Management

Functionality: Enables users to subscribe to data streams with customizable preferences, such as data filters or frequency of updates.
Technical Implementation: Use Kafka’s consumer groups to manage subscriptions. Implement filtering logic with Kafka Streams or ksqlDB to tailor streams to user preferences.

Usage-Based Billing

Functionality: Tracks the volume or type of data consumed by each user and generates invoices dynamically.
Technical Implementation: Use Kafka’s log retention and monitoring tools to track data consumption. Integrate with a billing engine via Kafka Connect or RESTful APIs for real-time invoice generation.

Functionality: Facilitates revenue sharing between data providers and marketplace operators.
Technical Implementation: Build a revenue-sharing logic layer using Kafka Streams or Apache Flink, processing data usage metrics. Store provider-specific pricing models in a database connected via Kafka Connect.

Compliance and Data Governance

Functionality: Ensures data sharing complies with regulations (e.g., GDPR, HIPAA) and provides an audit trail.
Technical Implementation: Leverage Kafka’s immutable event log as an auditable record of all data exchanges. Implement data contracts for Kafka Topics with policies, role-based access control (RBAC), and encryption for secure sharing.

Dynamic Pricing Models

Functionality: Enables data providers to offer pricing models based on volume, frequency, or quality of data.
Technical Implementation: Deploy stream processing applications to calculate pricing dynamically. Use APIs to display pricing updates in real time on the marketplace portal.

Marketplace Analytics

Functionality: Offers insights into usage patterns, revenue streams, and operational metrics.
Technical Implementation: Aggregate Kafka stream data into analytics platforms such as Snowflake, Databricks, Elasticsearch or Microsoft Fabri.

Real-World Success Story: AppDirect’s Subscription Commerce Platform Powered by a Data Streaming Platform

AppDirect is a leading subscription commerce platform that helps businesses monetize and manage software, services, and data through a unified digital marketplace. It provides tools for subscription billing, usage tracking, partner management, and revenue sharing, enabling seamless B2B transactions.

Source: AppDirect

AppDirect serves customers across industries such as telecommunications (e.g., Telstra, Deutsche Telekom), technology (e.g., Google, Microsoft), and cloud services, powering ecosystems for software distribution and partner-driven monetization.

The Challenge

AppDirect enables SaaS providers to monetize their offerings, but faced significant challenges in scaling its platform to handle the growing complexity of real-time subscription billing and data flow management.

As the number of vendors and consumers on the platform increased, ensuring accurate, real-time tracking of usage and billing became increasingly difficult. Additionally, the legacy systems struggled to support seamless integration, dynamic pricing models, and real-time updates required for a competitive marketplace experience.

The Solution

AppDirect implemented a data streaming backbone with Apache Kafka leveraging Confluent’s data streaming platform. This enabled:

Real-time billing for subscription services.
Accurate usage tracking and monetization.
Improved scalability with a distributed, event-driven architecture.

The Outcome

90% reduction in time-to-market for new features.
Enhanced customer experience with real-time updates.
Seamless scaling to handle increasing vendor participation and data loads.

Advantages Over Competitors in the Subscription Commerce and Data Marketplace Business

Powered by the event-driven architecture and a data streaming platform, AppDirect distinguishes itself with from competitors in the subscription commerce and data marketplace business:

A unified approach to subscription management, billing, and partner ecosystem enablement.
Strong focus on the telecommunications and technology sectors.
Deep integrations for vendor and reseller ecosystems.

The technical backbone of a B2B data marketplace relies on data streaming to deliver real-time data sharing, scalable subscription management, and secure monetization. Platforms like Apache Kafka and Confluent enable these features through their distributed, event-driven architecture, ensuring resilience, compliance, and operational efficiency.

By implementing these principles, organizations can build a modern, self-service data marketplace that fosters innovation and collaboration. The success of AppDirect highlights the potential of this approach, offering a blueprint for businesses looking to capitalize on the power of data streaming.

Whether you’re a data provider seeking additional revenue streams or a business aiming to harness external insights, a well-designed data marketplace is your gateway to unlocking value in the digital economy.

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And make sure to download my free book about data streaming use cases.

The post Data Streaming as the Technical Foundation for a B2B Marketplace appeared first on Kai Waehner.

Fully Managed (SaaS) vs. Partially Managed (PaaS) Cloud Services for Data Streaming with Kafka and Flink

Kai Waehner — Sat, 18 Jan 2025 11:33:44 +0000

The cloud revolution has transformed how businesses deploy, scale, and manage data streaming solutions. While Software-as-a-Service (SaaS) and Platform-as-a-Service (PaaS) cloud models are often used interchangeably in marketing, their distinctions have significant implications for operational efficiency, cost, and scalability. In the context of data streaming around Apache Kafka and Flink, understanding these differences and recognizing common misconceptions—such as the overuse of the term “serverless”—can help you make an informed decision. Additionally, the emergence of Bring Your Own Cloud (BYOC) offers yet another option, providing organizations with enhanced control and flexibility in their cloud environments.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch.

The Data Streaming Landscape: Kafka, Flink, Cloud, and More

The Data Streaming Landscape 2025 highlights how data streaming has evolved into a key software category, moving from niche adoption to a fundamental part of modern data architecture.

With frameworks like Apache Kafka and Flink at its core, the landscape now spans self-managed, BYOC, and fully managed SaaS solutions, driving real-time use cases, unifying transactional and analytical workloads, and enabling innovation across industries.

If you’re still grappling with the fundamentals of stream processing, this article is a must-read: “Stateless vs. Stateful Stream Processing with Kafka Streams and Apache Flink“.

What is SaaS in Data Streaming?

SaaS data streaming solutions are fully managed services where the provider handles all aspects of deployment, maintenance, scaling, and updates. SaaS offerings are designed for ease of use, providing a serverless experience where developers focus solely on building applications rather than managing infrastructure.

Characteristics of SaaS in Data Streaming

Serverless Architecture: Resources scale automatically based on demand. True SaaS solutions eliminate the need to provision or manage servers.
Low Operational Overhead: The provider manages hardware, software, and runtime configurations, including upgrades and security patches.
Pay-As-You-Go Pricing: Consumption-based pricing aligns costs directly with usage, reducing waste during low-demand periods.
Rapid Deployment: SaaS enables users to start processing streams within minutes, accelerating time-to-value.

Examples of SaaS in Data Streaming:

Confluent Cloud: A fully managed Kafka platform offering serverless scaling, multi-tenancy, and a broad feature set for both stateless and stateful processing.
Amazon Kinesis Data Analytics: A managed service for real-time analytics with automatic scaling.

What is PaaS in Data Streaming?

PaaS offerings sit between fully managed SaaS and infrastructure-as-a-service (IaaS). These solutions provide a platform to deploy and manage applications but still require significant user involvement for infrastructure management.

Characteristics of PaaS in Data Streaming

Partial Management: The provider offers tools and frameworks, but users must manage servers, clusters, and scaling policies.
Manual Configuration: Deployment involves provisioning VMs or containers, tuning parameters, and monitoring resource usage.
Complex Scaling: Scaling is not always automatic; users may need to adjust resource allocation based on workload changes.
Higher Overhead: PaaS requires more expertise and operational involvement, making it less accessible to teams without dedicated DevOps resources.

Examples of PaaS in Data Streaming (Kafka, Flink)

PaaS offerings in data streaming, while simplifying some infrastructure tasks, still require significant user involvement compared to fully serverless SaaS solutions. Below are three common examples, along with their benefits and pain points compared to serverless SaaS:

Apache Flink (Self-Managed on Kubernetes Cloud Service like EKS, AKS or GKE)
- Benefits: Full control over deployment and infrastructure customization.
- Pain Points: High operational overhead for managing Kubernetes clusters, manual scaling, and complex resource tuning.
Amazon Managed Service for Apache Flink (Amazon MSF)
- Benefits: Simplifies infrastructure management and integrates with some other AWS services.
- Pain Points: Users still handle job configuration, scaling optimization, and monitoring, making it less user-friendly than serverless SaaS solutions.
Amazon MSK (Managed Streaming for Apache Kafka)
- Benefits: Eases Kafka cluster maintenance and integrates with the AWS ecosystem.
- Pain Points: Requires users to design and manage producers/consumers, manually configure scaling, and handle monitoring responsibilities. MSK also excludes support for Kafka if you have any operational issues with the Kafka piece of the infrastructure.

SaaS vs. PaaS: Key Differences

SaaS and PaaS differ in the level of management and user responsibility, with SaaS offering fully managed services for simplicity and PaaS requiring more user involvement for customization and control.

Feature	SaaS	PaaS
Infrastructure	Fully managed by the provider	Partially managed; user controls clusters
Scaling	Automatic and server less	Manual or semi-automatic scaling
Deployment Speed	Immediate, ready to use	Slower; requires configuration
Operational Complexity	Minimal	Moderate to high
Cost Model	Consumption-based, no idle costs	May incur idle resource costs

The big benefit of PaaS over SaaS is greater flexibility and control, allowing users to customize the platform, integrate with specific infrastructure, and optimize configurations to meet unique business or technical requirements. This level of control is often essential for organizations with strict compliance, security, or data sovereignty requirements.

SaaS is NOT Always Better than PaaS!

Be careful: The limitations and pain points of PaaS do NOT mean that SaaS is always better.

A concrete example: Amazon MSK Serverless simplifies Apache Kafka operations with automated scaling and infrastructure management but comes with significant limitations, including the lack of fully-managed connectors, advanced data governance tools, and native multi-language client support.

Amazon MSK also excludes Kafka engine support, leading to potential operational risks and cost unpredictability, especially when integrating with additional AWS services for a complete data streaming pipeline. I explored these challenges in more detail in my article “When NOT to choose Amazon MSK Serverless for Apache Kafka?“.

Bring Your Own Cloud (BYOC) as Alternative to PaaS

BYOC (Bring Your Own Cloud) offers a middle ground between fully managed SaaS and self-managed PaaS solutions, allowing organizations to host applications in their own VPCs.

BYOC provides enhanced control, security, and compliance while reducing operational complexity. This makes BYOC a strong alternative to PaaS for companies with strict regulatory or cost requirements.

As an example, here are the options of Confluent for deploying the data streaming platform: Serverless Confluent Cloud, Self-managed Confluent Platform (some consider this a PaaS if you leverage Confluent’s Kubernetes Operator and other automation / DevOps tooling) and WarpStream as BYOC offering:

Source: Confluent

While BYOC complements SaaS and PaaS, it can be a better choice when fully managed solutions don’t align with specific business needs. I wrote a detailed article about this topic: “Deployment Options for Apache Kafka: Self-Managed, Fully-Managed / Serverless and BYOC (Bring Your Own Cloud)“.

“Serverless” Claims: Don’t Trust the Marketing

Many cloud data streaming solutions are marketed as “serverless,” but this term is often misused. A truly serverless solution should:

Abstract Infrastructure: Users should never need to worry about provisioning, upgrading, or cluster sizing.
Scale Transparently: Resources should scale up or down automatically based on workload.
Eliminate Idle Costs: There should be no cost for unused capacity.

However, many products marketed as serverless still require some degree of infrastructure management or provisioning, making them closer to PaaS. For example:

A so-called “serverless” PaaS solution may still require setting initial cluster sizes or monitoring node health.
Some products charge for pre-provisioned capacity, regardless of actual usage.

Do Your Own Research

When evaluating data streaming solutions, dive into the technical documentation and ask pointed questions:

Does the solution truly abstract infrastructure management?
Are scaling policies automatic, or do they require manual configuration?
Is there a minimum cost even during idle periods?

By scrutinizing these factors, you can avoid falling for misleading “serverless” claims and choose a solution that genuinely meets your needs.

Choosing the Right Model for Your Data Streaming Business: SaaS, PaaS, or BYOC

When adopting a data streaming platform, selecting the right model is crucial for aligning technology with your business strategy:

Use SaaS (Software as a Service) if you prioritize ease of use, rapid deployment, and operational simplicity. SaaS is ideal for teams looking to focus entirely on application development without worrying about infrastructure.
Use PaaS (Platform as a Service) if you require deep customization, control over resource allocation, or have unique workloads that SaaS offerings cannot address.
Use BYOC (Bring Your Own Cloud) if your organization demands full control over its data but sees benefits in fully managed services. BYOC enables you to run the data plane within your cloud VPC, ensuring compliance, security, and architectural flexibility while leveraging SaaS functionality for the control plane .

In the rapidly evolving world of data streaming around Apache Kafka and Flink, SaaS data streaming platforms like Confluent Cloud provide the best of both worlds: the advanced features of tools like Apache Kafka and Flink, combined with the simplicity of a fully managed serverless experience. Whether you’re handling stateless stream processing or complex stateful analytics, SaaS ensures you’re scaling efficiently without operational headaches.

What deployment option do you use today for Kafka and Flink? Any changes planned in the future? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Fully Managed (SaaS) vs. Partially Managed (PaaS) Cloud Services for Data Streaming with Kafka and Flink appeared first on Kai Waehner.

IoT and Data Streaming with Kafka for a Tolling Traffic System with Dynamic Pricing

Kai Waehner — Fri, 01 Nov 2024 08:13:05 +0000

In the rapidly evolving landscape of intelligent traffic systems, innovative software provides real-time processing capabilities, dynamic pricing and new customer experiences, particularly in the domains of tolling, payments and safety inspection. With the increasing complexity of road networks and the need for efficient traffic management, these organizations are embracing cutting-edge technology to revolutionize traffic and logistics systems. This blog post explores success stories from Quarterhill and DKV Mobility providing traffic and payment systems for tolls. Data streaming powered by Apache Kafka has been pivotal in the journey towards building intelligent traffic systems in the cloud.

Traffic System for Tolls: Use Case, Challenges, and Business Models

Tolling systems are integral to modern infrastructure by providing a mechanism for funding road maintenance and expansion. The primary use case for tolling systems is to efficiently manage and collect tolls from vehicles using roadways. This involves roadside tracking, back-office accounting, and payment processing. However, the implementation of such systems is loaded with challenges.

Use Cases and Business Models for Tolling

Various business models have emerged to provide comprehensive tolling and payment solutions that integrate technology and data-driven strategies to optimize operations and revenue generation:

Roadside Tracking and Data Collection: At the core of modern tolling systems is the integration of IoT devices for roadside tracking. These devices capture essential data, such as vehicle identification, speed, and lane usage. This data is crucial for calculating tolls accurately and in real-time. The business model here involves deploying and maintaining a network of sensors and cameras that ensure seamless data collection across toll points.
Back-Office Accounting and Payment Processing: A robust back-office system is essential for processing toll transactions, managing accounts, and handling payments. This includes integrating with financial institutions for payment processing and ensuring compliance with financial regulations. The business model focuses on providing a secure and efficient platform for managing financial transactions, reducing administrative overhead, and enhancing customer satisfaction through streamlined payment processes.
Dynamic Pricing Models: To optimize revenue and manage traffic flow, tolling systems can implement dynamic pricing models. These models adjust toll rates based on real-time traffic conditions, time of day, and demand. By leveraging data analytics and machine learning, toll operators can predict traffic patterns and set prices that encourage optimal road usage. The business model here involves using data-driven insights to maximize revenue while minimizing congestion and improving the overall driving experience.
Interoperability and Cross-Agency Collaboration: Vehicles often travel across multiple tolling jurisdictions, causing interoperability between different toll agencies. Business models in this area focus on creating partnerships and agreements that allow for seamless data exchange and revenue sharing. This ensures that tolls are accurately attributed and collected, regardless of jurisdiction. This enhances the user experience and operational efficiency.
Subscription and Membership Models: Some tolling systems offer subscription or membership models that provide users with benefits such as discounted rates, priority access to express lanes, or bundled services. This business model aims to build customer loyalty and generate steady revenue streams by offering value-added services and personalized experiences.
Public-Private Partnerships (PPPs): Many tolling systems are developed and operated through collaborations. These leverage the strengths of both sectors, with the public sector providing regulatory oversight and the private sector offering technological expertise and investment. The business model focuses on sharing risks and rewards. This strategy ensures sustainable and efficient tolling operations.

Challenges of Traffic Systems

Intelligent tolling systems create lots of challenges for the project teams:

Integration with IoT Devices: Tolling systems rely heavily on IoT devices for roadside tracking. These devices generate vast amounts of data that need to be processed in real-time to ensure accurate toll collection.
Interoperability: Ensuring interoperability between different systems is crucial with vehicles crossing state lines and using multiple toll agencies.
Data Management: Managing and processing the data generated by IoT devices and various backend IT systems such as a CRM in a scalable and reliable manner is complex.
Static Pricing: Implementing innovative revenue-generating use cases such as dynamic pricing on express lanes requires real-time data processing to adjust toll rates based on current traffic conditions.

As you might expect, implementing and deploying intelligent tolling systems requires the use of modern, cloud-natives technologies. Conventional data integration and processing solutions, like databases, data lakes, ETL tools, or API platforms, lack the necessary capabilities. Therefore, data streaming becomes essential…

Data Streaming with Apache Kafka and Flink for Intelligent Traffic Systems

The ability to process data in real-time is crucial for ensuring efficient and accurate toll collection. Data streaming has emerged as a transformative technology that addresses the unique challenges faced by tolling systems, particularly in integrating IoT devices and implementing dynamic pricing models.

Apache Kafka became the de facto standard for data streaming. Apache Flink emerges as standard for stream processing. These technologies can help to implement tolling use cases:

Real-Time Toll Collection: Tolling systems rely on IoT devices to capture data from vehicles as they pass through toll points. This data includes vehicle identification, time of passage, and lane usage. Real-time processing of this data is essential to ensure that tolls are accurately calculated and collected without delay from IoT devices.
Dynamic Pricing Models: To optimize traffic flow and revenue, tolling systems can implement dynamic pricing models. These models adjust toll rates based on current traffic conditions, time of day, and other factors. Data streaming enables the continuous analysis of traffic data, allowing for real-time adjustments to pricing.
Interoperability Across Agencies: Vehicles often travel across multiple tolling jurisdictions, requiring seamless interoperability between different toll agencies. Data streaming facilitates the real-time exchange of data between agencies, ensuring that tolls are accurately attributed and collected regardless of jurisdiction.

Key Benefits of Data Streaming with Apache Kafka and Flink

Data streaming with Kafka and Flink makes a tremendous difference for building a next generation traffic system:

Real-Time Processing: Data streaming technologies like Apache Kafka and Apache Flink enable the real-time processing of data from IoT devices. Kafka acts as the backbone for data ingestion, capturing and storing streams of data from roadside sensors and devices. Flink provides the capability to process and analyze these data streams in real-time, ensuring that tolls are calculated and collected accurately and promptly.
Scalability: Tolling systems must handle large volumes of data, especially during peak traffic hours. Kafka’s distributed architecture allows it to scale horizontally, accommodating the growing data demands of expanding traffic networks. This scalability ensures that the system can handle increased data loads without compromising performance.
Reliability: Kafka’s robust architecture provides a reliable mechanism for tracking and processing data. It ensures that every message from IoT devices is captured and processed, reducing the risk of errors in toll collection. Kafka’s ability to replay messages also allows for recovery from potential data loss, ensuring data integrity.
Flexibility: By decoupling data processing from the underlying infrastructure, data streaming offers the flexibility to adapt to changing business needs. Kafka’s integration capabilities allow it to connect with various data sources and sinks. Flink’s stream processing capabilities enable complex event processing and real-time analytics. This flexibility allows tolling systems to develop and incorporate new technologies and business models as needed.

Quarterhill – Tolling and Enforcement with Dynamic Pricing

Quarterhill is a company that specializes in intelligent traffic systems, focusing on two main areas: tolling and safety/inspection. The company provides comprehensive solutions for managing tolling systems, which include roadside tracking, back-office accounting, and payment processing.

Source: Quarterhill

By integrating IoT devices and leveraging data streaming technologies, Quarterhill optimizes toll collection processes, implements dynamic pricing models, and ensures interoperability across different toll agencies to optimize revenue generation while ensuring smooth traffic flow.

I had the pleasure of doing a panel conversation with Josh LittleSun, VP Delivery of Quarterhill at Confluent’s Data in Motion Tour Chicago 2024.

Quarterhill’s Product Portfolio

Quarterhill’s product portfolio encompasses a comprehensive range of solutions designed to enhance traffic management and transportation systems. As you can see, many of these products are inherently designed for data streaming.

Roadside technologies include tools for congestion charging, performance management, insights and analytics, processing systems, and lane configuration, all aimed at optimizing road usage and efficiency.
Commerce and mobility platforms offer analytics, toll interoperability, a mobility marketplace, back-office solutions, and performance management, facilitating seamless transactions and mobility services.
Safety and enforcement solutions focus on ensuring compliance and safety for commercial vehicles, with features like maintenance, e-screening, tire anomaly detection, weight compliance, and commercial roadside technologies.
Smart Transportation solutions provide multi-modal data and intersection management, improving the coordination and flow of various transportation modes.
Data Solutions feature video-based systems, traffic recording systems, in-road sensor systems, and cloud-based solutions, offering advanced data collection and analysis capabilities for informed decision-making and maintenance.

How Quarterhill Built an Intelligent Traffic System in the Cloud with Data Streaming and IoT

Quarterhill’s journey towards building an intelligent traffic system began with the realization that traditional monolithic architectures did not meet the demands of modern tolling systems. The company embarked on a transformation journey, moving from monolith to microservices and adopting data streaming as a core component of their architecture.

Key Components of Quarterhill’s Intelligent Traffic System

Fully Managed Confluent Cloud on GCP: By leveraging Confluent Cloud on Google Cloud Platform (GCP) as its data streaming platform, Quarterhill could focus on solving business problems rather than managing infrastructure. This shift allowed for greater agility and reduced operational overhead.
Data Streaming Instead of Google Pub/Sub: Quarterhill chose data streaming over Google Pub/Sub because of its ability to provide various use cases beyond ingestion into the data lake, including real-time processing of transactional workloads and integration with IoT devices.
Direct Connection to the Cloud via MQTT, HTTP, and Connectors: IoT devices connect directly to Kafka using protocols like MQTT and HTTP. Connectors facilitate data integration and processing.
Edge Servers for Data Aggregation: In some cases, edge servers are used to aggregate data before sending it to the cloud. This option optimizes bandwidth usage and ensuring low-latency processing.
Consumers: Elastic, BigQuery, Custom Connectors: Data is consumed by various systems, including Elastic for search and analytics, Google BigQuery for data warehousing, and custom connectors for specific use cases.

Benefits Realized with a Fully Managed Data Streaming Platform

Elasticity and high throughput: The ability to scale with traffic volume ensures that tolling systems can handle peak loads without degradation in performance.
Resiliency and accuracy: The reliability of data streaming ensures that toll collection is accurate and resilient to failures.
Cost savings and efficiency: By moving to a fully managed cloud solution for the data streaming platform with Confluent Cloud, Quarterhill achieved significant cost savings (TCO) and reduced the demand for in-house resources.

DKV Mobility: On-the-Road Payments and Solutions

DKV Mobility stands as a leading European B2B platform specializing in on-the-road payments and solutions. With a robust customer base of over 300,000 active clients spanning over 50 service countries, DKV Mobility has revolutionized the way businesses manage their on-the-road expenses. The platform enables real-time payments and transaction processing, providing valuable insights for businesses on the move.

Source: DKV Mobility

DKV Mobility’s comprehensive services cover a wide range of needs, including refueling, electric vehicle (EV) charging, toll solutions, and vehicle services. The platform supports approximately 468,000 EV charge points, 63,000 fuel service stations, and 30,000 vehicle service stations, ensuring that businesses have access to essential services wherever they operate. Through its innovative solutions, DKV Mobility enhances operational efficiency and cost management for businesses across Europe.

If you are interested in how DKV Mobility transitioned from open source Kafka to fully managed SaaS and how they leverage stream processing with Kafka Streams, check out the DKV Mobility success story.

IoT Connectivity with MQTT and HTTP + Data Streaming with Apache Kafka = Next Generation Traffic System

Quarterhill’s intelligent traffic system for tolls and DKV Mobility’s real time on-the-road payment solution exemplify the transformative power of a data streaming platform using Apache Kafka in modern infrastructure to solve a specific business problem. Related scenarios, such as logistics and supply chain, can benefit from such a foundation and connect to existing data products for new business models or B2B data exchanges with partners.

By embracing a cloud-native, microservices-based architecture, Quarterhill and DKV Mobility have not only overcome the challenges of traditional tolling and payment systems but have also set a new standard for efficiency and innovation in the industry. Use cases such as IoT sensor integration and dynamic pricing are only possible with data streaming.

As these companies continue to leverage stream processing with Kafka Streams and explore new technologies like Apache Flink and data governance solutions, the future of intelligent traffic systems looks promising. The potential is huge to further enhance safety, efficiency of payments and customer experiences, and revenue generation on roadways.

How do you leverage data streaming in your enterprise architecture? How do you connect to IoT interfaces? What is your data processing strategy? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post IoT and Data Streaming with Kafka for a Tolling Traffic System with Dynamic Pricing appeared first on Kai Waehner.

Fraud Prevention in Under 60 Seconds with Apache Kafka: How A Bank in Thailand is Leading the Charge

Kai Waehner — Sat, 26 Oct 2024 06:31:56 +0000

In financial services, the ability to prevent fraud in real-time is not just a competitive advantage – it is a necessity. For one of the largest banks in Thailand Krungsri (Bank of Ayudhya), with its vast assets, loans, and deposits, the challenge of fraud prevention has taken center stage. This blog post explores how the bank is leveraging data streaming with Apache Kafka to detect and block fraudulent transactions in under 60 seconds to ensure the safety and trust of its customers.

Fraud Prevention with Data Streaming using Apache Kafka and Flink

Fraud detection has become a critical focus across industries as digital transactions continue to rise, bringing with them increased opportunities for fraudulent activities. Traditional methods of fraud detection, often reliant on batch processing, struggle to keep pace with the speed and sophistication of modern scams. Data streaming offers a transformative solution to enable real-time analysis and immediate response to suspicious activities.

Data streaming technologies such as Apache Kafka and Flink enable businesses to continuously monitor transactions, identify anomalies, and prevent fraud before it affects customers. This shift to real-time fraud detection not only enhances security, but also builds trust and confidence among consumers.

I already explored “Fraud Detection with Apache Kafka, KSQL and Apache Flink” in its own blog post covering case studies across industries from companies such as Paypal, Capital One, ING Bank, Grab, and Kakao Games. And another blog post focusing on “Apache Kafka in Crypto and Financial Services for Cybersecurity and Fraud Detection“.

Kafka is an excellent foundation for fraud prevention and many other use cases across all industries. If you wonder when to choose Apache Flink or Kafka Streams for stream processing, I also got you covered.

Apache Kafka for Fraud Prevention at Krungsri Bank

Krungsri, also known as the Bank of Ayudhya, is one of Thailand’s largest banks. The company offers a range of financial services including personal and business banking, loans, credit cards, insurance, investment solutions, and wealth management.

I had the pleasure to do a panel conversation with Tul Roteseree, Executive Vice President and Head of the Data and Analytics Division from Krungsri at Confluent’s Data in Motion Tour 2024 in Bangkok, Thailand.

One of the most pressing concern for Krungsri is fraud prevention. In today’s digital landscape, scammers often trick consumers into transferring money to mule accounts within a mere 60 seconds. The bank’s data streaming platform allows analyzing payment transactions in real-time, detecting and blocking fraudulent activities before they can affect customers.

While fraud prevention is a primary focus, the bank’s data streaming initiatives encompass a range of use cases that enhance its overall operations. One of the other strategic areas is mainframe offloading. This involves transitioning data from legacy systems to more agile, real-time platforms. This shift not only reduces operational costs but also improves data accessibility and processing speed.

Another critical use case is the enhancement of customer notifications through the bank’s mobile app. By moving from batch processing to real-time data streaming, the bank can provide instant account movement alerts, keeping customers informed and engaged.

The Business Value of Data Streaming with Apache Kafka for Fraud Prevention

Krungsri bank’s decision to adopt data streaming is driven by the need for an event-driven architecture that can handle high-throughput data streams efficiently. Apache Kafka, the leading open source data streaming framework for building real-time data pipelines, was chosen for its scalability and reliability. Kafka’s ability to process vast amounts of data in real-time makes it an ideal choice for the bank’s fraud prevention efforts.

Confluent, a trusted provider of Kafka-based solutions, was selected for its stability and proven track record. The bank valued Confluent’s ability to deliver significant cost savings and speed up project timelines. By leveraging Confluent, the bank reduced its project go-live time from 4-6 months to just 6-8 weeks, ensuring a faster time to market.

Compliance is another critical factor: The bank’s operations are regulated by the Bank of Thailand. The data streaming architecture meets stringent regulatory requirements while ensuring data security and privacy.

From Mainframe to Hybrid Cloud at Krungsri Bank with Change Data Capture (CDC)

The bank’s data streaming architecture is built on a hybrid environment with core banking operations on-premises and mobile applications in the cloud. This setup provides the flexibility needed to adapt to changing business needs and regulatory landscapes.

Data ingestion and transformation occur across various environments, including cloud-to-cloud, cloud-to-on-premise, and on-premise-to-cloud. IBM’s Change Data Capture (CDC) technology is used for data capture. The data streaming platform acts as the intermediary between the mainframe and consumer applications. This “subscribe once, publish many” approach significantly reduces the mainframe’s burden, cutting costs and processing time.

Stream processing is a key component of the bank’s architecture, serving as the primary tool for real-time data transformations and analytics. This capability allows the bank to respond swiftly to emerging trends and threats. The continuous processing of data ensures that fraudulent activities are detected and blocked in under 60 seconds.

The bank’s move to the cloud also facilitates the integration of machine learning and AI models. The cloud transition enables more sophisticated data analysis and personalized services. Events generated through stream processing trigger AI models in the cloud to provide insights that drive decision-making and enhance customer experiences.

Fraud Detection with Stream Processing in Under 60 Seconds

In the fight against fraud, time is of the essence. By leveraging a data streaming platform, one of Thailand’s largest banks is setting a new standard for fraud prevention and ensures that payment transactions are continuously analyzed and blocked in under 60 seconds. With a robust event-driven architecture built on Kafka and Confluent, the bank is not only protecting its customers but also paving the way for a more secure and efficient financial future.

Do you also leverage data streaming for fraud prevention or any other critical use cases? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Fraud Prevention in Under 60 Seconds with Apache Kafka: How A Bank in Thailand is Leading the Charge appeared first on Kai Waehner.

Deployment Options for Apache Kafka: Self-Managed, Fully-Managed / Serverless and BYOC (Bring Your Own Cloud)

Kai Waehner — Thu, 12 Sep 2024 13:43:31 +0000

BYOC (Bring Your Own Cloud) is an emerging deployment model for organizations looking to maintain greater control over their cloud environments. Unlike traditional SaaS models, BYOC allows businesses to host applications within their own VPCs to provide enhanced data privacy, security, and compliance. This approach leverages existing cloud infrastructure. It offers more flexibility for custom configurations, particularly for companies with stringent security needs. In the data streaming sector around Apache Kafka, BYOC is changing how platforms are deployed. Organizations get more control and adaptability for various use cases. But it is clearly NOT the right choice for everyone!

BYOC (Bring Your Own Cloud) – A New Deployment Model for Cloud Infrastructure

BYOC (Bring Your Own Cloud) is a deployment model where organizations choose their preferred cloud infrastructure to host applications or services, rather than using a serverless / fully managed cloud solution selected by a software vendor; typically known as Software as a Service (SaaS). This model gives businesses flexibility to leverage their existing cloud services (like AWS, Google Cloud, Microsoft Azure, or Alibaba) while integrating third-party applications that are compatible with multiple cloud environments.

BYOC helps companies maintain control over their cloud infrastructure, optimize costs, ensure compliance with security standards. BYOC is typically implemented within an organization’s own cloud VPC. Unlike SaaS models, BYOC offers enhanced privacy and compliance by maintaining control over network architecture and data management.

However, BYOC also has some serious drawbacks! The main challenge is scaling a fleet of co-managed clusters running in customer environments with all the reliability expectations of a cloud service. Confluent has shied away from offering a BYOC deployment model for Apache Kafka based on Confluent Platform because doing BYOC at scale requires a different architecture. WarpStream has built this architecture, with a BYOC-native platform that was designed from the ground up to avoid the pitfalls of traditional BYOC.

The Data Streaming Landscape

Data Streaming is a separate software category of data platforms. Many software vendors built their entire businesses around this category. The data streaming landscape shows that most vendors use Kafka or implement its protocol because Apache Kafka has become the de facto standard for data streaming.

New software companies have emerged in this category in the last few years. And several mature players in the data market added support for data streaming in their platforms or cloud service ecosystem. Most software vendors use Kafka for their data streaming platforms. However, there is more than Kafka. Some vendors only use the Kafka protocol (Azure Event Hubs) or utterly different APIs (like Amazon Kinesis).

The following Data Streaming Landscape 2024 summarizes the current status of relevant products and cloud services.

The Data Streaming Landscape evolves. Last year, I added WarpStream as a new entrant into the market. WarpStream uses the Kafka protocol and provides a BYOC offering for Kafka in the cloud. In my next update of the data streaming landscape, I need to do yet another update: WarpStream is now part of Confluent. There are also many other new entrants. Stay tuned for a new “Data Streaming Landscape 2025” in a few weeks (subscribe to my newsletter to stay up-to-date with all the things data streaming).

Confluent Acquisition of WarpStream

Confluent had two product offerings:

Confluent Platform: A self-managed data streaming platform powered by Kafka, Flink, and much more that you can deploy everywhere (on-premise data center, public cloud VPC, edge like factory or retail store, and even stretched across multiple regions or clouds).
Confluent Cloud: A fully managed data streaming platform powered by Kafka, Flink, and much more that you can leverage as a serverless offering in all major public cloud providers (Amazon AWS, Microsoft Azure, Google Cloud Platform).

Why did Confluent acquire WarpStream? Because many customers requested a third deployment option: BYOC for Apache Kafka.

As Jay Kreps described in the acquisition announcement: “Why add another flavor of streaming? After all, we’ve long offered two major form factors–Confluent Cloud, a fully managed serverless offering, and Confluent Platform, a self-managed software offering–why complicate things? Well, our goal is to make data streaming the central nervous system of every company, and to do that we need to make it something that is a great fit for a vast array of use cases and companies.”

Read more details about the acquisition of WarpStream by Confluent in Jay’s blog post: Confluent + WarpStream = Large-Scale Streaming in your Cloud. In summary, WarpStream is not dead. The WarpStream team clarified the status quo and roadmap of this BYOC product for Kafka in its blog post: “WarpStream is Dead, Long Live WarpStream“.

Let’s dig deeper into the three deployment options and their trade-offs.

Deployment Options for Apache Kafka

Apache Kafka can be deployed in three primary ways: self-managed, fully managed/serverless, and BYOC (Bring Your Own Cloud).

In self-managed deployments, organizations handle the entire infrastructure, including setup, maintenance, and scaling. This provides full control but requires significant operational effort.
Fully managed or serverless Kafka is offered by providers like Confluent Cloud or Azure Event Hubs. The service is hosted and managed by a third-party, reducing operational overhead but with limited control over the underlying infrastructure.
BYOC deployments allow organizations to host Kafka within their own cloud VPC. BYOC combines some of the benefits of cloud flexibility with enhanced security and control, while it outsources most of Kafka’s management to specialized vendors.

Confluent’s Kafka Products: Self-Managed Platform vs. BYOC vs. Serverless Cloud

Using the example of Confluent’s product offerings, we can see why there are three product categories for data streaming around Apache Kafka.

There is no silver bullet. Each deployment option for Apache Kafka has its pros and cons. The key differences are related to the trade-offs between “ease of management” and “level of control”.

Source: Confluent

If we go into more detail, we see that different use cases require different configurations, security setups, and levels of control while also focusing on being cost effective and providing the right SLA and latency for each use case.

Trade-Offs of Confluent’s Deployment Options for Apache Kafka

On a high level, you need to figure out if you want or have to managed the data plane(s) and control plane of your data streaming infrastructure:

Source: Confluent

If you follow my blog, you know that a key focus is exploring various use cases, architectures and success stories across all industries. And use cases such as log aggregation or IoT sensor analytics required very different deployment characteristics than an instant payment platform or fraud detection and prevention.

Choose the right Kafka deployment model for your use case. Even within one organization, you will probably need different deployments because of security, data privacy and compliance requirements, but also staying cost efficient for high-volume workloads.

BYOC for Apache Kafka with WarpStream

Self-managed Kafka and fully managed Kafka are pretty well understood in the meantime. However, why is BYOC needed as a third option and how to do it right?

I had plenty of customer conversations across industries. Common feedback is that most organizations have a cloud-first strategy, but many also (have to) stay hybrid for security, latency or cost reasons.

And let’s be clear: If a data streaming project goes to the cloud, fully managed Kafka (and Flink) should always be the first option as it is much easier to manage and operate to focus on fast time to market and business innovation. Having said that, sometimes, security, cost or other reasons require BYOC.

How Is BYOC Implemented in WarpStream?

Let’s explore why WarpStream is an excellent option for Kafka as BYOC deployment and when to use it instead of serverless Kafka in the cloud:

WarpStream provides BYOC, meaning single-tenant service so that a customer has its own “instance” of Kafka (to use the protocol, it is not Apache Kafka under the hood).
However, under the hood, the system still uses cloud-native serverless systems like Amazon S3 for scalability, cost-efficiency and high availability (but the customer does not see this complexity and does not have to care about it).
As a result, the data plane is still customer managed (that’s what they need for security or other reasons), but in contrary to self-managed Kafka, the customer does not need to worry about the complexity under the hood (like rebalancing, rolling upgrades, backups) – that is what S3 and other magic code of the WarpStream service takes over.
The magic is the stateless agents in the customer VPC. It makes this solution scalable and still easy to operate (compared to the self-managed deployment option) while the customer has its own instance.
Many use cases are around lift and shift of existing Kafka deployments (like self-managed Apache Kafka or another vendor like Kafka as part of Cloudera or Red Hat). Some companies want to “lift and shift” and keep the feeling of control they are used to, while still offloading most of the management to the vendor.

I wrote this summary after reading the excellent article of my colleague Jack Vanlightly: BYOC, Not “The Future Of Cloud Services” But A Pillar Of An Everywhere Platform. This article goes into much more technical detail and is a highly recommended read for any architect and developer.

Benefits of WarpStream’s BYOC Implementation for Kafka

Most vendors have dubios BYOC implementations.

For instance, if the vendor needs to access the VPC of the customercheaper than AK self managed because cloud native (zero disks, zero interzone networking fees) and headaches for responsibilities in the case of failures.

WarpStream’s BYOC-native implementation differs from other vendors and provides various benefits because of its novel architecture:

WarpStream does not need access to the customer VPC. The data plane (i.e., the brokers in the customer VPC) are stateless. The metadata/consensus is in the control plane (i.e., the cloud service in the WarpStream VPC).
The architecture solves sovereignty challenges and is a great fit for security and compliance requirements.
The cost of WarpStream’s BYOC offering is cheaper than self-managed Apache Kafka because it is built with cloud-native concepts and technologies in mind (e.g., zero disks and zero interzone networking fees, leveraging cloud object storage such as Amazon S3).
The stateless architecture in the customer VPC makes autoscaling and elasticity very easy to implement/configure.

The Main Drawbacks of BYOC for Apache Kafka

BYOC is an excellent choice if you have specific security, compliance or cost requirements that need this deployment option. However, there are some drawbacks:

The latency is worse than self-managed Kafka or serverless Kafka as WarpStream directly touches the Amazon S3 object storage (in contrast to “normal Kafka”).
Kafka using BYOC is NOT fully managed, like e.g. Confluent Cloud, so you have more efforts to operate it. Also, keep in mind that most Kafka cloud services are NOT serverless but just provision Kafka for you and you still need to operate it.
Additional components of the data streaming platform (such as Kafka Connect connectors and stream processors such as Kafka Streams or Apache Flink) are not part of the BYOC offering (yet). This adds some complexity to operations and development.

Therefore, once again, I recommend to only look at BYOC options for Apache Kafka in the public cloud if a fully managed and serverless data streaming platform does NOT work for you because of cost, security or compliance reasons!

BYOC Complements Self-Managed and Serverless Apache Kafka – But BYOC Should NOT be the First Choice!

BYOC (Bring Your Own Cloud) offers a flexible and powerful deployment model, particularly beneficial for businesses with specific security or compliance needs. By allowing organizations to manage applications within their own cloud VPCs, BYOC combines the advantages of cloud infrastructure control with the flexibility of third-party service integration.

But once again: If a data streaming project goes to the cloud, fully managed Kafka (and Flink) should always be the first option as it is much easier to manage and operate to focus on fast time to market and business innovation. Choose BYOC only if fully managed does not work for you, e.g. because of security requirements.

In the data streaming domain around Apache Kafka, the BYOC model complements existing self-managed and fully managed options. It offers a middle ground that balances ease of operation with enhanced privacy and security. Ultimately, BYOC helps companies tailor their cloud environments to meet diverse and developing business requirements.

What is your deployment option for Apache Kafka? A self-managed deployment in the data center or at the edge? Serverless Cloud with a service such as Confluent Cloud? Or did you (have to) choose BYOC? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Deployment Options for Apache Kafka: Self-Managed, Fully-Managed / Serverless and BYOC (Bring Your Own Cloud) appeared first on Kai Waehner.

Multi-Cloud Replication in Real-Time with Apache Kafka and Cluster Linking

Kai Waehner — Wed, 14 Aug 2024 06:07:28 +0000

Multiple Apache Kafka clusters are the norm; not an exception anymore. Hybrid integration and multi-cloud replication for migration or disaster recovery are common use cases. This blog post explores a real-world success story from financial services around the transition of a large traditional bank from on-premise data centers into the public cloud for multi-cloud data sharing between AWS and Azure.

What is Multi-Cloud and How Does Apache Kafka Help?

Multi-cloud refers to the use of multiple cloud computing services from different providers in a single heterogeneous IT environment. This approach enhances flexibility, performance, and reliability while avoiding vendor lock-in.

Here are the key benefits of multi-cloud:

Avoidance of Vendor Lock-In: By utilizing multiple cloud providers, organizations can avoid dependency on a single vendor, reducing the risk associated with vendor-specific outages and price changes.
Optimization of Performance and Cost: Different cloud providers offer varying strengths, pricing models, and geographic availability. Multi-cloud strategies enable organizations to choose the best provider for each workload to optimize performance and cost.
Enhanced Redundancy and Resilience: Multi-cloud setups can provide higher availability and disaster recovery capabilities by distributing workloads across multiple cloud environments, thus reducing the impact of localized outages.
Regulatory and Compliance Benefits: Some industries and regions have specific regulatory requirements that may be easier to meet using a multi-cloud approach, ensuring data residency and compliance.

The real-time capabilities of Apache Kafka are a perfect match for multi-cloud architectures. Information is replicated and synchronized directly after the creation of an event. Apache Kafka’s combination of high throughput, low latency, durability provides strong data consistency guarantees across multiple cloud providers like AWS, Azure, GCP and Alibaba; no matter if the data sources or data sinks are real time, batch or API-driven request

One Apache Kafka Cluster Does NOT Fit All Use Cases

Organizations require multiple Kafka cluster strategies for various use cases: Hybrid integration, aggregation, migration and disaster recovery. I explored the architecture options and trade-offs in a dedicated blog post: “Apache Kafka Cluster Type Deployment Strategies“.

Multi-cloud is a special case with even higher challenges regarding security, cost, and latency. Nevertheless, all larger organizations have multi-cloud infrastructure and integration needs. Let’s explore the multi-cloud journey of fidelity investments and how data streaming with Apache Kafka helps.

Fidelity’s Hybrid Cloud Data Streaming Journey

Fidelity Investments is a leading financial services company that provides a wide range of investment management, retirement planning, brokerage, and wealth management services to individuals and institutions. Founded in 1946, Fidelity has grown to become one of the largest asset managers in the world, with trillions of dollars in assets under management. The company is known for its comprehensive research tools, innovative technology platforms, and commitment to customer service, helping clients achieve their financial goals.

Fidelity Investments presented at Kafka Summit events about their data streaming journey transitioning from on-premise to hybrid cloud infrastructure.

Source: Fidelity Investments

Fidelity’s Event Streaming Platform

Fidelity Investments built a streaming platform that hosts business application events for event driven architectures and integration with other business applications through events and streams.

Source: Fidelity Investments

A few impressive numbers about from Fidelity’s event streaming platform infrastructure (presented at Kafka Summit London in 2023):

4 years in public cloud
16k+ producer and consumer applications
6B+ events per day
72+ self-service APIs
300+ observability metrics

One of the first critical use cases was the integration and offloading from IBM z Systems mainframe via IBM MQ, Kafka Connect (deployed on the mainframe) and Confluent Platform.

For architectures and best practices around mainframe modernization, check out my article “Mainframe Integration, Offloading and Replacement with Apache Kafka“.

Fidelity’s Cloud Journey: From Point-to-Point to Decoupling Applications with Kafka and Cluster Linking between AWS and Azure

The Kafka Summit talk “Multi-Cloud Data Sharing: Make the Data Move for you Across CSPs using Cluster Linking” explored Fidelity Investment’s transition to the cloud.

Fidelity Investments designed its multi-cloud event streaming platform to enable applications residing in different cloud service providers to seamlessly share data between them.

BEFORE: Point-to-Point Multi-Cloud Replication

Many architects call this the spaghetti integration architecture. All applications do a point-to-point connection to each other application:

Source: Fidelity Investments

This setup is costly, error-prone and hard to maintain or innovate.

One of Apache Kafka’s unique values is truly decoupling between applications. The event-based durable commit log guarantees data consistency but also allows choosing the right technology or API in each business unit.

Replication between the Kafka clusters running in different cloud infrastructures and regions on AWS and Azure is implemented with Confluent Cluster Linking.

Source: Fidelity Investments

Confluent Cluster Linking plays a crucial role in this design for real-time replication between Kafka clusters. It uses the Kafka protocol for replication to provide all the benefits of Kafka and no needed infrastructure and operations overhead with tools like MirrorMaker. Using the Kafka protocol for multi-cloud replication also affects, i.e., reduces the network cost significantly because it avoids many translations and compression tasks required by MirrorMaker.

Fidelity’s Multi-Cloud Requirements: Data Ownership, Data Contracts and Self-Service API

Besides reliable real-time replication across multi-cloud environments, Fidelity Investments’ other important aspects include for the multi-cloud Kafka enterprise architecture include:

Data ownership in a multi-cloud environment
Schema Registry to provide common data contracts (often called data products in a data mesh architecture) across Kafka clusters in different cloud providers
Self-service management API plane allowing teams to manage their multi-cloud topic replications with as little as a single configuration change.

Transition to Cloud with Data Streaming Across Industries

Multi-cloud use cases include data integration, migration, aggregation and disaster recovery scenarios. Here are a few real-world examples from the financial services, healthcare and telecom sector:

Even if you do not plan multi-cloud infrastructure because focusing on a single service provider across regions, you can be sure: The next merger and acquisition (M&A) comes for sure… Multi-cloud scenarios are not an exception, but the norm in larger organizations.

Do you already deploy across multiple cloud providers? What are the use cases? How do you efficiently and reliably integrate these environments? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Multi-Cloud Replication in Real-Time with Apache Kafka and Cluster Linking appeared first on Kai Waehner.

Apache Kafka Cluster Type Deployment Strategies

Kai Waehner — Mon, 29 Jul 2024 06:34:49 +0000

Organizations start their data streaming adoption with a single Apache Kafka cluster to deploy the first use cases. The need for group-wide data governance and security but different SLAs, latency, and infrastructure requirements introduce new Kafka clusters. Multiple Kafka clusters are the norm, not an exception. Use cases include hybrid integration, aggregation, migration, and disaster recovery. This blog post explores real-world success stories and cluster strategies for different Kafka deployments across industries.

Apache Kafka – The De Facto Standard for Event-Driven Architectures and Data Streaming

Apache Kafka is an open-source, distributed event streaming platform designed for high-throughput, low-latency data processing. It allows you to publish, subscribe to, store, and process streams of records in real time.

Kafka serves as a popular choice for building real-time data pipelines and streaming applications. The Kafka protocol became the de facto standard for event streaming across various frameworks, solutions, and cloud services. It supports operational and analytical workloads with features like persistent storage, scalability, and fault tolerance. Kafka includes components like Kafka Connect for integration and Kafka Streams for stream processing, making it a versatile tool for various data-driven use cases.

While Kafka is famous for real-time use cases, many projects leverage the data streaming platform for data consistency across the entire enterprise architecture, including databases, data lakes, legacy systems, Open APIs, and cloud-native applications.

Different Apache Kafka Cluster Types

Kafka is a distributed system. A production setup usually requires at least four brokers. Hence, most people automatically assume that all you need is a single distributed cluster you scale up when you add throughput and use cases. This is not wrong in the beginning. But…

One Kafka cluster is NOT the right answer for every use case. Various characteristics influence the architecture of a Kafka cluster:

Availability: Zero downtime? 99.99% uptime SLA? Non-critical analytics?
Latency: End-to-end processing in <100ms (including processing)? 10-minute end-to-end data warehouse pipeline? Time travel for re-processing historical events?
Cost: Value vs. cost? Total Cost of Ownership (TCO) matters! For instance, in the public cloud, networking can be up to 80% of the total Kafka cost!
Security and Data Privacy: Data privacy (PCI data, GDPR, etc.)? Data governance and compliance? End-to-end encryption on the attribute level? Bring your own key? Public access and data sharing? Air-gapped edge environment?
Throughput and Data Size: Critical transactions (typically low volume)? Big data feeds (clickstream, IoT sensors, security logs, etc.)?

Related topics like on-premise vs. public cloud, regional vs. global, and many other requirements also affect the Kafka architecture.

Apache Kafka Cluster Strategies and Architectures

A single Kafka cluster is often the right starting point for your data streaming journey. It can onboard multiple use cases from different business domains and process gigabytes per second (if operated and scaled the right way). However, depending on your project requirements, you need an enterprise architecture with multiple Kafka clusters. Here are a few common examples:

Hybrid Architecture: Data integration and uni- or bi-directional data synchronization between multiple data centers. Often, connectivity between an on-premise data center and a public cloud service provider. Offloading from legacy into cloud analytics is one of the most common scenarios. But command & control communication is also possible, i.e., sending decisions/recommendations/transactions into a regional environment (e.g., storing a payment or order from a mobile app in the mainframe).
Multi-Region / Multi-Cloud: Data replication for compliance, cost, or data privacy reasons. Data sharing usually only includes a fraction of the events, not all Kafka Topics. Healthcare is one of many industries that goes this direction.
Disaster Recovery: Replication of critical data in active-active or active-passive mode between different data centers or cloud regions. Includes strategies and tooling for fail-over and fallback mechanisms in the case of a disaster to guarantee business continuity and compliance.
Aggregation: Regional clusters for local processing (e.g., pre-processing, streaming ETL, stream processing business applications) and replication of curated data to the big data center or cloud. Retail stores are an excellent example.
Migration: IT modernization with a migration from on-premise into the cloud or from self-managed open source into a fully managed SaaS. Such migrations can be done with zero downtime or data loss while the business continues during the cut-over.
Edge (Disconnected / Air-Gapped): Security, cost, or latency require edge deployments, e.g. in a factory or retail store. Some industries deploy in safety-critical environments with unidirectional hardware gateway and data diode.
Single Broker: Not resilient, but sufficient for scenarios like embedding a Kafka broker into a machine or on an Industrial PC (IPC) and replicating aggregated data into a large cloud analytics Kafka cluster. One nice example is the installation of data streaming (including integration and processing) on a computer of a soldier on the battlefield.

Bridging Hybrid Kafka Clusters

These options can be combined. For instance, a single broker at the edge typically replicates some curated data to a remote data center. And hybrid clusters have such different architectures depending on how they are bridged: connections over public internet, private link, VPC peering, and transit gateway, etc.

Having seen the development of Confluent Cloud over the years, I totally underestimated how much engineering time needs to be spent on security and connectivity. However, missing security bridges are the main blocker for the adoption of a Kafka cloud service. So, there is no way around providing various security bridges between Kafka clusters beyond just public internet.

There are even use cases where organizations need to replicate data from the data center to the cloud but the cloud service is NOT allowed to initiative the connection. Confluent built a specific feature “source-initiated link” for such security requirements where the source (i.e., the on-premise Kafka cluster) always initiates the connection – even though the cloud Kafka clusters is consuming the data:

Source: Confluent

As you see, it gets complex quickly. Find the right experts to help you from the beginning; not after you already deployed the first clusters and applications.

A long time ago, I already described in a detailed presentation of the architecture patterns for distributed, hybrid, edge, and global Apache Kafka deployments. Look at that slide deck and video recording for more details about the deployment options and trade-offs.

RPO vs. RTO = Data Loss vs. Downtime

RPO and RTO are two critical KPIs you need to discuss before deciding for a Kafka cluster strategy:

RPO (Recovery Point Objective) is the maximum acceptable amount of data loss measured in time, indicating how frequently backups should occur to minimize data loss.
RTO (Recovery Time Objective) is the maximum acceptable duration of time it takes to restore business operations after a disruption. Together, they help organizations plan their data backup and disaster recovery strategies to balance cost and operational impact.

While people often start with the goal of RPO = 0 and RTO = 0, they quickly realize how hard (but not impossible) it is to get this. You need to decide how much data are you okay to lose in a disaster? You need a disaster recovery plan if disaster strikes. The legal and compliance teams will have to tell you if it is okay to lose a few data sets in case of disaster or not. These any many other challenges need to be discussed when evaluating your Kafka cluster strategy.

The replication between Kafka clusters with tools like MIrrorMaker or Cluster Linking is asynchronous and RPO > 0. Only a stretched Kafka cluster provides RPO = 0.

Stretched Kafka Cluster – Zero Data Loss with Synchronous Replication across Data Centers

Most deployments with multiple Kafka clusters use asynchronous replication across data centers or clouds via tools like MirrorMaker or Confluent Cluster Linking. This is good enough for most use cases. But in case of a disaster, you lose a few messages. The RPO is > 0.

A stretched Kafka cluster deploys Kafka brokers of ONE SINGLE CLUSTER across three data centers. The replication is synchronous (as this is how Kafka replicates data within one cluster) and guarantees zero data loss (RPO = 0) – even in the case of a disaster!

Why shouldn’t you always do stretched clusters?

Low latency (<~50ms) and stable connection required between data centers
Three (!) data centers are needed, two is not enough as a majority (quorum) must acknowledge writes and reads to ensure the system’s reliability
Hard to set up, operate, and monitor – much harder than a cluster running in one data center
Cost vs. value is not worth it in many use cases – during a real disaster, most organizations and use cases have bigger problems than losing a few messages (even if it is critical data like a payment or order).

To be clear: In the public cloud, a region usually has three data centers (= availability zones). Hence, in the cloud, it depends on your SLAs if one cloud region counts as a stretched cluster or not. Most SaaS Kafka offerings deploy in a stretched cluster here. However, many compliance scenarios do NOT see a Kafka cluster in one cloud region as good enough for guaranteeing SLAs and business continuity if a disaster strikes.

Confluent built a dedicated product to solve (some of) these challenges: Multi-Region Clusters (MRC). It provides capabilities to do synchronous and asynchrounous replication within a stretched Kafka cluster.

For example, in a financial services scenario, MRC replicates low-volume critical transactions synchronously, but high-volume logs asynchronously:

handles ‘Payment’ transactions enter from us-east and us-west with fully synchronous replication
‘Log’ and ‘Location’ information in the same cluster use async – optimized for latency
Automated disaster recovery (zero downtime, zero data loss)

More details about stretched Kafka clusters vs. active-active / active-passive replication between two Kafka clusters in my global Kafka presentation.

Pricing of Kafka Cloud Offerings (vs. Self-Managed)

The above sections explain why you need to consider different Kafka architectures depending on your project requirements. Self-managed Kafka clusters can be configured the way you need. In the public cloud, fully managed offerings look different (the same way as any other fully managed SaaS). Pricing is different because SaaS vendors need to configure reasonable limits. The vendor has to provide specific SLAs.

The data streaming landscape includes various Kafka cloud offerings. Here is an example of Confluent’s current cloud offerings, including multi-tenant and dedicated environments with different SLAs, security features, and cost models.

Source: Confluent

Make sure to evaluate and understand the various cluster types from different vendors available in the public cloud, including TCO, provided uptime SLAs, replication costs across regions or cloud providers, and so on. The gaps and limitations are often intentionally hidden in the details.

For instance, if you use Amazon Managed Streaming for Apache Kafka (MSK), you should be aware that the terms and conditions tell you that “The service commitment does not apply to any unavailability, suspension or termination … caused by the underlying Apache Kafka or Apache Zookeeper engine software that leads to request failures”.

But pricing and support SLAs are just one critical piece of such a comparison. There are lots of “build vs. buy” decisions you have to make as part of evaluating a data streaming platform, as I pointed out in my detailed article comparing Confluent to Amazon MSK Serverless.

Kafka Storage – Tiered Storage and Iceberg Table Format to Store Data Only Once

Apache Kafka added Tiered Storage to separate compute and storage. The capability enables more scalable, reliable, and cost-efficient enterprise architectures. Tiered Storage for Kafka enables a new Kafka cluster type: Storing Petabytes of data in the Kafka commit log in a cost-efficient way (like in your data lake) with timestamps and guaranteed ordering to travel back in time for re-processing historical data. KOR Financial is a nice example of using Apache Kafka as a database for long-term persistence.

Kafka enables a Shift Left Architecture to store data only once for operational and analytical datasets:

With this in mind, think again about the use cases I described above for multiple Kafka clusters. Should you still replicate data in batch at rest in the database, data lake, or lakehouse from one data center or cloud region to another? No. You should synchronize data in real-time, store the data once (usually in an object store like Amazon S3), and then connect all analytical engines like Snowflake, Databricks, Amazon Athena, Google Cloud BigQuery, and so on to this standard table format.

Learn more about the unification of operational and analytical data in my article “Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming“.

Real-World Success Stories for Multiple Kafka Clusters

Most organizations have multiple Kafka clusters. This section explores four success stories across different industries:

Paypal (Financial Services) – US: Instant payments, fraud prevention.
JioCinema (Telco/Media) – APAC: Data integration, clickstream analytics, advertisement, personalization.
Audi (Automotive/Manufacturing) – EMEA: Connected cars with critical and analytical requirements.
New Relic (Software/Cloud) – US: Observability and application performance management (APM) across the world.

Paypal – Separation by Security Zone

PayPal is a digital payment platform that allows users to send and receive money online securely and conveniently around the world in real time. This requires a scalable, secure and compliant Kafka infrastructure.

During the 2022 Black Friday, Kafka traffic volume peaked at about 1.3 trillion messages daily! At present, PayPal has 85+ Kafka clusters, and every holiday season they flex up their Kafka infrastructure to handle the traffic surge. The Kafka platform continues to seamlessly scale to support this traffic growth without any impact on their business.

Today, PayPal’s Kafka fleet consists of over 1,500 brokers that host over 20,000 topics. The events are replicated among the clusters, offering 99.99% availability.

Kafka cluster deployments are separated into different security zones within a data center:

Source: Paypal

The Kafka clusters are deployed across these security zones, based on data classification and business requirements. Real-time replication with tools such as MirrorMaker (in this example, running on Kafka Connect infrastructure) or Confluent Cluster Linking (using a simpler and less error-prone approach directly using the Kafka protocol for replication) is used to mirror the data across the data centers, which helps with disaster recovery and to achieve inter-security zone communication.

JioCinema – Separation by Use Case and SLA

JioCinema is a rapidly growing video streaming platform in India. The telco OTT service is known for its expansive content offerings, including live sports like the Indian Premier League (IPL) for cricket, a newly launched Anime Hub, and comprehensive plans for covering major events like the Paris 2024 Olympics.

The data architecture leverages Apache Kafka, Flink, and Spark for data processing, as presented at Kafka Summit India 2024 in Bangalore:

Source: JioCinema

Data streaming plays a pivotal role in various use cases to transform user experiences and content delivery. Over ten million messages per second enhance analytics, user insights, and content delivery mechanisms.

JioCinema’s use cases include:

Inter Service Communication
Clickstream / Analytics
Ad Tracker
Machine Learning and Personalization

Kushal Khandelwal, Head of Data Platform, Analytics, and Consumption at JioCinema, explained that not all data is equal and the priorities and SLAs differ per use case:

Source: JioCinema

Data streaming is a journey. Like so many other organizations worldwide, JioCinema started with one large Kafka cluster using 1000+ Kafka Topics and 100,000+ Kafka Partitions for various use cases. Over time, a separation of concerns regarding use cases and SLAs developed into multiple Kafka clusters:

Source: JioCinema

The success story of JioCinema shows the common evolution of a data streaming organization. Let’s now explore another example where two very different Kafka clusters were deployed from the beginning for one use case.

Audi – Operations vs. Analytics for Connected Cars

The car manufacturer Audi provides connected cars featuring advanced technology that integrates internet connectivity and intelligent systems. Audi’s cars enable real-time navigation, remote diagnostics, and enhanced in-car entertainment. These vehicles are equipped with Audi Connect services. Features include emergency calls, online traffic information, and integration with smart home devices, to enhance convenience and safety for drivers.

Source: Audi

Audi presented their connected car architecture in the keynote of Kafka Summit in 2018. The Audi enterprise architecture relies on two Kafka clusters with very different SLAs and use cases.

Source: Audi

The Data Ingestion Kafka cluster is very critical. It needs to run 24/7 at scale. It provides last-mile connectivity to millions of cars using Kafka and MQTT. Backchannels from the IT side to the vehicle help with service communication and over-the-air updates (OTA).

ACDC Cloud is the analytics Kafka cluster of Audi’s connected car architecture. The cluster is the foundation of many analytical workloads. These process enormous volumes of IoT and log data at scale with batch processing frameworks, like Apache Spark.

This architecture was already presented in 2018. Audi’s slogan “Progress through Technology” shows how the company applied new technology for innovation long before most car manufacturers deployed similar scenarios. All sensor data from the connected cars is processed in real time and stored for historical analysis and reporting.

New Relic – Worldwide Multi-Cloud Observability

New Relic is a cloud-based observability platform that provides real-time performance monitoring and analytics for applications and infrastructure to customers around the world.

Andrew Hartnett, VP of Software Engineering, at New Relic explains how data streaming is crucial for the entire business model of New Relic:

“Kafka is our central nervous system. It is a part of everything that we do. Most services across 110 different engineering teams with hundreds of services touch Kafka in some way, shape, or form in our company, so it really is mission-critical. What we were looking for is the ability to grow, and Confluent Cloud provided that.”

New Relic ingested up to 7 billion data points per minute; on track to ingest 2.5 exabytes of data in 2023. As New Relic expands its multi-cloud strategies, teams will use Confluent Cloud for a single pane of glass view across all environments.

“New Relic is multi-cloud. We want to be where our customers are. We want to be in those same environments, in those same regions, and we wanted to have our Kafka there with us.” says Artnett in a Confluent case study.

Multiple Kafka Clusters are the Norm; Not an Exception!

Event-driven architectures and stream processing have existed for decades. The adoption grows with open source frameworks like Apache Kafka and Flink in combination with fully managed cloud services. More and more organizations struggle with their Kafka scale. Enterprise-wide data governance, center of excellence, automation of deployment and operations, and enterprise architecture best practices help to successfully provide data streaming with multiple Kafka clusters for independent or collaborating business domains.

Multiple Kafka clusters are the norm, not an exception. Use cases such as hybrid integration, disaster recovery, migration or aggregation enable real-time data streaming everywhere with the needed SLAs.

How does your enterprise architecture look like? How many Kafka clusters do you have? And how do you decide about data governance, separation of concerns, multi-tenancy, security, and similar challenges in your data streaming organization? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka Cluster Type Deployment Strategies appeared first on Kai Waehner.

My Data Streaming Journey with Kafka & Flink: 7 Years at Confluent

Kai Waehner — Fri, 03 May 2024 01:31:02 +0000

Time flies… I joined Confluent seven years ago when Apache Kafka was mainly used by a few tech giants and the company had ~100 employees. This blog post explores my data streaming journey, including Kafka becoming a de facto standard for over 100,000 organizations, Confluent doing an IPO on the NASDAQ stock exchange, 5000+ customers adopting a data streaming platform, and emerging new design approaches and technologies like data mesh, GenAI, and Apache Flink. I look at the past, present and future of my personal data streaming journey. Both, from the evolution of technology trends and the journey as a Confluent employee that started in a Silicon Valley startup and is now part of a global software and cloud company.

Disclaimer: Everything in this article reflects my personal opinions. This is particularly important when you talk about the outlook of a publicly listed company.

PAST: Apache Kafka is pretty much unknown outside of Silicon Valley in 2017

When I joined Confluent in 2017, most people did not know about Apache Kafka. Confluent was in the early stage with ~100 employees.

Tech: Big data with Hadoop and Spark as “game changer”; Kafka only the ingestion layer

2017 was a time where most companies installed Cloudera or Hortonworks. Hadoop + Spark and Kafka as ingestion layer. That was the starting point of using Apache Kafka. The predominant use case for Kafka at that time was data ingestion into Hadoop’s storage system HDFS. Map Reduce and later Apache Spark batch processes analyzed big data sets.

“The cloud” was not that big thing yet and the container wars were still going on (Kubernetes vs. Cloud Foundry vs. Mesosphere).

I announced my departure from TIBCO and the fact that I will join Confluent in a blog post in May 2017: “Why I move (back) to open source for messaging, integration and stream processing“. If you look at my predictions, my outlook was not too bad.

I was right about disruptive trends:

Massive adoption of open source (beyond Linux)
Companies moved from batch to real-time because real-time data beats slow data in almost all use cases across industries
Adoption of machine learning for improving existing business processes and innovation
From the Enterprise Service Bus (ESB) – called iPaaS in the cloud today – to more cloud-native middleware, i.e., Apache Kafka (many Kafka projects today are integration projects)
Kafka being complementary to other data platforms, including data warehouse, data lake, analytics engines, etc.

What I did not see coming:

The massive transition to the public cloud
Apache Kafka being an event store for out-of-the-box capabilities like true decoupling of applications (what is the foundation and de facto standard for event-based microservices and data mesh today) and replayability of historical events in guaranteed ordering with timestamps
Generative AI (GenAI) as a specific pillar of AI / ML (did anyone see this coming seven years ago?)

Company: Confluent is a silicon valley startup with ~100 people making Kafka enterprise-ready

Confluent was a traditional Silicon Valley startup in 2017 with ~100 employees and backed by venture capital. The initial HR conversation and first interview (and company pitch) was done by our CEO Jay Kreps. I still have Jay’s initial email in my mailbox. It started with this:

“Hi Kai, I’m the CEO of Confluent and one of the co-creators of Apache Kafka. Given your experience at Tibco maybe you’ve run into Kafka before? Confluent is the company we created that is taking Kafka to the enterprise. […] We aim to go beyond just messaging and really make the infrastructure for real-time data streams a foundational element of a modern company’s data architecture.“

In 2017, Confluent was just starting to kick off its global business. The United Kingdom and Germany are usually the first two countries outside the US for Silicon Valley startups because of their large economy and no language barrier in the UK.

I will not publish my response to Jay’s email with respect to my former employer, but I still can quote the following sentence I responded: “I really like to hear from you that you want to go beyond just messaging and really make the infrastructure for real-time data streams a foundational element of a modern company’s data architecture“. Mission accomplished. That’s where Confluent is today.

Working in an international overlay role for Confluent…

I also pointed out in the very first email to Confluent that I am already in an overlay role and work internationally. While I was officially starting as the first German employee for presales and solution consulting, I only signed the contract because everybody in Confluent’s executive team understood and agreed on supporting me to continue my career in an overlay role doing a mix of sales, marketing, enablement, evangelism, etc. internationally.

A few weeks later, I was already in the Confluent headquarters in Palo Alto, California:

I am still more or less in the same role today. However, with much more focus on executive conversations and business perspective instead of “just” technology. Like the technology developed, so did the conversations. Check out my career analysis describing what I do as Field CTO if you want to learn more.

While Confluent moved to a bigger office in Mountain View, California, the Kafka tree still exists today:

PRESENT: Everyone knows Kafka (and Confluent); most companies use it already in 2024

Apache Kafka is used in most organizations in 2024. Many enterprises are still in the early stage of the adoption curve and are building some streaming data pipelines. However, several companies, not just in Silicon Valley but worldwide and across industries, already matured and leverage stream processing with Kafka Streams or Apache Flink for advanced and critical use cases like fraud detection, context-specific customer personalization or predictive maintenance.

Tech: Apache Kafka is the de facto standard for data streaming

Over 100,000 organizations use Apache Kafka in 2024. This is an insane number. Apache Kafka became the de facto standard for data streaming. Data streaming is much more than just real-time messaging for transactional and analytics workloads. Most customers leverage Kafka Connect for data integration scenarios to legacy and cloud-native data sources and sinks. Confluent provides an entire ecosystem of integration capabilities like 100+ connectors, clients for any programming language, Confluent Stream Sharing, any many other integration alternatives. Always with security and data governance in mind. Enterprise-ready, as most people call it. And in the cloud, all of this is fully managed.

In the meantime, Apache Flink establishes itself as de facto standard for stream processing. Here is an interesting diagram showing the analogy of growth:

Source: Confluent

Various vendors build products and cloud services around the two successful open source data streaming frameworks: Apache Kafka and/or Flink. Some vendors leverage the open source frameworks, while others only rely on the open protocol to implement their own solutions to differentiate:

All major cloud providers provide Kafka as a service (AWS, Azure, GCP, Alibaba).
Many of the largest traditional software players include a Kafka product (including IBM, Oracle, and many more).
Established data companies support Kafka and/or Flink, like Confluent, Cloudera, Red Hat, Ververica, etc.
New startups emerge, including Redpanda, WarpStream, Decodable, and so on.

Data Streaming Landscape (vs. Data Lake, Data Warehouse and Lakehouse)

The current “Data Streaming Landscape 2024” provides a much more detailed overview. There will probably be some consolidation in the market. But it is great to see such an adoption and growth in the data streaming market.

While new concepts (e.g., data mesh) and technologies (e.g., GenAI) emerged in the past few years, one thing is clear: The fundamental value of event-driven architectures using data streaming with Kafka and Flink does not change: data in motion is much more valuable for most use cases and business instead of just storing and analyzing data at rest in a data warehouse, data lake, or in 2024 using an innovative lakehouse.

It is worth reading my blog series comparing data streaming with data lakes and data warehouses. These technologies are complementary, (mostly) not competitive.

I also published a blog series recently exploring how data streaming changes the view on Snowflake (and cloud data platforms in general) from an enterprise architecture perspective:

Company: Confluent is a global player with 3000+ employees and listed on the NASDAQ

Confluent is a global player in the software and cloud industry employing ~3000 reaching $1 Billion ARR soon. As announced a few earnings calls back, the company now also focuses on profit instead of just growth, and the last quarter was the first profitable quarter in the company’s history. This is a tremendous achievement looking into Confluent’s future.

Unfortunately, even in 2024, many people still struggle to understand event-driven architectures and stream processing. One of my major tasks in 2024 at Confluent is to educate people – internal, partners, customers/prospects, and the broader community – about data streaming:

What is data streaming?
What are the benefits, differences, and trade-offs compared to traditional software design patterns (like APIs, databases, message brokers, etc.) and related products/cloud services?
What use cases do companies use data streaming for?
How do industry-specific deployments look like (this differs a lot in financial services vs. retail vs. manufacturing vs. telco vs. gaming vs. public sector)?
What is the business value (reduced cost, increased revenue, reduced disk, improved customer experience, faster time to market)?

“The Past, Present and Future of Stream Processing” shows the evolution and looks at new concepts like emerging streaming databases and the unification of operational and analytical systems using Apache Iceberg or similar table formats.

Confluent is a well-known software and cloud company today. As part of my job, I present at international conferences, give press interviews, brief research analysts like Gartner/Forrester, and write public articles to let people know (in as simple as possible words) what data streaming is and why the adoption is so massive across all regions and industries.

Confluent Partners: Cloud Service Providers, 3rd Party Vendors, System Integrators

Confluent strategically works with cloud service providers (AWS, Azure, GCP, Alibaba), software / cloud vendors (the list is too long to name everyone), and system integrators. While some people still think about a company like AWS as an enemy, it is much more a friend to co-sell data streaming in combination with other cloud services via the Amazon marketplace.

The list of strategic partners grows year by year. One of the most exciting announcements of 2023 was the strategic partnership between SAP and Confluent to connect S/4Hana ERP and other systems with the rest of the software and cloud world using Confluent.

Confluent Customers: From Open Source Kafka to Hybrid Multi-Cloud

Confluent has over 5000 customers already. I talk about many of these customer journeys in by blogs. Just search for your favorite industry to learn more. One exciting example is the evolution of the data streaming adoption at BMW. Coming from a wild zoo of deployments, BMW has standardized on Confluent, including self-service, data governance, and global rollouts for smart factory, logistics, direct-to-consumer sales and marketing, and many other use cases.

BMW hosts an internal Kafka Wiesn (= Oktoberfest) every year where we sponsor some pretzels and internal teams and external partners like Michelin present new projects, use cases, success stories, architectures and best practices around the data streaming world for transactional and analytical workloads. Here is a picture of our event in 2023 where my colleague Evi Schneider and I visited the BMW headquarters:

FUTURE: Data streaming is a new software category in 2024+

Thinking about Gartner’s famous hype cycle, we are reaching the “plateau of productivity”. Thanks to mature open source frameworks, sophisticated (but far from perfect) products, and fully managed SaaS cloud offerings make the mass adoption of data streaming possible in the next few years.

Tech: Standards and (multi-cloud) SaaS are the new black

Data streaming is much more than just a better or more scalable messaging solution, a new integration platform, or a cloud-native processing platform. Data streaming is a new software category. Period. Even open source Kafka provides so many capabilities people don’t know, for instance, exactly-once semantics (EOS) for transactions, tiered storage API for separation of compute and storage, Kafka Connect for data integration and Kafka Streams for stream processing (both natively using the Kafka protocol), and so much more.

In December 2023, the research company Forrester published “The Forrester Wave: Streaming Data Platforms, Q4 2023“. Get free access to the report here. The leaders are Microsoft, Google, and Confluent, followed by Oracle, Amazon, Cloudera, and a few others. IDC followed in early 2024 with a similar report. This is a huge proof that the category of data streaming is attested. A Gartner Magic Quadrant for Data Streaming will hopefully (and likely) follow soon, too…

Cloud services make mass adoption very easy and affordable. Consumption-based pricing allows cost-sensitive exploration and adoption. I won’t take a look at different competing offerings in this blog post, check out the “Data Streaming Landscape 2024” and the “Comparison of Open Source Apache Kafka and Vendor Solutions / Cloud Service” for more details. I just want to say one thing: Make sure to evaluate open source frameworks and different products correctly. Read the terms and conditions. Understand the support agreements and expertise of a vendor. If a product offers you “Kafka as Windows .exe file download” or a cloud provider “excludes Kafka support in the support contract from its Kafka cloud offering”, then something is wrong with the offering. Both examples are true and available to be paid for in today’s Kafka landscape!

Company: Confluent is more than just Kafka and Flink; it is becoming the provider of a true data streaming platform

In the past years, the company transitioned from “the Kafka vendor” into a data streaming platform. Confluent still does only one thing (data streaming), but better than everyone else regarding product, support and expertise. I am huge fan of this approach compared to vendors with a similar number of employees that try to (!) solve every (!) problem.

As Confluent is a public company, it is possible to attend the quarterly earnings calls to learn about the product strategy and revenue/growth.

From a career perspective, I still enjoy doing the same thing I did when I started at Confluent seven years ago. I transitioned into the job role of a Global Field CTO, focusing more on executive and business conversations, not just focusing on the technology itself. This is a job role that comes up more and more in software companies. There is no standard definition for this job role. As I regularly get the question about what a Field CTO does, I summarized the tasks in my “Daily Life As A Field CTO“. The post concludes with the answer to how you can also become a Field CTO at a software company in your career.

Data streaming is still in an early stage…

Where are we today with data streaming as a paradigm and Confluent as a company? We are still early. This is comparable to Oracle where they started with a database. Data streaming is accepted as a new software category by many experts and research analysts. But education about the paradigm shift and business value is still one of the biggest challenges. Data streaming is a journey – learn from various companies across industries that already went through this in the past years.

I was really excited to start at Confluent in May 2017. I visited Confluent’s London and Palo Alto headquarters in the first weeks and also attended Kafka Summit in New York. It was an exciting month to get started in an outstanding Silicon Valley startup. Today, I still visit our headquarters regularly for executive briefings, and Kafka Summit or similar events from Confluent like Current and the Data in Motion Tour around the world.

I hope this was an interesting report about my past seven years in the data streaming world at Confluent. What is your opinion about the future of open source technologies like Apache Kafka and Flink, the transition to the cloud, and the outlook for Confluent as a company? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post My Data Streaming Journey with Kafka & Flink: 7 Years at Confluent appeared first on Kai Waehner.

Cloud Archives - Kai Waehner

Virta’s Electric Vehicle (EV) Charging Platform with Real-Time Data Streaming: Scalability for Large Charging Businesses

The Evolution and Challenges of Electric Vehicle (EV) Charging

Virta: Innovating the Future of EV Charging

Virta Platform Connecting 100,000+ Charging Stations Serving Millions of EV Drivers

Innovative Industry Partnerships: Virta and Valeo

The Role of Data Streaming in ESG and EV Charging

Virta’s Data Streaming Transformation

Scaling Challenges and the Need for Real-Time Processing

Deploying A Data Streaming Platform for Scalable EV Charging

Key Benefits of a SaaS Data Streaming Platform for Virta

Data Streaming Landscape: Spoilt for Choice – Open Source Kafka, Confluent, and many other Vendors

Business Impact of a Data Streaming Platform

1. Faster Time to Market

2. Instant Updates for Customers and Operators

3. Cost Savings through Usage-Based Pricing

4. Future-Ready Infrastructure for Advanced Analytics

Beyond EV Charging: Broader Energy and ESG Use Cases

The Future of EV Charging with Real-Time Data Streaming using Kafka and Flink

The Importance of Focus: Why Software Vendors Should Specialize Instead of Doing Everything (Example: Data Streaming)

Specialization vs. Generalization: Why Data Streaming Requires a Focused Approach

The Data Streaming Landscape

The Challenge of Doing Everything

Amazon AWS: Multiple Data Streaming Services, Multiple Choices

Strategy Shift and Rebranding with Multiple Product Portfolios

Google Cloud: Multiple Approaches to Streaming Analytics

Microsoft Azure: Shifting Strategies in Data Streaming

Instaclustr: Too Many Technologies, Not Enough Depth

Cloudera: A Broad Portfolio Without a Clear Focus

Splunk: Repeated Attempts at Data Streaming

Why a Focused Approach Works Better for Data Streaming

Confluent: Focused on Data Streaming, Built for Everywhere

More Than Just “The Kafka Company”

The Right Data Streaming Platform for Every Use Case

Deep Integrations Across Key Ecosystems

Beyond the Leader: Specialized Vendors Shaping Data Streaming

Great Ideas Are Born From Market Pressure

How Customers Benefit from Specialization

Why Deep Expertise Matters: Supporting 24/7, Mission-Critical Data Streaming

The Challenge with Generalist Cloud Services

Why Specialized Vendors Are Essential for Mission-Critical Workloads

The Power of Specialization: Deep Expertise Beats Broad Offerings

Data Streaming as the Technical Foundation for a B2B Marketplace

Subscription Commerce with a Digital Marketplace

The Competitive Landscape for Subscription Commerce

What Makes a B2B Data Marketplace Technically Unique?

Data Streaming as the Backbone of a B2B Data Marketplace

Why Apache Kafka for a Data Marketplace?

Architecture Overview

Data Sharing Beyond Kafka with Stream Sharing and Self-Service Data Portal

Data Marketplace Features and Their Technical Implementation

Self-Service Data Discovery

Real-Time Subscription Management

Usage-Based Billing

Monetization and Revenue Sharing

Compliance and Data Governance

Dynamic Pricing Models

Marketplace Analytics

Real-World Success Story: AppDirect’s Subscription Commerce Platform Powered by a Data Streaming Platform

The Challenge

The Solution

The Outcome

Advantages Over Competitors in the Subscription Commerce and Data Marketplace Business

Data Streaming Revolutionizes B2B Data Sharing

Fully Managed (SaaS) vs. Partially Managed (PaaS) Cloud Services for Data Streaming with Kafka and Flink

The Data Streaming Landscape: Kafka, Flink, Cloud, and More

What is SaaS in Data Streaming?

Characteristics of SaaS in Data Streaming

Examples of SaaS in Data Streaming:

What is PaaS in Data Streaming?

Characteristics of PaaS in Data Streaming

Examples of PaaS in Data Streaming (Kafka, Flink)

SaaS vs. PaaS: Key Differences

SaaS is NOT Always Better than PaaS!

Bring Your Own Cloud (BYOC) as Alternative to PaaS

“Serverless” Claims: Don’t Trust the Marketing

Do Your Own Research

Choosing the Right Model for Your Data Streaming Business: SaaS, PaaS, or BYOC

IoT and Data Streaming with Kafka for a Tolling Traffic System with Dynamic Pricing

Traffic System for Tolls: Use Case, Challenges, and Business Models