Automotive Archives - Kai Waehner

Virta’s Electric Vehicle (EV) Charging Platform with Real-Time Data Streaming: Scalability for Large Charging Businesses

Kai Waehner — Tue, 22 Apr 2025 11:53:00 +0000

The Electric Vehicle (EV) revolution is here, but scaling charging infrastructure and integration with the energy system presents challenges— rapid power supply and demand fluctuations, billing complexity, and real-time availability updates. Virta, a global leader in smart EV charging, is leveraging real-time data streaming to optimize operations, improve user experience, and drive sustainability. By integrating Apache Kafka and Confluent Cloud, Virta ensures seamless energy distribution, predictive maintenance, and dynamic pricing for a smarter, greener future. Read how data streaming is transforming EV charging and enabling scalable, intelligent infrastructure.

I spoke with Jussi Ahtikari (Chief AI Officer at Virta) at a HotTopics C-Suite Exchange about Virta business model around EV charging networks and how they leverage data streaming. The following is a summary of this excellent success story about an innovative EV charging platform.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including several success stories around Kafka and Flink to improve ESG.

The Evolution and Challenges of Electric Vehicle (EV) Charging

The global shift towards electric vehicles (EVs) is accelerating, driven by the surge in variable renewable energy (wind, solar) production, need for sustainable and more cost-efficient transportation solutions, government incentives, and rapid advancements in battery technology. EV charging infrastructure plays a critical role in making this transition successful. It ensures that drivers have access to reliable and efficient charging options while keeping the costs of energy and charging operations in check and energy system in balance.

The innovation in EV charging goes beyond simply providing power to vehicles. Intelligent charging networks, dynamic pricing models, and energy management solutions are transforming the industry. Sustainability is also a key factor, as efficient energy consumption and integration with renewable energy system contribute to environmental, social, and governance (ESG) goals.

While the user and charged energy volumes grow, the real time interplay with the energy system, demand fluctuations, complex billing systems, and real-time station availability updates require a scalable and resilient data infrastructure. Delays in processing real-time data can lead to inefficient energy distribution, poor user experience, and lost revenue.

Virta: Innovating the Future of EV Charging

Virta is a digital cloud platform for electric vehicle (EV) charging businesses and a global leader in connecting of smart charging infrastructure and EV battery capacity with the renewable energy system via bi-directional charging (V2G) and demand response (V1G).

The digital Virta EV Energy platform provides a comprehensive suite of solutions for charging businesses to launch and manage their own EV charging networks. Virta full-service charging platform enables Charging Network and Business Management, Transactions, Pricing, Payments and Invoicing, EV Driver and Fleet Services, Roaming, Energy Management, and Virtual Power Plant services.

Its Charge Point Management System (CPMS) supports over 450 charger models, allowing seamless integration with third-party infrastructure. Virta is the only provider combining CPMS with energy flexibility platform.

Source: Virta

Virta Platform Connecting 100,000+ Charging Stations Serving Millions of EV Drivers

The Virta platform is utilised by professional charge point operators (CPOs) and e-mobility service providers (EMPs) across energy, petrol, retail, automotive and real estate industries in 36 countries in Europe and South-East Asia. Virta is headquartered in Helsinki, Finland.

Virta manages real-time data from well over 100,000 EV charging stations, serving millions of EV drivers, and processes approximately 40 GB of real-time data every hour. Including roaming partnerships, the platform offers EV drivers access to in total over 620,000 public charging stations in over 60 countries.

With this scale, real-time responsiveness is critical. Each time a charging station sends a signal—for example, when a driver starts charging—the platform must immediately trigger a series of actions:

Start billing
Update real-time status in mobile apps
Notify roaming networks
Update metrics and statistics
Conduct fraud checks

At the early days of electric mobility all of these operations could be handled in a monolithic system using tightly coupled and synchronized code. According to Jussi Ahtikari, Chief AI Officer at Virta, this would have made the system “complex, difficult to maintain, and hard to scale” as data volumes grew. Therefore the team identified early a need for a more modular, scalable, and real-time architecture to support its rapid growth and evolving service portfolio.

Innovative Industry Partnerships: Virta and Valeo

Virta is also exploring new opportunities in the EV ecosystem through its partnership with Valeo, a leader in automotive and energy solutions. The companies are working on integrating Valeo’s Ineez charging technology with Virta’s CPMS platform to enhance fleet charging, leasing services, and vehicle-to-grid (V2G) capabilities.

Vehicle-to-grid technology enables EVs to act as distributed energy storage, feeding excess power back into the grid during peak demand. This innovation is expected to play a critical role in balancing electricity supply and demand, contributing to cheaper electricity and more stable renewables based energy system.

The Role of Data Streaming in ESG and EV Charging

Sustainability and environmental responsibility are key drivers of ESG initiatives in industries such as energy, transportation, and manufacturing. Data streaming plays a crucial role in achieving ESG goals by enabling real-time monitoring, predictive maintenance, and energy efficiency improvements.

In the EV charging industry, real-time data streaming supports:

Grid load balancing: Preventing energy spikes and ensuring optimal distribution of power. Example: Tesla’s Energy Platform.
Dynamic pricing: Adjusting charging costs based on demand and electricity availability. Example: Dynamic pricing for road tolling.
Fraud prevention: Detecting unauthorized access or energy theft. Example: Fraud prevention in under 60 seconds.
Predictive maintenance: Identifying potential failures before they occur, reducing downtime. Example: Condition monitoring in manufacturing.
User experience improvements: Providing real-time station availability updates and billing transparency. Example: Improved user experience in mobile apps.

Foreseeing the growing need for these real-time insights led Virta to adopt a data streaming approach with Confluent.

Virta’s Data Streaming Transformation

To maintain its rapid growth and provide an exceptional customer experience, Virta needed a scalable, real-time data streaming solution. The company turned to Confluent’s data streaming platform (DSP), powered by Apache Kafka, to process millions of messages per hour and ensure seamless operations.

Scaling Challenges and the Need for Real-Time Processing

Virta’s rapid growth to scale of millions of charging events and tens of gigawatt hours of charged energy on a monthly basis in Europe and South-East Asia resulted in massive volumes of data that needed to be processed instantly. Something legacy systems, based on sequential authorization, would have struggled with.

Without real-time updates, large scale charging operations would face issues such as:

Unclear station availability
Slow transaction processing
Inaccurate billing information

Initially, Virta worked with open-source Apache Kafka but found managing high-volume data streams at scale to be increasingly resource-intensive. Therefore the team sought an enterprise-grade solution that would remove operational complexities while providing robust real-time capabilities.

Deploying A Data Streaming Platform for Scalable EV Charging

Confluent has become the backbone of Virta’s real-time data architecture. With Confluent’s event streaming platform, Virta is able to maintain a modern event-driven microservices architecture. Instead of tightly coupling all business logic into one system, each charging event—such as a driver starting a session—is published as a single, centralized event. Independent microservices subscribe to that event to trigger specific actions like billing, mobile app updates, roaming notifications, fraud detection, and more.

Here is a diagram of Virta’s cloud-Native microservices architecture powered by AWS, Confluent Cloud, Snowflake, Redis, OpenSearch, and other technologies:

Source: Virta

This architectural shift with an event-driven architecture and the data streaming platform as central nervous system has significantly improved scalability, maintainability, and fault isolation. It has also accelerated innovation with fast roll-out times of new services, including audit trails, improved data governance through schemas, and the foundation for AI-powered capabilities—all built on clean, real-time data streams.

Key Benefits of a SaaS Data Streaming Platform for Virta

As a fully managed data streaming platform, Confluent Cloud has eliminated the need for Virta to maintain Kafka clusters manually, allowing its engineering teams to focus on innovation rather than infrastructure management:

Elastic scalability: Automatically scales up to handle peak loads, ensuring uninterrupted service.
Real-time processing: Supports 45 million messages per hour, enabling immediate updates on charging status and availability.
Simplified development: Tools such as Schema Registry and pre-built APIs provide a standardized approach for developers, speeding up feature deployment.

Data Streaming Landscape: Spoilt for Choice – Open Source Kafka, Confluent, and many other Vendors

To navigate the evolving data streaming landscape, Virta chose a cloud-native, enterprise-grade platform that balances reliability, scalability, cost-efficiency, and ease of use. While many streaming technologies exist, Confluent offered the right trade-offs between operational simplicity and real-time performance at scale.

Read more about the different data streaming frameworks, platforms and cloud services in the data streaming landscape overview:

Business Impact of a Data Streaming Platform

By leveraging Confluent Cloud as its cloud-native and serverless data streaming platform, Virta has realized significant business benefits:

1. Faster Time to Market

Virta’s teams can now deploy new app features, charge points, and business services more quickly. The company has regained the agility of a startup, rolling out improvements without infrastructure bottlenecks.

2. Instant Updates for Customers and Operators

With real-time data streaming, Virta can update station availability and configuration changes in less than a second. This ensures that customers always have the latest information at their fingertips.

3. Cost Savings through Usage-Based Pricing

Virta’s shift to a usage-based pricing model has optimized its operational expenses. Instead of maintaining excess capacity, the company only pays for the resources it consumes.

4. Future-Ready Infrastructure for Advanced Analytics

Virta is building the future of real-time analytics, predictive maintenance, and smart billing by integrating Confluent with Snowflake’s AI-powered data cloud.

By decoupling data streams with Kafka, Virta ensures data consistency, scalability, and agility—enabling advanced analytics without operational bottlenecks.

Beyond EV Charging: Broader Energy and ESG Use Cases

Virta’s success with real-time data streaming highlights broader applications across the energy and ESG sectors. Similar data-driven solutions are being deployed for:

Smart grids: Real-time monitoring of electricity distribution to optimize supply and demand.
Renewable energy integration: Managing wind and solar power fluctuations with predictive analytics.
Industrial sustainability: Tracking carbon emissions and optimizing resource utilization.

The Future of EV Charging with Real-Time Data Streaming using Kafka and Flink

The transition to electric mobility requires more than just an increase in charging stations. The ability to process and act on data in real time is critical to optimizing the use and costs of energy and infrastructure, enhancing user experience, and driving sustainability.

Virta’s usage of a serverless data streaming platform demonstrates the power of real-time data streaming in enabling scalable, efficient, and future-ready EV charging solutions. By eliminating infrastructure constraints, improving responsiveness, and reducing operational costs, Virta is setting new industry standards for innovation in mobility and energy management.

The EV charging landscape will tenfold within the next ten years, and especially with the mass adoption of bi-directional charging (V2G), integrate seamlessly with the energy system. Real-time data streaming will serve as the cornerstone for this evolution, helping businesses navigate challenges while unlocking new opportunities for sustainability and profitability.

For more data streaming success stories and use cases, make sure to download my free ebook. Please let me know your thoughts, feedback and use cases on LinkedIn and stay in touch via my newsletter.

The post Virta’s Electric Vehicle (EV) Charging Platform with Real-Time Data Streaming: Scalability for Large Charging Businesses appeared first on Kai Waehner.

Tesla Energy Platform – The Power of Data Streaming with Apache Kafka

Kai Waehner — Fri, 14 Feb 2025 08:17:37 +0000

Tesla’s Virtual Power Plant (VPP) is revolutionizing the energy sector by connecting home batteries, solar panels, and grid-scale storage into a real-time, intelligent energy network. Powered by Apache Kafka for event streaming and WebSockets for last-mile IoT integration, Tesla’s Energy Platform enables real-time energy trading, grid stabilization, and seamless market participation. By leveraging data streaming and automation, Tesla optimizes battery efficiency, prevents blackouts, and allows homeowners to monetize excess energy—all while making renewable energy more reliable and scalable. This software-driven approach showcases the power of real-time data in building the future of sustainable energy.

What is a Virtual Power Plant?

A Virtual Power Plant (VPP) is a network of decentralized energy resources—such as home batteries, solar panels, and smart grid systems—that function as a single unit. Unlike a traditional power plant that generates electricity from a centralized location, a VPP aggregates power from many small, distributed sources. This allows energy to be dynamically stored and shared, helping to balance supply and demand in real time.

VPPs are crucial in the shift to renewable energy. The traditional power grid was designed around fossil fuel plants that could easily adjust output. Renewable energy sources like solar and wind are intermittent—they don’t generate power on demand. By connecting thousands of batteries and solar panels in homes and businesses, a VPP can smooth out fluctuations in power generation and consumption. This prevents blackouts, reduces energy costs, and enables homes and businesses to participate in energy markets.

How Tesla’s Virtual Power Plant Fits Its Business Model

Tesla is not just an automaker. It is a sustainable energy company. Tesla’s product ecosystem includes electric vehicles, solar panels, home batteries (Powerwall), grid-scale energy storage (Megapack), and energy management software (Autobidder).

The Tesla Virtual Power Plant (VPP) ties all these elements together. Homeowners with Powerwalls store excess solar power during the day and feed it back to the grid when needed. Tesla’s Autobidder software automatically optimizes energy use and market participation, turning home batteries into revenue-generating assets.

For Tesla, the VPP strengthens its energy business, creating a scalable model that maximizes battery efficiency, stabilizes grids, and expands the role of software in energy markets. Tesla is not just selling batteries; it is selling energy intelligence.

Tesla’s energy platform is a perfect example of how data streaming and real-time decision-making align with ESG principles:

Environmental Impact: VPPs reduce reliance on fossil fuels by making renewable energy more reliable.
Social Benefit: By enabling energy independence, VPPs provide power during outages and extreme weather conditions.
Governance & Regulation: VPPs allow consumers to participate in energy markets, fostering decentralized energy ownership.

Tesla’s approach is smart grid innovation at scale—real-time data makes the grid more dynamic, efficient, and resilient.

My article “Green Data, Clean Insights: How Apache Kafka and Flink Power ESG Transformations” covers other real-world data streaming deployments in the energy sector like EON.

Tesla’s Energy Platform: A Network of Connected Home Energy Systems

Tesla’s VPP connects thousands of homes with Powerwalls, solar panels, and grid services. These systems work together to provide electricity on demand, reacting to supply fluctuations in real-time.

Key Functions of Tesla’s VPP:

Energy Storage & Redistribution: Batteries store solar energy during the day and discharge at night or during peak demand.
Grid Stabilization: The VPP balances energy supply and demand to prevent outages and fluctuations.
Market Participation: Homeowners can sell excess power back to the grid, monetizing their batteries.
Disaster Resilience: The VPP provides backup power during blackouts, storms, and grid failures.

This requires real-time data processing at massive scale—something traditional batch-based data architectures cannot handle.

Apache Kafka and Real-Time Data Streaming at Tesla

Tesla operates in many domains—automotive, energy, and AI. Across all these areas, Apache Kafka plays a critical role in enabling real-time data movement and stream processing.

In 2018, Tesla already processed trillions of IoT messages with Apache Kafka:

Source: Tesla

Tesla leverages stream processing to handle trillions of IoT events daily, using Apache Kafka to ingest, process, and analyze data from its vehicle fleet in real time. By implementing efficient data partitioning, fast and slow data lanes, and scalable infrastructure, Tesla optimizes vehicle performance, predicts failures, and enhances manufacturing efficiency.

These strategies demonstrate how real-time data streaming is essential for managing large-scale IoT ecosystems, ensuring low-latency insights while maintaining operational stability. To learn more about these use cases read Tesla’s blog post “Stream Processing with IoT Data: Challenges, Best Practices, and Techniques“.

The following sections explore Tesla’s innovation for its virtual power plant, as discussed in an excellent presentation at QCon.

Tesla Energy Platform: Architecture of the Virtual Power Plant Powered by Apache Kafka

Tesla’s VPP uses Apache Kafka for:

Telemetry Ingestion: Streaming data from millions of Powerwalls, solar panels, and Megapacks into the cloud.
Command & Control: Sending real-time control commands to batteries and grid services.
Market Participation: Autobidder analyzes real-time data and adjusts energy prices dynamically.

The event-driven architecture allows Tesla to react to energy demand in milliseconds—critical for balancing the grid.

Tesla’s Energy Platform is the software foundation of the VPP. It integrates OT (Operational Technology), IoT (Internet of Things), and IT (Information Technology) to control distributed energy assets.

Tesla Applications Built on the Energy Platform

Tesla’s Energy Platform powers a suite of applications that optimize energy management, market participation, and grid stability through real-time data streaming and automation.

Autobidder

Optimizes energy trading in real time.
Automatically bids into energy markets.

I wrote about about other data streaming success stories for energy trading with Apache Kafka and Flink, including Uniper, re.alto and Powerledger.

Distributed Virtual Power Plant

Aggregates thousands of Powerwalls into a single energy asset.
Provides grid stabilization and peak load balancing.

If you are interested in other smart grid infrastructures, check out “Apache Kafka for Smart Grid, Utilities and Energy Production“. The articles covers how data streaming realizes IT/OT integration. And some hybrid cloud IoT deployments.

Battery Control (Command & Control)

Ensures optimal charging and discharging of batteries.
Minimizes costs while maximizing energy efficiency.

Market Participation

Allows homeowners and businesses to profit from energy markets.
Ensures seamless grid integration of Tesla’s energy products.

Key Components of Tesla’s Energy Platform: Apache Kafka, WebSockets, Akka Streams

The combination of data streaming with Apache Kafka and the last-mile IoT integration via WebSockets builds the central nervous system of Tesla’s Energy Platform:

Apache Kafka (Event Streaming):
- Streams telemetry data from Powerwalls every second.
- Ensures durability and reliability of data streams.
- Allows real-time energy aggregation across thousands of homes.
WebSockets (Last-Mile IoT Integration):
- Provides low-latency bidirectional communication with Powerwalls.
- Used to send real-time commands to home batteries.
Pub/Sub (Command & Control):
- Enables distributed energy resource coordination.
- Ensures resilient messaging between systems.
Business Logic (Applications & Microservices):
- Tesla’s services are built with Scala and Python.
- Uses gRPC & HTTP for inter-service communication.
Digital Twins (Real-Time State Management):
- Digital models of physical assets ensure real-time decision-making.
- Tesla uses Akka Streams for stateful event processing.
Kubernetes (Cloud Infrastructure):
- Ensures scalability and resilience of Tesla’s energy microservices.

Source: Tesla

Interesting side note: While most energy companies I have seen rely on Kafka Streams or Apache Flink for stateful event processing, Tesla takes an interesting approach by leveraging Akka Streams (based on Akka’s Actor Model) to manage real-time digital twins of its energy assets. This choice provides fine-grained control over streaming workflows, but unlike Kafka Streams or Flink, Akka lacks widespread community adoption, making it a less common choice for many large-scale energy platforms. Kafka and Flink are a match made in heaven for most data streaming use cases.

Best Practice: Shift Left Architecture with Data Products for High-Volume IoT Data

Tesla leverages several data processing best practices to improve efficiency and consistency:

Canonical Kafka Topics: Data is filtered and structured at the source.
Consistent Downstream Services: Every consumer gets clean, structured data.
Real-Time Aggregation of Thousands of Batteries: A unique challenge that forms the foundation of the virtual power plant.

This data-first approach ensures Tesla’s energy platform can scale to millions of distributed assets.

Today, many people refer to the Shift Left Architecture when applying these best practices for processing data efficiently and continuously to provide data product in real-time and good quality:

In Tesla’s Energy Platform, the data comes from IoT interfaces. WebSockets provide the last-mile integration and feed the events into the data streaming platform for continuous processing before the ingestion into the operational and analytical applications.

Tesla’s Energy Vision: How Streaming Data Will Shape Tomorrow’s Power Grids

Tesla’s Virtual Power Plant is not just about batteries—it’s about software, real-time data, and automation.

Why Data Streaming Matters for Tesla’s Energy Platform:

Scalability: Can handle millions of energy devices.
Resilience: Works even when devices go offline.
Real-Time Decision Making: Adjusts energy distribution within milliseconds.
Market Optimization: Autobidder ensures maximum revenue for battery owners.

Tesla’s VPP is a blueprint for the future of energy—one where real-time data streaming and intelligent software optimize renewable energy. By leveraging Apache Kafka, WebSockets, and stream processing, Tesla is redefining how energy is generated, distributed, and consumed.

This is not just an innovation in power generation—it’s an AI-driven energy revolution.

How do you leverage data streaming in the energy and automotive sector? Follow me on LinkedIn or X (former Twitter) to stay in touch and discuss. Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter. And make sure to download my free book about data streaming use cases across all industries.

The post Tesla Energy Platform – The Power of Data Streaming with Apache Kafka appeared first on Kai Waehner.

How Michelin improves Aftermarket Sales and Customer Service with Data Streaming

Kai Waehner — Mon, 02 Oct 2023 11:25:12 +0000

Aftermarket sales and customer service require the right information at the right time to make context-specific decisions. This post explains the modernization of supply chain business process choreography based on the real-life use case of Michelin, a tire manufacturer in France. Data Streaming with Apache Kafka enables true decoupling, domain-driven design, and data consistency across real-time and batch systems. Common business goals drove them: Increase customer retention, increase revenue, reduce costs, and improve time to market for innovation.

The State of Data Streaming for Manufacturing in 2023

The evolution of industrial IoT, manufacturing 4.0, and digitalized B2B and customer relations require modern, open, and scalable information sharing. Data streaming allows integrating and correlating data in real-time at any scale. Trends like software-defined manufacturing and data streaming help modernize and innovate the entire engineering and sales lifecycle.

I have recently presented an overview of trending enterprise architectures in the manufacturing industry and data streaming customer stories from BMW, Mercedes, Michelin, and Siemens. A complete slide deck and on-demand video recording are included:

This blog post explores one of the enterprise architectures and case studies in more detail: Context-specific aftersales and service management in real-time with data streaming.

What is Aftermarket Sales and Service? And how does Data Streaming help?

The aftermarket is the secondary market of the manufacturing industry, concerned with the production, distribution, retailing, and installation of all parts, chemicals, equipment, and accessories after the product’s sale by the original equipment manufacturer (OEM) to the consumer. The term ‘aftermarket’ is mainly used in the automotive industry but as relevant in other manufacturing industries.

“Aftermarket sales and service are vital to manufacturers’ strategies” according to McKinsey.

Enterprises leverage data streaming for collecting data from cars, dealerships, customers, and many other backend systems to make automated context-specific decision-making in real-time when it is relevant (predictive maintenance) or valuable (cross-/upselling).

Challenges with Aftermarket Customer Communication

Manufacturers face many challenges when seeking to implement digital tools for aftermarket services. McKinsey defined research points to five central priorities – all grounded in data – for improving aftermarket services: People, operations, offers, a network of external partners, and digital tools.

While these priorities are related, digitalization is relevant across all business processes in aftermarket services:

Source: McKinsey & Company

Disclaimer: The McKinsey research focuses on aerospace and defense, but the challenges look very similar in other industries, in my experience from customer conversations.

Data Streaming to make Context-specific Decisions in Real-Time

“The newest aftermarket frontier features the robust use of modern technological developments such as advanced sensors, big data, and artificial intelligence.” says McKinsey.

Data streaming helps transform the global supply chain, including aftermarket business processes, with real-time data integration and correlation to make context-specific decisions.

McKinsey mentioned various digital tools that are valuable for aftermarket services:

Source: McKinsey & Company

Interestingly, this coincides with what I see from applications built with data streaming. One key reason is that data streaming with Apache Kafka enablement data consistency across real-time and non-real-time applications.

Omnichannel retail and aftersales are very challenging for most enterprises. That’s why many enterprise architectures rely on data streaming for their context-specific customer 360 infrastructure and real-time applications.

Michelin: Context-specific Aftermarket Sales and Customer Service

Michelin is a French multinational tire manufacturing company for almost every type of vehicle. The company sells a broad spectrum of tires. They manufacture products for automobiles, motorcycles, bicycles, aircraft, space shuttles, and heavy equipment.

Michelin’s many inventions include the removable tire, the ‘pneurail’ (a tire for rubber-tired metros), and the radial tire.

Source: Michelin

Michelin presented at Kafka Summit how they moved from monolithic orchestrator to data streaming with microservices. This project was all about replacing a huge and complex Business Process Management tool (Oracle BPM), an orchestrator of their internal logistic flows.

And when Michelin says huge, they really mean it: over 24 processes, 150 millions of tyres moved representing 10 billions € of Michelin turnover. So why replacing such a critical component in their Information System? Mainly because “it was built like a monolithic ERP and became difficult to maintain, not to say a potential single point of failure”. Michelin replaced it with a choreography of micro-services around our Kafka cluster.

From spaghetti integration to decoupled microservices

Michelin faced the same challenges as most manufacturers: Slow data processing, conflicting information, and complex supply chains. Hence, Michelin moved from a spaghetti integration architecture and batch processing to decoupled microservices and real-time event-driven architecture.

Source: Michelin

They optimized unreliable and outdated reporting on inventory, especially for raw and semi-finished materials by connecting various systems across the supply chain, including DRP, TMS, ERP, WMS, and more. Apache Kafka provides the data fabric for data integration and to ensure truly decoupled and independent microservices.

Source: Michelin

From human processes to predictive mobility services

However, the supply chain does not end with manufacturing the best tires. Michelin aims to provide the best services and customer experience via data-driven analytics. As part of this initiative, Michelin migrated from orchestration and a single point of failure with a legacy BPM engine to a flexible choreography and true decoupling with an event-driven architecture leveraging Apache Kafka:

Source: Michelin

Michelin implemented mobility solutions to provide mobility assistance and fleet services to its diverse customer base. For instance, predictive insights notify customers to replace tires or show the best routes to optimize fuel. The new business process choreography enables proactive marketing and aftersales. Context-specific customer service is possible as the event-driven architecture gives access to the right data at the right time (e.g. when the customer calls the service hotline).

The technical infrastructure is based on cloud-native technologies such as Kubernetes (elastic infrastructure), Apache Kafka (data streaming with components like Kafka Connect and Kafka Streams), and Zeebe (a modern BPM and workflow engine).

From self-managed operations to fully managed cloud

Michelin’s commercial supply chain spans 170 countries. Michelin relies on a real-time inventory system to efficiently manage the flow of products and materials within their massive network.

A strategic decision was the move to a fully managed data streaming service to focus on business logic and innovation in manufacturing, after-sales, and service management. The migration of self-managed Kafka to Confluent Cloud cut operations costs by 35%.

Many companies replace existing legacy BPM engines with workflow orchestration powered by Apache Kafka.

Lightboard Video: How Data Streaming improves Aftermarket Sales and Customer Service

Here is a five-minute lightboard video that describes how data streaming helps with modernizing non-scalable and inflexible data infrastructure for improving the end-to-end supply chain, including aftermarket sales and customer service:

If you liked this video, make sure to follow the Confluent YouTube channel for many more lightboard videos across all industries.

Apache Kafka for automated business processed and improved aftermarket

The Michelin case study explored how a manufacturer improved the end-to-end supply chain from production to aftermarket sales and customer service. For more case studies, check out the free “The State of Data Streaming in Manufacturing” on-demand recording or read the related blog post.

Critical aftermarket sales and customer services challenges are missing information, rising costs, customer churn, and decreasing revenue. Real-time monitoring and context-specific decision-making improve the customer journey and retention. Learn more by reading how data streaming enables building a control tower for real-time supply chain operations.

How do you leverage data streaming in your aftermarket use cases for sales and service management? Did you already build a real-time infrastructure across your supply chain? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post How Michelin improves Aftermarket Sales and Customer Service with Data Streaming appeared first on Kai Waehner.

Apache Kafka Landscape for Automotive and Manufacturing

Kai Waehner — Wed, 12 Jan 2022 12:07:20 +0000

Before the Covid pandemic, I had the pleasure of visiting “Motor City” Detroit in November 2019. I met with several automotive companies, suppliers, startups, and cloud providers to discuss use cases and architectures around Apache Kafka. A lot has happened. Since then, I have also met several OEMs and suppliers in Europe and Asia. As I finally go back to Detroit this January 2022 to meet customers again, I thought it would be a good time to update the status quo of event streaming and Apache Kafka in the automotive and manufacturing industry.

Today, in 2022, Apache Kafka is the central nervous system of many applications in various areas related to the automotive and manufacturing industry for processing analytical and transactional data in motion across edge, hybrid, and multi-cloud deployments. This article explores the automotive event streaming landscape, including connected vehicles, smart manufacturing, supply chain optimization, aftersales, mobility services, and innovative new business models.

The Event Streaming Landscape for Automotive and Manufacturing

Every business domain leverages Event Streaming with Apache Kafka in the automotive and manufacturing industry. Data in motion helps everywhere. The infrastructure and deployment differ depending on the use case and requirements. I have seen everything at carmakers and manufacturers across the globe:

Cloud-first strategy with all new business applications in the public cloud deployed and connected across regions and even continents
Hybrid integration scenarios between legacy applications in the data center and modern cloud-native services the public cloud
Edge computing in a smart factory for low latency, cost-efficient data processing, and cybersecurity
Embedded Kafka brokers in machines and vehicles at the disconnected edge

This spread of use cases is impressive. The following diagram depicts a high-level overview:

The following sections describe the automotive and manufacturing landscape for event streaming in more detail:

Manufacturing 4.0
Supply Chain Optimization
Mobility Services
New Business Models

If you are mainly interested in real-world Kafka deployments with examples from BMW, Porsche, Audi, Tesla, and other OEMs, check out the article “Real-World Deployments of Kafka in the Automotive Industry“.

If you want to understand why Kafka makes such a difference in automotive and manufacturing, check out the article “Apache Kafka in the Automotive Industry“. This article explores the business motivation for these game-changing concepts of data in motion for the digitalization of the automotive industry.

Before you start reading the below section, I want to clearly emphasize that Kafka is not the silver bullet for every problem. “When NOT to use Apache Kafka?” digs deep into this discussion.

I keep the following sections relatively short to give a high-level overview. Each section contains links to more deep-dive articles about the topics.

Manufacturing 4.0

Industrial IoT (IIoT) respectively Industry 4.0 changes how the shop floor and production lines produce goods. Automation, process efficiency, and a much better Overall Equipment Effectiveness (OEE) enable cost reduction and flexibility in the production process:

Smart Factory

A smart factory is not necessarily a newly built building like a Tesla Gigafactory. Many enterprises install smart technology like networked sensors for temperature or vibrations measurements into old factories. Improving the Overall Equipment Effectiveness (OEE) is the primary goal of most use cases. Many scenarios leverage Kafka for continuously processing sensor and telemetry data in motion:

Connectivity to machines with modern, open technologies such as MQTT
Visualization and monitoring of equipment and assets (often called Digital Twin or Digital Threat)
Quality assurance such as condition monitoring with stateless stream processing
Predictive maintenance of machines, robots, productions lines with stateful streaming analytics and machine learning
Surveillance and safety-critical video monitoring by processing images and videos
Cybersecurity for situational awareness and threat intelligence
Smart Buildings for maintenance and operations, smarter energy consumptions, optimized space usage, better employee experience

Legacy Modernization with Open APIs and Hybrid Cloud

Factories exist for decades after they are built. Digitalization and the modernization of legacy technologies are some of the biggest challenges in IIoT projects. Such an initiative usually includes several tasks:

Complex integration with proprietary legacy protocols such as Siemens S7, Allan Bradley, Modbus, et al., for instance, with a dedicated open-source framework such as Apache PLC4X running on Kafka Connect
Simple integration with open standards such as HTTP and REST-based web services and a REST Proxy for Kafka
Deployment of a modern, open, scalable data historian replacing or complementing monolithic, proprietary data historians
Postmodern MES and ERP architectures upgrading or replacing legacy proprietary non-scalable MES and ERP systems, for instance, the integration between legacy SAP systems and Kafka
Lift and shift from on-premise data centers to the public cloud with hybrid synchronization and replication using the Kafka protocol and cluster linking

Continuous Data-driven Engineering and Product Development

Last but not least, an opportunity many people underestimate: Continuous data streaming with Kafka enables new possibilities in software engineering and product development for IoT and automotive projects.

For instance, developing and deploying the “big loop” for machine learning of advanced driver-assistance systems (ADAS) or self-driving functions based on sensor data from the fleet is a new way of software engineering. Tesla’s Kafka-based data platform is a fantastic example. A related use case in engineering is the ingest of sensor data during and after test drives.

Supply Chain Optimization

Supply chain processes and solutions are very complex. The Covid pandemic showed how only flexible enterprises could survive, stay profitable, and provide a great customer experience, even in disastrous external events.

Here are the top 5 critical challenges of supply chains:

Time Frames are Shorter
Rapid Change
Zoo of Technologies and Products
Historical Models are No Longer Viable
Lack of visibility

Only real-time data streaming and correlation solve these supply chain challenges end-to-end across regions and companies:

In its detailed blog post, I covered Supply Chain Optimization (SCM) with Apache Kafka. Check it out to learn about real-world supply chain use cases from Bosch, BMW, Walmart, and other companies.

Intra-logistics and Global Distribution Networks

Logistics and supply chains within a factory, distribution center, or store require real-time data integration and processing to provide efficient processing of goods and a great customer experience. Batch processes or manual interaction by human workers cannot implement these use cases. Examples include:

Real-Time Locating System (RTLS) for transportation and logistics within a building to monitor robots, driverless transport systems (DTS), and manual processes for improved safety, controlled security, and optimized operations and productivity
Inventory management for optimized and customized production processes with a better B2B integration in real-time – the same story and benefits as in a modern omnichannel retail architecture
Globally distributed networks powered by a global Kafka infrastructure
Augmented reality for the digitalization and automation of worker tasks

Track & Trace and Fleet Management

Real-time logistics is a game-changer for fleet management and track & trace use cases.

Commercial motor vehicles such as cars, vans, trucks, specialist vehicles (such as mobile construction machinery), forklifts, and trailers
Private vehicles used for work (the ‘grey fleet’)
Aviation machinery such as aircraft (planes and helicopters)
Ships
Rail cars
Non-powered assets such as generators, tanks, gearboxes

All the following aspects are not new. The difference is that event streaming allows to continuously execute these tasks in real-time to act on new information in motion:

Visualization
Location-based services
Routing and navigation
Estimated time of arrival
Alerting
Proactive recalculation
Monitoring of the assets and mechanical components of a vehicle

Most companies have a cloud-first strategy for building such a platform. However, some cases require edge computing either via local 5G location for low latency use cases or embedded Kafka brokers for disconnected data collection and analytics within the vehicles.

Streaming Data Exchange for B2B Collaboration with Partners

Real-time data is not just relevant without a company. OEMs and Tier 1 and Tier 2 suppliers benefit in the same way from data streams. The same is true for car dealerships, end customers, and any other consumer of the data. Hence, a clear trend in the market is the emergence of a Kafka-based streaming data exchange across companies to build a data mesh.

I have often seen this situation in the past: The OEM leverages event streaming. The Tier 1 supplier leverages event streaming. The used ERP solution is built on Kafka, too. All leverage the capabilities of scalable real-time data streaming. It makes little sense to integrate with partners and software vendors via web service APIs, such as SOAP or HTTP/REST. Instead, a streaming interface is a natural choice to hand streaming data to partners.

The following example from the automotive industry shows how independent stakeholders (= domains in different enterprises) use a cross-company streaming data exchange:

Mobility Services

Every OEM, supplier, or innovative startup in the automotive space thinks about providing a mobility service either on top of the goods they sell or as an independent service.

Most mobility services on your mobile apps used today for business or privately are only possible because of a scalable real-time backbone powered by event streaming:

The possibilities for mobility services are endless. A few examples that are mainstream today already:

Omnichannel retail and aftersales to buy additional car features online, for instance, more power, seat heater, up-to-date navigation, self-driving software (okay, the latter one is not mainstream yet, but Tesla shows where it goes)
Connected Cars for ride-hailing, scooter rental, taxi services, food delivery
3rd party integration for adding services that a company does not want to build by themselves

Today’s most successful and widely adopted mobility services are independent of a specific carmaker or supplier.

Examples of prominent Kafka-powered consumer mobility services are Uber and Lyft in the US, Grab in Asia, and FREENOW in Europe. Here Technologies is an excellent example for a B2B mobility service providing mapping information so that companies can build new or improve existing applications on top of it.

A good starting point to learn more is my blog post about Apache Kafka and MQTT for mobility services and transportation.

New Business Models

The access to real-time data enables companies to build entirely new business models on top of their existing products:

A few examples:

Next-generation car rental with excellent customer experience, context-specific coupons, loyalty platform, and car rental fleets with other services from the carmaker.
Reinventing car insurance based on real-time driving information about each driver to build driver-specific pricing based on real-time analysis of the driver behavior instead of legacy approaches using statistical models with attributes like driver age, number of accidents in the past, etc.
Data provider for monetization enables other companies to build new business models with your car data – for instance, working with a government to make a smart city traffic system or a mobility service startup to analyze and correlate car data across OEMs.

This evolution is just the beginning of the usage of streaming data. I have seen many customers build a first streaming pipeline for one use case. However, new business divisions will leverage the data for innovations when the platform is there.

The Data is in Motion in Automotive and Manufacturing

The landscape for Apache Kafka in the automotive and manufacturing industry showed that Apache Kafka is the central nervous system of many applications in various areas for processing analytical and transactional data in motion.

This article explored use cases such as connected vehicles, smart manufacturing, supply chain optimization, aftersales, mobility services, and innovative new business models. The possibilities for data in motion are almost endless. The automotive and manufacturing industry is still in the very early stages of leveraging data in motion.

Where do you use Apache Kafka and its ecosystem in the automotive and manufacturing industry? Do you deploy in the public cloud, in your data center, or at the edge outside a data center? What other technologies do you combine with Kafka? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka Landscape for Automotive and Manufacturing appeared first on Kai Waehner.

When NOT to use Apache Kafka?

Kai Waehner — Tue, 04 Jan 2022 07:24:59 +0000

Apache Kafka is the de facto standard for event streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This blog post explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.

Market Trends – A Connected World

Let’s begin with understanding why Kafka comes up everywhere in the meantime. This clarifies the huge market demand for event streaming but also shows that there is no silver bullet solving all problems. Kafka is NOT the silver bullet for a connected world, but a crucial component!

The world gets more and more connected. Vast volumes of data are generated and need to be correlated in real-time to increase revenue, reduce costs, and reduce risks. I could pick almost any industry. Some are faster. Others are slower. But the connected world is coming everywhere. Think about manufacturing, smart cities, gaming, retail, banking, insurance, and so on. If you look at my past blogs, you can find relevant Kafka use cases for any industry.

I picked two market trends that show this insane growth of data and the creation of innovation and new cutting-edge use cases (and why Kafka’s adoption is insane across industries, too).

Connected Cars – Insane volume of telemetry data and aftersales

Here is the “Global Opportunity Analysis and Industry Forecast, 2020–2027” by Allied Market Research:

The Connected Car market includes a much wider variety of use cases and industries than most people think. A few examples: Network infrastructure and connectivity, safety, entertainment, retail, aftermarket, vehicle insurance, 3rd party data usage (e.g., smart city), and so much more.

Gaming – Billions of players and massive revenues

The gaming industry is already bigger than all other media categories combined, and this is still just the beginning of a new era – as Bitkraft depicts:

Millions of new players join the gaming community every month across the globe. Connectivity and cheap smartphones are sold in less wealthy countries. New business models like “play to earn” change how the next generation of gamers plays a game. More scalable and low latency technologies like 5G enable new use cases. Blockchain and NFT (Non-Fungible Token) are changing the monetization and collection market forever.

These market trends across industries clarify why the need for real-time data processing increases significantly quarter by quarter. Apache Kafka established itself as the de facto standard for processing analytical and transactional data streams at scale. However, it is crucial to understand when (not) to use Apache Kafka and its ecosystem in your projects.

What is Apache Kafka, and what is it NOT?

Kafka is often misunderstood. For instance, I still hear way too often that Kafka is a message queue. Part of the reason is that some vendors only pitch it for a specific problem (such as data ingestion into a data lake or data warehouse) to sell their products. So, in short:

Kafka is…

a scalable real-time messaging platform to process millions of messages per second.
an event streaming platform for massive volumes of big data analytics and small volumes of transactional data processing.
a distributed storage provides true decoupling for backpressure handling, support of various communication protocols, and replayability of events with guaranteed ordering.
a data integration framework for streaming ETL.
a data processing framework for continuous stateless or stateful stream processing.

This combination of characteristics in a single platform makes Kafka unique (and successful).

Kafka is NOT…

a proxy for millions of clients (like mobile apps) – but Kafka-native proxies (like REST or MQTT) exist for some use cases.
an API Management platform – but these tools are usually complementary and used for the creation, life cycle management, or the monetization of Kafka APIs.
a database for complex queries and batch analytics workloads – but good enough for transactional queries and relatively simple aggregations (especially with ksqlDB).
an IoT platform with features such as device management – but direct Kafka-native integration with (some) IoT protocols such as MQTT or OPC-UA is possible and the appropriate approach for (some) use cases.
a technology for hard real-time applications such as safety-critical or deterministic systems – but that’s true for any other IT framework, too. Embedded systems are a different software!

For these reasons, Kafka is complementary, not competitive, to these other technologies. Choose the right tool for the job and combine them!

Case studies for Apache Kafka in a connected world

This section shows a few examples of fantastic success stories where Kafka is combined with other technologies because it makes sense and solves the business problem. The focus here is case studies that need more than just Kafka for the end-to-end data flow.

No matter if you follow my blog, Kafka Summit conferences, online platforms like Medium or Dzone, or any other tech-related news. You find plenty of success stories around real-time data streaming with Apache Kafka for high volumes of analytics and transactional data from connected cars, IoT edge devices, or gaming apps on smartphones.

A few examples across industries and use cases:

Audi: Connected car platform rolled out across regions and cloud providers
BMW: Smart factories for the optimization of the supply chain and logistics
SolarPower: Complete solar energy solutions and services across the globe
Royal Caribbean: Entertainment on cruise ships with disconnected edge services and hybrid cloud aggregation
Disney+ Hotstar: Interactive media content and gaming/betting for millions of fans on their smartphone
The list goes on and on and on.

So what is the problem with all these great IoT success stories? Well, there is no problem. But some clarification is needed to explain when to use event streaming with the Apache Kafka ecosystem and where other complementary solutions usually complement it.

When to use Apache Kafka?

Before we discuss when NOT to use Kafka, let’s understand where to use it to get more clear how and when to complement it with other technologies if needed.

I will add real-world examples to each section. In my experience, this makes it much easier to understand the added value.

Kafka consumes and processes high volumes of IoT and mobile data in real-time

Processing massive volumes of data in real-time is one of the critical capabilities of Kafka.

Tesla is not just a car maker. Tesla is a tech company writing a lot of innovative and cutting-edge software. They provide an energy infrastructure for cars with their Tesla Superchargers, solar energy production at their Gigafactories, and much more. Processing and analyzing the data from their vehicles, smart grids, and factories and integrating with the rest of the IT backend services in real-time is a crucial piece of their success.

Tesla has built a Kafka-based data platform infrastructure “to support millions of devices and trillions of data points per day”. Tesla showed an exciting history and evolution of their Kafka usage at a Kafka Summit in 2019:

Keep in mind that Kafka is much more than just messaging. I repeat this in almost every blog post as too many people still don’t get it. Kafka is a distributed storage layer that truly decouples producers and consumers. Additionally, Kafka-native processing tools like Kafka Streams and ksqlDB enable real-time processing.

Kafka correlates IoT data with transactional data from the MES and ERP systems

Data integration in real-time at scale is relevant for analytics and the usage of transactional systems like an ERP or MES system. Kafka Connect and non-Kafka middleware complement the core of event streaming for this task.

BMW operates mission-critical Kafka workloads across the edge (i.e., in the smart factories) and public cloud. Kafka enables decoupling, transparency, and innovation. The products and expertise from Confluent add stability. The latter is vital for success in manufacturing. Each minute of downtime costs a fortune. Read my related article “Apache Kafka as Data Historian – an IIoT / Industry 4.0 Real-Time Data Lake” to understand how Kafka improves the Overall Equipment Effectiveness (OEE) in manufacturing.

BMW optimizes its supply chain management in real-time. The solution provides information about the right stock in place, both physically and in transactional systems like BMW’s ERP powered by SAP. “Just in time, just in sequence” is crucial for many critical applications. The integration between Kafka and SAP is required for almost 50% of customers I talk to in this space. Beyond the integration, many next-generation transactional ERP and MES platforms are powered by Kafka, too.

Kafka integrates with all the non-IoT IT in the enterprise at the edge and hybrid or multi-cloud

Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. Learn about several scenarios that may require multi-cluster solutions and see real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments, and global Kafka.

The true decoupling between different interfaces is a unique advantage of Kafka vs. other messaging platforms such as IBM MQ, RabbitMQ, or MQTT brokers. I also explored this in detail in my article about Domain-driven Design (DDD) with Kafka.

Infrastructure modernization and hybrid cloud architectures with Apache Kafka are typical across industries.

One of my favorite examples is the success story from Unity. The company provides a real-time 3D development platform focusing on gaming and getting into other industries like manufacturing with their Augmented Reality (AR) / Virtual Reality (VR) features.

The data-driven company already had content installed 33 billion times in 2019, reaching 3 billion devices worldwide. Unity operates one of the largest monetization networks in the world. They migrated this platform from self-managed Kafka to fully-managed Confluent Cloud. The cutover was executed by the project team without downtime or data loss. Read Unity’s post on the Confluent Blog: “How Unity uses Confluent for real-time event streaming at scale “.

Kafka is the scalable real-time backend for mobility services and gaming/betting platforms

Many gaming and mobility services leverage event streaming as the backbone of their infrastructure. Use cases include the processing of telemetry data, location-based services, payments, fraud detection, user/player retention, loyalty platform, and so much more. Almost all innovative applications in this sector require real-time data streaming at scale.

A few examples:

Mobility services: Uber, Lyft, FREE NOW, Grab, Otonomo, Here Technologies, …
Gaming services: Disney+ Hotstar, Sony Playstation, Tencent, Big Fish Games, …
Betting services: William Hill, Sky Betting, …

Just look at the job portals of any mobility or gaming service. Not everybody is talking about their Kafka usage in public. But almost everyone is looking for Kafka experts to develop and operate their platform.

These use cases are just as critical as a payment process in a core banking platform. Regulatory compliance and zero data loss are crucial. Multi-Region Clusters (i.e., a Kafka cluster stretched across regions like US East, Central, and West) enable high availability with zero downtime and no data loss even in the case of a disaster.

Vehicles, machines, or IoT devices embed a single Kafka broker

The edge is here to stay and grow. Some use cases require the deployment of a Kafka cluster or single broker outside a data center. Reasons for operating a Kafka infrastructure at the edge include low latency, cost efficiency, cybersecurity, or no internet connectivity.

Examples for Kafka at the edge:

Disconnected edge in logistics to store logs, sensor data, and images while offline (e.g., a truck on the street or a drone flying around a ship) until a good internet connection is available in the distribution center
Vehicle-to-Everything (V2X) communication in a local small data center like AWS Outposts (via a gateway like MQTT if large area, a considerable number of vehicles, or lousy network), or via direct Kafka client connection for a few hundreds of machines, e.g., in a smart factory )
Offline mobility services like integrating a car infrastructure with gaming, maps, or a recommendation engine with locally processed partner services (e.g., the next Mc Donalds comes in 10 miles, here is a coupon).

The cruise line Royal Caribbean is a great success story for this scenario. It operates the four largest passenger ships in the world. As of January 2021, the line operates twenty-four ships and has six additional ships on order.

Royal Caribbean implemented one of Kafka’s most famous use cases at the edge. Each cruise ship has a Kafka cluster running locally for use cases such as payment processing, loyalty information, customer recommendations, etc.:

I covered this example and other Kafka edge deployments in various blogs. I talked about use cases for Kafka at the edge, showed architectures for Kafka at the edge, and explored low latency 5G deployments powered by Kafka.

When NOT to use Apache Kafka?

Finally, we are coming to the section everybody was looking for, right? However, it is crucial first to understand when to use Kafka. Now, it is easy to explain when NOT to use Kafka.

For this section, let’s assume that we talk about production scenarios, not some ugly (?) workarounds to connect Kafka to something for a proof of concept directly; there is always a quick and dirty option to test something – and that’s fine for that goal. But things change when you need to scale and roll out your infrastructure globally, be compliant to law, and guarantee no data loss for transactional workloads.

With this in mind, it is relatively easy to qualify out Kafka as an option for some use cases and problems:

Kafka is NOT hard real-time

The definition of the term “real-time” is difficult. It is often a marketing term. Real-time programs must guarantee a response within specified time constraints.

Kafka – and all other frameworks, products, and cloud services used in this context – is only soft real-time and built for the IT world. Many OT and IoT applications require hard real-time with zero latency spikes.

Soft real-time is used for applications such as

Point-to-point messaging between IT applications
Data ingestion from various data sources into one or more data sinks
Data processing and data correlation (often called event streaming or event stream processing)

If your application requires sub-millisecond latency, Kafka is not the right technology. For instance, high-frequency trading is usually implemented with purpose-built proprietary commercial solutions.

Always keep in mind: The lowest latency would be to not use a messaging system at all and just use shared memory. In a race to the lowest latency, Kafka will lose every time. However, for the audit log and transaction log or persistence engine parts of the exchange, it is no data loss that becomes more important than latency and Kafka wins.

Most real-time use cases “only” require data processing in the millisecond to the second range. In that case, Kafka is a perfect solution. Many FinTechs, such as Robinhood, rely on Kafka for mission-critical transactional workloads, even financial trading. Multi-access edge computing (MEC) is another excellent example of low latency data streaming with Apache Kafka and cloud-native 5G infrastructure.

Kafka is NOT deterministic for embedded and safety-critical systems

This one is pretty straightforward and related to the above section. Kafka is not a deterministic system. Safety-critical applications cannot use it for a car engine control system, a medical system such as a heart pacemaker, or an industrial process controller.

A few examples where Kafka CANNOT be used for:

Safety-critical data processing in the car or vehicle. That’s Autosar / MINRA C / Assembler and similar technologies.
CAN Bus communication between ECUs.
Robotics. That’s C / C++ or similar low-level languages combined with frameworks such as Industrial ROS (Robot Operating System).
Safety-critical machine learning / deep learning (e.g., for autonomous driving)
Vehicle-to-Vehicle (V2V) communication. That’s 5G sidelink without an intermediary like Kafka.

My post “Apache Kafka is NOT Hard Real-Time BUT Used Everywhere in Automotive and Industrial IoT” explores this discussion in more detail.

TL;DR: Safety-related data processing must be implemented with dedicated low-level programming languages and solutions. That’s not Kafka! The same is true for any other IT software, too. Hence, don’t replace Kafka with IBM MQ, Flink, Spark, Snowflake, or any other similar IT software.

Kafka is NOT built for bad networks

Kafka requires good stable network connectivity between the Kafka clients and the Kafka brokers. Hence, if the network is unstable and clients need to reconnect to the brokers all the time, then operations are challenging, and SLAs are hard to reach.

There are some exceptions, but the basic rule of thumb is that other technologies are built specifically to solve the problem of bad networks. MQTT is the most prominent example. Hence, Kafka and MQTT are friends, not enemies. The combination is super powerful and used a lot across industries. For that reason, I wrote a whole blog series about Kafka and MQTT.

We built a connected car infrastructure that processes 100,000 data streams for real-time predictions using MQTT, Kafka, and TensorFlow in a Kappa architecture.

Kafka does NOT provide connectivity to tens of thousands of client applications

Another specific point to qualify Kafka out as an integration solution is that Kafka cannot connect to tens of thousands of clients. If you need to build a connected car infrastructure or gaming platform for mobile players, the clients (i.e., cars or smartphones) will not directly connect to Kafka.

A dedicated proxy such as an HTTP gateway or MQTT broker is the right intermediary between thousands of clients and Kafka for real-time backend processing and the integration with further data sinks such as a data lake, data warehouse, or custom real-time applications.

Where are the limits of Kafka client connections? As so often, this is hard to say. I have seen customers connect directly from their shop floor in the plant via .NET and Java Kafka clients via a direct connection to the cloud where the Kafka cluster is running. Direct hybrid connections usually work well if the number of machines, PLCs, IoT gateways, and IoT devices is in the hundreds. For higher numbers of client applications, you need to evaluate if you a) need a proxy in the middle or b) deploy “edge computing” with or without Kafka at the edge for lower latency and cost-efficient workloads.

When to MAYBE use Apache Kafka?

The last section covered scenarios where it is relatively easy to quality Kafka out as it simply cannot provide the required capabilities. I want to explore a few less apparent topics, and it depends on several things if Kafka is a good choice or not.

Kafka does (usually) NOT replace another database

Apache Kafka is a database. It provides ACID guarantees and is used in hundreds of companies for mission-critical deployments. However, most times, Kafka is not competitive with other databases. Kafka is an event streaming platform for messaging, storage, processing, and integration at scale in real-time with zero downtime or data loss.

Kafka is often used as a central streaming integration layer with these characteristics. Other databases can build materialized views for their specific use cases like real-time time-series analytics, near real-time ingestion into a text search infrastructure, or long-term storage in a data lake.

In summary, when you get asked if Kafka can replace a database, then there are several answers to consider:

Kafka can store data forever in a durable and high available manner providing ACID guarantees
Further options to query historical data are available in Kafka
Kafka-native add-ons like ksqlDB or Tiered Storage make Kafka more potent than ever before for data processing and event-based long-term storage
Stateful applications can be built leveraging Kafka clients (microservices, business applications) with no other external database
Not a replacement for existing databases, data warehouses, or data lakes like MySQL, MongoDB, Elasticsearch, Hadoop, Snowflake, Google BigQuery, etc.
Other databases and Kafka complement each other; the right solution has to be selected for a problem; often, purpose-built materialized views are created and updated in real-time from the central event-based infrastructure
Different options are available for bi-directional pull and push-based integration between Kafka and databases to complement each other

My blog post “Can Apache Kafka replace a database, data warehouse, or data lake?” discusses the usage of Kafka as a database in much more detail.

Kafka does (usually) NOT process large messages

Kafka was not built for large messages. Period.

Nevertheless, more and more projects send and process 1Mb, 10Mb, and even much bigger files and other large payloads via Kafka. One reason is that Kafka was designed for large volume/throughput – which is required for large messages. A very common example that comes up regularly is the ingestion and processing of large files from legacy systems with Kafka before ingesting the processed data into a Data Warehouse.

However, not all large messages should be processed with Kafka. Often you should use the right storage system and just leverage Kafka for the orchestration. Reference-based messaging (i.e. storing the file in another storage system and sending the link and metadata) is often the better design pattern:

Know the different design patterns and choose the right technology for your problem.

For more details and use cases about handling large files with Kafka, check out this blog post: “Handling Large Messages with Apache Kafka (CSV, XML, Image, Video, Audio, Files)“.

Kafka is (usually) NOT the IoT gateway for the last-mile integration of industrial protocols…

The last-mile integration with IoT interfaces and mobile apps is a tricky space. As discussed above, Kafka cannot connect to thousands of Kafka clients. However, many IoT and mobile applications only require tens or hundreds of connections. In that case, a Kafka-native connection is straightforward using one of the various Kafka clients available for almost any programming language on the planet.

Suppose a connection on TCP level with a Kafka client makes little sense or is not possible. In that case, a very prevalent workaround is the REST Proxy as the intermediary between the clients and the Kafka cluster. The clients communicate via synchronous HTTP(S) with the streaming platform.

Use cases for HTTP and REST APIs with Apache Kafka include the control plane (= management), the data plane (= produce and consume messages), and automation, respectively DevOps tasks.

Unfortunately, many IoT projects require much more complex integrations. I am not just talking about a relatively straightforward integration via an MQTT or OPC-UA connector. Challenges in Industrial IoT projects include:

The automation industry does often not use open standards but is slow, insecure, not scalable, and proprietary.
Product Lifecycles are very long (tens of years), with no simple changes or upgrades.
IIoT usually uses incompatible protocols, typically proprietary and built for one specific vendor.
Proprietary and expensive monoliths that are not scalable and not extendible.

Therefore, many IoT projects complement Kafka with a purpose-built IoT platform. Most IoT products and cloud services are proprietary but provide open interfaces and architectures. The open-source space is small in this industry. A great alternative (for some use cases) is Apache PLC4X. The framework integrates with many proprietary legacy protocols, such as Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, etc. PLC4X also provides a Kafka Connect connector for native and scalable Kafka integration.

A modern data historian is open and flexible. The foundation of many strategic IoT modernization projects across the shop floor and hybrid cloud is powered by event streaming:

Kafka is NOT a blockchain (but relevant for web3, crypto trading, NFT, off-chain, sidechain, oracles)

Kafka is a distributed commit log. The concepts and foundations are very similar to a blockchain. I explored this in more detail in my post “Apache Kafka and Blockchain – Comparison and a Kafka-native Implementation“.

A blockchain should be used ONLY if different untrusted parties need to collaborate. For most enterprise projects, a blockchain is unnecessary added complexity. A distributed commit log (= Kafka) or a tamper-proof distributed ledger (= enhanced Kafka) is sufficient.

Having said this, more interestingly, I see more and more companies using Kafka within their crypto trading platforms, market exchanges, and NFT token trading marketplaces.

To be clear: Kafka is NOT the blockchain on these platforms. The blockchain is a cryptocurrency like Bitcoin or a platform providing smart contracts like Ethereum where people build new distributed applications (dApps) like NFTs for the gaming or art industry. Kafka is the streaming platform to connect these blockchains with other Oracles (= the non-blockchain apps) like the CRM, data lake, data warehouse, and so on:

TokenAnalyst is an excellent example that leverages Kafka to integrate blockchain data from Bitcoin and Ethereum with their analytics tools. Kafka Streams provides a stateful streaming application to prevent using invalid blocks in downstream aggregate calculations. For example, TokenAnalyst developed a block confirmer component that resolves reorganization scenarios by temporarily keeping blocks, and only propagates them when a threshold of a number of confirmations (children to that block are mined) is reached.

In some advanced use cases, Kafka is used to implementing a sidechain or off-chain platform as the original blockchain does not scale well enough (blockchain is known as on-chain data). Not just Bitcoin has the problem of only processing single-digit (!) transactions per second. Most modern blockchain solutions cannot scale even close to the workloads Kafka processes in real-time.

From DAOs to blue chip companies, measuring the health of blockchain infrastructure and IOT components is still necessary even in a distributed network to avoid downtime, secure the infrastructure, and make the blockchain data accessible. Kafka provides an agentless and scalable way to present that data to the parties involved and make sure that the relevant data is exposed to the right teams before a node is lost. This is relevant for cutting-edge Web3 IoT projects like Helium, or simpler closed distributed ledgers (DLT) like R3 Corda.

My recent post about live commerce powered by event streaming and Kafka transforming the retail metaverse shows how the retail and gaming industry connects virtual and physical things. The retail business process and customer communication happen in real-time; no matter if you want to sell clothes, a smartphone, or a blockchain-based NFT token for your collectible or video game.

TL;DR: Kafka is NOT…

… a replacement for your favorite database or data warehouse.

… hard real-time for safety-critical embedded workloads.

… a proxy for thousands of clients in bad networks.

… an API Management solution.

… an IoT gateway.

… a blockchain.

It is easy to qualify Kafka out for some use cases and requirements.

However, analytical and transactional workloads across all industries use Kafka. It is the de-facto standard for event streaming everywhere. Hence, Kafka is often combined with other technologies and platforms.

Where do you (not) use Apache Kafka? What other technologies do you combine Kafka with? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post When NOT to use Apache Kafka? appeared first on Kai Waehner.

Apache Kafka in the Public Sector – Blog Series about Use Cases and Architectures

Kai Waehner — Thu, 07 Oct 2021 14:13:24 +0000

The public sector includes many different areas. Some groups leverage cutting-edge technology, like military leverage. Others like the public administration are years or even decades behind. This blog series explores how the public sector leverages data in motion powered by Apache Kafka to add value for innovative new applications and modernizing legacy IT infrastructures. Life is a stream of events. Therefore, examples include a broad spectrum of use cases across smart cities, citizen services, energy and utilities, and national security deployed across the edge, hybrid, and multi-cloud scenarios.

Blog series: Apache Kafka in the Public Sector and Government

This blog series explores why many governments and public infrastructure sectors leverage event streaming for various use cases. Learn about real-world deployments and different architectures for Kafka in the public sector:

Life is a Stream of Events (THIS POST)
Smart City
Citizen Services
Energy and Utilities
National Security

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts once published.

As a side note: If you wonder why healthcare is not on the above list. Healthcare is another blog series on its own. While the government can provide public health care through national healthcare systems, it is part of the private sector in many other cases.

The Public Sector is a Broad Spectrum of Use Cases

The public sector covers so many different areas. Examples include defense, law enforcement, national security, healthcare, public administration, police, judiciary, finance and tax, research, aerospace, agriculture, etc. Many of these terms and sectors overlap. In many countries, some of these sectors are private or a combination of public and private. For these reasons, my blog series does not cover specific sectors. Instead, I focus on use cases. Many of these are applicable across many sectors.

Real-time Data Beats Slow Data in the Public Sector

I won’t do yet another long introduction about the added value of real-time data. Check out my blog about “Use Cases across Industries for Data in Motion powered by Apache Kafka” to understand the broad spectrum and benefits. The public sector is not different: Real-time data beats slow data in almost every use case! Here are a few examples:

But think about your use cases! How often can you say that getting data late (like in one hour or the following day) is better than getting data when it happens (now, in a few milliseconds or seconds)? Probably not very often.

An important fact is that the added business value comes from correlating the events from different data sources. As an example, let’s look at the processes in a smart city:

The sensor data from the car is only valuable if an application correlates it with data from other vehicles in the traffic planning system. Intelligent parking is only reasonable if it integrates with the overall city planning. Emergency service needs to receive an alert in real-time if a crash happens. All of that needs to happen in real-time! It does not matter if the use case is about transactional workloads (usually smaller data sets) or analytical workloads (usually more extensive data sets).

Open API and Partnerships are Mandatory

Governments can build great applications. At least in theory. In practice, they rely on external data from partners and 3rd party applications for many potential use cases:

Governments and cities need to work with several other stakeholders, including carmakers, suppliers, telcos, mobility Services, cloud providers, software providers, etc. Standards and open APIs are mandatory for successful cross-cutting projects. The foundation of such an enterprise architecture is an open, reliable, scalable platform that can process data in real-time. Apache Kafka became the de facto standard for event streaming.

An example that shows the added value of data integration across stakeholders and processing the data in real-time: Transportation Services. A mobile app needs context. Think about hailing a taxi ride. It doesn’t help you if you see the position of each taxi on the city map in real-time. You want to know the estimated time of arrival, the estimated cost, the estimated time of arrival at your destination, the car model that will pick you up, and so much more.

This use case – like many others – is only possible if you integrate and correlate the data from many different interfaces like a mapping service, all taxi drivers, all customers in a city, the weather service, backend analytics services, and much more:

The left side of the picture shows a dashboard built with a real-time message queue like RabbitMQ. The right side shows data correlation of data from different sources in real-time with an event streaming platform like Apache Kafka.

I hope you agree on the added value of the event streaming platform. Just sending data from A to B in real-time is not enough. Only the data processing in real-time adds true value.

Data in Motion as Paradigm Shift in the Public Sector

Real-time beats slow data. No matter if you think about cutting-edge use cases in national security or modernizing the IT infrastructure in the public administration. Event Streaming is the foundation of this paradigm shift moving towards real-time data processing in the public sector. The upcoming posts of this blog series explore many different use cases and architectures. If you also want to learn more about Apache Kafka offerings on the market, check out my comparison of Apache Kafka products and cloud services.

How do you leverage event streaming in the public sector? What technologies and architectures do you use? What projects did you already work on or are in the planning? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka in the Public Sector – Blog Series about Use Cases and Architectures appeared first on Kai Waehner.

Real-World Deployments of Kafka in the Automotive Industry

Kai Waehner — Mon, 19 Jul 2021 06:58:58 +0000

Apache Kafka became the central nervous system of many applications in various areas related to the automotive industry (and it’s not only connected cars). This blog post explores various real-world deployments across several fields, including connected vehicles, smart manufacturing, and innovative mobility services. Examples include car makers such as Audi, BMW, Porsche, and Tesla, plus a few mobility services such as Uber, Lyft, and Here Technologies.

I wrote about use cases for event streaming with Apache Kafka in the automotive industry shortly before the pandemic when I visited a few customers in Detroit. I recommend reading that post to learn about the general shift and innovation in the automotive industry. That post explores various use cases for data in motion, such as:

Connected vehicle infrastructures for fleet management, emergency systems, and smart driving
Smart manufacturing for improved supply chain management and analytics on the shop floor level
Context-specific customer interactions including great customer experiences, aftersales, and data monetization

Real-World Deployments of Kafka in the Automotive Industry

This blog post explores various real-world success stories of Apache Kafka and its ecosystem in the automotive industry. Learn how carmakers, suppliers, mobility services built infrastructure to integrate and process data in motion across various fields and business units.

BMW – Smart Shop Floor and Industry 4.0

Felix Böhm, responsible for BMW Plant Digitalization and Cloud Transformation, spoke with Confluent CEO Jay Kreps at the Kafka Summit EU 2021 about their journey towards data in motion in manufacturing. The following are my notes. For more details, feel free to watch the complete conversation on Youtube.

Decoupled IoT Data and Manufacturing

BMW operates mission-critical workloads at the edge (i.e., in the factories) and in the public cloud. Kafka provides decoupling, transparency, and innovation. Confluent adds stability [via products and expertise]. The latter is key for success in manufacturing. Each minute of downtime costs a fortune. Read my related article “Apache Kafka as Data Historian – an IIoT / Industry 4.0 Real-Time Data Lake” to understand how Kafka improves the Overall Equipment Effectiveness (OEE) in manufacturing.

Logistics and supply chain in global plants

The discussed use case covered optimized supply chain management in real-time.

The solution provides information about the right stock in place, both physically and in ERP systems like SAP. “Just in time, just in sequence” is a key concept for many critical applications.

Things BMW couldn’t do before

Get IoT data without interfering with others, and get it to the right place
Collect once, process, and consume several times (by different consumers at different times with different communication paradigms like real-time, batch, request-response)
Enable scalable real-time processing and improve time-to-market with new applications

The true decoupling between different interfaces is a unique advantage of Kafka vs. other messaging platforms such as IBM MQ, Rabbit MQ, or MQTT brokers. I also explored this in detail in my article about Domain-driven Design (DDD) with Kafka

BMW – Machine Learning and Natural Language Processing

Unrelated to the above, another exciting project: BMW has built an “Industry-ready NLP Service Framework Based on Kafka”. The implementation leverage the Kafka ecosystem as an orchestration and processing layer for different NLP (= natural language processing) services:

Kafka Streams applications do the preprocessing to consolidate and enrich the incoming text and speech data sets. Various machine learning / deep learning platforms and technologies consume events for language processing tasks such as speech-to-text translation.

That’s very similar to what I see at other enterprises: There is no single Machine Learning silver bullet. Different use cases – even in one category like NLP – require different technologies. The problem space does not just include model training but also model deployment and monitoring. Python is not the right choice for every data science problem. I explored how to solve the impedance mismatch between the data scientists and production engineers in the post “Apache Kafka + ksqlDB + TensorFlow for Data Scientists via Python + Jupyter Notebook“.

BMW leverage the NLP platform for various use cases, including digital contract intelligence, workplace assistance, machine translation, and service desk automation:

Check out the details in BMW’s Kafka Summit talk about their use cases for Kafka and Deep Learning / NLP.

Audi – Connected Vehicles

Audi has built a connected car infrastructure with Apache Kafka. Their Kafka Summit keynote explored the use cases and architecture:

Use cases include real-time data analysis, swarm intelligence, collaboration with partners, and predictive AI.

Depending on how you define the term and buzzword “Digital Twin”, this is a perfect example: All sensor data from the connected cars are processed in real-time and stored for historical analysis and reporting. Read more about “Kafka for Digital Twin Architectures” here.

The architecture with the Kafka clusters is called “Audi Data Collector”:

Tesla – Connected Everything – Industrial IoT, Cars, Energy

Tesla is not just a car maker. Telsa is a tech company writing a lot of innovative and cutting-edge software. They provide an energy infrastructure for cars with their Telsa Superchargers, solar energy production at their Gigafactories, and much more. Processing and analyzing the data from their cars, smart grids, and factories and integrating with the rest of the IT backend services in real-time is a key piece of their success.

Tesla has built a Kafka-based data platform infrastructure “to support millions of devices and trillions of data points per day”. Tesla showed an interesting history and evolution of their Kafka usage at a Kafka Summit in 2019:

Why Kafka at Telsa?

Telsa chose Kafka to solve the following design requirements (quote from their Kafka Summit presentation):

“Just works”
Flexible batching
One-stream, one-app
Scale with multiple degrees of freedom

Once again, another proof that Kafka is battle-tested, scalable, and reliable for massive IoT workloads across verticals and use cases.

Porsche – Customer 360 and Personalized Experience

‘My Porsche’ is the innovative and modern digital omnichannel platform from Porsche for keeping a great relationship with their customers. Porsche can describe it better than me:

“The way automotive customers interact with brands has changed, accompanied by a major transformation of customer needs and requirements. Today’s brand experience expands beyond the car and other offline touchpoints to include various digital touchpoints. Automotive customers expect a seamless brand experience across all channels — both offline and online.”

The ‘Porsche Dev‘ group from Porsche published a few great posts about their architecture. Here is a good overview:

Kafka provides real decoupling between applications. Hence, Kafka became the defacto standard for microservices and Domain-driven Design (DDD) in many companies. It allows building independent and loosely coupled but scalable, highly available, and reliable applications.

That’s exactly what Porsche describes for their usage of Apache Kafka through its supply chain:

“The recent rise of data streaming has opened new possibilities for real-time analytics. At Porsche, data streaming technologies are increasingly applied across a range of contexts, including warranty and sales, manufacturing and supply chain, connected vehicles, and charging stations” writes Sridhar Mamella (Platform Manager Data Streaming at Porsche).

As you can see in the above architecture, there is no need to fight REST / HTTP and Event Streaming / Kafka enthusiasts! As I explained in detail before, most microservice architectures need Apache Kafka and API Management for REST. HTTP and Kafka complement each other very well!

Porsche – A Central platform as the Backbone of a Data-driven Company

Porsche built a central platform strategy across data centers, clouds, and regions (with the really cool, innovative name Streamzilla) to enable the data-driven company. Their Kafka Summit talk shows this platform in more detail:

One interesting solution is the “Over The Air (OTA) update mechanism” powered by Apache Kafka to enable digital aftersales and other use cases:

Last but not least, a great podcast link: The Porsche guys talked about “Apache Kafka and Porsche: Fast Cars and Fast Data” to explain Streamzilla.

DriveCentric – CRM for Automotive Dealerships

While I showed a lot of great use cases from OEMs, that’s not all. Many Tier 1 and Tier 2 suppliers and other third-party software providers leverage Apache Kafka to build innovative offerings. One example is DriveCentric, a scalable real-time CRM for automotive dealerships.

The solution provides a 360-degree customer experience with effective customer engagement across all channels. A few benefits: Boost engagement, shorten sales cycles, and spur growth. This is yet another great example of why Kafka became the de facto standard for decoupled microservice architectures and omnichannel scenarios.

DriveCentric wanted to focus on business, not infrastructure. For that reason, they started with Confluent Cloud, the (only) truly serverless offering for Kafka. If you don’t understand the difference between a partially managed Kafka offering and a truly fully-managed Kafka offering, check out this discussion about serverless Kafka offerings on the cloud market. A more general comparison of Kafka vendors is also available.

Uber / Lyft / Otonomo / Here Technologies – Innovative Mobility Services

The spectrum of mobility services is huge and still growing at an unbelievable speed. This is where I see most of the innovation for the improvement of customer experiences. This topic is worth its own blog post, but let me share a quick overview about a few of the examples:

Uber uses Kafka at an extreme scale for trillions of messages and multiple petabytes of data per day to “Building the World’s Real-time Transit Infrastructure“.
Lyft, similarly using Kafka everywhere, talked about a specific example for doing streaming analytics to implement map-matching, ETA, and cost calculation in real-time.
Here Technologies, majority-owned by a consortium of German automotive companies (namely Audi, BMW, Daimler) and American semiconductor company Intel, captures location content such as road networks, buildings, parks, and traffic patterns. Its public API exposes a Kafka-native interface instead of just non-scalable REST/HTTP. For this reason, streaming API Management for Kafka gets more and more relevant.
Otonomo is an open API platform for car data that enables you to accelerate time to market for new services. Kafka is part of their central infrastructure to integrate and process the high volumes of vehicle data.
FREE NOW (former mytaxi) is a mobility as a service provider (a joint venture between BMW and Daimler Mobility). You can see them as the “European version of Uber”.

FREE NOW – Real-time Streaming Analytics in the Cloud

Let’s talk about FREE NOW in more detail to explore at least one scenario for mobility services. The use case is very similar to other ride-hailing app: Data correlation for huge volumes of events from various data sources. Real-time. 24/7. A perfect fit for the Kafka ecosystem. The following example show how they leverage stream processing to calculate the surge factor (i.e. the price of the ride depending on the current demand in the region) in real-time:

A few notes from FREE NOW’s Kafka summit talk:

Stateful stream processing with Confluent Cloud, Kafka Connect, Kafka Streams, Schema Registry
Cloud-native application elasticity and scalability leveraging Kafka and Kubernetes capabilities
Use cases: Dynamic pricing, fraud detection, real-time analytics for marketing campaigns, etc.
Various information about the trip, location and business performance

Where Kafka can (not) help in the automotive industry

The various real-world deployments show how well Kafka and its ecosystem fit into the automotive industry. Let’s conclude the post with a few notes about aspects that Kafka is NOT built for:

Hard real-time and safety-critical car IT: Kafka has latency spikes and no detemernistic network (like almost all IT frameworks). Kafka is soft real-time for 10+ milliseconds end-to-end processing. That’s sufficient for most use cases and scalaes for high volumes. But it is not built for safety critical logic in the vehicle, robot or machine. Learn more in the post “Apache Kafka is NOT Hard Real Time BUT Used Everywhere in Automotive and Industrial IoT“.
Last mile integration: Kafka can integrate with the OT world (machines, PLCs, sensors, etc.). Frameworks like PLC4X provide a Kafka Connect connector. Some customers also use Eclipse Kura for IoT integration. The Confluent REST Proxy and other gateways can connect to smart devices and mobile apps. Having said this, in most cases, the last mile integration is implemented with dedicated IoT platforms or HTTP proxies. Kafka itself cannot connect to hundreds of thoudands of devices. It does also not speak low level proprietary legacy protocols. Additionally, Kafka does not work well in unreliable networks. I have several posts covering this discussion, including PLC4X integration, MQTT integration, Kafka as Modern Data Historian, and many more.

Slides and Video for Kafka in the Automotive Industry

I summarized the content in a presentation. Here is the slide deck:

And here is the on-demand video recording walking you through the above slides:

Data in Motion is the new Black in the Automotive Industry

Apache Kafka became the central nervous system of many applications in various different areas related to the automotive industry. You have seen real-world deployments across different fields including connected vehicles, smart manufacturing, and innovative mobility services. Exciting examples covered car makers such as Audi, BMW, Porsche, and Tesla, plus a few mobility services such as Uber, Lyft, and Here Technologies.

How do you leverage Apache Kafka and its ecosystem in the automotive industry? What projects did you already deploy? What technologies do you combine with Kafka and why? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Real-World Deployments of Kafka in the Automotive Industry appeared first on Kai Waehner.

Apache Kafka for Industrial IoT and Manufacturing 4.0

Kai Waehner — Wed, 19 May 2021 08:47:24 +0000

This post explores use cases and architectures for processing data in motion with Apache Kafka in Industrial IoT (IIoT) across verticals such as automotive, energy, steel manufacturing, oil&gas, cybersecurity, shipping, logistics. Use cases include predictive maintenance, quality assurance, track and track, real-time locating system (RTLS), asset tracking, customer 360, and more. Examples include BMW, Bosch, Baader, Intel, Porsche, and Devon.

Why Kafka Is a Key Piece of the Evolution for Industrial IoT and Manufacturing

Industrial IoT was a mess of monolithic and proprietary technologies in the last decades. Modbus, Siemens S7, SCADA, and similar “concepts” controlled the industry. Vendors locked in enterprises by intentionally building incompatible products without open interfaces. These systems still run on Windows XP or similar non-supported outdated operating systems and without security in mind.

Fortunately, this is completely changing. Apache Kafka and its ecosystem play a key role in the IIoT evolution. System integration and data processing get an open architecture with a scalable, reliable infrastructure.

I speak to customers in this industry every week across the globe. Very different challenges, use cases, and innovative ideas originate. I covered this topic a lot in the past, already.

Check out my other related blog posts for Kafka in IIoT and Manufacturing. Learn about use cases and architecture for deployments at the edge (i.e., outside the data center), the relation between Kafka and other IoT standards like MQTT or OPC-UA, and how to build a modern, open and scalable data historian.

I want to highlight one post as it is superimportant for any discussion around shop floors, PLCs, machines, robots, cars, and any other embedded systems: Kafka and other IT software are NOT hard real-time.

This post here “just” shares my latest presentation on this topic, including the slide deck and on-demand video recording. Before we get there, let’s summarize the current scenarios for Kafka in Industrial IoT in one concrete example.

Requirements for Industrial IoT: Everywhere, Complete, Cloud-native!

Let’s take a look at one specific example. The following picture depicts the usage of event streaming in combination with other OT and IT technologies in the shipping industry:

This is an interesting example because it shows many challenges and requirements of many Industrial IoT real-world scenarios across verticals:

Everywhere: Industrial IoT is not possible only in the cloud. The edge is impossible to avoid because manufacturing produces tangible goods. Integration between the (often disconnected) edge and the data center is essential for many use cases.
Complete: Industrial IoT is mission-critical. Stability with zero downtime, security, and safety are crucial across verticals. The only realistic option is a robust, battle-tested enterprise-grade solution to realize IIoT use cases.
Cloud-native: Automation, scalability, decoupled agile applications, and flexibility regarding technologies and products are required for enterprises to stay competitive. Not just in the cloud, but also at the edge! Not all use cases required a critical, scalable solution, though. For instance, a single broker for data processing and storage is sufficient in a disconnected drone.

A unique value of Kafka is that you can use one single technology for scalable real-time messaging, storage and caching, continuous stateless and stateful data processing, and data integration with the OT and IT world. This is especially important at the edge where the hardware is constrained, and the network is limited. It is much easier to operate and much more cost-efficient to deploy one single infrastructure instead of glue together a best-of-breed like you often do in the cloud.

With this introduction, let’s now share the slide deck and video recording to talk about all these points in much more detail.

Slide Deck: Kafka for Industrial IoT and Manufacturing 4.0

Here is the slide deck:

Video Recording: Connect All the Things

Here is the video recording:

Apache Kafka for an open, scalable, flexible IIoT Architecture

Industrial IoT was a mess of monolithic and proprietary technologies in the last decades. Fortunately, Apache Kafka is completely changing many industrial environments. An open architecture with a scalable, reliable infrastructure changes how systems are integrated and how data is processed in the future.

What are your experiences and plans in IIoT projects? What use case and architecture did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka for Industrial IoT and Manufacturing 4.0 appeared first on Kai Waehner.

Apache Kafka and MQTT (Part 2 of 5) – V2X and Connected Vehicles

Kai Waehner — Fri, 19 Mar 2021 08:00:31 +0000

Apache Kafka and MQTT are a perfect combination for many IoT use cases. This blog series covers the pros and cons of both technologies. Various use cases across industries, including connected vehicles, manufacturing, mobility services, and smart city are explored. The examples use different architectures, including lightweight edge scenarios, hybrid integrations, and serverless cloud solutions. This post is part two: Connected Vehicles and V2X applications.

Apache Kafka + MQTT Blog Series

The first blog post explores the relation between MQTT and Apache Kafka. Afterward, the other four blog posts discuss various use cases, architectures, and reference deployments.

Part 1 – Overview: Relation between Kafka and MQTT, pros and cons, architectures
Part 2 – Connected Vehicles (THIS POST): MQTT and Kafka in a private cloud on Kubernetes; use case: remote control and command of a car
Part 3 – Manufacturing: MQTT and Kafka at the edge in a smart factory; use case: Bidirectional OT-IT integration with Sparkplug between PLCs, IoT Gateways, Data Historian, MES, ERP, Data Lake, etc.
Part 4 – Mobility Services: MQTT and Kafka leveraging serverless cloud infrastructure; use case: Traffic jam prediction service using machine learning
Part 5 – Smart City: MQTT at the edge connected to fully-managed Kafka in the public cloud; use case: Intelligent traffic routing by combining and correlating 3rd party services

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts as soon as published.

Use Case: Connected Vehicles and V2X

Vehicle-to-everything (V2X) is communication between a vehicle and any entity that may affect, or may be affected by, the vehicle. It is a vehicular communication system that incorporates other more specific types of communication as V2I (vehicle-to-infrastructure), V2N (vehicle-to-network), V2V (vehicle-to-vehicle), V2P (vehicle-to-pedestrian), V2D (vehicle-to-device), and V2G (vehicle-to-grid). The main motivations for V2X are road safety, traffic efficiency, energy savings, and better driver experience.

V2X includes various use cases. The following picture from 3G4G shows some examples :

Business Point of View for Connected Vehicles

From a business perspective, the following diagram from Frost & Sullivan explains the use cases for connected vehicles very well:

Technical Point of View for V2X and Connected Vehicles

A few things to point out from a technical perspective:

MQTT + Kafka provides a scalable real-time infrastructure for high volumes of data in motion in milliseconds with end-to-end processing between 10 and 20ms. This is good enough for the integration with backend IT systems and almost all mobility services.
MQTT and Kafka are not used for hard real-time and deterministic embedded systems.
Some safety-critical V2X use cases require other communication technologies such as 5G New Radio (NR) / NR C-V2X sidelink to directly connect vehicles or vehicles and local infrastructure (e.g. traffic lights). There is no need for an intermediary cellular network or radio access network (RAN).
Example: A self-driving car executes all its algorithms like image processing and decision making within the car in embedded systems. These use cases require deterministic behavior and hard real-time. Communication with 3rd party such as emergency services, traffic routing, parking, etc., connects to backend systems for data correlation (close to the edge or far away in a cloud data center). Real-time in milliseconds – or sometimes even seconds – is good enough in these cases.
Not every application is for tens or hundreds of thousands of connected vehicles. For instance, a real-time locating system (RTLS) is a perfect example for realizing use cases in logistics in transportation. This can be geofencing within a plant or regional global track&trace. “Real-Time Locating System (RTLS) with Apache Kafka for Transportation and Logistics” explores this use case in more detail.

The following sections focus on use cases that require real-time (but not hard real-time) data integration and processing at scale with 24/7 uptime between vehicles, networks, infrastructure, and applications.

Architecture: MQTT and Kafka for Connected Vehicles

Let’s take a look at an example: Remote control and command of a car. This can be simple scenarios like opening your car trunk from a remote location with your digital key for the mailman or more sophisticated use cases like the payment process for buying a new feature via OTA (over the air) update.

The following diagram shows an architecture for V2X leveraging MQTT and Kafka:

A few notes on the above architecture:

The MQTT and Kafka clusters run in a Kubernetes environment.
Kubernetes allows the deployment across data centers and multiple cloud providers with a single “template”.
Bi-directional communication is guaranteed in reliable, scalable infrastructure end-to-end in real-time.
The MQTT clients from cars and mobile devices communicate with the MQTT cluster. This allows connecting hundreds of thousands of interfaces and support of bad networks.
Kafka is the integration backbone for connected vehicles and mobile devices. Use cases include streaming ETL, correlation of the data in stateful business applications, or ingestion into other IT applications, databases, and cloud services.

V2X with MQTT and Kafka in a 5G Infrastructure

The following diagram shows the above use cases around connected vehicles from the V2X perspective:

The infrastructure is separated into three categories and networks:

The edge (vehicles, devices) using local processing and remote integration via 5G.
MEC (multi-access edge computing) region for low-latency use cases. This example leverages AWS Wavelength for combining the power of 5G with cloud services and Confluent Platform for processing data in motion at scale.
The public cloud infrastructure using AWS and Confluent Cloud for all other cloud-native applications.

The integration between the edge and the IT world depends on the requirements. In this example, we use mostly MQTT but also HTTP for the integration with the Kafka cluster. The connectivity to other IT applications happens via Kafka-native interfaces such as Kafka clients, Kafka Connect, or Confluent’s Cluster Linking (for the bi-directional replication between the AWS Wavelength zone and the AWS cloud region).

Direct communication between vehicles or vehicles and pedestrians requires deterministic behavior and ultra-low latency. Hence, this communication does not use technologies like MQTT or Kafka. Technologies like 5G Sidelink were invented for these requirements.

Let’s now look at two-real world examples for connected vehicles.

Example: MQTT and Kafka for Millions of Connected Cars @ Autonomic

Autonomic built the Transportation Mobility Cloud (TMC), a standard way of accessing connected vehicle data and sending remote commands. This platform provides the foundation to build smart mobility applications related to driver safety, preventive maintenance, fleet management.

Autonomic built a solution with MQTT and Kafka to connect millions of cars. MQTT forwards the car data in real-time to Kafka to distribute the messages to the different microservices and applications in the platform.

This is a great example of combining the benefits of MQTT and Kafka. Read the complete case study from HiveMQ for more details.

Example: Kafka as Car Data Collector @ Audi

Audi started its journey for connected cars a long time ago to collect data from hundreds of thousands of cars in real-time. The car data is collected and processed in real-time with Apache Kafka. The following diagram shows the idea:

As you can imagine, tens of potential use cases exist to reduce cost, improve the customer experience, and increase revenue. The following is the example of a real-time service to find a free parking lot:

Watch Audi’s Kafka Summit keynote for more details about the infrastructure and use cases.

Slide Deck – Kafka for Connected Vehicles and V2X

Here is a slide deck covering this topic in more detail:

Kafka + MQTT = Connected Vehicles and V2X

In conclusion, Apache Kafka and MQTT are a perfect combination for V2X and connected vehicles. It makes so many new IoT use cases possible!

Follow this blog series to learn about use cases such as connected vehicles, manufacturing, mobility services, and smart city. Every blog post also includes real-world deployments from companies across industries. It is key to understand the different architectural options to make the right choice for your project.

What are your experiences and plans in IoT projects? What use case and architecture did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka and MQTT (Part 2 of 5) – V2X and Connected Vehicles appeared first on Kai Waehner.

Apache Kafka for the Connected World – Vehicles, Factories, Cities, Digital Services

Kai Waehner — Mon, 01 Mar 2021 11:30:05 +0000

The digital transformation enables a connected world. People, vehicles, factories, cities, digital services, and other “things” communicate with each other in real-time to provide a safe environment, efficient processes, and a fantastic user experience. This scenario only works well with data processing in real-time at scale. This blog post shares a presentation that explains why Apache Kafka plays a key role in these industries and use cases but also to connect the different stakeholders.

Software is Changing and Connecting the World

Event Streaming with Apache Kafka plays a massive role in processing massive volumes of data in real-time in a reliable, scalable, and flexible way integrating with various legacy and modern data sources and sinks.

I want to give you an overview of existing use cases for event streaming technology in a connected world across supply chains, industries, and customer experiences that come along with these interdisciplinary data intersections:

The Automotive Industry (and it’s not only Connected Cars)
Mobility Services across verticals (transportation, logistics, travel industry, retailing, …)
Smart Cities (including citizen health services, communication infrastructure, …)
Technology Providers (including cloud hyperscaler, software vendors, telco infrastructure, …)

A Connected World with MQ, ETL, ESB, and Kafka

All these industries and sectors do not have new characteristics and requirements. They require data integration, data correlation, and real decoupling. The difference is the massively increased volumes of data.

Real-time messaging solutions have existed for many years. Hundreds of platforms exist for data integration (including ETL and ESB tooling or specific IIoT platforms). Proprietary monoliths monitor plants, telco networks, and other infrastructures for decades in real-time. But now, Kafka combines all the above characteristics in an open, scalable, and flexible infrastructure to operate mission-critical workloads at scale in real-time. And is taking over the world of connecting data.

“Apache vs. MQ/ETL/ESB” goes into more detail about this discussion.

Streaming Data Exchange with Apache Kafka

Before we jump into the presentation, I want to cover one key trend I see across industries: A streaming data exchange with Apache Kafka:

TL;DR: If you use event streaming with Kafka in your projects (for reasons like real-time processing, scalability, decoupling), and your partner does the same, well, then it does NOT make sense to put a REST / HTTP API in the middle. Instead, the partners should be integrated in a streaming way.

APIs and API Management still have their value for some use cases, of course. Check out the comparison of “Event Streaming with Apache Kafka vs. API Gateway / API Management with Mulesoft or Kong” for more details.

Slide Deck

Here is the slide deck covering various use cases and architectures to realize a connected world with Apache Kafka from different perspectives:

On-Demand Video Recording

The on-demand video recording walks you through the above presentation:

Apache Kafka for the Connected World

Connecting the world is a key requirement across industries. Many innovative digital services are only possible through collaboration between stakeholders. Real-time messaging, integration, continuous stream processing, and replication between partners are required. Event Streaming with Apache Kafka helps with the implementation of these use cases.

What are your experiences and plans for event streaming to connect the world? Did you already build applications with Apache Kafka to connect your products and services to partners? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka for the Connected World – Vehicles, Factories, Cities, Digital Services appeared first on Kai Waehner.

Automotive Archives - Kai Waehner

Virta’s Electric Vehicle (EV) Charging Platform with Real-Time Data Streaming: Scalability for Large Charging Businesses

The Evolution and Challenges of Electric Vehicle (EV) Charging

Virta: Innovating the Future of EV Charging

Virta Platform Connecting 100,000+ Charging Stations Serving Millions of EV Drivers

Innovative Industry Partnerships: Virta and Valeo

The Role of Data Streaming in ESG and EV Charging

Virta’s Data Streaming Transformation

Scaling Challenges and the Need for Real-Time Processing

Deploying A Data Streaming Platform for Scalable EV Charging

Key Benefits of a SaaS Data Streaming Platform for Virta

Data Streaming Landscape: Spoilt for Choice – Open Source Kafka, Confluent, and many other Vendors

Business Impact of a Data Streaming Platform

1. Faster Time to Market

2. Instant Updates for Customers and Operators

3. Cost Savings through Usage-Based Pricing

4. Future-Ready Infrastructure for Advanced Analytics

Beyond EV Charging: Broader Energy and ESG Use Cases

The Future of EV Charging with Real-Time Data Streaming using Kafka and Flink

Tesla Energy Platform – The Power of Data Streaming with Apache Kafka

What is a Virtual Power Plant?

How Tesla’s Virtual Power Plant Fits Its Business Model

Virtual Energy Platform and ESG (Environmental, Social, and Governance) Goals

Tesla’s Energy Platform: A Network of Connected Home Energy Systems

Apache Kafka and Real-Time Data Streaming at Tesla

Tesla Energy Platform: Architecture of the Virtual Power Plant Powered by Apache Kafka

Tesla Applications Built on the Energy Platform

Autobidder

Distributed Virtual Power Plant

Battery Control (Command & Control)

Market Participation

Key Components of Tesla’s Energy Platform: Apache Kafka, WebSockets, Akka Streams

Best Practice: Shift Left Architecture with Data Products for High-Volume IoT Data

Tesla’s Energy Vision: How Streaming Data Will Shape Tomorrow’s Power Grids

How Michelin improves Aftermarket Sales and Customer Service with Data Streaming

The State of Data Streaming for Manufacturing in 2023

What is Aftermarket Sales and Service? And how does Data Streaming help?

Challenges with Aftermarket Customer Communication

Data Streaming to make Context-specific Decisions in Real-Time

Michelin: Context-specific Aftermarket Sales and Customer Service

From spaghetti integration to decoupled microservices

From human processes to predictive mobility services

From self-managed operations to fully managed cloud

Lightboard Video: How Data Streaming improves Aftermarket Sales and Customer Service

Apache Kafka for automated business processed and improved aftermarket

Apache Kafka Landscape for Automotive and Manufacturing

The Event Streaming Landscape for Automotive and Manufacturing

Manufacturing 4.0

Smart Factory

Legacy Modernization with Open APIs and Hybrid Cloud

Continuous Data-driven Engineering and Product Development

Supply Chain Optimization

Intra-logistics and Global Distribution Networks

Track & Trace and Fleet Management

Streaming Data Exchange for B2B Collaboration with Partners

Mobility Services

New Business Models

The Data is in Motion in Automotive and Manufacturing

When NOT to use Apache Kafka?

Market Trends – A Connected World

Connected Cars – Insane volume of telemetry data and aftersales

Gaming – Billions of players and massive revenues

What is Apache Kafka, and what is it NOT?

Kafka is…

Kafka is NOT…

Case studies for Apache Kafka in a connected world

When to use Apache Kafka?

Kafka consumes and processes high volumes of IoT and mobile data in real-time

Kafka correlates IoT data with transactional data from the MES and ERP systems

Kafka integrates with all the non-IoT IT in the enterprise at the edge and hybrid or multi-cloud

Kafka is the scalable real-time backend for mobility services and gaming/betting platforms

Vehicles, machines, or IoT devices embed a single Kafka broker

When NOT to use Apache Kafka?

Kafka is NOT hard real-time

Kafka is NOT deterministic for embedded and safety-critical systems

Kafka is NOT built for bad networks

Kafka does NOT provide connectivity to tens of thousands of client applications

When to MAYBE use Apache Kafka?

Kafka does (usually) NOT replace another database

Kafka does (usually) NOT process large messages