IIoT Archives - Kai Waehner

Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming

Kai Waehner — Mon, 17 Mar 2025 12:45:14 +0000

Operational Technology (OT) has traditionally relied on legacy middleware to connect industrial systems, manage data flows, and integrate with enterprise IT. However, these monolithic, proprietary, and expensive middleware solutionsstruggle to keep up with real-time, scalable, and cloud-native architectures.

Just as mainframe offloading modernized enterprise IT, offloading and replacing legacy OT middleware is the next wave of digital transformation. Companies are shifting from vendor-locked, heavyweight OT middleware to real-time, event-driven architectures using Apache Kafka and Apache Flink—enabling cost efficiency, agility, and seamless edge-to-cloud integration.

This blog explores why and how organizations are replacing traditional OT middleware with data streaming, the benefits of this shift, and architectural patterns for hybrid and edge deployments.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including architectures and customer stories for hybrid IT/OT integration scenarios.

Why Replace Legacy OT Middleware?

Industrial environments have long relied on OT middleware like OSIsoft PI, proprietary SCADA systems, and industry-specific data buses. These solutions were designed for polling-based communication, siloed data storage, and batch integration. But today’s real-time, AI-driven, and cloud-native use cases demand more.

Challenges: Proprietary, Monolithic, Expensive

High Costs – Licensing, maintenance, and scaling expenses grow exponentially.
Proprietary & Rigid – Vendor lock-in restricts flexibility and data sharing.
Batch & Polling-Based – Limited ability to process and act on real-time events.
Complex Integration – Difficult to connect with cloud and modern IT systems.
Limited Scalability – Not built for the massive data volumes of IoT and edge computing.

Just as PLCs are transitioning to virtual PLCs, eliminating hardware constraints and enabling software-defined industrial control, OT middleware is undergoing a similar shift. Moving from monolithic, proprietary middleware to event-driven, streaming architectures with Kafka and Flink allows organizations to scale dynamically, integrate seamlessly with IT, and process industrial data in real time—without vendor lock-in or infrastructure bottlenecks.

The Data Streaming Approach: Kafka & Flink as the Foundation for Modern OT Middleware

Data streaming is NOT a direct replacement for OT middleware, but it serves as the foundation for modernizing industrial data architectures. With Kafka and Flink, enterprises can offload or replace OT middleware to achieve real-time processing, edge-to-cloud integration, and open interoperability.

While Kafka and Flink provide real-time, scalable, and event-driven capabilities, last-mile integration with PLCs, sensors, and industrial equipment still requires OT-specific SDKs, open interfaces, or lightweight middleware. This includes support for MQTT, OPC UA or open-source solutions like Apache PLC4X to ensure seamless connectivity with OT systems.

Apache Kafka: The Backbone of Real-Time OT Data Streaming

Kafka acts as the central nervous system for industrial data to ensure low-latency, scalable, and fault-tolerant event streaming between OT and IT systems.

Aggregates and normalizes OT data from sensors, PLCs, SCADA, and edge devices.
Bridges OT and IT by integrating with ERP, MES, cloud analytics, and AI/ML platforms.
Operates seamlessly in hybrid, multi-cloud, and edge environments, ensuring real-time data flow.
Works with open OT standards like MQTT and OPC UA, reducing reliance on proprietary middleware solutions.

And just to be clear: Apache Kafka and similar technologies support “IT real-time” (meaning milliseconds of latency and sometimes latency spikes). This is NOT about hard real-time in the OT world for embedded systems or safety critical applications.

Apache Flink: The Real-Time Processing Engine for OT Data

Flink powers real-time analytics, complex event processing, and anomaly detection for streaming industrial data.

Processes sensor data in real-time for predictive maintenance, energy optimization, and fault detection.
Enables stateful stream processing, supporting event correlation, time-series aggregation, and trend analysis.
Runs on edge devices, data centers, or cloud platforms, providing scalability and deployment flexibility.

By leveraging Kafka and Flink, enterprises can process OT and IT data only once, ensuring a real-time, unified data architecture that eliminates redundant processing across separate systems. This approach enhances operational efficiency, reduces costs, and accelerates digital transformation while still integrating seamlessly with existing industrial protocols and interfaces.

Unifying Operational (OT) and Analytical (IT) Workloads

As industries modernize, a shift-left architecture approach ensures that operational data is not just consumed for real-time operational OT workloads but is also made available for transactional and analytical IT use cases—without unnecessary duplication or transformation overhead.

The Shift-Left Architecture: Bringing Advanced Analytics Closer to Industrial IoT

In traditional architectures, OT data is first collected, processed, and stored in proprietary or siloed middleware systems before being moved later to IT systems for analysis. This delayed, multi-step process leads to inefficiencies, including:

High latency between data collection and actionable insights.
Redundant data storage and transformations, increasing complexity and cost.
Disjointed AI/ML pipelines, where models are trained on outdated, pre-processed data rather than real-time information.

A shift-left approach eliminates these inefficiencies by bringing analytics, AI/ML, and data science closer to the raw, real-time data streams from the OT environments.

Instead of waiting for batch pipelines to extract and move data for analysis, a modern architecture integrates real-time streaming with open table formats to ensure immediate usability across both operational and analytical workloads.

Open Table Format with Apache Iceberg / Delta Lake for Unified Workloads and Single Storage Layer

By integrating open table formats like Apache Iceberg and Delta Lake, organizations can:

Unify operational and analytical workloads to enable both real-time data streaming and batch analytics in a single architecture.
Eliminate data silos, ensuring that OT and IT teams access the same high-quality, time-series data without duplication.
Ensure schema evolution and ACID transactions to enable robust and flexible long-term data storage and retrieval.
Enable real-time and historical analytics, allowing engineers, business users, and AI/ML models to query both fresh and historical data efficiently.
Reduce the need for complex ETL pipelines, as data is written once and made available for multiple workloadssimultaneously. And no need to use the anti-pattern of Reverse ETL.

The Result: An Open, Cloud-Native, Future-Proof Data Historian for Industrial IoT

This open, hybrid OT/IT architecture allows organizations to maintain real-time industrial automation and monitoring with Kafka and Flink, while ensuring structured, queryable, and analytics-ready data with Iceberg or Delta Lake. The shift-left approach ensures that data streams remain useful beyond their initial OT function, powering AI-driven automation, predictive maintenance, and business intelligence in near real-time rather than relying on outdated and inconsistent batch processes.

By adopting this unified, streaming-first architecture to build an open and cloud-native data historian, organizations can:

Process data once and make it available for both real-time decisions and long-term analytics.
Reduce costs and complexity by eliminating unnecessary data duplication and movement.
Improve AI/ML effectiveness by feeding models with real-time, high-fidelity OT data.
Ensure compliance and historical traceability without compromising real-time performance.

This approach future-proofs industrial data infrastructures, allowing enterprises to seamlessly integrate IT and OT, while supporting cloud, edge, and hybrid environments for maximum scalability and resilience.

Key Benefits of Offloading OT Middleware to Data Streaming

Lower Costs – Reduce licensing fees and maintenance overhead.
Real-Time Insights – No more waiting for batch updates; analyze events as they happen.
One Unified Data Pipeline – Process data once and make it available for both OT and IT use cases.
Edge and Hybrid Cloud Flexibility – Run analytics at the edge, on-premise, or in the cloud.
Open Standards & Interoperability – Support MQTT, OPC UA, REST/HTTP, Kafka, and Flink, avoiding vendor lock-in.
Scalability & Reliability – Handle massive sensor and machine data streams continuously without performance degradation.

A Step-by-Step Approach: Offloading vs. Replacing OT Middleware with Data Streaming

Companies transitioning from legacy OT middleware have several strategies by leveraging data streaming as an integration and migration platform:

Hybrid Data Processing
Lift-and-Shift
Full OT Middleware Replacement

1. Hybrid Data Streaming: Process Once for OT and IT

Why?

Traditional OT architectures often duplicate data processing across multiple siloed systems, leading to higher costs, slower insights, and operational inefficiencies. Many enterprises still process data inside expensive legacy OT middleware, only to extract and reprocess it again for IT, analytics, and cloud applications.

A hybrid approach using Kafka and Flink enables organizations to offload processing from legacy middleware while ensuring real-time, scalable, and cost-efficient data streaming across OT, IT, cloud, and edge environments.

How?

Connect to the existing OT middleware via:

A Kafka Connector (if available).
HTTP APIs, OPC UA, or MQTT for data extraction.
Custom integrations for proprietary OT protocols.
Lightweight edge processing to pre-filter data before ingestion.

Use Kafka for real-time ingestion, ensuring all OT data is available in a scalable, event-driven pipeline.

Process data once with Flink to:

Apply real-time transformations, aggregations, and filtering at scale.
Perform predictive analytics and anomaly detection before storing or forwarding data.
Enrich OT data with IT context (e.g., adding metadata from ERP or MES).

Distribute processed data to the right destinations, such as:

Time-series databases for historical analysis and monitoring.
Enterprise IT systems (ERP, MES, CMMS, BI tools) for decision-making.
Cloud analytics and AI platforms for advanced insights.
Edge and on-prem applications that need real-time operational intelligence.

Result?

Eliminate redundant processing across OT and IT, reducing costs.
Real-time data availability for analytics, automation, and AI-driven decision-making.
Unified, event-driven architecture that integrates seamlessly with on-premise, edge, hybrid, and cloud environments.
Flexibility to migrate OT workloads over time, without disrupting current operations.

By offloading costly data processing from legacy OT middleware, enterprises can modernize their industrial data infrastructure while maintaining interoperability, efficiency, and scalability.

2. Lift-and-Shift: Reduce Costs While Keeping Existing OT Integrations

Why?

Many enterprises rely on legacy OT middleware like OSIsoft PI, proprietary SCADA systems, or industry-specific data hubs for storing and processing industrial data. However, these solutions come with high licensing costs, limited scalability, and an inflexible architecture.

A lift-and-shift approach provides an immediate cost reduction by offloading data ingestion and storage to Apache Kafka while keeping existing integrations intact. This allows organizations to modernize their infrastructure without disrupting current operations.

How?

Use the Stranger Fig Design Pattern as a gradual modernization approach where new systems incrementally replace legacy components, reducing risk and ensuring a seamless transition:

“The most important reason to consider a strangler fig application over a cut-over rewrite is reduced risk.” Martin Fowler

Replace expensive OT middleware for ingestion and storage:

Deploy Kafka as a scalable, real-time event backbone to collect and distribute data.
Offload sensor, PLC, and SCADA data from OSIsoft PI, legacy brokers, or proprietary middleware.
Maintain the connectivity with existing OT applications to prevent workflow disruption.

Streamline OT data processing:

Store and distribute data in Kafka instead of proprietary, high-cost middleware storage.
Leverage schema-based data governance to ensure compatibility across IT and OT systems.
Reduce data duplication by ingesting once and distributing to all required systems.

Maintain existing IT and analytics integrations:

Keep connections to ERP, MES, and BI platforms via Kafka connectors.
Continue using existing dashboards and reports while transitioning to modern analytics platforms.
Avoid vendor lock-in and enable future migration to cloud or hybrid solutions.

Result?

Immediate cost savings by reducing reliance on expensive middleware storage and licensing fees.
No disruption to existing workflows, ensuring continued operational efficiency.
Scalable, future-ready architecture with the flexibility to expand to edge, cloud, or hybrid environments over time.
Real-time data streaming capabilities, paving the way for predictive analytics, AI-driven automation, and IoT-driven optimizations.

A lift-and-shift approach serves as a stepping stone toward full OT modernization, allowing enterprises to gradually transition to a fully event-driven, real-time architecture.

3. Full OT Middleware Replacement: Cloud-Native, Scalable, and Future-Proof

Why?

Legacy OT middleware systems were designed for on-premise, batch-based, and proprietary environments, making them expensive, inflexible, and difficult to scale. As industries embrace cloud-native architectures, edge computing, and real-time analytics, replacing traditional OT middleware with event-driven streaming platforms enables greater flexibility, cost efficiency, and real-time operational intelligence.

A full OT middleware replacement eliminates vendor lock-in, outdated integration methods, and high-maintenance costs while enabling scalable, event-driven data processing that works across edge, on-premise, and cloud environments.

How?

Use Kafka and Flink as the Core Data Streaming Platform

Kafka replaces legacy data brokers and middleware storage by handling high-throughput event ingestion and real-time data distribution.
Flink provides advanced real-time analytics, anomaly detection, and predictive maintenance capabilities.
Process OT and IT data in real-time, eliminating batch-based limitations.

Replace Proprietary Connectors with Lightweight, Open Standards

Deploy MQTT or OPC UA gateways to enable seamless communication with sensors, PLCs, SCADA, and industrial controllers.
Eliminate complex, costly middleware like OSIsoft PI with low-latency, open-source integration.
Leverage Apache PLC4X for industrial protocol connectivity, avoiding proprietary vendor constraints.

Adopt a Cloud-Native, Hybrid, or On-Premise Storage Strategy

Store time-series data in scalable, purpose-built databases like InfluxDB or TimescaleDB.
Enable real-time query capabilities for monitoring, analytics, and AI-driven automation.
Ensure data availability across on-premise infrastructure, hybrid cloud, and multi-cloud deployments.

Modernize IT and Business Integrations

Enable seamless OT-to-IT integration with ERP, MES, BI, and AI/ML platforms.
Stream data directly into cloud-based analytics services, digital twins, and AI models.
Build real-time dashboards and event-driven applications for operators, engineers, and business stakeholders.

Result?

Fully event-driven and cloud-native OT architecture that eliminates legacy bottlenecks.
Real-time data streaming and processing across all industrial environments.
Scalability for high-throughput workloads, supporting edge, hybrid, and multi-cloud use cases.
Lower operational costs and reduced maintenance overhead by replacing proprietary, heavyweight OT middleware.
Future-ready, open, and extensible architecture built on Kafka, Flink, and industry-standard protocols.

By fully replacing OT middleware, organizations gain real-time visibility, predictive analytics, and scalable industrial automation, unlocking new business value while ensuring seamless IT/OT integration.

Helin is an excellent example for a cloud-native IT/OT data solution powered by Kafka and Flink to focus on real-time data integration and analytics, particularly in the context of industrial and operational environments. Its industry focus on maritime and energy sector, but this is relevant across all IIoT industries.

The next generation of OT architectures is being built on open standards, real-time streaming, and hybrid cloud.

Most new industrial sensors, machines, and control systems are now designed with Kafka, MQTT, and OPC UA compatibility.
Modern IT architectures demand event-driven data pipelines for AI, analytics, and automation.
Edge and hybrid computing require scalable, fault-tolerant, real-time processing.

Use Kafka Cluster Linking for seamless bi-directional data replication and command&control, ensuring low-latency, high-availability data synchronization across on-premise, edge, and cloud environments.

Enable multi-region and hybrid edge to cloud architectures with real-time data mirroring to allow organizations to maintain data consistency across global deployments while ensuring business continuity and failover capabilities.

It’s Time to Move Beyond Legacy OT Middleware to Open Standards like MQTT, OPC-UA, Kafka

The days of expensive, proprietary, and rigid OT middleware are numbered (at least for new deployments). Industrial enterprises need real-time, scalable, and open architectures to meet the growing demands of automation, predictive maintenance, and industrial IoT. By embracing open IoT and data streaming technologies, companies can seamlessly bridge the gap between Operational Technology (OT) and IT, ensuring efficient, event-driven communication across industrial systems.

MQTT, OPC-UA and Apache Kafka are a match in heaven for industrial IoT:

MQTT enables lightweight, publish-subscribe messaging for industrial sensors and edge devices.
OPC-UA provides secure, interoperable communication between industrial control systems and modern applications.
Kafka acts as the high-performance event backbone, allowing data from OT systems to be streamed, processed, and analyzed in real time.

Whether lifting and shifting, optimizing hybrid processing, or fully replacing legacy middleware, data streaming is the foundation for the next generation of OT and IT integration. With Kafka at the core, enterprises can decouple systems, enhance scalability, and unlock real-time analytics across the entire industrial landscape.

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And make sure to download my free book about data streaming use cases and industry success stories.

The post Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming appeared first on Kai Waehner.

OPC UA, MQTT, and Apache Kafka – The Trinity of Data Streaming in IoT

Kai Waehner — Fri, 11 Feb 2022 03:44:16 +0000

In the IoT world, MQTT (Message Queue Telemetry Transport protocol) and OPC UA (OPC Unified Architecture) have established themselves as open and platform-independent standards for data exchange in Internet of Things (IIoT) and Industry 4.0 use cases. Data Streaming with Apache Kafka is the data hub for integrating and processing massive volumes of data at any scale in real-time. This blog post explores the relationship between Kafka and the IoT protocols, when to use which technology, and why sometimes HTTP/REST is the better choice. The end explores real-world case studies from Audi and BMW.

Industry 4.0: Data streaming platforms increase overall plant effectiveness and connect equipment

Machine data must be transformed and made available across the enterprise as soon as it is generated to extract the most value from the data. As a result, operations can avoid critical failures and increase the effectiveness of their overall plant.

Automotive manufacturers such as BMW and Tesla have already recognized the potential of data streaming platforms to get their data moving with the power of the Apache Kafka ecosystem. Let’s explore the benefits of data streaming and how this technology enriches data-driven manufacturing companies.

The goals of increasing digitization and automation of the manufacturing sector are many:

To make production processes more efficient
Faster and cheaper overall
To minimize error rates.

Manufacturers are also striving to increase overall equipment effectiveness (OEE) in their production facilities – from product design and manufacturing to maintenance operations. This confronts them with equally diverse challenges. Industry 4.0 respectively Industrial IoT (IIoT) means that the amount of data generated daily is increasing and needs to be transported, processed, analyzed, and made available through systems in near real-time.

Complicating matters further is that legacy IT environments continue to live in today’s manufacturing facilities. This limits manufacturers’ ability to efficiently integrate data across operations. Therefore, most manufacturers require a hybrid data replication and synchronization strategy.

An adaptive manufacturing strategy starts with real-time data

Automation.com published an excellent article explaining the need for real-time processes and monitoring to provide a flexible production line. TL;DR: Processes should be real-time when possible, but real-time is not always possible, even within an application. Think about just-in-time production fighting with the supply chain issues because of the Covid pandemic and the Suez Canal block in 2021.

The theory of just-in-time production does not work with supply chain issues! You need to provide flexibility and be able to switch between different approaches:

Just-in-time (JIT) vs. make to forecast
Fixed vs. variable price contracts
Build vs. buy plant capacity
Staffed vs. lights-out third shift
Linking vs. not linking prices for materials and finished goods

Kappa architecture for a real-time IoT data hub

Real-time production and process monitoring data are essential for success! This evolution is only possible with real-time Kappa architecture. Lambda architecture with batch workloads either completely fails or makes things much more complex and costs from an IT infrastructure and OEE perspective.

For clarification, when I speak about real-time, I talk about millisecond latency. This is not hard real-time and deterministic like in safety-critical and embedded environments. The post “Apache Kafka is NOT Hard Real-Time BUT Used Everywhere in Automotive and Industrial IoT” elaborates on this topic.

In IoT, the MQTT and OPC UA have established standards for data exchange as platform-independent open standards. See what this combination of IoT protocols and Kafka looks like in a smart factory.

When to use Kafka vs. MQTT and OPC UA?

Kafka is a fantastic data streaming platform for messaging, storage, data integration, and data processing in real-time at scale. However, it is not a silver bullet for every problem!

Kafka is NOT…

A proxy for millions of clients (like mobile apps) – but Kafka-native proxies (like REST or MQTT) exist for some use cases.
An API Management platform – but these tools are usually complementary and used for the creation, life cycle management, or the monetization of Kafka APIs.
A database for complex queries and batch analytics workloads – but good enough for transactional queries and relatively simple aggregations (especially with ksqlDB).
An IoT platform with features such as device management – but direct Kafka-native integration with (some) IoT protocols such as MQTT or OPC-UA is possible and the appropriate approach for (some) use cases.
A technology for hard real-time applications such as safety-critical or deterministic systems – but that’s true for any other IT framework, too. Embedded systems are different software!

For these reasons, Kafka is complementary, not competitive, to MQTT and OPC UA. Choose the right tool for the job and combine them! I wrote a detailed blog post exploring when NOT to use Apache Kafka. The above was just the summary.

You should also think about this question from the other side to understand when a message broker is not the right choice. For instance, United Manufacturing Hub is an open-source manufacturing data infrastructure that recently migrated from MQTT as messaging infrastructure to Kafka as the central nervous system because of its storage capabilities, higher throughput, and guaranteed ordering. However, to be clear, this update is not replacing but complementing MQTT with Kafka.

Meeting the challenges of Industry 4.0 through data streaming and data mesh

Machine-to-machine communications and the (Industrial) Internet of Things enable automation, data-driven monitoring, and the use of intelligent machines that can, for example, identify defects and vulnerabilities on their own.

For all these scenarios, large volumes of data must be processed in near real-time and made available across plants, companies, and, under certain circumstances, worldwide via a stream data exchange:

This novel design approach is often implemented with Apache Kafka as decentralized data streaming data mesh.

The essential requirement here is integrating various systems, such as edge and IoT devices and business software, and execution independent of the underlying infrastructure (edge, on-premises as well as public, multi-, and hybrid cloud).

Therefore, an open, elastic, and flexible architecture is essential to integrate with the legacy environment while taking advantage of modern cloud-native applications.

Event-driven, open, and elastic data streaming platforms such as Apache Kafka serve precisely these requirements. They collect relevant sensor and telemetry data alongside data from information technology systems and process it while it is in motion. That concept is called “data in motion“. The new fundamental change differs significantly from processing “data at rest“, meaning you store events in a database and wait until someone else looks at them later. The latter is a “too late architecture” in many IoT use cases.

Separation of concerns in the OT/IT world with domain-driven design and true decoupling

Data integration with legacy and modern systems takes place in near real-time – target systems can use relevant data immediately. It doesn’t matter what infrastructure the plant’s IT landscape is built on. Besides the continuous flow of data, the decoupling of systems also allows messages to be stored until the target systems need them.

That feature of true decoupling with backpressure handling and replayability of data is a unique differentiator compared to other messaging systems like RabbitMQ in the IT space or MQTT in the IoT space. Kafka is also highly available and fail-safe, which is critical in the production environment. “Domain-driven design (DDD) with Apache Kafka” dives deeper into this benefit:

How to choose between OPC UA and MQTT with Kafka?

Three de facto standards for open and standardized IoT architectures. Two IoT-specific protocols and REST / HTTP as simple (and often good enough) options. Modern proprietary protocols compete in the space, too:

OPC UA (Open Platform Communications Unified Architecture)
MQTT (Message Queuing Telemetry Transport)
REST / HTTP
Proprietary protocols and IoT platforms

These alternatives are great vs. the legacy proprietary monolith world of the last decades in the OT/IT and IoT space.

MQTT vs. OPC UA (vs. HTTP vs. Proprietary)

First of all, this discussion is only relevant if you have the choice. If you buy and install a new machine or PLC on your shop floor and that one only offers a specific interface, then you have to use it. However, new software like IoT gateways provides different options to choose from.

How to compare these communication protocols?

Well, frankly, it is challenging as most literature is opinionated and often includes FUD about the “competing protocols”. Every alternative has its sweet spots. Hence, it is more of an apples and oranges comparison.

More or less randomly, I googled “OPC UA vs MQTT” and found the following interesting comparison from Skynet’s proprietary DataHub Transfer Protocol (DHTP). The vendor pitches its commercial product against the open standards (and added AMQP as an additional alternative):

Each comparison on the web differs. The above comparison is valid (and some people will disagree with some points). And sometimes, proprietary solutions provide the better choice from a TCO and ROI perspective, too.

Hint: Look at different comparisons. Understand if the publication is related to a specific vendor and standard. Evaluate several solutions and vendors to understand the differences and added value.

Decision tree for evaluating IoT protocols

My recommendation for comparing the different IoT protocols is to use open standards whenever possible. Choose the right tool for the job and combine them in a best-of-breed approach as needed.

Let’s take at a simple decision tree to decide between OCP UA, MQTT, HTTP, and other proprietary IIoT protocols (note: This is just a very simplified point of view, and you can build your opinion with different decisions, of course):

A few notes on the reasoning for how I built this decision tree:

HTTP / REST is perfect for simple use cases (keep it as simple as possible). HTTP is supported almost everywhere, well understood, and simple to use. No additional tooling, APIs, or middleware is needed. Communication is synchronous request-response. Conversations with security teams are much easier if you just need to open port 80 or 443 for HTTP(S) instead of TCP ports, like most other protocols. HTTP is unidirectional communication (e.g., a connected car needs an HTTP server to get data pushed from the cloud – pub/sub is the right choice instead of HTTP here).
MQTT is perfect for intermittent networks, respectively limited bandwidth and/or connecting tens or hundreds of thousands of devices (e.g., connected car infrastructure). Communication is asynchronous publish/subscribe using an MQTT broker as the middleman. MQTT uses no standard data format. But developers can use Sparkplug as an add-on built for this purpose. MQTT is incredibly lightweight. Features like Quality of Service (QoS), last will, and testament solve many requirements for IoT use cases out-of-the-box. MQTT is excellent for IT use cases and can easily be used for bidirectional communication (e.g., connected cars <–> cloud communication). LoRaWAN and other low-power wide-area networks are great for MQTT, too.
OPC UA is perfect for industrial automation (e.g., machines at the production line). Communication is usually client/server today, but publish/subscribe is also supported. It uses standard data formats and provides a rich (= powerful but also complex) set of features, components, and industry-specific data formats. OPC UA is excellent for OT/IT integration scenarios. OPC UA TSN (time-sensitive networking), one optional component, is an Ethernet communication standard that provides open, deterministic, hard real-time communication.
Proprietary protocols suit specific problems that standard-based implementations cannot solve similarly. These protocols have various trade-offs. Often powerful and performant, but also expensive and proprietary.

Choosing between OPC UA, MQTT, and other protocols isn’t an either/or decision. Each protocol plays its role and excels at certain use cases. An optimal modern industrial network uses OPC UA and MQTT for modern applications. Both together combine the strengths of each and mitigate their downsides. Legacy applications and proprietary SCADA systems or other data historians are usually integrated with other existing proprietary middleware.

Many IIoT platforms, such as Siemens, OSIsoft, or Inductive Automation, support various modern and legacy protocols. Some smaller vendors focus on a specific sweet spot, like HiveMQ for MQTT or OPC Router for OPA-UA.

Integration between MQTT / OPC UA and Kafka

A few integration options between equipment, machines, and devices that support MQTT or OPC UA and Kafka are:

Kafka Connect connectors: Native Kafka integration on protocol level. Check Confluent Hub for a few alternatives. Some enterprises built their custom Kafka Connect connectors.
Custom integration: Integration via a low level MQTT / OPC UA API (e.g. using Kafka’s HTTP / REST Proxy) or Kafka client (e.g. .NET / C++ for Windows environments).
Modern and open 3rd party IoT middleware: Generic open source integration middleware (e.g., Apache Camel with its IoT connectors), IoT-specific frameworks (like Apache PLC4X or Eclipse Ditto), or proprietary 3rd party IoT middleware with open and standards-based APIs
Commercial IoT platforms: Best fit for existing historical deployments and glue code with legacy protocols such as Modbus, Siemens S7, et al. Traditional data historians, proprietary protocols, monolith architectures, limited scalability, batch ETL platforms, work well for these workloads to connect the past with the future of the OT/IT world and to create a bridge between on-premise and cloud. Almost all IoT platforms added connectors for MQTT, OCP UA, and Kafka in the meantime.

OEE scenarios that benefit from data streaming

Data streaming platforms apply in various use cases to increase overall plant effectiveness as the central nervous system. These include connectivity via industry standards such as OPC UA or MQTT, visualization of multiple devices and assets in digital twins, and modern maintenance in the form of condition monitoring and predictive maintenance.

Connectivity to machines and equipment with OPC UA or MQTT

OPC UA and MQTT are not designed for data processing and integration. Instead, the strength is that bidirectional “last mile communication” to devices, machines, PLCs, IoT gateway, or vehicles is established in real-time.

As discussed above, both standards have different “sweet spots” and can also be combined: OPC UA is supported by almost all modern machines, PLCs, and IoT gateways for the smart factory. MQTT is used primarily in poor networks and/or also for thousands and hundreds of thousands of devices.

These data streams are then streamed into the data streaming platforms via connectors. The streaming platform can either be deployed in parallel with an IoT platform ‘at the edge’ or combined in hybrid or cloud scenarios.

The data streaming platform is a flexible data hub for data integration and processing between OT and IT applications. Besides OPC UA and MQTT on the OT side, various IT applications such as MES, ERP, CRM, data warehouse, or data lake are connected in real-time, regardless of whether they are operated ‘at the edge’, on-premise, or in the cloud.

More details: Apache Kafka as Data Historian – an IIoT / Industry 4.0 Real-Time Data Lake.

Digital twins for development and predictive simulation

By continuously streaming data and processing and integrating sensor data, data streaming platforms enable the creation of an open, scalable, and highly available infrastructure for the deployment of Digital Twins.

Digital Twins combine IoT, artificial intelligence, machine learning, and other technologies to create a virtual simulation of, for example, physical components, devices, and processes. They can also consider historical data and update themselves as soon as the data generated by the physical counterpart changes.

Kafka is the leading system in the following digital twin example:

Kafka is combined with other technologies to build a digital twin most times. For instance, Eclipse Ditto is a project combining Kafka with IoT protocols. And some teams made a custom digital twin with Kafka and a database like MongoDB.

IoT Architectures for Digital Twin with Apache Kafka provide more details about different digital twin architectures.

Industry 4.0 benefits from digital twins, as they allow detailed insight into the lifecycle of the elements they simulate or monitor. For example, product and process optimization can be carried out, individual parts or entire systems can be tested for their functionality and performance, or forecasts can be made about energy consumption and wear and tear.

Condition monitoring and predictive maintenance

For modern maintenance, machine operators mainly ask themselves questions: Are all devices functioning as intended? How long will these devices usually function before maintenance work is necessary? What are the causes of anomalies and errors?

On the one hand, Digital Twins can also be used here for monitoring and diagnostics. They correlate current sensor data with historical data, which makes it possible to identify the causes of faults and expect maintenance measures.

On the other hand, production facilities can also benefit from data streaming in this area. A prerequisite for Modern Maintenance is a reliable and scalable infrastructure that enables the processing, analysis, and integration of data streams. This allows the detection of critical changes in plants, such as severe temperature fluctuations or vibrations, in near real-time, after which operators can initiate measures to maintain plant effectiveness.

Above all, more efficient predictive maintenance scheduling saves manufacturing companies valuable resources by ensuring equipment and facilities are serviced only when necessary. In addition, operators avoid costly downtime periods when machines are not productive for a while.

More details: Condition Monitoring and Predictive Maintenance with Apache Kafka.

Connected cars and streaming machine learning

A connected car is a car that can communicate bidirectionally with other systems outside of the vehicle. This allows the car to share internet access and data with other devices and applications inside and outside the car. The possibilities are endless! MQTT in conjunction with Kafka is more or less a de facto standard architecture for connected car use cases and infrastructures.

The following shows how to integrate with tens or hundreds of thousands of IoT devices and process the data in real-time. The demo use case is predictive maintenance (i.e., anomaly detection) in a connected car infrastructure to predict motor engine failures:

The blog post “IoT Live Demo – 100.000 Connected Cars with Kubernetes, Kafka, MQTT, TensorFlow” explores the architecture and implementation in more detail. The source code is available on Github.

BMW case study: Manufacturing 4.0 with smart factory and cloud

I spoke with Felix Böhm, responsible for BMW Plant Digitalization and Cloud Transformation, at our Data in Motion tour in Germany in 2021. We talked about their journey towards data in motion in manufacturing and the use cases and architectures. He also talked to Confluent CEO Jay Kreps at the Kafka Summit EU 2021.

Kafka and OPC UA as real-time data hub between equipment at the edge and applications in the cloud

Let’s explore this BMW success story from a technical perspective.

Decoupled IoT Data and Manufacturing

BMW connects workloads from their global smart factories and replicates them in real-time in the public cloud. The team uses an OPC UA connector to directly communicate with Confluent Cloud in Azure.

Kafka provides decoupling, transparency, and innovation. Confluent adds stability via products and expertise. The latter is critical for success in manufacturing. Each minute of downtime costs a fortune. Read my related article “Apache Kafka as Data Historian – an IIoT / Industry 4.0 Real-Time Data Lake” to understand how Kafka improves the Overall Equipment Effectiveness (OEE) in manufacturing.

Logistics and supply chain in global plants

The discussed use case covered optimized supply chain management in real-time.

The solution provides information about the right stock in place, both physically and in ERP systems like SAP. “Just in time, just in sequence” is crucial for many critical applications.

Things BMW couldn’t do before

Get IoT data without interfering with others, and get it to the right place
Collect once, process, and consume several times (by different consumers at different times with varying paradigms of communication like real-time, batch, request-response)
Enable scalable real-time processing and improve time-to-market with new applications

The true decoupling between different interfaces is a unique advantage of Kafka vs. other messaging platforms such as IBM MQ, Rabbit MQ, or MQTT brokers. I also explored this in my article about Domain-driven Design (DDD) with Kafka.

Check out “Apache Kafka Landscape for Automotive and Manufacturing” for more Kafka architectures and use cases in this industry.

Audi case study – Connected cars for swarm intelligence

Audi has built a connected car infrastructure with Apache Kafka. Their Kafka Summit keynote explored the use cases and architecture:

Use cases include real-time data analysis, swarm intelligence, collaboration with partners, and predictive AI.

Depending on how you define the term and buzzword “Digital Twin“, this is a perfect example: All sensor data from the connected cars are processed in real-time and stored for historical analysis and reporting. Read more about “Kafka for Digital Twin Architectures” here.

I wrote a whole blog series with many more practical use cases and architecture for Apache Kafka and MQTT to learn more.

Serverless data streaming enables focusing on IoT business applications and improving OEE

An event-driven data streaming platform is elastic and highly available. It represents an opportunity to increase production facilities’ overall asset effectiveness significantly.

With the help of their data processing and integration capabilities, data streaming complements machine connectivity via MQTT, OPC UA, HTTP, among others. This allows streams of sensor data to be transported throughout the plant and to the cloud in near real-time. This is the basis for the use of Digital Twins as well as Modern Maintenance such as Condition Monitoring and Predictive Maintenance. The increased overall plant effectiveness not only enables manufacturing companies to work more productively and avoid potential disruptions, but also to save time and costs.

I did not talk about operating the infrastructure for data streaming and IoT. TL;DR: Go serverless if you can. That enables you to focus on solving business problems. The above example of BMW had exactly this motivation and leverages Confluent Cloud for this reason to roll out their smart factory use cases across the globe. “Serverless Kafka” is your best choice for data streaming if connectivity and the network infrastructure allow it in your IoT projects.

Do you use MQTT or OPC UA with Apache Kafka today? What use cases? Or do you rely on the HTTP protocol because it is good enough and simpler to integrate? How do you decide which protocol to choose? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post OPC UA, MQTT, and Apache Kafka – The Trinity of Data Streaming in IoT appeared first on Kai Waehner.

Apache Kafka and MQTT (Part 1 of 5) – Overview and Comparison

Kai Waehner — Mon, 15 Mar 2021 13:04:33 +0000

Apache Kafka and MQTT are a perfect combination for many IoT use cases. This blog series covers the pros and cons of both technologies. Various use cases across industries, including connected vehicles, manufacturing, mobility services, and smart city are explored. The examples use different architectures, including lightweight edge scenarios, hybrid integrations, and serverless cloud solutions. This post is part one: Overview and Comparison.

Gartner predicts: “Around 10% of enterprise-generated data is created and processed outside a traditional centralized data center or cloud. By 2025, this figure will reach 75%“. Hence, looking at the combination of MQTT and Kafka makes a lot of sense!

Apache Kafka + MQTT Blog Series

The first blog post explores the relation between MQTT and Apache Kafka. Afterward, the other four blog posts discuss various use cases, architectures, and reference deployments.

Part 1 – Overview (THIS POST): Relation between Kafka and MQTT, pros and cons, architectures
Part 2 – Connected Vehicles: MQTT and Kafka in a private cloud on Kubernetes; use case: remote control and command of a car
Part 3 – Manufacturing: MQTT and Kafka at the edge in a smart factory; use case: Bidirectional OT-IT integration with Sparkplug B between PLCs, IoT Gateways, Data Historian, MES, ERP, Data Lake, etc.
Part 4 – Mobility Services: MQTT and Kafka leveraging serverless cloud infrastructure; use case: Traffic jam prediction service using machine learning
Part 5 – Smart City: MQTT at the edge connected to fully-managed Kafka in the public cloud; use case: Intelligent traffic routing by combining and correlating different 1st and 3rd party services

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts as soon as published.

Apache Kafka vs. MQTT

MQTT is an open standard for a publish/subscribe messaging protocol. Open source and commercial solutions provide implementations of different MQTT standard version. MQTT was built for IoT use cases, including constrained devices and unreliable networks. However, it was not built for data integration and data processing.

Contrary to the above, Apache Kafka is not an IoT platform. Instead, Kafka is an event streaming platform and used the underpinning of an event-driven architecture for various use cases across industries. It provides a scalable, reliable, and elastic real-time platform for messaging, storage, data integration, and stream processing.

To clarify, MQTT and Kafka complement each other. Both have their strength and weaknesses. They typically do not compete with each other.

When (not) to use MQTT?

This section explores the trade-offs of both technologies.

Pros of MQTT

Lightweight
Built for poor connectivity / high latency scenarios (e.g., mobile networks!)
High scalability and availability (with the right MQTT broker)
ISO Standard
Most popular IoT protocol
Deployable on all infrastructures (edge, data center, public cloud)

Cons of MQTT

Only pub/sub messaging, no stream processing, no data integration
Asynchronous processing (clients can be offline for a long time)
No reprocessing of events

When (not) to use Apache Kafka?

Pros of Kafka

Processing data in motion – not just pub/sub messaging, but also including stream processing and data integration
High throughput
Large scale
High availability
Long term storage and buffering for real decoupling of producers and consumers
Reprocessing of events
Good integration to the rest of the enterprise
Deployable on all infrastructures (edge, data center, public cloud)

Cons of Kafka

Not built for tens of thousands of connections
Requires stable network and good infrastructure
No IoT-specific features like keep alive, last will, or testament

TL;DR

Choose MQTT for messaging if you have a bad network, tens of thousands of clients, or the need for a lightweight push-based messaging solution, then MQTT is the right choice. Elsewhere, Kafka, a powerful event streaming platform, is probably a great choice for messaging, data integration, and data processing. In many IoT use cases, the architecture combines both technologies.

Example: Predictive Maintenance with 100,000 Connected Cars

We have built a “simple demo” with Confluent and HiveMQ running on cloud-native Kubernetes infrastructure. The use case implements data processing from 100,000 connected cars and real-time predictive maintenance with TensorFlow:

More details, a video of the demo, and the code are available on Github: 100,000 Connected Cars and Predictive Maintenance in Real-Time with MQTT, Kafka, Kubernetes, and TensorFlow.

Additionally, I blogged a lot about Kafka and Machine Learning. For instance, take a look at streaming machine learning without a data lake with Kafka and TensorFlow.

Further Slides, Articles, and Demos

I already created a lot of material (including blogs, slides, videos) around Kafka and MQTT. Check this out if you need to learn about the different broker alternatives, integration options, and best practices for the combination.

The following slide deck covers the content of this blog series on a high level:

Kafka + MQTT = Match Made in Heaven

In conclusion, Apache Kafka and MQTT are a perfect combination for many IoT use cases. Follow the blog series to learn about use cases such as connected vehicles, manufacturing, mobility services, and smart city. Every blog post also includes real-world deployments from companies across industries. It is key to understand the different architectural options to make the right choice for your project.

What are your experiences and plans in IoT projects? What use case and architecture did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka and MQTT (Part 1 of 5) – Overview and Comparison appeared first on Kai Waehner.

Infrastructure Checklist for Apache Kafka at the Edge

Kai Waehner — Wed, 03 Feb 2021 12:39:30 +0000

Event streaming with Apache Kafka at the edge is getting more and more traction these days. It is a common approach to providing the same open, flexible, and scalable architecture in the cloud and at the edge outside the data center. Possible locations for Kafka edge deployments include retail stores, cell towers, trains, small factories, restaurants, hospitals, stadiums, etc. This post explores a checklist with infrastructure questions you need to check and evaluate if you want to deploy Kafka at the edge.

Apache Kafka at the Edge == Outside the Data Center

I already discussed the concepts and architectures of Kafka at the edge in detail in the past:

This blog post explores a checklist of common infrastructure questions you need to answer and doublecheck before planning to deploy Kafka at the edge.

What is the Edge?

The term ‘edge’ needs to be defined to have the same understanding. When I talk about the edge in the context of Kafka, it means:

Edge is NOT a data center, i.e., limited compute, storage, network bandwidth
Kafka clients AND the Kafka broker(s) deployed here, not just the client applications
Offline business continuity, i.e., the workloads continue to work even if there is no connection to the cloud
Often 100+ locations, like restaurants, coffee shops, or retail stores, or even embedded into 1000s of devices or machines
Low-footprint and low-touch, i.e., Kafka can run as a normal highly available cluster or as a single broker (no cluster, no high availability); often shipped “as a preconfigured box” in OEM hardware (e.g., Hivecell)
Hybrid integration, i.e., most use cases require uni- or bidirectional communication with a remote Kafka cluster in a data center or the cloud

Let’s recap one architecture example that deploys Kafka in the cloud and at the edge: A hybrid event streaming architecture for real-time omnichannel retail and customer 360:

This definition of a ‘Kafka edge deployment‘ can also be summarized as an ‘autonomous edge‘ or ‘disconnected edge‘. On the other side, the ‘connected edge’ means that Kafka clients at the edge connect directly to a remote data center or cloud.

Infrastructure Checklist: How to Deploy Apache Kafka at the Edge?

I talked to 100+ customers and prospects across industries with the need to do edge computing for different reasons, including bad internet connection, reduced cost, low latency requirements, and security implications.

The following discussion points and questions come up all the time. Make sure to discuss them with your project team:

What are the use cases for Kafka at the edge? For instance, edge processing (e.g., business logic/analytics), replication to the cloud (uni- or bi-directional), data integration (e.g., 0 to devices, IoT gateways, local databases)?
What is the data model, and what the replication scenarios and SLAs (aggregation to “just gather data”, command&control to send data back to the edge, local analytics, etc.)? Check out Kafka-native replication tools, especially MirrorMaker 2 and Confluent’s Cluster Linking.
What is the main motivation for doing edge processing (vs. ingestion into a DC/cloud for all processing)? Examples: Low latency requirements, cost-efficiency, business continuity even when offline / disconnected from the cloud, etc.
How many “edge sites” do you plan to deploy to (e.g., retail stores, factories, restaurants, trains, …)? This needs to be considered from the beginning. If you want to roll out edge computing to thousands of restaurants, you need a different hardware and automation strategy than deploying to just ten smart factories worldwide.
What hardware do you use at the edge (e.g., hardware specifications)? How much memory, disk, CPU, etc., is available? Do you work with a specific hardware vendor? What are the support model and monitoring setup for the edge computers?
What network do you use? Is it stable? What is the connection to the cloud? If it is a stable connection (like AWS DirectConnect or Azure ExpressRoute), do you still need Kafka at the edge?
What is the infrastructure you plan to run Kafka on at the edge (e.g., operating system, container, Kubernetes, etc.)?
Do you need high availability and a ‘real’ Kafka cluster with 3+ brokers? Or is a single broker good enough? In many cases, the latter is good enough to decouple edge and cloud, handle backpressure, and enable business continuity even if the internet connection is gone for some time.
What edge protocols do you need to integrate with? is Kafka Connect sufficient with its connectors, or do you need a 3rd party IoT gateway? Common integration points at the edge are OPC UA, MQTT, proprietary PLC, traditional relational databases, files, IoT Gateways, etc.
Do you need to process the data at the edge? Kafka-native stream processing with Kafka Streams or ksqlDB is usually a straightforward and lightweight, but still scalable and reliable option. Almost all use cases I have seen at least need some streaming ETL at the edge. For instance, preprocess and filter data so that you only send relevant, aggregated data over the network to the cloud. However, many customers also deploy business applications at the edge, for instance, for real-time model inference.

How will fleet management work? Which part of the infrastructure or tool handles the management and operations of the edge machines. In most cases, this is not specific for Kafka but instead handled on the infrastructure level. For instance, if you run a Kubernetes cluster, Rancher might be used to provision and manage the edge clusters, including the Kafka ecosystem. Of course, specific Kafka metrics are also integrated here, for instance via Prometheus if you are using Kubernetes.

Discussing and answering these questions will help you with your planning for Kafka at the edge. Are there any key questions missing? Please let me know and I will update the list.

Kafka at the Edge is the new Black!

Apache Kafka at the edge is a common approach to providing the same open, flexible, and scalable architecture in the cloud and outside the data center. A huge benefit is that the same technology and architecture and be deployed everywhere across regions, sites, and clouds. This is a real hybrid architecture combing edge sites, data centers, and multiple clouds! Discuss the above infrastructure checklist with your team to be successful.

What are your experiences and plans for event streaming with Apache Kafka at the edge? Did you already deploy Apache Kafka on a small node somewhere, maybe even as a single broker setup? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Infrastructure Checklist for Apache Kafka at the Edge appeared first on Kai Waehner.

Use Cases and Architectures for Kafka at the Edge

Kai Waehner — Wed, 14 Oct 2020 12:54:32 +0000

Event streaming with Apache Kafka at the edge is not cutting edge anymore. It is a common approach to providing the same open, flexible, and scalable architecture at the edge as in the cloud or data center. Possible locations for a Kafka edge deployment include retail stores, cell towers, trains, small factories, restaurants, etc. I already discussed the concepts and architectures in detail in the past: “Apache Kafka is the New Black at the Edge” and “Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments“. This blog post is an add-on focusing on use cases across industries for Kafka at the edge.

To be clear before you read on: Edge is NOT a data center.

And “Edge Kafka” is not simply yet another IoT project using Kafka in a remote location. Edge Kafka is actually an essential component of a streaming nervous system that spans IoT (or OT in Industrial IoT) and non-IoT (traditional data-center / cloud infrastructures).

The post’s focus is scenarios where the Kafka clients AND the Kafka brokers are running on the edge. This enables edge processing, integration, decoupling, low latency, and cost-efficient data processing.

Categories and Architectures for Kafka at the Edge

Some IoT projects are built like “normal Kafka projects”, i.e., built in the (edge) data center or cloud. For instance, bigger factories can provide infrastructure to deploy a reliable Kafka cluster with stable network connectivity to the cloud. Unfortunately, many IoT projects require real edge capabilities.

What’s different at the edge?

Offline business continuity is important even if the connection to the central data center or cloud is not available. Disconnected / offline sites do often not require or provide high availability (because it is not worth the efforts): Local pre-processing, real-time analytics with low latency, only online (i.e., connection to the data center or cloud) from time to time or with low bandwidth.
Often these projects need to deploy Kafka brokers across hundreds of locations. A single broker is often good enough, without high availability, but for back pressure and local processing. Use cases exist across industries, including retail stores, trains, restaurants, cell towers, small factories, etc.
Low-footprint, low-touch, little-or-no-DevOps-required installations of Kafka brokers (not just clients) are mandatory for many of these use-cases. In these cases, no IT experts are available “on-site” to operate Kafka. Hence, using certified OEM hardware is a great option to install and operate Kafka at the edge.
Many edge use cases are all around sensor and telemetry data. This is not transactional data where every single message counts. An application that processes millions of messages per second is fine with losing a few of the messages as it does not affect the outcome of the calculation.
Hybrid and not cloud-only: Consumer IoT (CIoT) always includes the users in their smart home, ride-share, retail store, etc.), Industrial IoT (IIoT) always includes tangible good (cars, food, energy, …)
Thousands and tens of thousands of connected interfaces: Sensors, machines, mobile devices, etc.

Using one single technical infrastructure enables building edge and hybrid architectures. No need for a ton of different frameworks and products is required. This is a huge benefit from a development, testing, operations, support point of view!

Use Cases Across Industries

Industries for Kafka at the Edge include manufacturing, pharma, carmakers, telecommunications, retailing, energy, restaurants, gaming, healthcare, public sector, aerospace, transportation, and others.

Architectures and use cases include data integration, pre-processing and replication to the cloud, big and small data edge processing, and analytics, disconnected offline scenarios, very low footprint scenarios with hundreds of locations, scenarios without the high-availability, and others.

Scenarios for Edge Computing with Kafka

Various examples for Kafka deployments at the edge exist. Almost all of these use cases are related to several of the above categories and requirements, such as low hardware footprint, disconnected offline processing, hundred of locations, and hybrid architectures.

I have worked with enterprises across industries and the globe on the following scenarios:

Public Sector: Local administration in each city, smart city projects incl. public transportation, traffic management, integration of various connected car platforms from different carmakers, cybersecurity (including IoT use cases such as capturing and processing camera images)
Transportation / Logistics / Railway / Aviation: Track&Trace, Kafka in the trains for offline and local processing / storage, traveller information (delayed or canceled flight / train / bus), real-time loyalty platforms (class upgrade, lounge access)
Manufacturing (Automotive, Aerospace, Semiconductors, Chemical, Food, and others): IoT aftermarket customer services, OEM in machines and vehicles, embedding into standard software such as ERP or MES systems, cybersecurity, a digital twin of devices/machines/production lines/processes, production line monitoring in factories for predictive maintenance/quality control/production efficiency, operations dashboards and line wellness (on-site for the plant manager, and aggregated global KPIs for executive management), track&trace and geofencing on the shop floor
Energy / Utility / Oil&Gas: Smart home, smart buildings, smart meters, monitoring of remote machines (e.g., for drilling, windmills, mining), pipeline and refinery operations (e.g., predictive failure or anomaly detection)
Telecommunications / Media: OSS real-time monitoring/problem analysis/metrics reporting/root cause analysis/action response of the network devices and infrastructure (routers, switches, other network devices), BSS customer experience and OTT services (mobile app integration for millions of users), 5G edge (e.g., street sensors)
Healthcare: Track&trace in the hospital, remote monitoring, machine sensor analytics
Retailing / Food / Restaurants / Banking: Customer communication, cross-/up-selling, loyalty system, payments in retail stores, perpetual inventory, Point-of-Sale (PoS) integration for (local) payments and (remote) CRM integration, EFTPOS (Electronic funds transfer at point of sale)

A great practical example of edge computing in retailing is fast-food chain Chick-fil-A. They deployed a Kubernetes cluster in each of their 2000 restaurants for real-time analytics at the edge without an internet connection. The hardware is pretty small and provides an Intel quadcore processor with 8 GB RAM and SSD:

Example Architecture: Kafka in Transportation and Logistics

Let’s make this “Kafka at the edge” thing more clear with a specific example. In this case, I use the railway and transportation industry. But this can easily be mapped to your industry and use case.

The following example shows an edge and hybrid solution for railways to improve the customer experience and increase the revenue of the railway company. It leverages offline edge processing for customer communication, replication to the cloud for analytics, and integration with 3rd party interfaces and APIs from partners.

Hybrid Architecture – From Edge to Cloud

Local processing at the edge is happening on the train. But each train also replicates relevant data in real-time to the cloud – if there are internet connectivity and free network resources. If the train is not online, Kafka is handling the backpressure and replicating to the cloud when online again:

Event Streaming in the Train with Kafka

Kafka on the train is NOT just used for real-time messaging and handling backpressure. These are already great reasons for using Kafka at the edge. Still, the even bigger value is created when Kafka is also used for data integration (restaurant, traveler information, loyalty system, etc.) and data processing (up-/cross-selling, real-time delay information, etc.) at the edge. This way, only one single platform is required to solve all the different problems:

Kafka for Disconnected / Offline Scenarios

Trains (and many other edge locations) are offline regularly. For instance, a train drives through a tunnel or reaches an area with no cell connectivity. Local processing is still possible. Business continuity is the key to improve customer experience and increase sales processes – even if the train disconnected from the internet. Passengers can still use the mobile app to see traveler information, buy food in the restaurant, or watch movies stores on the train’s local server. As soon as the train has internet connectivity again, the purchases from passengers are transferred to the loyalty system in the cloud, the latest delay information is consumed from the cloud and stored on the edge Kafka broker in the train, etc. etc. etc.:

Cross-Company Kafka Integration

Data processing does not stop with the hybrid integration between the edge (train) and cloud (CRM, loyalty system, etc.). Different divisions or partner companies need to integrate, too. Instead of using non-scalable, synchronous REST API calls / API Management for partner integration, streaming replication with Kafka-native technologies is a much better, scalable approach:

I hope this story about Kafka at the edge helped you better understand how you can leverage event streaming in your industry and use cases to build an end-to-end streaming infrastructure from edge to cloud.

Infrastructure and Hardware Requirement for Deployment of Kafka at the Edge

Finally, it is important to discuss how to deploy Kafka at the edge. To be clear: Kafka still needs some computing power.

Obviously, this depends on many factors: The hardware vendors and infrastructure you are working with, specific SLAs and HA requirements, and so on. The good news is that Kafka can be deployed in many infrastructures, including bare metal, VMs, containers, Kubernetes, etc. The other good news is that new hardware for computing resources (even for the “edge”) typically has 4, 8, or even 16GB RAM because this is the smallest chip vendors produce these days (for these environments such as small factories, retail stores, etc).

Minimum hardware requirements for running a very small footprint Kafka are a single-core processor and a few 100MB RAM. This already allows decent edge processing with 100+Mb/sec throughput on a single Kafka node (with replication factor = 1). However, real values depend on the number of partitions, message size, network speed, and other characteristics. Don’t expect the same performance and scalability as in the data center or cloud!

Thus, you can deploy a Kafka broker on a Raspberry Pi, but not on some small embedded device! The latter is where the Kafka clients can run.

Check out the “Infrastructure Checklist for Apache Kafka at the Edge” if you plan to go that direction!

“The Confluent Way” to Deploy Kafka at the Edge

From a technical perspective, deployment of Kafka at the edge is the same as in a data center or cloud. However, the environment and requirements are a little bit different as we learned above. Some additional features definitely help with deploying and operating Kafka at the edge.

I work for Confluent. Hence, I provide you “The Confluent Way” of deploying Kafka at the edge in your future projects, including innovative, differentiating features:

Confluent Server includes the Kafka Broker and various enhancements such as self-balancing clusters, Tiered Storage, embedded REST API, server-side schema validation, and much more. It can be deployed as a single node (very lightweight, but no high availability) or cluster (for mission-critical workloads that require high availability at the edge).
In 2020, ZooKeeper is still required as an additional component. However, ZooKeeper removal is coming in 2021! Many Confluent engineers work full-time on removing the (ugly) Zookeeper dependency from the Kafka project, which means it will be possible to run a standalone, single process edge solution powered by Kafka. The whole Kafka community is looking forward to this architectural change. You can already run Kafka at the edge today, of course. It works well with ZooKeeper, too.
Cluster Linking allows all these small Kafka edge sites to connect to a bigger Kafka cluster in a data center or cloud using the Kafka protocol. No need to use additional tools and infrastructure such as Confluent Replicator or MirrorMaker. This significantly reduces development efforts and infrastructure costs.
Monitoring and Proactive Support with Confluent’s toolset. For instance, one Control Center can monitor several different (remote) Kafka clusters. Confluent Telemetry Reporter collects data from edge sites. Monitoring includes the technical infrastructure, but also for applications and end-to-end integration.
Kubernetes is becoming the edge orchestration platform of choice for many edge deployments. For instance, using the MicroK8s Kubernetes distribution. Confluent Operator (Custom Resource Definitions, CRD) provides the capabilities to provision and operate single-broker or cluster deployments at the edge. This includes rolling upgrades, security automation, and much more. Confluent Operator is battle-tested in Confluent Cloud and large deployments of Confluent Platform in data centers, already. Chick-fil-A has a great success story about running Kubernetes in their 2000+ fast food stores to provide disconnected edge services and low latency. Kubernetes is clearly optional. Only use it when it really adds value, and not too complex and resource-hungry. Many edge deployments will probably disqualify Kubernetes as too heavyweight and complex.
Data Integration and data processing is key at the edge. Confluent provides connectors to legacy and modern systems (including PLC4X for PLC, OPC-UA, and IIoT integration, database CDC connectors, MQ and MQTT integration, Data Diode connectors for uni-directional UDP networks in high security or dirty environments, cloud connectors, and much more). Kafka Streams and ksqlDB allow lightweight but powerful stream processing without the need for yet another technology.
Lightweight edge clients (e.g., embedded devices) leverage the C or C++ client APIs for Kafka and the REST Proxy to communicate from any programming language via HTTP(S). This is important for many low-power edge devices with limited computing power where Java or similar “resource-hungry technologies” cannot be deployed.
Optionally, certified, pre-configured OEM hardware is available. You put the box at the edge, connect it to LAN or Wifi, and use it. That’s it. Management and monitoring of the infrastructure occur via the remote software of the hardware vendor. “Feasible video processing on Hivecell” demonstrates edge analytics with a Kafka cluster, a Kafka Streams application, and embedded machine learning models for image recognition in real-time.

Kafka at the Edge (and in Hybrid Architectures) is the New Black

Kafka is a great solution for the edge. It enables deploying the same open, scalable, and reliable technology at the edge, data center, and the cloud. This is relevant across industries. Kafka is used in more and more places where nobody has seen it before. Edge sites include retail stores, restaurants, cell towers, trains, and many others. I hope the various use cases and architectures inspired you a little bit.

What are your experiences at the edge? What are your use cases? Did you or do you plan to use Apache Kafka and its ecosystem? What is your strategy? Let’s connect on LinkedIn and discuss it!

The post Use Cases and Architectures for Kafka at the Edge appeared first on Kai Waehner.

Apache Kafka in Manufacturing and Industry 4.0

Kai Waehner — Thu, 17 Sep 2020 07:00:44 +0000

The Fourth Industrial Revolution (also known as Industry 4.0) is the ongoing automation of traditional manufacturing and industrial practices, using modern smart technology. Event Streaming with Apache Kafka plays a massive role in processing massive volumes of data in real-time in a reliable, scalable, and flexible way using integrating with various legacy and modern data sources and sinks. This blog post covers multiple use cases for Apache Kafka in Industrial IoT and manufacturing across different industries.

10000 Foot View – Event Streaming with Apache Kafka for Manufacturing

Large-scale machine-to-machine communication (M2M) and the internet of things (IoT) are integrated for increased automation, improved communication and self-monitoring, and production of smart machines that can analyze and diagnose issues without the need for human intervention. The term “Industrie 4.0”, shortened to I4.0 or only I4, was coined in Germany, including reference architectures such as RAMI 4.0.

These scenarios in manufacturing require the processing of high volumes of data in real-time at scale. Mission-critical deployments without downtime or data loss are the norm. Integration with edge devices/machines, IoT gateways, enterprise software, and many other systems are essential for success. An open, elastic, and flexible architecture is a must to integrate with the monolithic legacy world, but to be also future-ready for building cloud-native, standards-based, hybrid applications. Support for dominant IoT standards such as OPC-UA and MQTT is obligatory.

Due to these requirements, Apache Kafka comes into play in many new I4.0 projects:

I already discussed the usage of Apache Kafka for manufacturing and Industrial IoT (IIoT) from various perspectives:

The following explores the same idea, but from a different angle with the focus on use cases, business value, and real-world examples from companies such as Audi, BMW, Tesla, Bosch, and others. Of course, there are some overlaps with the above articles, but I still hope to share some additional value and perspectives.

Why Apache Kafka in Manufacturing and Industry 4.0?

Here are a few reasons why Apache Kafka gets more and more adoption in I4 projects:

Real-time messaging (at scale, mission-critical)
Global Kafka (edge, data center, multi-cloud)
Cloud-native (open, flexible, elastic)
Data integration (legacy + modern protocols, applications, communication paradigms)
Data correlation (real-time + historical data, omni-channel)
Real decoupling (not just messaging, but also infinite storage + replayability of events)
Real-time monitoring
Transactional data (MES, ERP, CRM, SCM, …)
Applied machine learning (model training and scoring)
Cybersecurity
Cutting edge technology (3D printing, augmented reality, …)

These are not new characteristics and requirements. Real-time messaging solutions exist for many years. Hundreds of platforms exist for data integration (including ETL and ESB tooling or specific IIoT platforms such as OSIsoft PI). SCADA systems monitor plants for decades in real-time. And so on.

The significant difference is that Kafka combines all the above characteristics in an open, scalable, and flexible infrastructure to operate mission-critical workloads at scale in real-time.

Use Cases for Kafka in Manufacturing

The following list shows different use cases where Kafka is used as a strategic platform for mission-critical event streaming at companies I talked to in the past:

Track&Trace / Production Control / Plant Logistics
Quality Assurance / Yield Management
Predictive Maintenance
Supply Chain Management
Cybersecurity
Servitization leveraging Digital Twins
Additive Manufacturing
Augmented Reality
Many more…

Stay tuned for dedicated blog posts on the above topics focusing on the use case perspective. Please let me know if you want to see any other specific use case or maybe even have implemented something else already by yourself!

Slides and Video Recording

Over the next weeks and months, I plan to write a dedicated blog post per use case and update the above list with a link to it.

For now, here is the high-level presentation covering all of the above use cases with architectures and specific implementation examples.

Slides

Video Recording

What are your experiences with Apache Kafka in Manufacturing and Industry 4.0? Which projects did you or do you plan to implement? What challenges did you face, and how did you or do you plan to solve this? What is your strategy? Let’s connect on LinkedIn and discuss it!

The post Apache Kafka in Manufacturing and Industry 4.0 appeared first on Kai Waehner.

Apache Kafka as Data Historian – an IIoT / Industry 4.0 Real Time Data Lake

Kai Waehner — Tue, 21 Apr 2020 15:54:15 +0000

‘Data Historian‘ is a well-known concept in Industrial IoT (IIoT). It helps to ensure and improve the Overall Equipment Effectiveness (OEE).

Data Historian has a lot in common with other industrial trends like digital twin or data lake: It is ambiguous; there is more than one definition. ‘Process Historian’ or ‘Operational Historian’ are synonyms. ‘Enterprise Historian’ is similar but more on enterprise level (plant or global infrastructure) while the ‘Operational Historian’ is closer to the edge. Historian software is often embedded or used in conjunction with standard DCS and PLC control systems.

The following is inspired by the articles ‘What is a Data Historian?’ from ‘Automated Results’ and ‘Operational Historian vs. Enterprise Historian: What’s the Difference?‘ from ‘Parasyn’ – two expert companies in the Industrial IoT space.

This blog post explores the relation between a data historian and event streaming, and why Apache Kafka might become a part of your ‘Data Historian 4.0’. This also requires the discussion if a ‘Data Historian 4.0’ is operational, enterprise level, or a mixture of both. As you can imagine, there is no single answer to this question…

Kafka as Data Historian != Replacement of other Data Storage, Databases or Data Lake

Just a short note before we get started:

The idea is NOT to use Kafka as single allrounder database and replace your favorite data storage! No worries Check out the following blog post for more thoughts on this discussion:

Use Cases for a Data Historian in Industrial IoT (IIoT)

There are many uses for a Data Historian in different industries. The following is a shameless copy&paste from Automated Results’s article:

Manufacturing site to record instrument readings
- Process (ex. flow rate, valve position, vessel level, temperature, pressure)
- Production Status (ex. machine up/down, downtime reason tracking)
- Performance Monitoring (ex. units/hour, machine utilization vs. machine capacity, scheduled vs. unscheduled outages)
- Product Genealogy (ex. start/end times, material consumption quantity, lot # tracking, product setpoints and actual values)
- Quality Control (ex. quality readings inline or offline in a lab for compliance to specifications)
- Manufacturing Costing (ex. machine and material costs assignable to a production)
Utilities (ex. Coal, Hydro, Nucleur, and Wind power plants, transmission, and distribution)
Data Center to record device performance about the server environment (ex. resource utilization, temperatures, fan speeds), the network infrastructure (ex. router throughput, port status, bandwidth accounting), and applications (ex. health, execution statistics, resource consumption).
Heavy Equipment monitoring (ex. recording of run hours, instrument and equipment readings for predictive maintenance)
Racing (ex. environmental and equipment readings for Sail boats, race cars)
Environmental monitoring (ex. weather, sea level, atmospheric conditions, ground water contamination)

Before we talk about the capabilities of a data historian, let’s first think of why this concept exists…

Overall Equipment Effectiveness (OEE)

Overall Equipment Effectiveness (OEE) “is the gold standard for measuring manufacturing productivity. Simply put – it identifies the percentage of manufacturing time that is truly productive. An OEE score of 100% means you are manufacturing only Good Parts, as fast as possible, with no Stop Time. In the language of OEE that means 100% Quality (only Good Parts), 100% Performance (as fast as possible), and 100% Availability (no Stop Time).”

One of the major goals of OEE programs is to reduce and / or eliminate the most common causes of equipment-based productivity loss in manufacturing (called the ‘Six Big Losses‘):

How does a Data Historian help reducing and / or eliminating the Six Big Losses?

Capabilities of a Data Historian

A Data Historian supports ensuring and improving the OEE:

The data historian contains the following key components to help implementing factory automation and process automation:

Integration: Collect data from PLCs (Programmable Logic Controllers), DCS (Distributed Control System), proprietary protocols, and other systems. Bidirectional communication to send control commands back to the actors of the machines.
Storage: Store data for high availability, re-processing and analytics.
Processing: Correlate data over time. Join information from different systems, sensors, applications and technologies. Some examples: One lot of raw material to another, one continuous production run vs. another, day shift vs. evening or night shift, one plant vs. another.
Access: Monitor a sensor, machine, production line, factory or global infrastructure. Real time alerting, reporting, batch analytics and machine learning / deep learning.
Cloud: Move data to the cloud for aggregation, analytics, backup. A combination of hybrid integration and edge computing is crucial in most use cases, though.
Security: Add authentication, authorization, encryption. At least for external interfaces outside the factory.

These features are often very limited and proprietary in a traditional data historian. Therefore, Kafka might be a good option for parts of this; as we see later in this post.

Before I map these requirements to Kafka-based infrastructure, let’s think about the relation and difference between OT, IT, and Industry 4.0. We need to understand why there is so much demand to change from traditional Data Historians to modern, open, scalable IoT architectures…

The Evolution of IT-OT Convergence

For many decades, automation industry was used to proprietary technologies, monoliths, no or very limited network connectivity, and no or very limited security enforcement (meaning authentication, authorization and encryption, NOT meaning safety which actually is in place and crucial).

The evolution of convergence between IT (i.e. software and information technology) and OT (i.e. factories, machines and industrial automation) is changing this:

Let’s thinks about this convergence in more details from a simplified point of view:

OT => Uptime

OT’s main interest is uptime. Typically 99,999+%. Operations teams are not “that much” interested in fast turnaround times. Their incentive is to keep the production lines running for 10+ years without changes or downtime.

IT => Business Value

IT’s main interest is business value. In the last decade, microservices, DevOps, CI/CD and other agile paradigms created a new way of thinking. Originated at the Silicon Valley tech giants with millions of users and petabytes of data, this is now the new normal in any industry. Yes, even in automation industry (even though you don’t want to update the software of a production line on a daily basis). This is where ‘Industry 4.0’ and similar terms come into play…

Industry 4.0 => OT + IT

Industry 4.0 is converging OT and IT. This digital transformation is asking for new characteristics of hardware and software:

Real time
Scalability
High availability
Decoupling
Cost reduction
Flexibility
Standards-based
Extensibility
Security
Infrastructure-independent
Multi-region / global

In 2020, the above points are normal in IT in many projects. Cloud-native infrastructures, agile DevOps, and CI/CD are used more and more to be competitive and / or innovative.

Unfortunately, we are still in very early stages in OT. At least, we are getting some open standards like OPC-UA for vendor-neutral communication.

Shop Floor, Top Floor, Data Center, Cloud, Mobile, Partner…

The more data is integrated and correlated, the more business value can be achieved. Due to this, convergence of OT and IT goes far beyond shop floor and top floor. The rest of the enterprise IT architecture gets relevant, too. This can be software running in your data center or the public cloud.

Hybrid architectures become more and more common. Integration with 3rd party applications enables quick innovation and differentiation while building sophisticated partnerships with other vendors and service providers.

As you can see, achieving the industry revolution 4.0 requires some new capabilities. This is where Event Streaming and Apache Kafka come into play.

Apache Kafka and Event Streaming in Automation Industry / IIoT

Apache Kafka can help reducing and / or eliminating the Six Big Losses in manufacturing by providing data ingestion, processing, storage and analytics in real time at scale without downtime.

I won’t cover in detail what Apache Kafka is and why people use it a lot in automation industry and Industry 4.0 projects. I covered this in several posts, already:

Please note that Apache Kafka is not the allrounder for every problem. The above posts describe when, why and how to complement it with other IoT platforms and frameworks, and how to combine it with existing legacy data historians, proprietary protocols, DCS, SCADA, MES, ERP, and other industrial and non-industrial systems.

10 Reasons for Event Streaming with Apache Kafka in IIoT Initiatives

Why is Kafka a perfect fit for IIoT projects? Here you go with the top 10 arguments I heard from project teams in the automation industry:

Real Time
Scalable
Cost Reduction
24/7 – Zero downtime, zero data loss
Decoupling – Storage, Domain-driven Design
Data (re-)processing and stateful client applications
Integration – Connectivity to IoT, legacy, big data, everything
Hybrid Architecture – On Premises, multi cloud, edge computing
Fully managed cloud
No vendor locking

Even several well-known vendors in this space use Kafka for their internal projects instead of internal IIoT products. Often, IIoT platforms have OEM’d several different legacy platforms for middleware, analytics and other components. This embedded zoo of technologies does not solve the requirements of IIoT projects in the year 2020.

Architecture: Kafka as Data Historian 4.0

The following architecture shows one possible option to build a data historian with Kafka:

This is just one sample architecture. Obviously, individual components can be added or removed. For example, existing Data Historians, HMI (Human-Machine-Interface) or SCADA systems, or another IoT platform or stream processing engine can complement or replace existing components.

Remember: Extensibility and flexibility are two key pillars of a successful IIoT project. Unfortunately, many IoT platforms miss these characteristics, similar like they often don’t provide scalability, elasticity or high throughput.

I have also seen a few companies building an enterprise data historian using “traditional data lake software stack”: Kafka, Hadoop, Spark, NiFi, Hive and various other data lake technologies. Here, Kafka was just the ingestion layer into HDFS or S3. While it is still valid to build a data lake with this architecture and these technologies, the three main drawbacks are:

The central data storage is data at rest instead real time
A zoo of many complex technologies
Not applicable for edge deployments due to its zoo of technologies and complex infrastructure and architecture

I won’t go into more detail here; there is always trade-offs for both approaches. The blog post ‘Streaming Machine Learning with Tiered Storage and Without a Data Lake‘ discusses the pros and cons of a simplified architectures without a data lake.

With this in mind, let’s now go back to the key pillars of a Data Historian in the Industrial IoT and see how these fit to the above architecture.

Data Integration / Data Ingestion

When talking about a data historian or other IoT architectures, some vendors and consultants call this component “data ingestion”. I think this is really unfortunate for three reasons:

Data Ingestion often includes many more tasks than just sending data from the data source to the data sink. Think about filtering, enrichment-on-the-fly or type conversion. These things can happen in a separate process, but also as part of the “ingestion”.
Data Ingestion means sending data from A to B. But the challenge is not just the data movement, but also the connectivity. Connectors provide the capability to integrate with a data source without complex coding; guaranteeing reliability, fail-over handling and correct event order.
Data Ingestion means you send data from the data source to the data sink. However, in many cases the real value is created when your IoT infrastructure establishes bi-directional integration and communication; not just for analytics and monitoring, but commands command & control, too.

Data Sources

A data historian often has various industrial data sources, such as PLCs, DCS (Distributed Control System), SCADA, MES, ERP, and more. Apache PLC4X, MQTT, or dedicated IIoT Platforms can be used for data ingestion and integration.

Legacy vs. Standards for Industrial Integration

Legacy integration is still the main focus in 2020, unfortunately: Files, proprietary formats, old database technologies, etc.

OPC-UA is one option for standardized integration. This is only possible for modern or enhanced old production lines. Some machine support other interfaces like MQTT or REST Web Services.

No matter if you integrate via proprietary protocols or open standards: The integration typically happens on different levels. While OPC-UA, MQTT or REST interfaces provides critical information, some customers also want to directly integrate raw Syslog streams from machine sensors. This is much higher throughput (of less important data). Some customers also want to directly integrate with their SCADA monitoring systems.

Integration with the Rest of the Enterprise

While various industrial data sources need be integrated, this is still only half the story: The real added value is created when the data historian also integrates with the rest of the enterprise beyond IIoT machine and applications. For instance, CRM, Data Lake, Analytics tools, Machine Learning solutions, etc.

Plus hybrid integration and bi-directional communication between factories in different regions and a central cluster (edge <–> data center / cloud). Local edge processing in real time plus remote replication for aggregation / analytics is one of the most common hybrid scenarios.

Kafka Connect / Kafka Clients / REST Proxy for Data Integration

Kafka Connect is a Kafka-native integration solution providing connectors for data source and data sinks. This includes connectivity to legacy systems, industrial interfaces like MQTT and modern technologies like big data analytics or cloud services. Most IIoT platforms provide their own Kafka connectors, too.

If there is no connector available, you can easily connect directly via Kafka Client APIs in almost every programming language, including Java, Scala, C++, C, Go, Python, JavaScript, and more. Confluent REST Proxy is available for bidirectional HTTP(S) communication to produce and consume messages.

Data Storage

Kafka is not just a messaging system. The core of Kafka is a distributed commit log to storage events as long as you want or need to. The blog post “Is Kafka a Database?” covers all the details to understand when Kafka is the right storage option and when it is not.

Tiered Storage for Reduce Cost, Infinite Storage and Elastic Scalability

In summary, Kafka can store data forever. Tiered Storage enables separation of storage and processing by using a remote object store (like AWS S3) for infinite storage, low cost and elastic scalability / operations.

Stateful Kafka Client Applications

Kafka is not just the server side. A Kafka application is (or should be) a distributed application with two or more instances to provide high availability and elastic scalability.

Kafka applications can be stateless (e.g. for Streaming ETL) or stateful. The latter is used to build materialized views of data (e.g. aggregations) or business applications (e.g. a predictive maintenance real time app). These clients store data in the client (either in memory or on disk) for real time processing. Zero-data loss and guaranteed processing order are still ensured because Kafka applications leverage the Kafka log as “backup”.

Data Processing

Data Processing adds the real value to your data historian. Validation, enrichment, aggregation and correlation of different data streams from various data sources enable insightful monitoring, proactive alerting and predictive actions.

Real time Supply Chain Management

Supply Chain Management (SCM) with Just in Time (JIT) and Just in Sequence (JIS) inventory strategies is a great example: Correlate the data from the production line, MES, ERP and other backend systems like CRM. Apply the analytic model trained in the data lake for real time scoring to make the right prediction about ordering parts from a partner.

I did a webinar with Expero recently to discuss the benefits of “Apache Kafka and Machine Learning for Real Time Supply Chain Optimization in IIoT“.

Stream Processing with Kafka Streams / ksqlDB

Kafka Streams (Java, part of Apache Kafka) / ksqlDB (SQL, Confluent)) are two open source projects providing Kafka-native stream processing at scale in real time.

These frameworks can be used for “simple” Streaming ETL like filtering or enrichments. However, powerful aggregations of different streams can be joined to build stateful applications. You can also add custom business logic to implement your own business rules or apply an analytic model for real time scoring.

Check out these examples on Github to see how you can implement scalable Machine Learning applications for real time predictions at scale: Kafka Streams examples with TensorFlow and KSQL with H2O.ai.

Data Access

A modern data historian is more than HMI and SCADA. Industry 4.0 with more and more streaming data requires real time monitoring, alerting and analytics at scale with an open architecture and ecosystem.

Human-Machine Interface (HMI) and Supervisory Control and Data Acquisition (SCADA)

A Human-Machine Interface (HMI) is a user interface or dashboard that connects a person to a machine, system, or device in the context of an industrial process. A HMI allows to

visually display data
track production time, trends, and tags
oversee key performance indicators (KPI)
monitor machine inputs and outputs
and more

Supervisory Control and Data Acquisition (SCADA) systems collect and record information or connect to databases to monitor and control system operation.

HMI and SCADA solutions are typically proprietary solutions; often with monolith, inflexible and non-scalable characteristics.

Kafka as Next Generation HMI, Monitoring and Analytics

Kafka can be used to build new “HMIs” to do real time monitoring, alerting and analytics. This is not a replacement of existing technologies and use cases. HMI and SCADA worked well in the last decades for what they were built. Kafka should complement existing HMI and SCADA solutions to process big data sets and implement innovative new use cases!

Analytics, Business Intelligence and Machine Learning in Real Time at Scale

HMI and SCADA systems are limited, proprietary monoliths. Kafka enables the combination of your industrial infrastructure with modern technologies for powerful streaming analytics (Streaming Push vs. Pull Queries), traditional SQL-native Business Intelligence (with Tableau, Qlik, Power BI or similar tools), and analytics (with tools like TensorFlow or Cloud ML Services).

Kafka Connect is used to integrate with your favorite database or analytics tool. For some use cases, you can simplify the architecture and “just” use Kafka-native technologies like ksqlDB with its Pull Queries.

Data Historian in the Cloud

The cloud is here to stay. It has huge advantages for some scenarios. However, most companies are very cautious linking internal processes and IT systems to the cloud. Often, it is even hard to just get access to a computer at the shop floor or top floor via TeamViewer to adjust problems.

The rule of thumb in Automation Industry: No external access to internal processes! Companies ask themselves: Do I want to use a commercial 3rd party cloud to store and process data our proprietary and valuable data? Do we want to trust our IP to other people in cloud on the other side of the world?

Edge and Hybrid IoT Architectures

There is a lot of trade-offs and cloud has many benefits, too. In reality, edge and hybrid architectures are the new black in the very conservative Industrial IoT market. This totally makes sense as factories and production lines will stay on premise anyway. It does not make sense to send all the big data sets to the cloud. This has huge implications on cost, latency and security.

Architecture Patterns for Distributed, Hybrid, Edge and Global Apache Kafka Deployments

Kafka is deployed in various architectures depending on the scenario and use case. Edge deployments are as common as cloud infrastructures and hybrid bidirectional replication scenarios. Check out this blog post for more details; covering several IoT architectures:

Security

No matter if you decide to move data to the cloud or not: Security is super important for Industry 4.0 initiatives.

While the use cases for a data historian are great, security is key for success! Authentication, Authorization, encryption, RBAC (Role Based Access Control), Audit logs, Governance (Schema Enforcement, Data Catalog, Tagging, Data Lineage, …) etc. are required.

Many shop floors don’t have any security at all (and therefore no internet / remote connection). As machines are built to stay for 20, 30 or 40 years, it is not easy to adjust existing ones. In reality, the next years will bring factories a mix of old proprietary non-connected legacy machines and modern internet-capable new machines.

From the outside (i.e. a monitoring application or even another data center), you will probably never get access to the insecure legacy machine. The Data Historian deployed in the factory can be used as termination point for these legacy machines. Similar to SSL Termination in an internet proxy. Isolating insecure legacy machines from the rest with Kafka is a common pattern I see more and more.

Kafka behaves as gateway from a cybersecurity stand point between IT and OT systems yet providing essential information to any users who may contribute to operating, designing or managing the business more effectively. Of course, security products like a security gateway complements the data streaming of Kafka.

Secure End-to-End Communication with Kafka

Kafka supports open standards such as SSL, SASL, Kerberos, OAuth and so on. Authentication, Authorization, Encryption, RBAC, Audit logs and Governance can to be configured / implemented.

This is normality in most other industries today. In the automation industry, Kafka and its ecosystem can provide a secure environment for communication between different systems and for data processing. This includes edge computing, but also the remote communication when replacing data between the edge and another data center or cloud.

Kafka as Data Historian to Improve OEE and Reduce / Eliminate the Sig Big Losses

Continuous real time data ingestion, processing and monitoring 24/7 at scale is a key requirement for successful Industry 4.0 initiatives. Event Streaming with Apache Kafka and its ecosystem brings huge value to implement these modern IoT architectures.

This blog post explored how Kafka can be used as a component of a Data Historian to improve the OEE and reduce / eliminate the most common causes of equipment-based productivity loss in manufacturing (aka Six Big Losses).

Kafka is not an allrounder. Understand it’s added value and differentiators in IoT projects compared to other technologies; and combine it the right way with your existing and new Industrial IoT infrastructure.

The post Apache Kafka as Data Historian – an IIoT / Industry 4.0 Real Time Data Lake appeared first on Kai Waehner.

IoT Architectures for Digital Twin with Apache Kafka

Kai Waehner — Wed, 25 Mar 2020 15:47:30 +0000

A digital twin is a virtual representation of something else. This can be a physical thing, process or service. This post covers the benefits and IoT architectures of a Digital Twin in various industries and its relation to Apache Kafka, IoT frameworks and Machine Learning. Kafka is often used as central event streaming platform to build a scalable and reliable digital twin and digital thread for real time streaming sensor data.

I already blogged about this topic recently in detail: Apache Kafka as Digital Twin for Open, Scalable, Reliable Industrial IoT (IIoT). Hence that post covers the relation to Event Streaming and why people choose Apache Kafka to build an open, scalable and reliable digital twin infrastructure.

This article here extends the discussion about building an open and scalable digital twin infrastructure:

Digital Twin vs. Digital Thread
Relation between Event Streaming, Digital Twin and AI / Machine Learning
IoT Architectures for a Digital Twin with Apache Kafka and other IoT Platforms
Extensive slide deck and video recording

Key Take-Aways for Building a Digital Twin

Key Take-Aways:

A digital twin merges the real world (often physical things) and the digital world
Apache Kafka enables an open, scalable and reliable infrastructure for a Digital Twin
Event Streaming complements IoT platforms and other backend applications / databases.
Machine Learning (ML) and statistical models are used in most digital twin architectures to do simulations, predictions and recommendations.

Digital Thread vs. Digital Twin

The term ‘Digital Twin’ usually means the copy of a single asset. In the real world, many digital twins exist. The term ‘Digital Thread’ spans the entire life cycle of one or more digital twins. Eurostep has a great graphic explaining this:

When we talk about ‘Digital Twin’ use cases, we almost always mean a ‘Digital Thread’.

Honestly, the same is true in my material. Both terms overlap, but ‘Digital Twin’ is the “agreed buzzword”. It is important to understand the relation and definition of both terms, though.

Use Cases for Digital Twin and Digital Thread

Use cases exist in many industries. Think about some examples:

Downtime reduction
Inventory management
Fleet management
What-if simulations
Operational planning
Servitization
Product development
Healthcare
Customer experience

The slides and lecture (Youtube video) go into more detail discussing four use cases from different industries:

Virtual Singapore: A Digital Twin of the Smart City
Smart Infrastructure: Digital Solutions for Entire Building Lifecycle
Connected Car Infrastructure
Twinning the Human Body to Enhance Medical Care

The key message here is that digital twins are not just for automation industry. Instead, many industries and projects can add business value and innovation by building a digital twin.

Relation between Event Streaming, Digital Twin and AI / Machine Learning

Digital Twin respectively Digital Thread and AI / Machine Learning (ML) are complementary concepts. You need to apply ML to do accurate predictions using a digital twin.

Digital Twin and AI

Melda Ulusoy from MathWorks shows in a Youtube video how different Digital Twin implementations leverage statistical methods and analytic models:

Examples include physics-based modeling to simulate what-if scenarios and data-driven modeling to estimate the RUL (Remaining Useful Life).

Digital Twin and Machine Learning both have the following in common:

Continuous learning, monitoring and acting
(Good) data is key for success
The more data the better
Real time, scalability and reliability are key requirements

Digital Twin, Machine Learning and Event Streaming with Apache Kafka

Real time, scalability and reliability are key requirements to build a digital twin infrastructure. This makes clear how Event Streaming and Apache Kafka fit into this discussion. I won’t cover what Kafka is or relation between Kafka and Machine Learning in detail here because there are so many other blog posts and videos about it. To recap, let’s take a look at a common Kafka ML architecture providing openness, real time processing, scalability and reliability for model training, deployment / scoring and monitoring:

To get more details about Kafka + Machine learning, start with the blog post “How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka” and google for slides, demos, videos and more blog posts.

Characteristics of Digital Twin Technology

The following five characteristics describe common Digital Twin implementations:

Connectivity
- Physical assets, enterprise software, customers
- Bidirectional communication to ingest, command and control
Homogenization
- Decoupling and standardization
- Virtualization of information
- Shared with multiple agents, unconstrained by physical location or time
- Lower cost and easier testing, development and predictions
Reprogrammable and smart
- Adjust and improve characteristics and develop new version of a product
Digital traces
- Go back in time and analyse historical events to diagnose problems
Modularity
- Design and customization of products and production modules
- Tweak modules of models and machines

There are plenty of options to implement these characteristics. Let’s take a look at some IoT platforms and how Event Streaming and Apache Kafka fit into the discussion.

IoT Platforms, Frameworks, Standards and Cloud Services

Plenty of IoT solutions are available on the market. IoT Analytics Research talks about over 600 IoT Platforms in 2019. All have their “right to exist” In most cases, some of these tools are combined with each other. There is no need or good reason to choose just one single solution.

Let’s take a quick look at some offerings and their trade-offs.

Proprietary IoT Platforms

Sophisticated integration for related IIoT protocols (like Siemens S7, Modbus, etc.) and standards (like OPC-UA)
Not a single product (plenty of acquisitions, OEMs and different code bases are typically the foundation)
Typically very expensive
Proprietary (just open interfaces)
Often limited scalability
Examples: Siemens MindSphere, Cisco Kinetic, GE Digital and Predix

IoT Offerings from Cloud Providers

Sophisticated tools for IoT management (devices, shadowing, …)
Good integration with other cloud services (storage, analytics, …)
Vendor lock-in
No key focus on hybrid and edge (but some on premises products)
Limited scalability
Often high cost (beyond ’hello world’)
Examples: All major cloud providers have IoT services, including AWS, GCP, Azure and Alibaba

Standards-based / Open Source IoT Platforms

Open and standards-based (e.g. MQTT)
Open source / open core business model
Infrastructure-independent
Different vendors contribute and compete behind the core technologies (competition means innovation)
Sometimes less mature or non-existent connectivity (especially to legacy and proprietary protocols)
Examples: Open source frameworks like Eclipse IoT, Apache PLC4X or Node-RED and standards like MQTT and related vendors like HiveMQ
Trade-off: Solid offering for one standard (e.g. HiveMQ for MQTT) or diversity but not for mission-critical scale (e.g. Node-RED)

IoT Architectures for a Digital Twin / Digital Thread with Apache Kafka and other IoT Platforms

So, we learned that there are hundreds of IoT solutions available. Consequently, how does Apache Kafka fit into this discussion?

As discussed in the other blog post and in the below slides / video recording: There is huge demand for an open, scalable and reliable infrastructure for Digital Twins. This is where Kafka comes into play to provide a mission-critical event streaming platform for real time messaging, integration and processing.

Kafka and the 5 Characteristics of a Digital Twin

Let’s take a look at a few architectures in the following. Keep in mind the five characteristics of Digital Twins discussed above and its relation to Kafka:

Connectivity – Kafka Connect provides connectivity as scale in real time to IoT interfaces, big data solutions and cloud services. The Kafka ecosystem is complementary, NOT competitive to other Middleware and IoT Platforms.
Homogenization – Real decoupling between clients (i.e. producers and consumers) is one of the key strengths of Kafka. Schema management and enforcement leveraging different technologies (JSON Schema, Avro, Profobuf, etc.) enables data awareness and standardization.
Reprogrammable and smart – Kafka is the de facto standard for microservice architectures for exactly this reason: Separation of concerns and domain-driven design (DDD). Deploy new decoupled applications and do versioning, A/B testing, canarying.
Digital traces – Kafka is a distributed commit log. Events are appended, stored as long as you want (potentially forever with retention time = -1) and immutable. Seriously, what other technology could be used better to build a digital trace for a digital twin?
Modularity – The Kafka infrastructure itself is modular and scalable. This includes components like Kafka brokers, Connect, Schema Registry, REST Proxy and client applications in different languages like Java, Scala, Python, Go, .NET, C++ and others. With this modularity, you can easily build the right Digital Twin architecture your your edge, hybrid or global scenarios and also combine the Kafka components with any other IoT solutions.

Each of the following IoT architectures for a Digital Twin has its pros and cons. Depending on your overall enterprise architecture, project situation and many other aspects, pick and choose the right one:

Scenario 1: Digital Twin Monolith

An IoT Platform is good enough for integration and building the digital twin. No need to use another database or integrate with the rest of the enterprise.

Scenario 2: Digital Twin as External Database

An IoT Platform is used for integration with the IoT endpoints. The Digital Twin data is stored in an external database. This can be something like MongoDB, Elastic, InfluxDB or a Cloud Storage. The database could be used just for storage and for additional tasks like processing, dashboards and analytics.

A combination with yet another product is also very common. For instance, a Business Intelligence (BI) tool like Tableau, Qlik or Power BI can use the SQL interface of a database for interactive queries and reports.

Scenario 3: Kafka as Backbone for the Digital Twin and the Rest of the Enterprise

The IoT Platform is used for integration with the IoT endpoints. Kafka is the central event streaming platform to provide decoupling between the other components. As a result, the central layer is open, scalable and reliable. The database is used for the digital twin (storage, dashboards, analytics). Other applications also consume parts of the data from Kafka (some real time, some batch, some request-response communication).

Scenario 4: Kafka as IoT Platform

Kafka is the central event streaming platform to provide a mission-critical real time infrastructure and the integration layer to the IoT endpoints and other applications. The digital twin is implemented in its own solution. In this example, it does not use a database like in the examples above, but a Cloud IoT Service like Azure Digital Twins.

Scenario 5: Kafka as IoT Platform

Kafka is used to implement the digital twin. No other components or databases are involved. Other consumers consume the raw data and the digital twin data.

Like all the other architectures, this has pros and cons. The main question in this approach is if Kafka can really replace a database and how you can query the data. First if all, Kafka can be used as database (check out the detailed discussion in the linked blog post), but it will not replace other databases like Oracle, MongoDB or Elasticsearch.

Having said this, I have already seen several deployments of Kafka for Digital Twin infrastructures in automation, aviation, and even banking industry.

Especially with “Tiered Storage” in mind (a Kafka feature currently discussed in a KIP-405 and already implemented by Confluent), Kafka gets more and more powerful for long-term storage.

Slides and Video Recording – IoT Architectures for a Digital Twin with Apache Kafka

This section provides a slide deck and video recording to discuss Digital Twin use cases, technologies and architectures in much more detail.

The agenda for the deck and lecture:

Digital Twin – Merging the Physical and the Digital World
Real World Challenges
IoT Platforms
Apache Kafka as Event Streaming Solution for IoT
Spoilt for Choice for a Digital Twin
Global IoT Architectures
A Digital Twin for 100000 Connected Cars

Slides

Here is the long version of the slides (with more content than the slides used for the video recording):

Video Recording

The video recording covers a “lightweight version” of the above slides:

The post IoT Architectures for Digital Twin with Apache Kafka appeared first on Kai Waehner.

Apache Kafka is the New Black at the Edge in Industrial IoT, Logistics and Retailing

Kai Waehner — Wed, 01 Jan 2020 10:45:28 +0000

The following question comes up almost every week in conversations with customers: Can and should I deploy Apache Kafka at the edge? Or should I just deploy Kafka in a “real” data center or public cloud infrastructure? I am glad that people ask because it is a valid question in various industries, including manufacturing, automation industry, aviation, logistics, and retailing. This blog post explains why Apache Kafka is the New Black at the Edge in Internet of Things (IoT) projects. I cover use cases and different architectures. The last section discusses how Kafka as event streaming platform complements other IoT frameworks and products at the edge for data integration and edge processing in real time at scale.

Multiple Kafka Clusters Became the Norm, not an Exception!

Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. A Kafka deployment at the edge can be an independent project. However, in most cases, Kafka at the edge is part of an overall Kafka architecture.

Many reasons exist to create more than just one Kafka cluster in your organization:

Independent Projects
Hybrid integration
Edge computing
Aggregation
Migration
Disaster recovery
Global infrastructure (regional or even cross continent communication)
Cross-company communication

This blog post focuses on the deployment of Apache Kafka at the edge. The relation to all the other kinds of Kafka architectures will be discussed in February 2020 at DevNexus in Atlanta. There, I will explain in detail the “architecture patterns for distributed, hybrid and global Apache Kafka deployments“. I will share the slides and a video recording in another blog post after the conference.

Before we think about running Kafka at the edge, we need to define the term “edge”.

What is “The Edge” or “Edge Computing”?

Wikipedia says “edge computing is a distributed computing paradigm which brings computation and data storage closer to the location where it is needed, to improve response times and save bandwidth”. Other benefits include add cost reduction, flexible architecture and separation of concerns.

Apache Kafka at the Edge

Different options exist to discuss “Kafka for edge computing”:

Only Clients at the Edge: Kafka Clients running at the edge. Kafka Cluster deployed in a Data Center or Public Cloud environment.
Everything at the Edge: Kafka Cluster and Kafka Clients (e.g. sensors in the factory) deployed at the edge.
Edge and Beyond: Kafka Cluster deployed at the edge. Kafka Clients (e.g. smartphones in the region) running close to the edge.

Therefore, the range of “Kafka at the Edge” is gigantic:

Edge in an IIoT shop floor could be a Kafka Client written in C and deployed to a microcontroller in a sensor. Such a sensor typically has just a few kilobyte of memory. It lives for a defined time span before it dies and gets replaced.
Edge in telco business could be a full distributed Kafka cluster running on StarlingX. This is an open source private cloud infrastructure software stack based on Kubernetes for the edge used by the most demanding applications in industrial IoT, telecom, video delivery and other ultra-low latency use cases. The hardware requirements for such an edge deployment are higher than what some teams in a traditional bank or insurance company get after waiting six months for management approvals.
In most cases, edge is somewhere in the middle of the two extreme scenarios described above.

In most scenarios, “Kafka at the edge” means the deployment of a Kafka cluster on site at the edge. The Kafka Clients are either running on site or close to the site (“close” could be several miles away in some cases). Sometimes, the edge site is offline and disconnected from the cloud regularly. This is my definition for this blog post, too. Just make sure to define what you mean with the term “edge” in your conversations with colleagues and customers.

Now let’s think about use cases for running Kafka at the edge.

Use Cases for Kafka at the Edge

Use cases for Kafka deployments at the edge exist in various industries. No matter if “your things” are smartphones, machines in the shop floor, sensors, cars, robots, cash point machines, or anything else.

Let’s take a look at a few examples of what I have seen in 2019 in many different enterprises:

Industrial IoT (IIoT): Edge integration and processing in real time are key for success in modern IoT architectures. Various use cases exist for Industry 4.0, including predictive maintenance, quality insurance, process optimization and cyber security. Building a Digital Twin with Kafka is one of the most frequent scenarios and a perfect fit for many use cases.
Retailing: The digital transformation enables many new, innovative use cases, including customer 360 experiences, cross-selling and collaboration with partner vendors. This is relevant for any kind of retailer. No matter if you think about retail stores like Walmart, coffee shops like Starbucks, or cutting-edge shops like the Amazon Go store.
Logistics: Correlation of data in real time at scale is a game changer for any logistics scenario: Track and trace end-to-end for package delivery, communication between delivery drones (or humans using cars) and the local self-service collection booths, accelerated processing in the logistics center, coordination and planning of car sharing / ride sharing, traffic light management in a smart city, and so on.

Kafka at the Edge – High Level Architecture

No matter which use case you want to implement: The architecture for Apache Kafka at the edge typically looks more or less like the following (on a very high level):

Why and How does Kafka help at the Edge?

The ideas and use cases discussed above are nothing new. But think about your favorite consumer application, retailer or coffee shop: How many enterprises have already rolled out real time applications for context-specific push messages, customer experience or other services at scale in a reliable way?

Honestly, not many. Only some tech companies like Uber and Lyft did a great job. Though, these companies could start on the green field a few years ago. Not really surprisingly, all the tech companies use the event streaming platform Kafka as the heart of their real time infrastructure.

But the situation gets better and better these days. More and more traditional enterprises already rolled out an event streaming platform in their data centers or in the cloud to process data in real time at scale to build innovative applications.

Challenges at the Edge

However, enterprises often have challenges to bring these innovative real time applications into scenarios where data is distributed in local sites, like factories, retail stores, coffee shops, etc.

Challenges include:

Integration with all the hardware, machines, devices at the edge is hard due to bad network and many other limitations
Processing at scale and in real time is mandatory for many use cases. Thus, processing should happen on site, not in a remote data center or cloud (often hundreds of miles away).
Various technologies and protocols have to be integrated at the edge. Often, legacy and proprietary protocols need to communicate with modern big data tools on the other side of the tunnel.
Limited hardware resources and human expertise at the edge. IT experts cannot go on site to every location. Teams cannot spend a lot of money on hardware and operations continuously in every site.

Data has to be stored and processed locally on site in real time at scale. Plus, data needs to be replicated to the data center or cloud to do further processing and analytics with the aggregated data from different sites. In best case, communication is bidirectional so that smaller local sites can be controlled from one single point by sending commands and control events back to the local sites.

The Same Infrastructure and Technology at the Edge and in the Data Center / Cloud

The implementation of edge use cases is a lot of efforts. Rolling out the solution to all the sites is another big challenge. Ideally, enterprises can leverage the same infrastructure, technologies and applications BOTH on site at the edge in factories or stores AND in the big data center or cloud.

This is why Apache Kafka comes into play in more and more edge scenarios: Leverage the same infrastructure, technologies and applications everywhere. Real time stream processing, reliability and flexible scalability are core features of Kafka. This allows large scale scenarios in the cloud and small or medium scale scenarios at the edge.

Let’s take a loot at different options for deploying Kafka at the edge.

Kafka Architectures for Edge Computing

The key question you have to ask yourself: Do I need high availability at the edge?

Edge Computing does not always require high availability. If it does, then you deploy a traditional Kafka cluster. If it does not, then you choose the simple and less costly option with just one single Kafka broker at the edge. A hardware appliance could make this even easier for roll out in tens or hundreds of sites.

The following example shows three edge sites. One Kafka cluster is deployed at each site including additional Kafka components:

Resilient Deployment at the Edge with 3+ Kafka Brokers

Kafka and its ecosystem are build for high availability and zero downtime (even if individual nodes fail). Usually, you deploy a distributed system. Kafka and Zookeeper required at least three nodes. Other components require at least two nodes to guarantee robust operations without data loss:

Check out the Apache Kafka and Confluent Platform Reference Architecture for more details about deployment best practices. These best practices do not change at the edge. Having said this, the load and throughput is often lower at the edge. Therefore, less memory and smaller disks are often sufficient. It all depends on your SLAs, requirements and use cases.

Non-Resilient Deployment at the Edge with One Single Kafka Broker

The demand to deploy a “lightweight Kafka cluster” at the edge and synchronize / replicate data with a bigger central Kafka cluster comes up more and more. Due to hardware limitations or lower SLAs regarding high availability, the deployment of just one single Kafka broker (plus one Zookeeper) at the edge is totally fine. You can even deploy the whole Kafka environment on one single server:

This creates some obvious drawbacks: No replication, downtime in case of failure of the node or network, risk of data loss. However, a single-node Kafka deployment works and still provides many benefits of the Kafka fundamentals:

Decoupling between producers and consumers
Handling of back-pressure
High volume real time processing (even one broker has a lot of power)
Storage on disks
Ability to reprocess data
All Kafka-native components available (Kafka Connect for integration, Kafka Streams or ksqlDB for stream processing, Schema Registry for governance).

TL;DR: A Single-Broker Kafka deployment works well. Just be aware of the drawbacks and doublecheck if this is okay for your SLAs and requirements.

ZooKeeper Removal – A Great Help for Kafka at the Edge

Kafka is a powerful distributed infrastructure. Hence, Kafka is not the easiest piece of infrastructure to operate. A key reason is the dependency to ZooKeeper (the same is true for many other distributed systems like Hadoop or Spark). I won’t go into details about the challenges and problems with ZooKeeper here. There is enough information on the web. TL;DR: ZooKeeper makes Kafka harder to operate and less scalable. Many P1 and P2 support tickets are not about Kafka but about ZooKeeper (not because ZooKeeper is unstable, but because it is hard to operate).

The good news: Kafka will get rid of ZooKeeper in the next few releases. Check out “KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum” for more details. KIP-500 will make Kafka much more lightweight, scalable, elastic and simple to operate. This is relevant BOTH at extreme scale in the cloud AND in small single-broker deployments at the edge.

As most IoT projects do not plan just one year ahead, but being rolled out to different sites over five and more years, remember that Kafka will be much easier and more lightweight without ZooKeeper in ~1 year from now.

Kafka as Gateway between Edge Devices and Cloud

A Kafka gateway is an additional optional component in the architecture. In some configurations you might want the edges devices to communicate with a local gateway on premise.

As example, in an industry plant, you might see multiple machines or production lines as edge devices. They integrate with their own Kafka cluster and send this data to a gateway Kafka cluster. At the gateway Kafka cluster, you can do some analytics directly on premise, and maybe filter the data or transform it, before sending it to a large remote aggregation Kafka cluster:

In the above example, we have two independent factories. Both use non-resilient single-broker Kafka deployments for local processing. A resilient gateway Kafka cluster with three Kafka brokers aggregates and processes the data locally in the factory. Only the important and pre-processed data is forwarded to a remote Kafka cluster; in this case Confluent Cloud. The Kafka cluster in the cloud aggregates the data from different factories to integrate with other business applications or analytics tools.

Kafka at the Edge as OEM or in Hardware Appliances

At the edge, enterprises do not have the capabilities and possibilities like in their data centers or in the public cloud. Installing hardware is a huge challenge. Operations is even harder. A standardized way of installing Kafka components at the edge reduces the efforts and risks.

Tens of hardware vendors are available to build your own OEM or hardware appliance. Or you just pick a ready-to-ship box and install all required software components with some DevOps tools via remote management.

Plenty of options exist to ease installation and operations of a Kafka cluster at the edge. Hivecell is one interesting example which you could use. “Hivecell enables companies to deploy and maintain software at the edge without an army of technicians” is the slogan of this startup. You just ship one or more boxes to the site, connect it to the local WiFi and everything else can be done remotely. The Hivecell boxes be shipped in a pre-configured way. For instance, the hardware could already have installed Kubernetes, the Kafka ecosystem and other business applications. Tooling like Confluent Operator can run on the boxes to ease and automate operations of the Kafka environment at the edge. This way, remote management is typically sufficient.

Communication, Connectivity, Integration, Data Processing

As already shown in the diagrams above, a Kafka environment includes more than just the Kafka broker and mandatory Zookeeper. Communication, connectivity, integration and data processing are important components in a Kafka infrastructure, no matter if in the cloud, on premise or at the edge:

Communication between Kafka Brokers and Kafka Clients:

Edge-Only: Device -> Kafka at the edge -> Device
Edge-to-Remote: Device -> Kafka at the edge -> Replication -> Kafka (Data Center / Cloud) -> Analytics / Real Time Processing
Bidirectional: A combination of 1) and 2) for bidirectional communication between the edge and remote Kafka cluster

Kafka-native Connectivity, Integration, Data Processing:

Kafka-native components leverage Kafka under the hood. This way, you have to manage just one platform for communication, integration and processing of data in real time at scale:

Kafka Connect: MQTT, OPC-UA, FTP, CSV, PLC4X (Legacy and proprietary IIoT protocols and PLCs like Modbus, Siemens S7, Beckhoff, Allen Bradley), etc.
Mirrormaker 2 / Confluent Replicator: Uni- or Bidirectional replication between two Kafka Clusters
Kafka Clients (Producers / Consumers): Java, Python, C++, C, Go, Javascript, …
Data Processing: Stream Processing (stateless streaming ETL or stateful applications) with Kafka Streams or ksqlDB
Proxies: REST Proxy for HTTP(S) communication, MQTT Proxy for MQTT integration
Schema Registry: Governance and schema enforcement.

Make sure to plan the whole infrastructure from the beginning. Especially at the edge where you typically have limited hardware resources… Doublecheck if you really need another database or external processing framework! Maybe the Kafka stack is good enough for your needs?

Kafka is NOT an IoT framework

Kafka can be used in very different scenarios. It is best for event streaming at scale and building a reliable and open infrastructure to integrate IoT at the edge and the rest of the enterprise.

However, Kafka is not the silver bullet for every problem. A specific IoT product might be the better choice for integration and processing of IoT interfaces and shop floors. This depends on the specific requirements, existing ecosystem and already used software. Complexity and cost of solutions need to be evaluated. “Build vs. buy” is always a valid question. Often, the best choice and solution is a mix of building an open, flexible, self-built central streaming infrastructure and buying COTS for specific edge integration and processing scenarios.

Hybrid Architecture – When and Why to use Additional IoT Frameworks and Products?

You should always ask yourself: Is Kafka alone sufficient? If yes, why use an additional framework or product? End-to-End integration gets harder with each additional technology. 24/7 deployments, zero data loss, and real time processing without latency spikes are requirements Kafka works best for.

If Kafka is not sufficient alone, combine it with another IoT framework or solution:

Sometimes, the shop floor connects to an IoT solution. The IoT solution is used as gateway or proxy. This can be a broad, powerful (but also more complex and expensive) solution like Siemens MindSphere. Or the choice is to deploy “just” a specific solution, more lightweight solution to solve one problem. For instance, HiveMQ could be deployed as scalable MQTT cluster to connect to machines and devices. This IoT gateway or proxy connects to Kafka. Kafka is either deployed in the same infrastructure or in another data center or cloud. The Kafka cluster connects to the rest of the enterprise.

In other scenarios, Kafka is used as IoT gateway or proxy to connect to the PLCs or Distributed Control System (DCS) directly. Kafka then connects to an IoT Solution like AWS IoT or Google Cloud’s MQTT Bridge where further processing and analytics happen.

Communication is often bidirectional. No matter what architecture you choose: The data is ingested from the shop floor or other IoT devices, processed and correlated in real time, and finally control events are sent back to the machines. For instance, in predictive analytics, you first train analytic models with tools like TensorFlow in the cloud. Then you deploy the analytic model at the edge for real time predictions.

Do you really need NiFi / MiNiFi or an ESB / ETL Tool?

I just explained why you might combine Kafka with other IoT frameworks or solutions. These are very complementary to the Kafka ecosystem and focus on different use cases like device management, training an analytic model, reporting or building a digital twin.

For example, the major cloud providers provide IoT services for device management, proxies to their own cloud services, and analytics tools. At the edge, Eclipse IoT alone provides various different IoT frameworks. For instance, Eclipse ditto is a great open source framework for building a digital twin.

For integration problems, I think differently!

24/7 Uptime and Zero Data Loss Required?

This is a long discussion, but TL;DR: Most integration projects have critical SLAs. The infrastructure has to run 24/7 without downtime and without data loss. The more middleware components combined, the harder it gets to ensure your SLAs and requirements. If you can run Kafka infrastructure 99.95, for each additional middleware components you combine with it, the end-to-end availability goes down. Additionally, you have to develop, test, operate and pay for two or more middleware components instead of just focusing on one single infrastructure.

SLAs Important? Kafka Eats its Own Dog Food

This is one of the key benefits of Kafka; Kafka eats its own dog food: All components leverage the Kafka protocol and its features like offsets, replication, consumer groups, etc. under the hood:

I see the added value of tools like Apache NiFi or Node-RED: You have a drag&drop UI to build pipelines. This is really nice! If you just have to built a pipeline to send data from the edge to the data lake for reporting and analytics, Nifi et al are great tools – if you can live with the risk of downtime and data loss!

Kafka + NiFi + XYZ = Too Many Distributed Systems to Operate!

If you have to built a scalable, reliable streaming infrastructure for edge computing and hybrid architecture without downtime and without data loss: Trust me, you will have a lot of pain. I have seen many deployments where end-to-end integration combining different middleware tools did not go through the integration tests. The idea looks nice in the beginning, but is not robust enough for mission-critical scenarios.

Think about it again: The more tools you combine with each other, the higher the risk for an outage or data loss! NiFi, as one example, runs its own distributed infrastructure. This means you have to guarantee 24/7 end-to-end uptime from the producers via NiFi and Kafka to the final consumer. Kafka-native tools like Kafka Connect or Kafka Streams use Kafka topics (including all the high availability features of Kafka) under the hood. This means you have to operate just one single infrastructure in 24/7 mode to guarantee end-to-end integration without downtime and without data loss.

My heart aches when I see architecture recommendations where you have pipelines like “Sensor ABC -> NiFi (Ingestion) -> Kafka Topic A -> NiFi (Transformation) -> Kafka Topic B -> NiFi (Load) -> Application XYZ”. Again, this is fine for batch ETL pipelines and I see the added value of a nice UI tool like NiFi. But this is NOT the right way to build a 24/7 infrastructure for real time processing at scale with zero data loss.

I have a lot of material to learn the differences between an event-driven streaming platform like Apache Kafka and middleware like NiFi, Node-RED, Message Queues (MQ), Extract-Transform-Load (ETL) and Enterprise Service Bus (ESB).

Kafka at the Edge to Consolidate the Hybrid IoT Architecture

“Kafka at the edge” is the new black. Many industries deploy Kafka in hybrid architectures. Edge computing allows innovative use cases, increases processing speed, reduces network cost and makes the overall infrastructure more scalable, reliable and robust.

Start small, roll out Kafka at the edge in one site, connect it to the remote Kafka cluster. Then, connect more and more sites step-by-step or build bidirectional use cases.

Edge computing is just a part of the overall architecture. It is okay to deploy just one single Kafka broker in small sites like a coffee shop, retail store or small plant. A “real Kafka cluster” has to be deployed on site to ensure mission-critical use cases. Local processing allows mission-critical processing in real time at scale without the need for remote communication. This reduces costs and increases security.

However, added value often comes from combining the data from different sites to use it for real time decisions. IoT Kafka infrastructures often combine small edge deployments with bigger Kafka deployments in the data center or public cloud. In the meantime, you can even run a single Apache Kafka cluster across multiple datacenters to build regional and global Kafka infrastructures – and connect these to the local edge Kafka clusters.

What are your thoughts about Kafka at the edge and in hybrid architectures? Please let me know. Also, check out the “Infrastructure Checklist for Apache Kafka at the Edge” if you plan to go that direction!

Let’s connect on LinkedIn and discuss your use cases, architectures, and requirements. Also, stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka is the New Black at the Edge in Industrial IoT, Logistics and Retailing appeared first on Kai Waehner.

Apache Kafka as Digital Twin for Open, Scalable, Reliable Industrial IoT (IIoT)

Kai Waehner — Thu, 28 Nov 2019 12:53:48 +0000

This blog post discusses the benefits of a Digital Twin in Industrial IoT (IIoT) and its relation to Apache Kafka. Kafka is often used as central event streaming platform to build a scalable and reliable digital twin for real time streaming sensor data.

In November 2019, I attended the SPS Conference in Nuremberg. This is one of the most important events about Industrial IoT (IIoT). Vendors and attendees from all over the world fly in to make business and discuss new products. Hotel prices in this region go up from usually 80-100€ to over 300€ per night. Germany is still known for its excellent engineering and manufacturing industry. German companies drive a lot of innovation and standardization around Internet of Things (IoT) and Industry 4.0.

The article discusses:

The relation between Operational Technology (OT) and Information Technology (IT)
The advantages and architecture of an open event streaming platform for edge and global IIoT infrastructures
Examples and use cases for Digital Twin infrastructures leveraging an event streaming platform for storage and processing in real time at scale

SPS – A Trade Show in Germany for Global Industrial IoT Vendors

“SPS covers the entire spectrum of smart and digital automation – from simple sensors to intelligent solutions, from what is feasible today to the vision of a fully digitalized industrial world“. No surprise almost all vendors show software and hardware solutions for innovative use cases like

Condition monitoring in real time
Predictive maintenance using artificial intelligence (AI) – I prefer the more realistic term “machine learning”, a subset of AI
Integration of legacy machines and proprietary protocols in the shop floors
Robotics
New digital services
Other related buzzwords.

Digital Twin as Huge Value Proposition

New software and hardware needs to improve business processes, increase revenue or cut costs. Many booths showed solution for a digital twin as part of this buzzword bingo and value proposition. Most vendors exhibited complete solutions as hardware or software products. Nobody talked about the underlying implementation. Not all vendors could explain in detail how the infrastructure really scales and performs under the hood.

This post starts from a different direction. It begins with definition and use cases of a digital twin infrastructure. The challenges and requirements are discussed in detail. Afterwards, possible architectures and combinations of solutions show the benefits of an open and scalable event streaming platform as part of the puzzle.

The value proposition of a digital twin always discusses the combination of OT and IT:

OT: Operation Technology; dealing with machines and devices
IT: Information Technology, dealing with information

Excursus: SPS == PLC —> A Core Component in each IoT Infrastructure

A funny, but relevant marginal note for non-German people: The event name of the trade fair and acronym “SPS” stands for “smart production solutions”. However, in Germany “SPS” actually stands for “SpeicherProgrammierbare Steuerung”. The English translation might be very familiar for you: “Programmable Logic Controller” or shortened “PLC“.

PLC is a core component in any industrial infrastructure. This industrial digital computer has been ruggedized and adapted for the control of manufacturing processes, such as:

Assembly lines
Robotic devices
Any activity that requires high reliability control and ease of programming and process fault diagnosis

PLCs are built to withstand extreme temperatures, strong vibrations, high humidity, and more. Furthermore, since they are not reliant on a PC or network, a PLC will continue to independently function without any connectivity. PLC is OT. This is very different from the IT hardware a software engineers knows from developing and deploying “normal” Java, .NET or Golang applications.

Digital Twin – Merging the Physical and the Digital World

A digital twin is a digital replica of a living or non-living physical entity. By bridging the physical and the virtual world, data is transmitted seamlessly allowing the virtual entity to exist simultaneously with the physical entity. The digital twin therefore interconnects OT and IT.

Digital Replica of Potential and Actual Physical Assets

Digital twin refers to a digital replica of potential and actual physical assets (physical twin), processes, people, places, systems and devices. The digital replica can be used for various purposes. The digital representation provides both the elements and the dynamics of how an IoT device operates and lives throughout its life cycle.

Definitions of digital twin technology used in prior research emphasize two important characteristics. Firstly, each definition emphasizes the connection between the physical model and the corresponding virtual model or virtual counterpart. Secondly, this connection is established by generating real time data using sensors.

Digital Twins Link Internet of Things and Artificial Intelligence

Digital twins connect internet of things, artificial intelligence, machine learning and software analytics to create living digital simulation models. These models update and change as their physical counterparts change. A digital twin continuously learns and updates itself from multiple sources to represent its near real-time status, working condition or position.

This learning system learns from

itself, using sensor data that conveys various aspects of its operating condition.
human experts, such as engineers with deep and relevant industry domain knowledge.
other similar machines
other similar fleets of machines
the larger systems and environment in which it may be a part of.

A digital twin also integrates historical data from past machine usage to factor into its digital model.

Use Cases for a Digital Twin in Various Industries

In various industrial sectors, digital twins are used to optimize the operation and maintenance of physical assets, systems and manufacturing processes. They are a formative technology for the IIoT. A digital twin in the workplace is often considered part of Robotic Process Automation (RPA). Per Industry-analyst firm Gartner, a digital twin is part of the broader and emerging hyperautomation category.

Some industries that can leverage digital twins include:

Manufacturing Industry: Physical manufacturing objects are virtualized and represented as digital twin models seamlessly and closely integrated in both the physical and cyber spaces. Physical objects and twin models interact in a mutually beneficial manner. Therefore, the IT infrastructure also sends control commands back to the actuators of the machines (OT).
Automotive Industry: Digital twins in the automobile industry use existing data in order to facilitate processes and reduce marginal costs. They can also suggest incorporating new features in the car that can reduce car accidents on the road.
Healthcare Industry: Lives can be improved in terms of medical health, sports and education by taking a more data-driven approach to healthcare. The biggest benefit of the digital twin on the healthcare industry is the fact that healthcare can be tailored to anticipate on the responses of individual patients.

Examples for Digital Twins in the Industrial IoT

Digital twins are used in various scenarios. For example, they enable the optimization of the maintenance of power generation equipment such as power generation turbines, jet engines and locomotives. Further examples of industry applications are aircraft engines, wind turbines, large structures (e.g. offshore platforms, offshore vessels), heating, ventilation, and air conditioning (HVAC) control systems, locomotives, buildings, utilities (electric, gas, water, waste water networks).

Let’s take a look at some use cases in more detail:

Monitoring, Diagnostics and Prognostics

A digital twin can be used for monitoring, diagnostics and prognostics to optimize asset performance and utilization. In this field, sensory data can be combined with historical data, human expertise and fleet and simulation learning to improve the outcome of prognostics. Therefore, complex prognostics and intelligent maintenance system platforms can use digital twins in finding the root cause of issues and improve productivity.

Digital twins of autonomous vehicles and their sensor suite embedded in a traffic and environment simulation have also been proposed as a means to overcome the significant development, testing and validation challenges for the automotive application. In particular when the related algorithms are based on artificial intelligence approaches that require extensive training data and validation data sets.

3D Modeling for the Creation of Digital Companions

Digital twins are used for 3D modeling to create digital companions for the physical objects. It can be used to view the status of the actual physical object. This provides a way to project physical objects into the digital world. For instance, when sensors collect data from a connected device, the sensor data can be used to update a “digital twin” copy of the device’s state in real time.

The term “device shadow” is also used for the concept of a digital twin. The digital twin is meant to be an up-to-date and accurate copy of the physical object’s properties and states. This includes information such as shape, position, gesture, status and motion.

Embedded Digital Twin

Some manufacturers embed a digital twin into their device. This improves quality, allows earlier fault detection and gives better feedback on product usage to the product designers.

How to Build a Digital Twin Infrastructure?

TL;DR: You need to have the correct information in real time at the right location to be able to analyze the data and act properly. Otherwise you will have conversations like the following:

Challenges and Requirements for Building a Scalable, Reliable Digital Twin

The following challenges have to be solved to implement a successful digital twin infrastructure:

Connectivity: Different machines and sensors typically do not provide the same interface. Some use a modern standard like MQTT or OPC-UA. Though, many (older) machines use proprietary interfaces. Furthermore, you also need to integrate to the rest of the enterprise.
Ingestion and Processing: End-to-end pipelines and correlation of all relevant information from all machines, devices, MES, ERP, SCM, and any other related enterprise software in the factories, data center or cloud.
Real Time: Ingestion of all information in real time (this is a tough term; in this case “real time” typically means milliseconds, sometimes even seconds or minutes are fine).
Long-term Storage: Storage of the data for reporting, batch analytics, correlations of data from different data sources, and other use cases you did not think at the time when the data was created.
Security: Trusted data pipelines using authentication, authorization and encryption.
Scalability: Ingestion and processing of sensor data from one or more shop floors creates a lot of data.
Decoupling: Sensors produce data continuously. They don’t ask if the consumers are available and if they can keep up with the input. The handling of backpressure and decoupling if producers and consumers is mandatory.
Multi-Region or Global Deployment: Analysis and correlation of data in one plant is great. But it is even better if you can correlate and analyze the data from all your plants. Maybe even plants deployed all over the world.
High Availability: Building a pipeline from one shop floor to the cloud for analytics is great. However, even if you just integrate a single plant, the pipeline typically has to run 24/7. In some cases without data loss and with order guarantees of the sensor events.
Role-Based Access Control (RBAC): Integration of tens or hundreds of machines requires capabilities to manage the relation between the hardware and its digital twin. This includes the technical integration, role-based access control, and more.

The above list covers technical requirements. On top, business services have to be provided. This includes device management, analytics, web UI / mobile app, and other capabilities depending on your use cases.

So, how do you get there? Let’s discuss three alternatives in the following sections.

Solution #1 => IoT / IIoT COTS Product

COTS (commercial off-the-shelf) is software or hardware products that are ready-made and available for sale. IIoT COTS products are built for exactly one problem: Development, Deployment and Operations of IoT use cases.

Many IoT COTS solutions are available on the market. This includes products like Siemens MindSphere, PTC IoT, GE Predix, Hitachi Lumada or Cisco Kinetic for deployments in the data center or in different clouds.

Major cloud providers like AWS, GCP, Azure and Alibaba provide IoT-specific services. Cloud providers are a potential alternative if you are fine with the trade-offs of vendor lock-in. Main advantage: Native integration into the ecosystem of the cloud provider. Main disadvantage: Lock-in into the ecosystem of the cloud provider.

At the SPS trade show, 100+ vendors presented their IoT solution to connect shop floors, ingest data into the data center or cloud, and do analytics. Often, many different tools, products and cloud services are combined to solve a specific problem. This can either be obvious or just under the hood with OEM partners. I already covered the discussion around using one single integration infrastructure versus many different middleware components in much more detail.

Read Gartner’s Magic Quadrant for Industrial IoT Platforms 2019. The analyst report describes the mess under the hood of most IIoT products. Plenty of acquisitions and different code bases are the foundation of many commercial products. Scalability and automated rollouts are key challenges. Download of the report is possible with a paid Gartner account or via free download from any vendor with distribution rights, like PTC.

Proprietary and Vendor-specific Features and Products

Automation industry typically uses proprietary and expensive products. All of them are marketed as flexible cloud products. The truth is that most of them are not really cloud-native. This means they are not scalable and not extendible like you would expect it from a cloud-native service. Some problems I have heard in the past from end users:

Scalability is not given. Many products don’t use a scalable architecture. Instead, additional black boxes or monoliths are added if you need to scale. This challenges uptime SLAs, burden of operations and cost.
Some vendors just deploy their legacy on premise solution into AWS EC2 instances and call it a cloud product. This is far away from being cloud native.
Even IoT solutions from some global cloud providers do not scale good enough to implement large IoT projects (e.g. to connect to one million cars). I saw several companies which evaluate a cloud-native MQTT solution. They then went to a specific MQTT provider afterwards (but still used the cloud provider for its other great services, like computing resources, analytics services, etc).
Most complete IoT solutions use many different components and code bases under the hood. Even if you buy one single product, the infrastructure usually uses various technologies and (OEM) vendors. This makes development, testing, roll-out and 24/7 end-to-end operations much harder and more costly.

Long Product Lifecycles with Proprietary Protocols

In automation industry, product lifecycles are very long (tens of years). Simple changes or upgrades are not possible.

IIoT usually uses incompatible protocols. Typically, not just the products, but also these protocols are proprietary. They are just built for hardware and machines of one specific vendor. Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, to name a few of these protocols and “standards”.

OPC-UA is supported more and more. This is a real standard. However, it has all the pros and cons of an open standard. It is often poorly implemented by the vendors and requires an app server on top of the PLC.

In most scenarios I have seen, connectivity to machines and sensors from different vendors is required, too. Even in one single plant, you typically have many different technologies and communication paradigms to integrate.

There is More than just the Machines and PLCs…

No matter how you integrate to the shop floor machines; this is just the first step to get data correlated with other systems from your enterprise. Connectivity to all the machines in the shop floors is not sufficient. Integration and combination of the IoT data with other enterprise applications is crucial. You also want to provide and sell additional services in the data center or cloud. In addition, you might even want to integrate with partners and suppliers. This adds additional value to your product. You can provide features you don’t sell by yourself.

Therefore, an IoT COTS solution does not solve all the challenges. There is huge demand to build an open, flexible, scalable platform. This is the reason why Apache Kafka comes into play in many projects as event streaming platform.

Solution #2 => Apache Kafka as Event Streaming Platform for the Digital Twin Infrastructure

Apache Kafka provides many business and technical characteristics out-of-the-box:

Cost reduction due to open core principle
No vendor lock-in
Flexibility and Extensibility
Scalability
Standards-based Integration
Infrastructure-, vendor- and technology-independent
Decoupling of applications and machines

Apache Kafka – An Immutable, Distributed Log for Real Time Processing and Long Term Storage

I assume you already know Apache Kafka: The de-facto standard for real-time event streaming. Apache Kafka provides the following characteristics:

Open-source (Apache 2.0 License)
Global-scale
High volume messaging
Real-time
Persistent storage for backup and decoupling of sources and sinks
Connectivity (via Kafka Connect)
Continuous Stream processing (via Kafka Streams)

This is all included within the Apache Kafka framework. Fully independent of any vendor. Vibrant community with thousands of contributors from many hundreds of companies. Adoption all over the world in any industry.

If you need more details about Apache Kafka, check out the Kafka website, the extensive Confluent documentation, or hundreds of free video recordings and slide decks from all Kafka Summit events to learn about the technology and use cases.

How is Kafka related to Industrial IoT, shop floors and building digital twins? Let’s take a quick look at Kafka’s capabilities for integration and continuous data processing at scale.

Connectivity to Industrial Control Systems (ICS)

Kafka can connect to different functional levels of a manufacturing control operation:

Integrate directly to Level 1 (PLC / DCS): Kafka Connect or any other Kafka Clients (Java, Python, .NET, Go, JavaScript, etc.) can connect directly to PLCs or an OPC-UA server. The the data gets directly ingested from the machines and devices. Check out “Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing” for an example to integrate with different PLC protocols like Siemens S7 and Modbus in real time at scale.
Integrate on level 2 (Plant Supervisory) or even above on 3 (Production Control) or 4 (Production Scheduling): For instance, you integrate with MES / ERP / SCM systems via REST / HTTP interfaces or any MQTT-supported plant supervisory or production control infrastructure. The blog post “IoT and Event Streaming at Scale with MQTT” discusses a few different options.
What about level 0 (Field level)? Today, IT typically only connects to PLCs respectively DCS. IT does not directly integrate to sensors and actuators in the field bus, switch or end node. This will probably change in the future. Sensors get smarter and more powerful. And new standards for the “last mile” of the network emerge, like 10BASE-T1L.

We discussed different integration levels between IT and OT infrastructure. However, connecting from the IT to the PLC and shop floor is just half of the story…

Connectivity and Data Ingestion to MES, ERP, SCM, Big Data, Cloud and the Rest of the Enterprise

Kafka Connect enables reliable and scalable integration of Kafka with any other system. This is important as you need to integrate and correlate sensor data from PLCs with applications and databases from the rest of the enterprise. Kafka Connect provides sources and sinks to

Databases like Oracle or SAP Hana
Big Data Analytics like Hadoop, Spark or Google Cloud Machine Learning
Enterprise Applications like ERP, CRM, SCM
Any other application using custom connectors or Kafka clients

Continuous Stream Processing and Streaming Analytics in Real Time on the Digital Twin with Kafka

A pipeline from machines to other systems in real time at scale is just part of the full story. You also need to continuously process the data. For instance, you implement streaming ETL, real time predictions with analytic models, or execute any other business logic.

Kafka Streams allows to write standard Java apps and microservices to continuously process your data in real-time with a lightweight stream processing Java API. You could alos use ksqlDB to do stream processing using SQL semantics. Both frameworks use Kafka natively under the hood. Therefore, you leverage all Kafka-native features like high throughput, high availability and zero downtime out-of-the box.

The integration with other streaming solutions like Apache Flink, Spark Streaming, AWS Kinesis or other commercial products like TIBCO StreamBase, IBM Streams or Software AG’s Apama is possible, of course.

Kafka is so great and widely adopted because it decouples all sources and sinks from each other leveraging its distributed messaging and storage infrastructure. Smart endpoints and dumb pipes is a key design principle applied with Kafka automatically to decouple services through a Domain-Driven Design (DDD):

Kafka + Machine Learning / Deep Learning = Real Time Predictions in Industrial IoT

The combination of Kafka and Machine Learning for digital twins is not different from any other industry. However, IoT projects usually generate big data sets throught real time streaming sensor data. Therefore, Kafka + Machine Learning makes even more sense in IoT projects for building digital twins than in many other projects where you lack big data sets.

Analytic models need to be applied at scale. Predictions often need to happen in real time. Data preprocessing, feature engineering and model training also need to happen as fast as possible. Monitoring the whole infrastructure in real time at scale is a key requirement, too. This includes not just the infrastructure for the machines and sensors in the shop floor, but also the overall end-to-end edge-to-cloud infrastructure.

With this in mind, you quickly understand that Machine Learning / Deep Learning and Apache Kafka are very complementary. I have covered this in detail in many other blog posts and presentations. Get started here for more details:

Blog Post: How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka
Slide Deck: Apache Kafka + Machine Learning => Intelligent Real Time Applications
Video Recording: Deep Learning in Mission Critical and Scalable Real Time Applications with Open Source Frameworks

Example for Kafka + ML + IoT: Embedded System at the Edge using an Analytic Model in the Firmware

Let’s discuss a quick example for Kafka + ML + IoT Edge: Embedded systems are very inflexible. It is hard to change code or business rules. The code in the firmware applies between input and out of the hardware. The code is embedded which implements business rules. However, new code or business rules cannot be simply deployed with a script or continuous delivery like you know it from your favorite Java / .NET / Go / Python application and tools like Maven and Jenkins. Instead, each change requires a new, long development lifecycle including tests, certifications and a manufacturing process.

There is another option Instead of writing and embedding business rules in code with a complex and costly process: Analytic models can be trained on historical data. This can happen anywhere. For instance, you can ingest sensor data into a data lake in the cloud via Kafka. The model is trained in the elastic and flexible cloud infrastructure. Finally, this model is deployed to an embedded system respectively a new firmware version is created with this model (using the long, expensive process). However, updating (i.e. improving) the model (which is already deployed on the embedded system) gets much easier because no code has to be changed. The mode is “just” re-trained and improved.

This way, business rules can be updated and improved by improving the already deployed model in the embedded system. No new development lifecycle, testing and certification and manufacturing process are required. DNP/AISS1 from SSV Software is one example of a hardware starter kit with pre-installed ML algorithms.

Solution #3 => Kafka + IIoT COTS Product as Complementary Solutions for the Digital Twin

The above sections described how to use either an IoT COTS product or an event streaming platform like Apache Kafka and its ecosystem for building a digital twin infrastructure. Interestingly, Kafka and IoT COTS are actually combined in most deployments I have seen so far.

Most IoT COTS products provide out-of-the-box Kafka connectors because end users are asking for this feature all the time. Let’s discuss the combination of Kafka and other IoT products in more detail.

Kafka is Complementary to Industry Solutions such as Siemens MindSphere or Cisco Kinetic

Kafka can be used in very different scenarios. It is s best for building a scalable, reliable and open infrastructure to integrate IoT at the edge and the rest of the enterprise.

Different Combinations of Kafka and IoT Solutions

The combination of an even streaming platform like Kafka with one or more other IoT products or frameworks is very common:

Sometimes, the shop floor is connecting to an IoT solution. The IoT solution is used as gateway or proxy. This can be a broad, powerful (but also more complex and expensive) solution like Siemens MindSphere. Or the choice is to deploy “just” a specific solution, more lightweight solution to solve one problem. For instance, HiveMQ could be deployed as scalable MQTT cluster to connect to machines and devices. This IoT gateway or proxy connects to Kafka. Kafka is either deployed in the same infrastructure or in another data center or cloud. The Kafka cluster connects to the rest of the enterprise.

Communication is often bidirectional. No matter what architecture you choose. This means the data is ingested from the shop floor, processed and correlated, and finally control events are sent back to the machines. For instance, in predictive analytics, you first train analytic models with tools like TensorFlow in the cloud. Then you deploy the analytic model at the edge for real time predictions.

Eclipse ditto – An Open Source Framework Dedicated to the Digital Twin; with Kafka Integration

There is not just commercial IoT solutions on the market, of course. Kafka is complementary to COTS IoT solutions and to open source IoT frameworks. Eclipse IoT alone provides various different IoT frameworks. Let’s take a look at one of them, which fits perfectly into this blog post:

Eclipse ditto – an open source framework for building a digital twin. With the decoupled principle of Kafka, it is straightforward to leverage other frameworks for specific tasks.

ditto was created to help realizing digital twins with an open source API. It provides features like like Device-as-a-Service, state management for digital twins, and APIs to organize your set of digital twins. Kafka integration is built-in into ditto out-of-the-box. Therefore, ditto and Kafka complement each other very well to build a digital twin infrastructure.

The World is Hybrid and Polyglot

The world is hybrid and polyglot in terms of technologies and standards. Different machines use different technologies and protocols. Each plant uses its own frameworks, products and standards. Business units use different analytics tools and not always the same clouds provider. And so on…

Global Kafka Architecture for Edge / On Premises / Hybrid / Multi Cloud Deployments of the Digital Twin

Kafka is often not just the backbone to integrate IoT devices and the rest of the enterprise. More and more companies deploy multiple Kafka clusters in different regions, facilities and cloud providers.

The right architecture for Kafka deployments depends on the use cases, SLAs and many other requirements. If you want to build a digital twin architecture, you typically have to think about edge AND data centers / cloud infrastructure:

Apache Kafka as Global Nervous System for Streaming Data

Using Kafka as global nervous system for streaming data typically means you spin up different Kafka clusters. The following scenarios are very common:

Local edge Kafka clusters in the shop floors: Each factory has its own Kafka cluster to integrate with the machines, sensors and assembly lines. But also with ERP systems, SCADA monitoring tools, and mobile devices from the workers. This is typically a very small Kafka cluster with e.g. three Brokers (which still can process ~100+ MB/sec). Sometimes, one single Kafka broker is deployed. This is fine if you do not need high availability and prefer low cost and very simple operations.
Central regional Kafka clusters: Kafka clusters are deployed in different regions. Each Kafka cluster is used to ingest, process and aggregate data from different factories in that region or from all cars within a region. These Kafka clusters are bigger than the local Kafka clusters as they need to integrate data from several edge Kafka clusters. The integration can be realized easily and reliable with Confluent Replicator or in the future maybe with MirrorMaker 2 (if it matures over time). Don’t use Mirrormaker 1 at all – you can find many good reasons on the web. Another option is to directly integrate Kafka clients deployed at the edge to a central regional Kafka cluster. Either with a Kafka Client using Java, C, C++, Python, Go or another programming language. Or using a proxy in the middle, like Confluent REST Proxy, Confluent MQTT Proxy, or any MQTT Broker outside the Kafka environment. Find out more details about comparing different MQTT and HTTP-based IoT integration options for Kafka here.
Multi-region or global Kafka clusters: You can deploy one Kafka cluster in each region or continent. Then replicate the data between each other (one- or bidirectional) in real time using Confluent Replicator. Or you can leverage the multi-data center replication Kafka feature from Confluent Platform to spin up one logical cluster over different regions. The latter provides automatic fail-over, zero data loss and much easier operations of server and client side.

This is just a quick summary of deployment options for Kafka clusters at the edge, on premises or in the cloud. You typically combine different options to deploy a hybrid and global Kafka infrastructure. Often, you start small with one pipeline and a single Kafka cluster. Scaling up and rolling out the global expansion should be included into the planning from the beginning. I will speak in more detail about different “architecture patterns and best practices for distributed, hybrid and global Apache Kafka deployments” at DevNexus in Atlanta in February 2020. This is a good topic for another blog post in 2020.

Polyglot Infrastructure – There is no AWS, GCP or Azure Cloud in China and Russia!

For global deployments, you need to choose the right cloud providers or build your own data centers in some different regions.

For example, you might leverage Confluent Cloud. This is a fully managed Kafka service with usage-based pricing and enterprise-ready SLAs in Europe and the US on Azure, AWS or GCP. Confluent Cloud is a real serverless approach. No need to think about Kafka Brokers, operations, scalability, rebalancing, security, fine tuning, upgrades.

US cloud providers do not provide cloud services in China. Alibaba is the leading cloud provider. Kafka can be deployed on Alibaba cloud. Or choose a generic cloud-native infrastructure like Kubernetes. Confluent Operator, a Kubernetes Operator including CRD and Helm charts, is a tool to support and automate provisioning and operations on any Kubernetes infrastructure.

No public cloud is available in Russia at all. The reasons is mainly legal restrictions. Kafka has to be deployed on premises.

In some scenarios, the data from different Kafka clusters in different regions is replicated and aggregated. Some anonymous sensor data from all continents can be correlated to find new insights. But some specific user data might always just stay in the country of origin and local region.

Standardized Infrastructure Templates and Automation

Many companies build one general infrastructure template on a specific abstraction level. This template can then be deployed to different data centers and cloud providers the same way. This standardizes and eases operations in global Kafka deployments.

Cloud-Native Kubernetes for Robust IoT and Self-Healing, Scalable Kafka Cluster

Today, Kubernetes is often choosen as the abstraction layer. Kubernetes is deployed and managed by the cloud provider (e.g. GKE on GCP) or an operations team on premises. All required infrastructure on top of Kubernetes is scripted and automated with a template framework (e.g. Terraform). This can then be rolled out to different regions and infrastructures in the same standardized and automated way.

The same is applicable for the Kafka infrastructure on top of Kubernetes. Either you leverage existing tools like Confluent Operator or build your own scripts and custom resource definitions. The Kafka Operator for Kubernetes has several features built-in, like automated handling of fail-over, rolling upgrades and security configuration.

Find the right abstraction level for your Digital Twin infrastructure:

You can also use tools like the open source Kafka Ansible scripts to deploy and operate the Kafka ecosystem (including components like Schema Registry or Kafka Connect), of course.

However, the beauty of a cloud-native infrastructure like Kubernetes is its self-healing and robust characteristics. Failure is expected. This means that your infrastructure continues to run without downtime or data loss in case of node or network failures.

This is quite important if you deploy a digital twin infrastructure and roll it out to different regions, countries and business units. Many failures are handled automatically (in terms of continuous operations without downtime or data loss).

Not every failure requires a P1 and call to the support hotline. The system continues to run while an ops team can replace defect infrastructure in the background without time pressure. This is exactly what you need to deploy robust IoT solutions to production at scale.

Apache Kafka as the Digital Twin for 100000 Connected Cars

Let’s conclude with a specific example to build a Digital Twin infrastructure with the Kafka ecosystem. We use an implementation from the automotive industry. But this is applicable to any scenario where you want to build and leverage digital twins. Read more about “Use Cases for Apache Kafka in Automotive Industry” here.

Honestly, this demo was not built with the idea of creating a digital twin infrastructure in mind. However, think about it (and take a look at some definitions, architectures and solutions on the web): Digital Twin is just a concept and software design pattern. Remember our definition from the beginning of the article: A digital twin is a digital replica of a living or non-living physical entity. Therefore, the decision of the right architecture and technology depends on the use case(s).

We built a demo which shows how you can integrate with tens or hundreds of thousands IoT devices and process the data in real time. The demo use case is predictive maintenance (i.e. anomaly detection) in a connected car infrastructure to predict motor engine failures: “Building a Digital Twin for Streaming Machine Learning at Scale from 100000 IoT Devices with HiveMQ, Apache Kafka and TensorFlow“.

In this example, the data from 100000 cars is ingested and stored in the Kafka cluster, i.e. the digital twin, for further processing and real time analytics.

Kafka client applications consume the data for different use cases and in different speed:

Real time data pre-processing and data engineering using the data from the digital twin with Kafka Streams and KSQL / ksqlDB.
Streaming model training (i.e. without a data lake in the middle) with the Maschine Learning / Deep Learning framework TensorFlow and its Kafka plugin (part of TensorFlow I/O). In our example, we train two neural networks: An unsupervised Autoencoder for anomaly detection and a supervised LSTM.
Model deployment for inference in real time on new car sensor events to predict potential failures in the motor engine.
Ingestion of the data into another batch system, database or data lake (Oracle, HDFS, Elastic, AWS S3, Google Cloud Storage, whatever).
Another consumer could be a real time time series database like InfluxDB or TimescaleDB.

Build your own Digital Twin Infrastructure with Kafka and its Open Ecosystem!

I hope this post gave you some insights and ideas. I described three options to build the infrastructure for a digital twin: 1) IoT COTS, 2) Kafka, 3) Kafka + Iot COTS. Many companies leverage Apache Kafka as central nervous system for their IoT infrastructure in one way or the other.

Digital Twin is just one of many possible IoT use cases for an event streaming platform. Often, Kafka is “just” part of the solution. Pick and choose the right tools for your use cases. Evaluate the Kafka ecosystem and different IoT frameworks / solutions to find the best combination for your project. Don’t forget to include the vision and long-term planning into your decisions! If you plan it right from the beginning, it is (relative) straightforward to start with a pilot or MVP. Then roll it out to the first plant into production. Over time, deploy a global Digital Twin infrastructure…

The post Apache Kafka as Digital Twin for Open, Scalable, Reliable Industrial IoT (IIoT) appeared first on Kai Waehner.

IIoT Archives - Kai Waehner

Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming

Why Replace Legacy OT Middleware?

Challenges: Proprietary, Monolithic, Expensive

The Data Streaming Approach: Kafka & Flink as the Foundation for Modern OT Middleware

Apache Kafka: The Backbone of Real-Time OT Data Streaming

Apache Flink: The Real-Time Processing Engine for OT Data

Unifying Operational (OT) and Analytical (IT) Workloads

The Shift-Left Architecture: Bringing Advanced Analytics Closer to Industrial IoT

Open Table Format with Apache Iceberg / Delta Lake for Unified Workloads and Single Storage Layer

The Result: An Open, Cloud-Native, Future-Proof Data Historian for Industrial IoT

Key Benefits of Offloading OT Middleware to Data Streaming

A Step-by-Step Approach: Offloading vs. Replacing OT Middleware with Data Streaming

1. Hybrid Data Streaming: Process Once for OT and IT

Why?

How?

Result?

2. Lift-and-Shift: Reduce Costs While Keeping Existing OT Integrations

Why?

How?

Result?

3. Full OT Middleware Replacement: Cloud-Native, Scalable, and Future-Proof

Why?

How?

Result?

Why This Matters: The Future of OT is Real-Time & Open for Data Sharing

It’s Time to Move Beyond Legacy OT Middleware to Open Standards like MQTT, OPC-UA, Kafka

OPC UA, MQTT, and Apache Kafka – The Trinity of Data Streaming in IoT

Industry 4.0: Data streaming platforms increase overall plant effectiveness and connect equipment

An adaptive manufacturing strategy starts with real-time data

Kappa architecture for a real-time IoT data hub

When to use Kafka vs. MQTT and OPC UA?

Meeting the challenges of Industry 4.0 through data streaming and data mesh

Separation of concerns in the OT/IT world with domain-driven design and true decoupling

How to choose between OPC UA and MQTT with Kafka?

MQTT vs. OPC UA (vs. HTTP vs. Proprietary)

Decision tree for evaluating IoT protocols

Integration between MQTT / OPC UA and Kafka

OEE scenarios that benefit from data streaming

Connectivity to machines and equipment with OPC UA or MQTT

Digital twins for development and predictive simulation

Condition monitoring and predictive maintenance

Connected cars and streaming machine learning

BMW case study: Manufacturing 4.0 with smart factory and cloud

Kafka and OPC UA as real-time data hub between equipment at the edge and applications in the cloud

Audi case study – Connected cars for swarm intelligence

Serverless data streaming enables focusing on IoT business applications and improving OEE

Apache Kafka and MQTT (Part 1 of 5) – Overview and Comparison

Apache Kafka + MQTT Blog Series

Apache Kafka vs. MQTT

When (not) to use MQTT?

When (not) to use Apache Kafka?

TL;DR

Example: Predictive Maintenance with 100,000 Connected Cars

Further Slides, Articles, and Demos

Kafka + MQTT = Match Made in Heaven

Infrastructure Checklist for Apache Kafka at the Edge

Apache Kafka at the Edge == Outside the Data Center

What is the Edge?

Infrastructure Checklist: How to Deploy Apache Kafka at the Edge?

Kafka at the Edge is the new Black!

Use Cases and Architectures for Kafka at the Edge

Categories and Architectures for Kafka at the Edge

Use Cases Across Industries

Scenarios for Edge Computing with Kafka

Example Architecture: Kafka in Transportation and Logistics

Hybrid Architecture – From Edge to Cloud

Event Streaming in the Train with Kafka

Kafka for Disconnected / Offline Scenarios

Cross-Company Kafka Integration

Infrastructure and Hardware Requirement for Deployment of Kafka at the Edge

“The Confluent Way” to Deploy Kafka at the Edge

Kafka at the Edge (and in Hybrid Architectures) is the New Black

Apache Kafka in Manufacturing and Industry 4.0

10000 Foot View – Event Streaming with Apache Kafka for Manufacturing

Why Apache Kafka in Manufacturing and Industry 4.0?

Use Cases for Kafka in Manufacturing

Slides and Video Recording

Slides

Video Recording