Edge Archives - Kai Waehner

Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming

Kai Waehner — Mon, 17 Mar 2025 12:45:14 +0000

Operational Technology (OT) has traditionally relied on legacy middleware to connect industrial systems, manage data flows, and integrate with enterprise IT. However, these monolithic, proprietary, and expensive middleware solutionsstruggle to keep up with real-time, scalable, and cloud-native architectures.

Just as mainframe offloading modernized enterprise IT, offloading and replacing legacy OT middleware is the next wave of digital transformation. Companies are shifting from vendor-locked, heavyweight OT middleware to real-time, event-driven architectures using Apache Kafka and Apache Flink—enabling cost efficiency, agility, and seamless edge-to-cloud integration.

This blog explores why and how organizations are replacing traditional OT middleware with data streaming, the benefits of this shift, and architectural patterns for hybrid and edge deployments.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including architectures and customer stories for hybrid IT/OT integration scenarios.

Why Replace Legacy OT Middleware?

Industrial environments have long relied on OT middleware like OSIsoft PI, proprietary SCADA systems, and industry-specific data buses. These solutions were designed for polling-based communication, siloed data storage, and batch integration. But today’s real-time, AI-driven, and cloud-native use cases demand more.

Challenges: Proprietary, Monolithic, Expensive

High Costs – Licensing, maintenance, and scaling expenses grow exponentially.
Proprietary & Rigid – Vendor lock-in restricts flexibility and data sharing.
Batch & Polling-Based – Limited ability to process and act on real-time events.
Complex Integration – Difficult to connect with cloud and modern IT systems.
Limited Scalability – Not built for the massive data volumes of IoT and edge computing.

Just as PLCs are transitioning to virtual PLCs, eliminating hardware constraints and enabling software-defined industrial control, OT middleware is undergoing a similar shift. Moving from monolithic, proprietary middleware to event-driven, streaming architectures with Kafka and Flink allows organizations to scale dynamically, integrate seamlessly with IT, and process industrial data in real time—without vendor lock-in or infrastructure bottlenecks.

The Data Streaming Approach: Kafka & Flink as the Foundation for Modern OT Middleware

Data streaming is NOT a direct replacement for OT middleware, but it serves as the foundation for modernizing industrial data architectures. With Kafka and Flink, enterprises can offload or replace OT middleware to achieve real-time processing, edge-to-cloud integration, and open interoperability.

While Kafka and Flink provide real-time, scalable, and event-driven capabilities, last-mile integration with PLCs, sensors, and industrial equipment still requires OT-specific SDKs, open interfaces, or lightweight middleware. This includes support for MQTT, OPC UA or open-source solutions like Apache PLC4X to ensure seamless connectivity with OT systems.

Apache Kafka: The Backbone of Real-Time OT Data Streaming

Kafka acts as the central nervous system for industrial data to ensure low-latency, scalable, and fault-tolerant event streaming between OT and IT systems.

Aggregates and normalizes OT data from sensors, PLCs, SCADA, and edge devices.
Bridges OT and IT by integrating with ERP, MES, cloud analytics, and AI/ML platforms.
Operates seamlessly in hybrid, multi-cloud, and edge environments, ensuring real-time data flow.
Works with open OT standards like MQTT and OPC UA, reducing reliance on proprietary middleware solutions.

And just to be clear: Apache Kafka and similar technologies support “IT real-time” (meaning milliseconds of latency and sometimes latency spikes). This is NOT about hard real-time in the OT world for embedded systems or safety critical applications.

Apache Flink: The Real-Time Processing Engine for OT Data

Flink powers real-time analytics, complex event processing, and anomaly detection for streaming industrial data.

Processes sensor data in real-time for predictive maintenance, energy optimization, and fault detection.
Enables stateful stream processing, supporting event correlation, time-series aggregation, and trend analysis.
Runs on edge devices, data centers, or cloud platforms, providing scalability and deployment flexibility.

By leveraging Kafka and Flink, enterprises can process OT and IT data only once, ensuring a real-time, unified data architecture that eliminates redundant processing across separate systems. This approach enhances operational efficiency, reduces costs, and accelerates digital transformation while still integrating seamlessly with existing industrial protocols and interfaces.

Unifying Operational (OT) and Analytical (IT) Workloads

As industries modernize, a shift-left architecture approach ensures that operational data is not just consumed for real-time operational OT workloads but is also made available for transactional and analytical IT use cases—without unnecessary duplication or transformation overhead.

The Shift-Left Architecture: Bringing Advanced Analytics Closer to Industrial IoT

In traditional architectures, OT data is first collected, processed, and stored in proprietary or siloed middleware systems before being moved later to IT systems for analysis. This delayed, multi-step process leads to inefficiencies, including:

High latency between data collection and actionable insights.
Redundant data storage and transformations, increasing complexity and cost.
Disjointed AI/ML pipelines, where models are trained on outdated, pre-processed data rather than real-time information.

A shift-left approach eliminates these inefficiencies by bringing analytics, AI/ML, and data science closer to the raw, real-time data streams from the OT environments.

Instead of waiting for batch pipelines to extract and move data for analysis, a modern architecture integrates real-time streaming with open table formats to ensure immediate usability across both operational and analytical workloads.

Open Table Format with Apache Iceberg / Delta Lake for Unified Workloads and Single Storage Layer

By integrating open table formats like Apache Iceberg and Delta Lake, organizations can:

Unify operational and analytical workloads to enable both real-time data streaming and batch analytics in a single architecture.
Eliminate data silos, ensuring that OT and IT teams access the same high-quality, time-series data without duplication.
Ensure schema evolution and ACID transactions to enable robust and flexible long-term data storage and retrieval.
Enable real-time and historical analytics, allowing engineers, business users, and AI/ML models to query both fresh and historical data efficiently.
Reduce the need for complex ETL pipelines, as data is written once and made available for multiple workloadssimultaneously. And no need to use the anti-pattern of Reverse ETL.

The Result: An Open, Cloud-Native, Future-Proof Data Historian for Industrial IoT

This open, hybrid OT/IT architecture allows organizations to maintain real-time industrial automation and monitoring with Kafka and Flink, while ensuring structured, queryable, and analytics-ready data with Iceberg or Delta Lake. The shift-left approach ensures that data streams remain useful beyond their initial OT function, powering AI-driven automation, predictive maintenance, and business intelligence in near real-time rather than relying on outdated and inconsistent batch processes.

By adopting this unified, streaming-first architecture to build an open and cloud-native data historian, organizations can:

Process data once and make it available for both real-time decisions and long-term analytics.
Reduce costs and complexity by eliminating unnecessary data duplication and movement.
Improve AI/ML effectiveness by feeding models with real-time, high-fidelity OT data.
Ensure compliance and historical traceability without compromising real-time performance.

This approach future-proofs industrial data infrastructures, allowing enterprises to seamlessly integrate IT and OT, while supporting cloud, edge, and hybrid environments for maximum scalability and resilience.

Key Benefits of Offloading OT Middleware to Data Streaming

Lower Costs – Reduce licensing fees and maintenance overhead.
Real-Time Insights – No more waiting for batch updates; analyze events as they happen.
One Unified Data Pipeline – Process data once and make it available for both OT and IT use cases.
Edge and Hybrid Cloud Flexibility – Run analytics at the edge, on-premise, or in the cloud.
Open Standards & Interoperability – Support MQTT, OPC UA, REST/HTTP, Kafka, and Flink, avoiding vendor lock-in.
Scalability & Reliability – Handle massive sensor and machine data streams continuously without performance degradation.

A Step-by-Step Approach: Offloading vs. Replacing OT Middleware with Data Streaming

Companies transitioning from legacy OT middleware have several strategies by leveraging data streaming as an integration and migration platform:

Hybrid Data Processing
Lift-and-Shift
Full OT Middleware Replacement

1. Hybrid Data Streaming: Process Once for OT and IT

Why?

Traditional OT architectures often duplicate data processing across multiple siloed systems, leading to higher costs, slower insights, and operational inefficiencies. Many enterprises still process data inside expensive legacy OT middleware, only to extract and reprocess it again for IT, analytics, and cloud applications.

A hybrid approach using Kafka and Flink enables organizations to offload processing from legacy middleware while ensuring real-time, scalable, and cost-efficient data streaming across OT, IT, cloud, and edge environments.

How?

Connect to the existing OT middleware via:

A Kafka Connector (if available).
HTTP APIs, OPC UA, or MQTT for data extraction.
Custom integrations for proprietary OT protocols.
Lightweight edge processing to pre-filter data before ingestion.

Use Kafka for real-time ingestion, ensuring all OT data is available in a scalable, event-driven pipeline.

Process data once with Flink to:

Apply real-time transformations, aggregations, and filtering at scale.
Perform predictive analytics and anomaly detection before storing or forwarding data.
Enrich OT data with IT context (e.g., adding metadata from ERP or MES).

Distribute processed data to the right destinations, such as:

Time-series databases for historical analysis and monitoring.
Enterprise IT systems (ERP, MES, CMMS, BI tools) for decision-making.
Cloud analytics and AI platforms for advanced insights.
Edge and on-prem applications that need real-time operational intelligence.

Result?

Eliminate redundant processing across OT and IT, reducing costs.
Real-time data availability for analytics, automation, and AI-driven decision-making.
Unified, event-driven architecture that integrates seamlessly with on-premise, edge, hybrid, and cloud environments.
Flexibility to migrate OT workloads over time, without disrupting current operations.

By offloading costly data processing from legacy OT middleware, enterprises can modernize their industrial data infrastructure while maintaining interoperability, efficiency, and scalability.

2. Lift-and-Shift: Reduce Costs While Keeping Existing OT Integrations

Why?

Many enterprises rely on legacy OT middleware like OSIsoft PI, proprietary SCADA systems, or industry-specific data hubs for storing and processing industrial data. However, these solutions come with high licensing costs, limited scalability, and an inflexible architecture.

A lift-and-shift approach provides an immediate cost reduction by offloading data ingestion and storage to Apache Kafka while keeping existing integrations intact. This allows organizations to modernize their infrastructure without disrupting current operations.

How?

Use the Stranger Fig Design Pattern as a gradual modernization approach where new systems incrementally replace legacy components, reducing risk and ensuring a seamless transition:

“The most important reason to consider a strangler fig application over a cut-over rewrite is reduced risk.” Martin Fowler

Replace expensive OT middleware for ingestion and storage:

Deploy Kafka as a scalable, real-time event backbone to collect and distribute data.
Offload sensor, PLC, and SCADA data from OSIsoft PI, legacy brokers, or proprietary middleware.
Maintain the connectivity with existing OT applications to prevent workflow disruption.

Streamline OT data processing:

Store and distribute data in Kafka instead of proprietary, high-cost middleware storage.
Leverage schema-based data governance to ensure compatibility across IT and OT systems.
Reduce data duplication by ingesting once and distributing to all required systems.

Maintain existing IT and analytics integrations:

Keep connections to ERP, MES, and BI platforms via Kafka connectors.
Continue using existing dashboards and reports while transitioning to modern analytics platforms.
Avoid vendor lock-in and enable future migration to cloud or hybrid solutions.

Result?

Immediate cost savings by reducing reliance on expensive middleware storage and licensing fees.
No disruption to existing workflows, ensuring continued operational efficiency.
Scalable, future-ready architecture with the flexibility to expand to edge, cloud, or hybrid environments over time.
Real-time data streaming capabilities, paving the way for predictive analytics, AI-driven automation, and IoT-driven optimizations.

A lift-and-shift approach serves as a stepping stone toward full OT modernization, allowing enterprises to gradually transition to a fully event-driven, real-time architecture.

3. Full OT Middleware Replacement: Cloud-Native, Scalable, and Future-Proof

Why?

Legacy OT middleware systems were designed for on-premise, batch-based, and proprietary environments, making them expensive, inflexible, and difficult to scale. As industries embrace cloud-native architectures, edge computing, and real-time analytics, replacing traditional OT middleware with event-driven streaming platforms enables greater flexibility, cost efficiency, and real-time operational intelligence.

A full OT middleware replacement eliminates vendor lock-in, outdated integration methods, and high-maintenance costs while enabling scalable, event-driven data processing that works across edge, on-premise, and cloud environments.

How?

Use Kafka and Flink as the Core Data Streaming Platform

Kafka replaces legacy data brokers and middleware storage by handling high-throughput event ingestion and real-time data distribution.
Flink provides advanced real-time analytics, anomaly detection, and predictive maintenance capabilities.
Process OT and IT data in real-time, eliminating batch-based limitations.

Replace Proprietary Connectors with Lightweight, Open Standards

Deploy MQTT or OPC UA gateways to enable seamless communication with sensors, PLCs, SCADA, and industrial controllers.
Eliminate complex, costly middleware like OSIsoft PI with low-latency, open-source integration.
Leverage Apache PLC4X for industrial protocol connectivity, avoiding proprietary vendor constraints.

Adopt a Cloud-Native, Hybrid, or On-Premise Storage Strategy

Store time-series data in scalable, purpose-built databases like InfluxDB or TimescaleDB.
Enable real-time query capabilities for monitoring, analytics, and AI-driven automation.
Ensure data availability across on-premise infrastructure, hybrid cloud, and multi-cloud deployments.

Modernize IT and Business Integrations

Enable seamless OT-to-IT integration with ERP, MES, BI, and AI/ML platforms.
Stream data directly into cloud-based analytics services, digital twins, and AI models.
Build real-time dashboards and event-driven applications for operators, engineers, and business stakeholders.

Result?

Fully event-driven and cloud-native OT architecture that eliminates legacy bottlenecks.
Real-time data streaming and processing across all industrial environments.
Scalability for high-throughput workloads, supporting edge, hybrid, and multi-cloud use cases.
Lower operational costs and reduced maintenance overhead by replacing proprietary, heavyweight OT middleware.
Future-ready, open, and extensible architecture built on Kafka, Flink, and industry-standard protocols.

By fully replacing OT middleware, organizations gain real-time visibility, predictive analytics, and scalable industrial automation, unlocking new business value while ensuring seamless IT/OT integration.

Helin is an excellent example for a cloud-native IT/OT data solution powered by Kafka and Flink to focus on real-time data integration and analytics, particularly in the context of industrial and operational environments. Its industry focus on maritime and energy sector, but this is relevant across all IIoT industries.

The next generation of OT architectures is being built on open standards, real-time streaming, and hybrid cloud.

Most new industrial sensors, machines, and control systems are now designed with Kafka, MQTT, and OPC UA compatibility.
Modern IT architectures demand event-driven data pipelines for AI, analytics, and automation.
Edge and hybrid computing require scalable, fault-tolerant, real-time processing.

Use Kafka Cluster Linking for seamless bi-directional data replication and command&control, ensuring low-latency, high-availability data synchronization across on-premise, edge, and cloud environments.

Enable multi-region and hybrid edge to cloud architectures with real-time data mirroring to allow organizations to maintain data consistency across global deployments while ensuring business continuity and failover capabilities.

It’s Time to Move Beyond Legacy OT Middleware to Open Standards like MQTT, OPC-UA, Kafka

The days of expensive, proprietary, and rigid OT middleware are numbered (at least for new deployments). Industrial enterprises need real-time, scalable, and open architectures to meet the growing demands of automation, predictive maintenance, and industrial IoT. By embracing open IoT and data streaming technologies, companies can seamlessly bridge the gap between Operational Technology (OT) and IT, ensuring efficient, event-driven communication across industrial systems.

MQTT, OPC-UA and Apache Kafka are a match in heaven for industrial IoT:

MQTT enables lightweight, publish-subscribe messaging for industrial sensors and edge devices.
OPC-UA provides secure, interoperable communication between industrial control systems and modern applications.
Kafka acts as the high-performance event backbone, allowing data from OT systems to be streamed, processed, and analyzed in real time.

Whether lifting and shifting, optimizing hybrid processing, or fully replacing legacy middleware, data streaming is the foundation for the next generation of OT and IT integration. With Kafka at the core, enterprises can decouple systems, enhance scalability, and unlock real-time analytics across the entire industrial landscape.

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And make sure to download my free book about data streaming use cases and industry success stories.

The post Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming appeared first on Kai Waehner.

Industrial IoT Middleware for Edge and Cloud OT/IT Bridge powered by Apache Kafka and Flink

Kai Waehner — Fri, 20 Sep 2024 06:48:31 +0000

As industries continue to adopt digital transformation, the convergence of Operational Technology (OT) and Information Technology (IT) has become essential. The OT/IT Bridge is a key concept in industrial automation to connect real-time operational processes with business-oriented IT systems ensuring seamless data flow and coordination. This integration plays a critical role in the Industrial Internet of Things (IIoT). It enables industries to monitor, control, and optimize their operations through real-time data synchronization and improve the Overall Equipment Effectiveness (OEE). By leveraging IIoT middleware and data streaming technologies like Apache Kafka and Flink, businesses can achieve a unified approach to managing both production processes and higher-level business operations to drive greater efficiency, predictive maintenance, and streamlined decision-making.

Industrial Automation – The OT/IT Bridge

An OT/IT Bridge in industrial automation refers to the integration between Operational Technology (OT) systems, which manage real-time industrial processes, and Information Technology (IT) systems, which handle data, business operations, and analytics. This bridge is crucial for modern Industrial IoT (IIoT) environments, as it enables seamless data flow between machines, sensors, and industrial control systems (PLC, SCADA) on the OT side, and business management applications (ERP, MES) on the IT side.

The OT/IT Bridge facilitates real-time data synchronization. It allows industries to monitor and control their operations more efficiently, implement condition monitoring/predictive maintenance, and perform advanced analytics. The OT/IT bridge helps overcome the traditional siloing of OT and IT systems by integrating real-time data from production environments with business decision-making tools. Data Streaming frameworks like Kafka and Flink, often combined with specialized platforms for the last-mile IoT integration, act as intermediaries to ensure data consistency, interoperability, and secure communication across both domains.

This bridge enhances overall productivity and improves the OEE by providing actionable insights that help optimize performance and reduce downtime across industrial processes.

OT/IT Hierarchy – Different Layers based on ISA-95 and the Purdue Model

The OT/IT Levels 0-5 framework is commonly used to describe the different layers in industrial automation and control systems, often following the ISA-95 or Purdue model:

Level 0: Physical Process: This is the most basic level, consisting of the physical machinery, equipment, sensors, actuators, and production processes. It represents the actual processes being monitored or controlled in a factory or industrial environment.
Level 1: Sensing and Actuation: At this level, sensors, actuators, and field devices gather data from the physical processes. This includes things like temperature sensors, pressure gauges, motors, and valves that interact directly with the equipment at Level 0.
Level 2: Control Systems: Level 2 includes real-time control systems such as Programmable Logic Controllers (PLCs) and Distributed Control Systems (DCS). These systems interpret the data from Level 1 and make real-time decisions to control the physical processes.
Level 3: Manufacturing Operations Management (MOM): This level manages and monitors production workflows. It includes systems like Manufacturing Execution Systems (MES), which ensure that production runs smoothly and aligns with the business’s operational goals. It bridges the gap between the physical operations and higher-level business planning.
Level 4: Business Planning and Logistics: This is the IT layer that includes systems for business management, enterprise resource planning (ERP), and supply chain management (SCM). These systems handle business logistics such as order processing, materials procurement, and long-term planning.
Level 5: Enterprise Integration: This level encompasses corporate-wide IT functions such as financial systems, HR, sales, and overall business strategy. It ensures the alignment of all operations with the broader business goals.

In summary, Levels 0-2 focus on the OT (Operational Technology) side—real-time control and monitoring of industrial processes, while Levels 3-5 focus on the IT (Information Technology) side—managing data, logistics, and business operations.

While the modern, cloud-native IIoT world is not strictly hierarchical anymore (e.g. there is also lots of edge computing like sensor analytics), these layers are still often used to separate functions and responsibilities. Industrial IoT data platforms, including the data streaming platform, often connect to several of these layers in a decoupled hub and spoke architecture.

Industrial IoT Middleware

Industrial IoT (IIoT) Middleware is a specialized software infrastructure designed to manage and facilitate the flow of data between connected industrial devices and enterprise systems. It acts as a mediator that connects various industrial assets, such as machines, sensors, and controllers, with IT applications and services such as MES or ERP, often in a cloud or on-premises environment.

This middleware provides a unified interface for managing the complexities of data integration, protocol translation, and device communication to enable seamless interoperability among heterogeneous systems. It often includes features like real-time data processing, event management, scalability to handle large volumes of data, and robust security mechanisms to protect sensitive industrial operations.

In essence, IIoT Middleware is critical for enabling the smart factory concept, where connected devices and systems can communicate effectively, allowing for automated decision-making, predictive maintenance, and optimized production processes in industrial settings.

By providing these services, IIoT Middleware enables industrial organizations to optimize operations, enhance Overall Equipment Effectiveness (OEE), and improve system efficiency through seamless integration and real-time data analytics.

Relevant Industries for IIoT Middleware

Industrial IoT Middleware is essential across various industries that rely on connected equipment, sensors or vehicles and data-driven processes to optimize operations. Some key industries where IIoT Middleware is particularly needed include:

Manufacturing: For smart factories, IIoT Middleware enables real-time monitoring of production lines, predictive maintenance, and automation of manufacturing processes. It supports Industry 4.0 initiatives by integrating machines, robotics, and enterprise systems.
Energy and Utilities: IIoT Middleware is used to manage data from smart grids, power plants, and renewable energy sources. It helps in optimizing energy distribution, monitoring infrastructure health, and improving operational efficiency.
Oil and Gas: In this industry, IIoT Middleware facilitates the remote monitoring of pipelines, drilling rigs, and refineries. It enables predictive maintenance, safety monitoring, and optimization of extraction and refining processes.
Transportation and Logistics: IIoT Middleware is critical for managing fleet operations, tracking shipments, and monitoring transportation infrastructure. It supports real-time data analysis for route optimization, fuel efficiency, and supply chain management.
Healthcare: In healthcare, IIoT Middleware connects medical devices, patient monitoring systems, and healthcare IT systems. It enables real-time monitoring of patient vitals, predictive diagnostics, and efficient management of medical equipment.
Agriculture: IIoT Middleware is used in precision agriculture to connect sensors, drones, and farm equipment. It helps in monitoring soil conditions, weather patterns, and crop health, leading to optimized farming practices and resource management.
Aerospace and Defense: IIoT Middleware supports the monitoring and maintenance of aircraft, drones, and defense systems. It ensures the reliability and safety of critical operations by integrating real-time data from various sources.
Automotive: In the automotive industry, IIoT Middleware connects smart vehicles, assembly lines, and supply chains. It enables connected car services, autonomous driving, and the optimization of manufacturing processes.
Building Management: For smart buildings and infrastructure, IIoT Middleware integrates systems like HVAC, lighting, and security. It enables real-time monitoring and control, energy efficiency, and enhanced occupant comfort.
Pharmaceuticals: In pharmaceuticals, IIoT Middleware helps monitor production processes, maintain regulatory compliance, and ensure the integrity of the supply chain.

These industries benefit from IIoT Middleware by gaining better visibility into their operations. The digitalization of shop floor and business processes improves decision-making and drives efficiency through automation and real-time data analysis.

Industrial IoT Middleware Layers in OT/IT

While modern, cloud-native IoT architectures don’t always use an hierarchical model anymore, Industrial IoT (IIoT) middleware typically operates at Level 3 (Manufacturing Operations Management) and Level 2 (Control Systems) in the OT/IT hierarchy.

At Level 3, IIoT middleware integrates data from control systems, sensors, and other devices, coordinating operations, and connecting these systems to higher-level IT layers such as MES and ERP systems. At Level 2, the middleware handles real-time data exchange between industrial control systems (like PLCs) and IT infrastructure, ensuring data flow and interoperability between the OT and IT layers.

This middleware acts as a bridge between the operational technology (OT) at Levels 0-2 and the business-oriented IT systems at Levels 4-5.

Edge and Cloud Vendors for Industrial IoT

The industrial IoT space provides many solutions from various software vendors. Let’s explore the different options and their trade-offs.

Traditional “Legacy” Solutions

Traditional Industrial IoT (IIoT) solutions are often characterized by proprietary, monolithic architectures that can be inflexible and expensive to implement and maintain. These traditional platforms, offered by established industrial vendors like PTC ThingWorx, Siemens MindSphere, GE Predix, and Osisoft PI, are typically designed to meet specific industry needs but may lack the scalability, flexibility, and cost-efficiency required for modern industrial applications. However, while these solutions are often called “legacy” do a solid job integrating with proprietary PLCs, SCADA systems and data historians. They still operate the shop floor in most factories worldwide.

Emerging Cloud Solutions

In contrast to legacy systems, emerging cloud-based IIoT solutions offer elastic, scalable, and (hopefully) cost-efficient alternatives that are fully managed by cloud service providers. These platforms, such as AWS IoT Core, enable industrial organizations to quickly deploy and scale IoT applications while benefiting from the cloud’s inherent flexibility, reduced operational overhead, and integration with other cloud services.

However, emerging cloud solutions for IIoT can face challenges:

Latency and real-time processing limitations, making them less suitable for time-sensitive industrial applications.
High network transfer cost from the edge to the cloud.
Security and compliance concerns arise when transferring sensitive operational data to the cloud, particularly in regulated industries.
Depending on reliable internet connectivity, which can be a significant drawback in remote or unstable environments.
Very limited connectivity to proprietary (legacy) protocols such as Siemens S7 or Modbus.

The IIoT Enterprise Architecture is a Mix of Vendors and Platforms

Threre is no black and white comparing different solutions. The current IIoT landscape in real world deployments features a mix of traditional industrial vendors and new cloud-native solutions. Companies like Schneider Electric’s EcoStruxure still provide robust industrial platforms, while newer entrants like AWS IoT Core are gaining traction due to their modern, cloud-centric approaches. The shift towards cloud solutions reflects the growing demand for more agile and scalable IIoT infrastructures.

The reality in the industrial space is that:

OT/IT is usually hybrid edge to cloud, not just cloud
Most cloud-only solutions do not provide the right security, SLAs, latency, cost
IoT is a complex space. “Just” a OPC-UA or MQTT connector is not sufficient in most scenarios.

Data Streaming for Industrial IoT in the OT/IT World with Kafka and Flink

Data streaming with Apache Kafka and Flink is a powerful approach that enables the continuous flow and processing of real-time data across various systems. However, to be clear: Data streaming is NOT a silver bullet. It is complementary to other IoT middleware. And some modern, cloud-native industrial software is built on top of data streaming technologies like Kafka and Flink under the hood.

In the context of Industrial IoT, data streaming plays a crucial role by seamlessly integrating and processing data from numerous IoT devices, equipment, PLCs, MES and ERP in real-time. This capability enhances decision-making processes and operational efficiency by providing continuous insights, allowing industries to optimize their operations and respond proactively to changing conditions. The last-mile integration is usually done by complementary IIoT technologies providing sophisticated connectivity to OPC-UA, MQTT and proprietary legacy protocols like S7 or Modbus.

Apache Kafka and Flink in IT Environments

In data center and cloud settings, Kafka and Flink are used to provide continuous processing and data consistency across IT applications including sales and marketing, B2B communication with partners, and eCommerce. Data streaming facilitates data integration, processing and analytics to enhance the efficiency and responsiveness of IT operations and business; no matter if data sources or sinks are real-time, batch or request-response APIs.

Apache Kafka as an OT/IT Bridge

Kafka serves as a critical bridge between Operational Technology (OT) and Information Technology (IT) by enabling real-time data synchronization at scale. This integration ensures data consistency across different systems, supporting seamless communication and coordination between industrial operations and business systems.

Apache Kafka and Flink in OT Edge Applications

At the edge of operational technology, Kafka and Flink provide a robust backbone for use cases such as condition monitoring and predictive maintenance. By processing data locally and in real-time, these technologies improve the Overall Equipment Effectiveness (OEE), and support advanced analytics and decision-making directly within industrial environments.

IoT Success Story: Industrial Edge Intelligence with Helin and Confluent

Helin is a company specializes in providing advanced data solutions focusing on real-time data integration and analytics, particularly in the context of industrial and operational environments. Its industry focus on maritime and energy sector, but this is relevant across all IIoT industries.

Helin presented about its Industrial Edge Intelligence Platform at Confluent’s Data in Motion Tour in Utrecht, Netherlands in. 2024. The IIoT platform includes capabilities for data streaming, processing, and visualization to help organizations leverage their data more effectively for decision-making and operational improvements.

Source: Helin

Helin’s platform bridges the OT and IT worlds by seamlessly integrating industrial edge analytics with multi-tenant cloud solutions:

Source: Helin

The above architecture diagram shows how Helin maps to the OT/IT hierarchy:

OT – 0,1,2,3
- 1: Sensors, Actuators, Field Devices
- 2: Remote I/O
- 3: Controller
DMZ / Gateway – 3.5
BIZ (= IT) – 4,5
- 4 OT Applications (MES, SCADA, etc)
- 5 – outside of Helin – IT Applications (ERP, CRM, DWH, etc)

The strategy and value of Helin’s IoT platform is relevant for most industrial organizations: Making dumb assets smart by extracting data in real-time and utilize AI to transform it into significant business value and actionable insights for the maritime & energy sectors.

Business Value: Fuel Reduction, Increased Revenue, Saving Human Lives

Helin presented three success stories with huge business value:

8% Fuel reduction: Helin’s platform reduced the fuel consumption for Boskalis 8% by delivering real-time insights to vessel operators offshore.
20% Revenue: An increase of revenue for the solar parks of Sunrock with 20% by optimizing their assets by the platform.
Saving human lives: Optimization of drilling operations while increasing the safety of the crew on oil rigs by reducing human errors.

Data Streaming with Kafka and Flink in the Cloud

Why does the Helin IoT Platform use Kafka? Helin brought up a few powerful arguments:

Flexibility towards the integration between the edge and the cloud
Different data streams at different velocity
- Slow cold storage data
- Real time streams for analytics
- Data base endpoint for visualization
Multi-cloud with a standardized streaming protocol
- Reduced code overhead by not having to build adapters
- Open platform so that customers can land their data anywhere
- Failover baked in

Helin’s Data Streaming Journey from Self-Managed Kafka to Serverless Confluent Cloud

Helin started with self-managed Kafka and cumbersome Python scripts…

Source: Helin

… and transitioned to fully managed Kafka in Confluent Cloud:

Source: Helin

As a next step, Helin is migrating from cumbersome and unreliable Python mappings to Apache Flink for scalable and reliable data processing.

Please note that the last-mile IoT connectivity at the edge (SCADA, PLC, etc.) is implemented with technologies like OPC-UA, MQTT or custom integrations. You can see a common best practice: Choose and combine the right tools for the job.

Data Streaming with Kafka and Flink Empowers Edge-to-Cloud Industrial IoT Middleware

Data streaming plays a crucial role in bridging OT and IT in industrial automation. By enabling continuous data flow between the edge and the cloud, Kafka and Flink ensure that both operational data from sensors and machinery, and IT applications like ERP and MES, remain synchronized in real-time. Additionally, data consistency with non-real-time systems like a legacy batch system or a cloud-native data lakehouse are guaranteed out-of-the-box.

The real-time integration powered by Kafka and Flink improves the overall operational efficiency (OEE) and enables specific use cases such as enhanced predictive maintenance, condition monitoring. As industries increasingly adopt edge computing alongside cloud solutions, these data streaming tools provide the scalability, flexibility, and low-latency performance needed to drive Industrial IoT initiatives forward.

Helin’s Industrial Edge Intelligence platform is an excellent example for an IIoT middleware. It leverages Apache Kafka and Flink to integrate real-time data from industrial assets and enabling predictive analytics and operational optimization. By using this platform, companies like Boskalis achieved 8% fuel savings, and Sunrock increased revenue by 20%. These real world scenarios demonstrate the platform’s ability to drive significant business value through real-time insights and decision-making in industrial projects.

How does your OT/IT integration look like today? Do you plan to optimize the infrastructure with data streaming? How does the hybrid architecture look like? What are the use cases? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Industrial IoT Middleware for Edge and Cloud OT/IT Bridge powered by Apache Kafka and Flink appeared first on Kai Waehner.

Apache Kafka Cluster Type Deployment Strategies

Kai Waehner — Mon, 29 Jul 2024 06:34:49 +0000

Organizations start their data streaming adoption with a single Apache Kafka cluster to deploy the first use cases. The need for group-wide data governance and security but different SLAs, latency, and infrastructure requirements introduce new Kafka clusters. Multiple Kafka clusters are the norm, not an exception. Use cases include hybrid integration, aggregation, migration, and disaster recovery. This blog post explores real-world success stories and cluster strategies for different Kafka deployments across industries.

Apache Kafka – The De Facto Standard for Event-Driven Architectures and Data Streaming

Apache Kafka is an open-source, distributed event streaming platform designed for high-throughput, low-latency data processing. It allows you to publish, subscribe to, store, and process streams of records in real time.

Kafka serves as a popular choice for building real-time data pipelines and streaming applications. The Kafka protocol became the de facto standard for event streaming across various frameworks, solutions, and cloud services. It supports operational and analytical workloads with features like persistent storage, scalability, and fault tolerance. Kafka includes components like Kafka Connect for integration and Kafka Streams for stream processing, making it a versatile tool for various data-driven use cases.

While Kafka is famous for real-time use cases, many projects leverage the data streaming platform for data consistency across the entire enterprise architecture, including databases, data lakes, legacy systems, Open APIs, and cloud-native applications.

Different Apache Kafka Cluster Types

Kafka is a distributed system. A production setup usually requires at least four brokers. Hence, most people automatically assume that all you need is a single distributed cluster you scale up when you add throughput and use cases. This is not wrong in the beginning. But…

One Kafka cluster is NOT the right answer for every use case. Various characteristics influence the architecture of a Kafka cluster:

Availability: Zero downtime? 99.99% uptime SLA? Non-critical analytics?
Latency: End-to-end processing in <100ms (including processing)? 10-minute end-to-end data warehouse pipeline? Time travel for re-processing historical events?
Cost: Value vs. cost? Total Cost of Ownership (TCO) matters! For instance, in the public cloud, networking can be up to 80% of the total Kafka cost!
Security and Data Privacy: Data privacy (PCI data, GDPR, etc.)? Data governance and compliance? End-to-end encryption on the attribute level? Bring your own key? Public access and data sharing? Air-gapped edge environment?
Throughput and Data Size: Critical transactions (typically low volume)? Big data feeds (clickstream, IoT sensors, security logs, etc.)?

Related topics like on-premise vs. public cloud, regional vs. global, and many other requirements also affect the Kafka architecture.

Apache Kafka Cluster Strategies and Architectures

A single Kafka cluster is often the right starting point for your data streaming journey. It can onboard multiple use cases from different business domains and process gigabytes per second (if operated and scaled the right way). However, depending on your project requirements, you need an enterprise architecture with multiple Kafka clusters. Here are a few common examples:

Hybrid Architecture: Data integration and uni- or bi-directional data synchronization between multiple data centers. Often, connectivity between an on-premise data center and a public cloud service provider. Offloading from legacy into cloud analytics is one of the most common scenarios. But command & control communication is also possible, i.e., sending decisions/recommendations/transactions into a regional environment (e.g., storing a payment or order from a mobile app in the mainframe).
Multi-Region / Multi-Cloud: Data replication for compliance, cost, or data privacy reasons. Data sharing usually only includes a fraction of the events, not all Kafka Topics. Healthcare is one of many industries that goes this direction.
Disaster Recovery: Replication of critical data in active-active or active-passive mode between different data centers or cloud regions. Includes strategies and tooling for fail-over and fallback mechanisms in the case of a disaster to guarantee business continuity and compliance.
Aggregation: Regional clusters for local processing (e.g., pre-processing, streaming ETL, stream processing business applications) and replication of curated data to the big data center or cloud. Retail stores are an excellent example.
Migration: IT modernization with a migration from on-premise into the cloud or from self-managed open source into a fully managed SaaS. Such migrations can be done with zero downtime or data loss while the business continues during the cut-over.
Edge (Disconnected / Air-Gapped): Security, cost, or latency require edge deployments, e.g. in a factory or retail store. Some industries deploy in safety-critical environments with unidirectional hardware gateway and data diode.
Single Broker: Not resilient, but sufficient for scenarios like embedding a Kafka broker into a machine or on an Industrial PC (IPC) and replicating aggregated data into a large cloud analytics Kafka cluster. One nice example is the installation of data streaming (including integration and processing) on a computer of a soldier on the battlefield.

Bridging Hybrid Kafka Clusters

These options can be combined. For instance, a single broker at the edge typically replicates some curated data to a remote data center. And hybrid clusters have such different architectures depending on how they are bridged: connections over public internet, private link, VPC peering, and transit gateway, etc.

Having seen the development of Confluent Cloud over the years, I totally underestimated how much engineering time needs to be spent on security and connectivity. However, missing security bridges are the main blocker for the adoption of a Kafka cloud service. So, there is no way around providing various security bridges between Kafka clusters beyond just public internet.

There are even use cases where organizations need to replicate data from the data center to the cloud but the cloud service is NOT allowed to initiative the connection. Confluent built a specific feature “source-initiated link” for such security requirements where the source (i.e., the on-premise Kafka cluster) always initiates the connection – even though the cloud Kafka clusters is consuming the data:

Source: Confluent

As you see, it gets complex quickly. Find the right experts to help you from the beginning; not after you already deployed the first clusters and applications.

A long time ago, I already described in a detailed presentation of the architecture patterns for distributed, hybrid, edge, and global Apache Kafka deployments. Look at that slide deck and video recording for more details about the deployment options and trade-offs.

RPO vs. RTO = Data Loss vs. Downtime

RPO and RTO are two critical KPIs you need to discuss before deciding for a Kafka cluster strategy:

RPO (Recovery Point Objective) is the maximum acceptable amount of data loss measured in time, indicating how frequently backups should occur to minimize data loss.
RTO (Recovery Time Objective) is the maximum acceptable duration of time it takes to restore business operations after a disruption. Together, they help organizations plan their data backup and disaster recovery strategies to balance cost and operational impact.

While people often start with the goal of RPO = 0 and RTO = 0, they quickly realize how hard (but not impossible) it is to get this. You need to decide how much data are you okay to lose in a disaster? You need a disaster recovery plan if disaster strikes. The legal and compliance teams will have to tell you if it is okay to lose a few data sets in case of disaster or not. These any many other challenges need to be discussed when evaluating your Kafka cluster strategy.

The replication between Kafka clusters with tools like MIrrorMaker or Cluster Linking is asynchronous and RPO > 0. Only a stretched Kafka cluster provides RPO = 0.

Stretched Kafka Cluster – Zero Data Loss with Synchronous Replication across Data Centers

Most deployments with multiple Kafka clusters use asynchronous replication across data centers or clouds via tools like MirrorMaker or Confluent Cluster Linking. This is good enough for most use cases. But in case of a disaster, you lose a few messages. The RPO is > 0.

A stretched Kafka cluster deploys Kafka brokers of ONE SINGLE CLUSTER across three data centers. The replication is synchronous (as this is how Kafka replicates data within one cluster) and guarantees zero data loss (RPO = 0) – even in the case of a disaster!

Why shouldn’t you always do stretched clusters?

Low latency (<~50ms) and stable connection required between data centers
Three (!) data centers are needed, two is not enough as a majority (quorum) must acknowledge writes and reads to ensure the system’s reliability
Hard to set up, operate, and monitor – much harder than a cluster running in one data center
Cost vs. value is not worth it in many use cases – during a real disaster, most organizations and use cases have bigger problems than losing a few messages (even if it is critical data like a payment or order).

To be clear: In the public cloud, a region usually has three data centers (= availability zones). Hence, in the cloud, it depends on your SLAs if one cloud region counts as a stretched cluster or not. Most SaaS Kafka offerings deploy in a stretched cluster here. However, many compliance scenarios do NOT see a Kafka cluster in one cloud region as good enough for guaranteeing SLAs and business continuity if a disaster strikes.

Confluent built a dedicated product to solve (some of) these challenges: Multi-Region Clusters (MRC). It provides capabilities to do synchronous and asynchrounous replication within a stretched Kafka cluster.

For example, in a financial services scenario, MRC replicates low-volume critical transactions synchronously, but high-volume logs asynchronously:

handles ‘Payment’ transactions enter from us-east and us-west with fully synchronous replication
‘Log’ and ‘Location’ information in the same cluster use async – optimized for latency
Automated disaster recovery (zero downtime, zero data loss)

More details about stretched Kafka clusters vs. active-active / active-passive replication between two Kafka clusters in my global Kafka presentation.

Pricing of Kafka Cloud Offerings (vs. Self-Managed)

The above sections explain why you need to consider different Kafka architectures depending on your project requirements. Self-managed Kafka clusters can be configured the way you need. In the public cloud, fully managed offerings look different (the same way as any other fully managed SaaS). Pricing is different because SaaS vendors need to configure reasonable limits. The vendor has to provide specific SLAs.

The data streaming landscape includes various Kafka cloud offerings. Here is an example of Confluent’s current cloud offerings, including multi-tenant and dedicated environments with different SLAs, security features, and cost models.

Source: Confluent

Make sure to evaluate and understand the various cluster types from different vendors available in the public cloud, including TCO, provided uptime SLAs, replication costs across regions or cloud providers, and so on. The gaps and limitations are often intentionally hidden in the details.

For instance, if you use Amazon Managed Streaming for Apache Kafka (MSK), you should be aware that the terms and conditions tell you that “The service commitment does not apply to any unavailability, suspension or termination … caused by the underlying Apache Kafka or Apache Zookeeper engine software that leads to request failures”.

But pricing and support SLAs are just one critical piece of such a comparison. There are lots of “build vs. buy” decisions you have to make as part of evaluating a data streaming platform, as I pointed out in my detailed article comparing Confluent to Amazon MSK Serverless.

Kafka Storage – Tiered Storage and Iceberg Table Format to Store Data Only Once

Apache Kafka added Tiered Storage to separate compute and storage. The capability enables more scalable, reliable, and cost-efficient enterprise architectures. Tiered Storage for Kafka enables a new Kafka cluster type: Storing Petabytes of data in the Kafka commit log in a cost-efficient way (like in your data lake) with timestamps and guaranteed ordering to travel back in time for re-processing historical data. KOR Financial is a nice example of using Apache Kafka as a database for long-term persistence.

Kafka enables a Shift Left Architecture to store data only once for operational and analytical datasets:

With this in mind, think again about the use cases I described above for multiple Kafka clusters. Should you still replicate data in batch at rest in the database, data lake, or lakehouse from one data center or cloud region to another? No. You should synchronize data in real-time, store the data once (usually in an object store like Amazon S3), and then connect all analytical engines like Snowflake, Databricks, Amazon Athena, Google Cloud BigQuery, and so on to this standard table format.

Learn more about the unification of operational and analytical data in my article “Apache Iceberg – The Open Table Format for Lakehouse AND Data Streaming“.

Real-World Success Stories for Multiple Kafka Clusters

Most organizations have multiple Kafka clusters. This section explores four success stories across different industries:

Paypal (Financial Services) – US: Instant payments, fraud prevention.
JioCinema (Telco/Media) – APAC: Data integration, clickstream analytics, advertisement, personalization.
Audi (Automotive/Manufacturing) – EMEA: Connected cars with critical and analytical requirements.
New Relic (Software/Cloud) – US: Observability and application performance management (APM) across the world.

Paypal – Separation by Security Zone

PayPal is a digital payment platform that allows users to send and receive money online securely and conveniently around the world in real time. This requires a scalable, secure and compliant Kafka infrastructure.

During the 2022 Black Friday, Kafka traffic volume peaked at about 1.3 trillion messages daily! At present, PayPal has 85+ Kafka clusters, and every holiday season they flex up their Kafka infrastructure to handle the traffic surge. The Kafka platform continues to seamlessly scale to support this traffic growth without any impact on their business.

Today, PayPal’s Kafka fleet consists of over 1,500 brokers that host over 20,000 topics. The events are replicated among the clusters, offering 99.99% availability.

Kafka cluster deployments are separated into different security zones within a data center:

Source: Paypal

The Kafka clusters are deployed across these security zones, based on data classification and business requirements. Real-time replication with tools such as MirrorMaker (in this example, running on Kafka Connect infrastructure) or Confluent Cluster Linking (using a simpler and less error-prone approach directly using the Kafka protocol for replication) is used to mirror the data across the data centers, which helps with disaster recovery and to achieve inter-security zone communication.

JioCinema – Separation by Use Case and SLA

JioCinema is a rapidly growing video streaming platform in India. The telco OTT service is known for its expansive content offerings, including live sports like the Indian Premier League (IPL) for cricket, a newly launched Anime Hub, and comprehensive plans for covering major events like the Paris 2024 Olympics.

The data architecture leverages Apache Kafka, Flink, and Spark for data processing, as presented at Kafka Summit India 2024 in Bangalore:

Source: JioCinema

Data streaming plays a pivotal role in various use cases to transform user experiences and content delivery. Over ten million messages per second enhance analytics, user insights, and content delivery mechanisms.

JioCinema’s use cases include:

Inter Service Communication
Clickstream / Analytics
Ad Tracker
Machine Learning and Personalization

Kushal Khandelwal, Head of Data Platform, Analytics, and Consumption at JioCinema, explained that not all data is equal and the priorities and SLAs differ per use case:

Source: JioCinema

Data streaming is a journey. Like so many other organizations worldwide, JioCinema started with one large Kafka cluster using 1000+ Kafka Topics and 100,000+ Kafka Partitions for various use cases. Over time, a separation of concerns regarding use cases and SLAs developed into multiple Kafka clusters:

Source: JioCinema

The success story of JioCinema shows the common evolution of a data streaming organization. Let’s now explore another example where two very different Kafka clusters were deployed from the beginning for one use case.

Audi – Operations vs. Analytics for Connected Cars

The car manufacturer Audi provides connected cars featuring advanced technology that integrates internet connectivity and intelligent systems. Audi’s cars enable real-time navigation, remote diagnostics, and enhanced in-car entertainment. These vehicles are equipped with Audi Connect services. Features include emergency calls, online traffic information, and integration with smart home devices, to enhance convenience and safety for drivers.

Source: Audi

Audi presented their connected car architecture in the keynote of Kafka Summit in 2018. The Audi enterprise architecture relies on two Kafka clusters with very different SLAs and use cases.

Source: Audi

The Data Ingestion Kafka cluster is very critical. It needs to run 24/7 at scale. It provides last-mile connectivity to millions of cars using Kafka and MQTT. Backchannels from the IT side to the vehicle help with service communication and over-the-air updates (OTA).

ACDC Cloud is the analytics Kafka cluster of Audi’s connected car architecture. The cluster is the foundation of many analytical workloads. These process enormous volumes of IoT and log data at scale with batch processing frameworks, like Apache Spark.

This architecture was already presented in 2018. Audi’s slogan “Progress through Technology” shows how the company applied new technology for innovation long before most car manufacturers deployed similar scenarios. All sensor data from the connected cars is processed in real time and stored for historical analysis and reporting.

New Relic – Worldwide Multi-Cloud Observability

New Relic is a cloud-based observability platform that provides real-time performance monitoring and analytics for applications and infrastructure to customers around the world.

Andrew Hartnett, VP of Software Engineering, at New Relic explains how data streaming is crucial for the entire business model of New Relic:

“Kafka is our central nervous system. It is a part of everything that we do. Most services across 110 different engineering teams with hundreds of services touch Kafka in some way, shape, or form in our company, so it really is mission-critical. What we were looking for is the ability to grow, and Confluent Cloud provided that.”

New Relic ingested up to 7 billion data points per minute; on track to ingest 2.5 exabytes of data in 2023. As New Relic expands its multi-cloud strategies, teams will use Confluent Cloud for a single pane of glass view across all environments.

“New Relic is multi-cloud. We want to be where our customers are. We want to be in those same environments, in those same regions, and we wanted to have our Kafka there with us.” says Artnett in a Confluent case study.

Multiple Kafka Clusters are the Norm; Not an Exception!

Event-driven architectures and stream processing have existed for decades. The adoption grows with open source frameworks like Apache Kafka and Flink in combination with fully managed cloud services. More and more organizations struggle with their Kafka scale. Enterprise-wide data governance, center of excellence, automation of deployment and operations, and enterprise architecture best practices help to successfully provide data streaming with multiple Kafka clusters for independent or collaborating business domains.

Multiple Kafka clusters are the norm, not an exception. Use cases such as hybrid integration, disaster recovery, migration or aggregation enable real-time data streaming everywhere with the needed SLAs.

How does your enterprise architecture look like? How many Kafka clusters do you have? And how do you decide about data governance, separation of concerns, multi-tenancy, security, and similar challenges in your data streaming organization? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka Cluster Type Deployment Strategies appeared first on Kai Waehner.

Energy Trading with Apache Kafka and Flink

Kai Waehner — Fri, 28 Jun 2024 02:30:09 +0000

Energy trading and data streaming are connected because real-time data helps traders make better decisions in the fast-moving energy markets. This data includes things like price changes, supply and demand, smart IoT meters and sensors, and weather, which help traders react quickly and plan effectively. As a result, data streaming with Apache Kafka and Apache Flink makes the market clearer, speeds up information sharing, and improves forecasting and risk management. This blog post explores the use cases and architectures for scalable and reliable real-time energy trading, including real-world deployments from Uniper, re.alto and Powerledger.

What is Energy Trading?

Energy trading is the process of buying and selling energy commodities in order to manage risk, optimize costs, and ensure the efficient distribution of energy. Commodities traded include:

Electricity: Traded in wholesale markets to balance supply and demand.
Natural Gas: Bought and sold for heating, electricity generation, and industrial use.
Oil: Crude oil and refined products like gasoline and diesel are traded globally.
Renewable Energy Certificates (RECs): Represent proof that energy was generated from renewable sources.

Market Participants:

Producers: Companies that extract or generate energy.
Utilities: Entities that distribute energy to consumers.
Industrial Consumers: Large energy users that purchase energy directly.
Traders and Financial Institutions: Participants that buy and sell energy contracts for profit or risk management.

Objectives of Energy Trading

The objectives for energy trading are risk management (hedging against price volatility and supply disruptions), cost optimization (securing energy at the best possible prices) and revenue generation (profiting from price differences in different markets).

Market types include:

Spot Markets: Immediate delivery and payment of energy commodities.
Futures Markets: Contracts to buy or sell a commodity at a future date, helping manage price risks.
Over-the-Counter (OTC) Markets: Direct trades between parties, often customized contracts.
Exchanges: Platforms like the New York Mercantile Exchange (NYMEX) and Intercontinental Exchange (ICE) where standardized contracts are traded.

Energy trading is subject to extensive regulation to ensure fair practices, prevent market manipulation, and protect consumers.

What is Data Streaming with Apache Kafka and Flink?

Data streaming with Apache Kafka and Flink provides a unique combination of capabilities:

Real-time messaging at scale for analytical and transactional workloads.
Event store for durability, true decoupling and the ability to travel back in time for replayability of events with guaranteed ordering.
Data integration with any data source and sink (real-time, near real-time, batch, request response APIs, files, etc.)
Stream processing for stateless and stateful correlations of data for streaming ETL and business applications

Trading Architecture with Apache Kafka

Many trading markets use data streaming with Apache Kafka under the hood to integrate with internal systems, external exchanges and data providers, clearing houses and regulators:

Source: Confluent

For instance, NASDAQ combines critical stock exchange trading with low-latency streaming analytics. This is not much different for energy trading, even though the interfaces and challenges differ a bit as additionally various IoT data sources are involved.

Why Apache Kafka and Flink for Energy Trading?

Data streaming with Apache Kafka and Apache Flink is highly beneficial for energy trading for several reasons across the end-to-end business process and data pipelines.

Here is why these technologies are often used in the energy sector:

Real-Time Data Processing

Real-Time Analytics: Energy trading relies on real-time data to make informed decisions. Kafka and Flink can process data streams in real-time, providing immediate insights into market conditions, energy consumption and production levels.

Immediate Response: Real-time processing allows traders to respond instantly to market changes, such as price fluctuations or sudden changes in supply and demand, optimizing trading strategies and mitigating risks.

Scalability and Performance

Scalability: Both Kafka and Flink handle high-throughput data streams. This scalability is crucial for energy markets, which generate vast amounts of data from multiple sources, including sensors, smart meters, and market feeds.

High Performance: Data streaming enables fast data processing and analysis. Kafka ensures low-latency data ingestion, while Flink provides efficient, distributed stream processing.

Fault Tolerance and Reliability

Fault Tolerance: Kafka’s distributed architecture ensures data durability and fault tolerance, essential for the continuous operation of energy trading systems.

Reliability: Flink offers exactly-once processing semantics, ensuring that each piece of data is processed accurately without loss or duplication, which is critical for maintaining data integrity in trading operations.

Integration and Flexibility

Integration Capabilities: Kafka can integrate with various data sources and sinks via Kafka Connect or Client APIs like Java, C, C++, Python, JavaScript or REST/HTTP. This making it versatile for collecting data from different energy systems. Flink can process this data in real-time and output the results to various storage systems or dashboards.

Flexible Data Processing: Flink supports complex event processing, windowed computations, and machine learning, allowing for advanced analytics and predictive modeling in energy trading.

Event-Driven Architecture (EDA)

Event-Driven Processing: Energy trading can benefit from an event-driven architecture where trading decisions and alerts are triggered by specific events, such as market price thresholds or changes in energy production. Kafka and Flink facilitate this approach by efficiently handling event streams.

Energy Trading at Uniper

Uniper is a global energy company headquartered in Germany that focuses on power generation, energy trading, and energy storage solutions, providing electricity, natural gas, and other energy services to customers worldwide.

Source: Uniper

Uniper’s Business Value of Data Streaming

Why has Uniper chosen to use the Apache Kafka and Apache Flink ecosystem? If you look at the trade lifecycle in the energy sector, you probably can find out by yourself:

Source: Uniper

The underlying process is much more complex than the above picture. For instance, pre-trading including aspects like capacity management. If you trade energy between the Netherlands and Germany, the transportation of the energy needs to be planned while executing the trade. Uniper explained the process in much more details in the below webinar recording.

Here are Uniper’s benefits of implementing the trade lifecycle with data streaming using Kafka and Flink, as they described them:

Business-driven:

Increase of trading volumes
More messages per day
Faster processing of data

Architecture-driven:

Decoupling of applications
Faster processing of data – batch vs. streaming data
Reusability of data

Uniper’s IT Landscape

Uniper’s enterprise architecture leverages data streaming as central nervous systems between various technical platforms (integrated via Kafka Connect or Apache Camel) and business applications (e.g., algorithmic trading, dispatch and invoicing systems).

Source: Uniper

Uniper runs mission-critical workloads through Kafka. Confluent Cloud provides the right scale, elasticity, and SLAs for such use cases. Apache Flink serves ETL use cases for continuous stream processing.

Kafka Connect provides many connectors for direct integration with (non)streaming interfaces. Apache Camel is used for some other protocols that do not fit well into a native Kafka connector. Camel is an integration framework with native Kafka integration.

Fun fact: If you did not know: I have a history with Apache Camel, too. I worked a lot with this open source framework as independent consultant and at Talend with its Enterprise Service Bus (ESB) powered by Apache Camel. Hence, my blog has some articles about Apache Camel, too. Including: “When to use Apache Camel vs. Apache Kafka?”

Webinar Recording: Energy Trading with Kafka and Flink @ Uniper

The following on-demand webinar recording explores the relation between data streaming and energy trading in more detail. Uniper’s Alex Esseling (Platform & Security Architect, Sales & Trading IT) discusses Apache Kafka and Flink inside energy trading at Uniper:

Source: Confluent

IoT Data for Energy Trading

Energy trading differs a bit from traditional trading on Nasdaq and similar financial markets as IoT data is an important additional data source for several key reasons:

1. Real-Time Market Insights

Live Data Feed: IoT devices, such as smart meters and sensors, provide real-time data on energy production, consumption, and grid status, enabling traders to make informed decisions based on the latest market conditions.
Demand Forecasting: Accurate demand forecasting relies on real-time consumption data, which IoT devices supply continuously, helping traders anticipate market movements and adjust their strategies accordingly.

2. Enhanced Decision Making

Predictive Analytics: IoT data allows for sophisticated predictive analytics, helping traders forecast price trends, identify potential supply disruptions, and optimize trading positions.
Risk Management: Continuous monitoring of energy infrastructure through IoT sensors helps in identifying and mitigating risks, such as equipment failures or grid imbalances, which could affect trading decisions.

A typical use case in energy trading might involve:

Data Collection: IoT devices across the energy grid collect data on energy production from renewable sources, consumption patterns in residential and commercial areas, and grid stability metrics.
Data Analysis: This data is streamed and processed in real-time using platforms like Apache Kafka and Flink, enabling immediate analysis and visualization.
Trading Decisions: Traders use the insights derived from this analysis to make informed decisions about buying and selling energy, optimizing their strategies based on current and predicted market conditions.

In summary, IoT data is essential in energy trading for providing real-time insights, enhancing decision-making, optimizing grid operations, ensuring compliance, and integrating renewable energy sources, ultimately leading to a more efficient and responsive energy market.

Data Streaming to Ingest IoT Data into Energy Trading

Data streaming with Kafka and Flink is deployed in various edge and hybrid cloud energy use cases.

As discussed above, some of the IoT data is very helpful for energy trading, not just for OT and operational workloads. Read about data streaming in the IoT space in the following articles:

Powerledger – Energy Trading with Kafka, MongoDB and Blockchain

Powerledger is another excellent success story for energy trading powered by data streaming with Apache Kafka. The technology company uses blockchain to enable decentralized energy trading. Their platform allows users to trade energy directly with each other, manage renewable energy certificates, and track the origin and movement of energy in real-time.

The platform provides

Tracking, tracing and trading of renewable energy
Blockchain-based energy trading platform
Facilitating peer-to-peer (P2P) trading of excess electricity from rooftop solar power installations and virtual power plants
Traceability with non-fungible tokens (NFTs) representing renewable energy certificates (RECs)

Powerledger uses a decentralised rather than the conventional unidirectional market. Benefits include reduced customer acquisition costs, increased customer satisfaction, better prices for buyers and sellers (compared with feed-in and supply tariffs), and provision for cross-retailer trading.

Apache Kafka via Confluent Cloud as a core piece of infrastructure, specifically to ingest data from smart electricity meters and feed it into the trading system.

Wondering why to combine Kafka and Blockchain? Learn more here: “Apache Kafka and Blockchain – Comparison and a Kafka-native Implementation“.

re.alto – Solar Trading: Insights into the Production of Solar Plants

re.alto is a company that provides a digital marketplace for energy data and services. Their platform connects energy providers, consumers, and developers, facilitating the exchange of data and APIs (application programming interfaces) to optimize energy usage and distribution. By enabling seamless access to diverse energy-related data, re.alto supports innovation, enhances energy efficiency, and helps create smarter, more flexible energy systems.

re.alto presented their data streaming use cases at the Data in Motion Tour in Brussels, Belgium:

Xenn: Real time monitoring of energy costs
Smart charging: Schedule the charging of an electric vehicle to reduce costs or environmental impact
Solar trading: Insights into the production of solar plants

Let’s explore solar trading in more detail. re.alto presented about its platform at the Confluent Data in Motion Tour 2024 in Brussels, Belgium. re.alto’s platform provides connectivity and APIs for market pricing data but also IoT integration to smart meters, grid data, batteries, SCADA systems, etc.:

Source: re.alto

Solar trading includes three steps:

Data collection from sources such as SMA, FIMER, HUAWEI, solar edge
Data processing with data streaming, time series analytics and overvoltage detection
Providing data via ePox Spot and APIs / marketplace

Energy Trading Needs Reliable Real-Time Data Feeds, Connectivity and Reliability

Energy trading requires scalable and reliable real-time data processing. In contrary to trading in financial markets, the energy sector additional integrates IoT data sources like smart meters and sensors.

Uniper, re.alto and Powerledger are excellent examples of how to build a reliable energy trading platform powered by data streaming.

How does your enterprise architecture look like for energy trading? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Energy Trading with Apache Kafka and Flink appeared first on Kai Waehner.

Apache Kafka in Manufacturing at Automotive Supplier Brose for Industrial IoT Use Cases

Kai Waehner — Thu, 13 Jun 2024 07:15:57 +0000

Data streaming unifies OT/IT workloads by connecting information from sensors, PLCs, robotics and other manufacturing systems at the edge with business applications and the big data analytics world in the cloud. This blog post explores how the global automotive supplier Brose deploys a hybrid industrial IoT architecture using Apache Kafka in combination with Eclipse Kura, OPC-UA, MuleSoft and SAP.

Data Streaming and Industrial IoT / Industry 4.0

Data streaming with Apache Kafka plays a critical role in Industrial IoT by enabling real-time data ingestion, processing, and analysis from various industrial devices and sensors. Kafka’s high throughput and scalability ensure that it can reliably handle and integrate massive streams of data from IoT devices into analytics platforms for valuable insights. This real-time capability enhances predictive maintenance, operational efficiency, and overall automation in industrial settings.

Here is an exemplary hybrid industrial IoT architecture with data streaming at the edge in the factory and 5G supply chain environments synchronizing in real-time with business applications and analytics / AI platforms in the cloud:

Brose – A Global Automotive Supplier

Brose is a global automotive supplier headquartered in beautiful Franconia, Bavaria, Germany. The company has a global presence with 70 locations, 25 countries, 5 continents, and about 30,000 employees.

Brose specializes in mechatronic systems for vehicle doors, seats, and electric motors. They develop and manufacture innovative products that enhance vehicle comfort, safety, and efficiency, serving major car manufacturers worldwide.

Source: Brose

Brose’s Hybrid Architecture for Industry 4.0 with Eclipse Kura, OPC UA, Kafka, SAP and MuleSoft

Brose is an excellent example of combining data streaming using Confluent with other technologies like open source Eclipse Kura and OPC-UA for the OT and edge site, and IT infrastructure and cloud software like SAP, Splunk, SQL Server, AWS Kinesis and MuleSoft:

Source: Brose

Here is how it works according to Sven Matuschzik, Director of IT-Platforms and Databases at Brose:

Regional Kafka on-premise clusters are embedded within the IIoT and production platform, facilitating seamless data flow from the shop floor to the business world in combination with other integration tools. This hybrid IoT streaming architecture connects machines to the IT infrastructure, mastering various technologies, and ensuring zero trust security with micro-segmentation. It manages latencies between sites and central IT, enables two-way communication between machines and the IT world, and maintains high data quality from the shop floor.

For more insights from Brose (and Siemens) about IoT and data streaming with Apache Kafka, listen to the following interactive discussion.

Interactive Discussion with Siemens and Brose about Data Streaming and IoT

Brose and Siemens discussed with me

the practical strategies employed by Brose and Siemens to integrate data streaming in IoT for real-time data utilization.
the challenges faced by both companies in embracing data streaming, and reveal how they overcame barriers to maximize their potential with a hybrid cloud infrastructure.
how these enterprise architectures will be expanded, including real-time data sharing with customers, partners, and suppliers, and the potential impact of artificial intelligence (AI), including GenAI, on data streaming efforts, providing valuable insights to drive business outcomes and operational efficiency.
the significance of event-driven architectures and data streaming for enhanced manufacturing processes to improve overall equipment effectiveness (OEE) and seamlessly integrate with existing IT systems like SAP ERP and Salesforce CRM to optimize their operations.

Here is the video recording with Stefan Baer from Siemens and Sven Matuschzik from Brose:

Source: Confluent

Data Streaming with Apache Kafka to Unify Industrial IoT Workloads from Edge to Cloud with Apache Kafka

Many manufacturers leverage data streaming powered by Apache Kafka to unify the OT/IT world from edge sites like factories to the data center or public cloud for analytics and business applications.

I wrote a lot about data streaming with Apache Kafka and Flink in Industry 4.0, Industrial IoT and OT/IT modernization. Here are a few of my favourite articles:

How does your IoT architecture look like? Do you already use data streaming? What are the use cases and enterprise architecture? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka in Manufacturing at Automotive Supplier Brose for Industrial IoT Use Cases appeared first on Kai Waehner.

ARM CPU for Cost-Effective Apache Kafka at the Edge and Cloud

Kai Waehner — Thu, 22 Feb 2024 13:22:35 +0000

ARM CPUs often outperform x86 CPUs in scenarios requiring high energy efficiency and lower power consumption. These characteristics make ARM preferred for edge and cloud environments. This blog post discusses the benefits of using Apache Kafka alongside ARM CPUs for real-time data processing in edge and hybrid cloud setups, highlighting energy-efficiency, cost-effectiveness, and versatility. A wide range of use cases are explored across industries, including manufacturing, retail, smart cities and telco.

Apache Kafka at the Edge and Hybrid Cloud

Apache Kafka is a distributed event streaming platform that enables building real-time streaming data pipelines and applications by providing capabilities for publishing, subscribing to, storing, and processing streams of records in a scalable and fault-tolerant way.

Various examples exist for Kafka deployments on the edge. These use cases are related to several of the above categories and requirements, such as low hardware footprint, disconnected offline processing, hundred of locations, and hybrid architectures.

Use Cases for Apache Kafka at the Edge

I have worked with enterprises across industries and the globe on the following scenarios:

Public Sector: Local administration in each city, smart city projects including public transportation, traffic management, integration of various connected car platforms from different carmakers, cybersecurity (including IoT use cases such as capturing and processing camera images)
Transportation / Logistics / Railway / Aviation: Track and trace, Kafka in the trains for offline and local processing / storage, traveller information (delayed or canceled flight / train / bus), real-time loyalty platforms (class upgrade, lounge access)
Manufacturing (Automotive, Aerospace, Semiconductors, Chemical, Food, and others): IoT aftermarket customer services, OEM in machines and vehicles, embedding into standard software such as ERP or MES systems, cybersecurity, a digital twin of devices/machines/production lines/processes, production line monitoring in factories for predictive maintenance/quality control/production efficiency, operations dashboards and line wellness (on-site for the plant manager, and aggregated global KPIs for executive management), track&trace and geofencing on the shop floor
Energy / Utility / Oil & Gas: Smart home, smart buildings, smart meters, monitoring of remote machines (e.g., for drilling, windmills, mining), pipeline and refinery operations (e.g., predictive failure or anomaly detection)
Telecommunications / Media: OSS real-time monitoring/problem analysis/metrics reporting/root cause analysis/action response of the network devices and infrastructure (routers, switches, other network devices), BSS customer experience and OTT services (mobile app integration for millions of users), 5G edge (e.g., street sensors)
Healthcare: Track & trace in the hospital, remote monitoring, machine sensor analytics
Retailing / Food / Restaurants / Banking: Customer communication, cross-/up-selling, loyalty system, payments in retail stores, perpetual inventory, Point-of-Sale (PoS) integration for (local) payments and (remote) CRM integration, EFTPOS (Electronic funds transfer at point of sale)

Benefits for Kafka at the Edge AND in the Cloud

Deploying the same technology in hybrid environments is not a new idea. Project teams see tremendous benefits when using Kafka at the edge and in the data center or cloud:

Same APIs, concepts, development tools and testing
Same architecture for streaming, storing, processing and connecting systems, even if at very different scale
Real-time synchronization between multiple environments included out-of-the-box via the Kafka protocol

Let’s explore how ARM CPUs fit into this discussion.

What is ARM CPU?

An ARM CPU refers to a family of CPUs based on the Advanced RISC Machine (ARM) architecture, which is a type of Reduced Instruction Set Computing (RISC) architecture. ARM CPUs ave a reputation for their high performance, power efficiency, and low cost. These characteristics make them particularly popular in mobile devices such as smartphones, tablets, and an increasingly wide range of other devices like IoT (Internet of Things) gadgets, servers, and even desktop computers.

The ARM architecture performs operations with a smaller number of computer instructions, allowing it to achieve high performance with lower power consumption compared to more complex instruction set computing (CISC) architectures like x86 used by Intel and AMD CPUs. This efficiency is a key advantage for battery-powered devices, where energy conservation is critical.

ARM Holdings, the company behind the ARM architecture, does not manufacture CPUs but licenses the architecture to other companies. These companies can then implement their own ARM-based processors, potentially customizing them for specific needs. This licensing model has led to a wide adoption of ARM processors across various segments of the technology industry.

ARM32 vs. ARM64

ARM architectures come in different versions, primarily distinguished by their instruction set architectures and addressing capabilities. The most commonly referenced are ARMv7 and ARMv8 (also called AArch64) correspond to 32-bit and 64-bit processing capabilities, respectively.

Newer hardware for industrial PCs and home computers incorporates ARMv8 (64-bit). It is the foundation for smartphones, tablets, servers, and processors like Apple’s A-series chips in iPhones and iPads. Even the cloud providers use the ARM architecture to build new processors for cloud computing, like Amazon’s Graviton. ARMv8 processors can run both 32-bit and 64-bit applications, offering greater versatility and performance.

Key Features and Benefits of ARM CPUs

The key features and benefits of ARM CPUs include:

Power Efficiency: Their design allows for significant power savings, extending battery life in portable devices.
Performance: While historically seen as less powerful than their x86 counterparts, modern ARM processors offer competitive performance, especially in multi-core configurations.
Customization: Companies can license the ARM architecture and customize their own chips, allowing for optimized processors that meet specific product requirements.
Ecosystem: A broad adoption across mobile, embedded, and increasingly in server and desktop markets ensures a robust ecosystem of software and development tools.

ARM CPUs are central to the development of mobile computing and are becoming more important in other areas, including edge computing, data centers, and as part of the shift towards more energy-efficient computing solutions.

Why ARM CPUs at the Edge (e.g., for Industrial IoT)?

ARM architecture is favored for edge computing, including Industrial IoT. It provides high power efficiency and performance within compact form factors. These characteristics ensure devices can handle compute-intensive tasks locally. Only relevant data is transmitted to the cloud, which saves bandwidth and decreases latency.

The efficiency of ARM CPUs is crucial for industrial applications where real-time processing and long battery life are essential. ARM’s versatility and low power consumption make it ideal for the diverse needs of edge computing in various environments.

For instance, in manufacturing, ARM-powered sensors on machines enable predictive maintenance by monitoring conditions like vibration and temperature. These sensors process data locally, offering real-time alerts on potential failures, reducing downtime, and saving costs. ARM’s efficiency supports widespread deployment, making it ideal for continuous, autonomous monitoring in industrial environments.

Why ARM in the Cloud?

ARM’s efficiency and performance advantages are driving its adoption in cloud computing. ARM-based processors, like Amazon’s AWS Graviton, offer an attractive mix of high performance and lower power consumption compared to traditional x86 CPUs. This efficiency translates into cost savings and reduced environmental impact for cloud service providers and their customers.

AWS Graviton, specifically designed for cloud workloads, exemplifies how ARM architecture can optimize operations in data centers, enhancing the performance of web servers, containerized applications, and microservices at a lower cost. This shift towards ARM in the cloud represents a significant move towards more energy-efficient and cost-effective data center operations.

Apache Kafka on ARM – A Match Made in Heaven for Edge and Cloud Workloads

Using ARM architecture together with Apache Kafka, a distributed streaming platform, offers several advantages, especially in scenarios that demand high throughput, scalability, and energy efficiency.

Energy Efficiency and Cost-Effectiveness: ARM processors are known for their low power consumption, which makes them cost-effective for running distributed systems like Kafka. Deploying Kafka on ARM-based servers can reduce operational costs, particularly in large-scale environments where energy consumption can significantly affect the budget.
Scalability: Kafka handles large volumes of data and high throughput, characteristics that align well with the scalability of ARM processors in cloud environments. ARM’s efficiency enables scaling out Kafka clusters more economically, allowing for the processing of streaming data in real-time without incurring high energy or hardware costs.
Edge Computing: Kafka is a common choice for real-time data processing and aggregation in edge computing scenarios. ARM’s dominance in IoT and edge devices makes it a natural fit for these use cases. Running Kafka on ARM enables efficient data processing closer to the source, reducing latency and bandwidth usage by minimizing the need to send large volumes of data to central data centers.
Eco-Friendly Solutions: With growing environmental concerns, ARM’s energy efficiency contributes to more sustainable computing solutions. Deploying Kafka on ARM can be part of an eco-friendly strategy for organizations looking to minimize their carbon footprint.
Innovative Use Cases: Combining Kafka with ARM opens up new possibilities for innovative applications in IoT, real-time analytics, and mobile applications. The efficiency of ARM allows for cost-effective experimentation and deployment of new services that require real-time data processing and streaming capabilities.

Examples and Case Studies for Kafka at the Edge

Overall, the combination of ARM and Apache Kafka supports the development of efficient, scalable, and sustainable data processing architectures, particularly suited for modern applications that require real-time performance with minimal energy consumption.

For several use cases, architectures and case studies about data streaming at the edge and hybrid cloud, check out my related articles:

Most of these blog posts are a few years old. But they are as relevant today as at the time of writing them. Actually, the official support of ARM CPU at the edge completely changes the conversations about challenges and solutions of deploying Kafka on edge infrastructure. The deployment of Kafka at the edge was never easier. If you buy a new Industrial PC (IPC) today, it will have enough hardware power to run Kafka and its ecosystem for data integration and stream processing easily.

Confluent Platform on ARM Infrastructure for Edge Deployments

Confluent Platform is Confluent’s data streaming platform for self-managed deployments of Apache Kafka. Most deployments operate in a traditional data center. However, this is more and more shifting to deploy at the edge, i.e., outside of a data center, too.

Since version 7.6., Confluent Platform officially supports ARM64 Linux architectures. Confluent Platform’s architecture allows you to run it wherever your IT systems are, across a global footprint. This includes running it as a mission-critical cluster in data centers, but also on edge sites like retail stores, ships or factories, or as a single broker on edge devices.

Confluent itself recognized the benefits of ARM64 CPUs: They moved the entire AWS fleet for the fully managed Confluent Cloud to ARM-based images in the past months.

The Confluent Server Broker powered by Apache Kafka enables end-to-end data pipelines:

Collecting data from any source
Persists the data in the event storage with separate compute and storage
Processes the events with stream processing
Share data with downstream applications
Replicate selected events across the WAN through the native Kafka protocol via Cluster Linking.

Now, you can deploy this in production on low-cost, small-footprint ARM64 architecture infrastructure at the edge and also synchronize with a data center or cloud Kafka cluster.

Kafka + ARM = Cost-Effective and Sustainable

The article outlined the synergistic relationship between Apache Kafka and ARM CPUs. It enables efficient, scalable, and sustainable data processing architectures for edge and hybrid cloud environments.

The adoption of ARM in cloud computing marks a significant shift towards more sustainable and performance-optimized computing solutions. The combination of Kafka and ARM CPUs is poised to drive innovation in real-time analytics, IoT, and mobile applications. A few great examples:

AWS Graviton to operate Kafka cost-efficient in the public cloud.
Confluent Platform’s compatibility and support for ARM64 architectures at the edge.

The sustainability of energy-efficient ARM CPUs is a perfect segue to the data streaming article “Green Data, Clean Insights: How Kafka and Flink Power ESG Transformations“.

Do you already use ARM processors in your edge or cloud Kafka environment? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post ARM CPU for Cost-Effective Apache Kafka at the Edge and Cloud appeared first on Kai Waehner.

MQTT Market Trends: Cloud, Unified Namespace, Sparkplug, Kafka Integration

Kai Waehner — Fri, 08 Dec 2023 09:15:24 +0000

The lightweight and open IoT messaging protocol MQTT gets adopted more widely across industries. This blog post explores relevant market trends for MQTT: cloud deployments and fully managed services, data governance with unified namespace and Sparkplug B, MQTT vs. OPC-UA debates, and the integration with Apache Kafka for OT/IT data processing in real-time.

MQTT Summit in Munich

In December 2023, I attended the MQTT Summit Connack. HiveMQ sponsored the event. The agenda included various industry experts. The talks covered industrial IoT deployments, unified namespace, Sparkplug B, security and fleet management, and use cases for Kafka combined with MQTT like connected vehicles or smart city (my talk).

It was a pleasure to meet many industry peers of the MQTT community, independent consultants, and software vendors. I learned a lot about the adoption of MQTT in the real world, best practices, and a few trade-offs of Sparkplug B. The following sections summarize my trends for MQTT of this event combined with experiences I had this year in customer meetings around the world.

Special thanks to Kudzai Manditereza of HiveMQ to organize this great event with many international attendees across industries:

What is MQTT?

MQTT stands for Message Queuing Telemetry Transport. MQTT is a lightweight and open-source messaging protocol designed for small sensors and mobile devices with high-latency or unreliable networks. IBM originally developed MQTT in the late 1990s and later became an open standard.

MQTT follows a publish/subscribe model, where devices (or clients) communicate through a central message broker. The key components in MQTT are:

Client: The device or application that connects to the MQTT broker to send or receive messages.
Broker: The central hub that manages the communication between clients. It receives messages from publishing clients and routes them to subscribing clients based on topics.
Topic: A hierarchical string that acts as a label for a message. Clients subscribe to topics to receive messages and publish messages to specific topics.

When to use MQTT?

The publish/subscribe model allows for efficient communication between devices. When a client publishes a message to a specific topic, all other clients subscribed to that topic receive the message. This decouples the sender and receiver, enabling a scalable and flexible communication system.

The MQTT standard is known for its simplicity, low bandwidth usage, and support for unreliable networks. These characteristics make it well-suited for Internet of Things (IoT) applications, where devices often have limited resources and may operate under challenging network conditions. Good MQTT implementations provide a scalable and reliable platform for IoT projects.

MQTT has gained widespread adoption in various industries for IoT deployments, home automation, and other scenarios requiring lightweight and efficient communication.

I discuss the following four market trends for MQTT in the following sections. These have huge impact on the adoption and making a decision to choose MQTT:

MQTT in the Public Cloud
Data Governance for MQTT
MQTT vs. OPC-UA Debates
MQTT and Apache Kafka for OT/IT Data Processing

Trend 1: MQTT in the Public Cloud

Most companies have a cloud first strategy. Go serverless if you can! Focus on business problems, faster time-to-market, and an elastic infrastructure are the consequence.

Mature MQTT cloud services exist. At Confluent, we work a lot with HiveMQ. The combination even provides a fully managed integration between both cloud offerings.

Having said that, not everything can or should go to the (public) cloud. Security, latency and cost often make a deployment in the data center or at the edge (e.g., in a smart factory) the preferred or mandatory option. Hybrid architectures allow the combination of both options for building the most cost-efficient but also reliable and secure IoT infrastructure. I talked about zero-trust and air-gapped environments leveraging unidirectional hardware for the most critical use cases in another blog..

Automation and Security are the Typical Blockers for Public Cloud

Key for success, especially in hybrid architectures, is automation and fleet management with CI/CD and GitOps for multi-cluster management. Many projects leverage Kubernetes as a cloud-native infrastructure for the edge and private cloud. However, in the public cloud, the first option should always be a fully managed service (if security and other requirements allow it).

Be careful when adopting fully-managed MQTT cloud services: Support for MQTT is not always equal across the cloud vendors. Many vendors do not implement the entire protocol, miss features, and require usage limitations. HiveMQ wrote a great article showing this. The article is a bit outdated (and opinionated, of course, as a competing MQTT vendor). But it shows very well how some vendors provide offerings that are far away from a good MQTT cloud solution.

The hardest problem for public cloud adoption of MQTT is security! Double check the requirements early. Latency, availability or specific features are usually not the problem. The deployment and integration need to be compliant and follow the cloud strategy. As Industrial IoT projects always have to include some kind of edge story, it is a tougher discussion than sales or marketing projects.

Trend 2: Data Governance for MQTT

Data governance is crucial across the enterprise. From an IoT and MQTT perspective, the two main topics are unified namespace as the concept and Sparkplug B as the technology.

Unified Namespace for Industrial IoT

In the context of Industrial Internet of Things (IIoT), a unified namespace (UNS) typically refers to a standardized and cohesive way of naming and organizing devices, data, and resources within an industrial network or ecosystem. The goal is to provide a consistent naming structure that facilitates interoperability, data sharing, and management of IIoT devices and systems.

The term Unified Namespace (in Industrial IoT) was coined and popularized by Walker Reynolds, an expert and content creator for Industrial IoT.

Concepts of Unified Namespace

Here are some key aspects of a unified namespace in Industrial IoT:

Device Naming: Devices in an IIoT environment may come from various manufacturers and have different functionalities. A unified namespace ensures that devices are named consistently, making it easier for administrators, applications, and other devices to identify and interact with them.
Data Naming and Tagging: IIoT involves the generation and exchange of vast amounts of data. A unified namespace includes standardized naming conventions and tagging mechanisms for data points, variables, or attributes associated with devices. This consistency is crucial for applications that need to access and interpret data across different devices.
Interoperability: A unified namespace promotes interoperability by providing a common framework for devices and systems to communicate. When devices and applications follow the same naming conventions, it becomes easier to integrate new devices into existing systems or replace components without causing disruptions.
Security and Access Control: A well-defined namespace contributes to security by enabling effective access control mechanisms. Security policies can be implemented based on the standardized names and hierarchies, ensuring that only authorized entities can access specific devices or data.
Management and Scalability: In large-scale industrial environments, having a unified namespace simplifies device and resource management. It allows for scalable solutions where new devices can be added or replaced without requiring extensive reconfiguration.
Semantic Interoperability: Beyond just naming, a unified namespace may include semantic definitions and standards. This helps in achieving semantic interoperability, ensuring that devices and systems understand the meaning and context of the data they exchange.

Overall, a unified namespace in Industrial IoT is about establishing a common and standardized structure for naming devices, data, and resources, providing a foundation for efficient, secure, and scalable IIoT deployments. Standards organizations and industry consortia often play a role in developing and promoting these standards to ensure widespread adoption and compatibility across diverse industrial ecosystems.

Sparkplug B: Interoperability and Standardized Communication for MQTT Topics and Payloads

Unified Namespace is the theoretical concept for interoperability. The standardized implementation for payload structure enforcement is Sparkplug B. This is a specification created at the Eclipse foundation and turned into an ISO standard later.

Sparkplug B provides a set of conventions for organizing data and defining a common language for devices to exchange information. Here is an example of HiveMQ depicting how a unified namespace makes communication between devices, systems, and sites easier:

Source: HiveMQ

Key features of Sparkplug B include:

Payload Structure: Sparkplug B defines a specific format for the payload of MQTT messages. This format includes fields for information such as timestamp, data type, and value. This standardized payload structure ensures that devices can consistently understand and interpret the data being exchanged.
Topic Namespace: The specification defines a standardized topic namespace for MQTT messages. This helps in organizing and categorizing messages, making it easier for devices to discover and subscribe to relevant information.
Birth and Death Certificates: Sparkplug B introduces the concept of “Birth” and “Death” certificates for devices. When a device comes online, it sends a Birth certificate with information about itself. Conversely, when a device goes offline, it sends a Death certificate. This mechanism aids in monitoring the status of devices within the IIoT network.
State Management: The specification includes features for managing the state of devices. Devices can publish their current state, and other devices can subscribe to receive updates. This helps in maintaining a synchronized view of device states across the network.

Sparkplug B is intended to enhance the interoperability, scalability, and efficiency of IIoT deployments by providing a standardized framework for MQTT communication in industrial environments. Its adoption can simplify the integration of diverse devices and systems within an industrial ecosystem, promoting seamless communication and data exchange.

Limitations of Sparkplug B

Sparkplug B has a few limitations, such as:

Only supports Quality of Service (QoS) 0 providing at most once message delivery guarantees.
Limits in the structure of topic namespaces.
Very device centric (but MQTT is for many “things”)

Understand the pros and cons of Sparkplug B. It is perfect for some use cases. But the above limitations are blockers for some others. Especially, only supporting QoS 0 is a huge limitation for mission-critical use cases.

Trend 3: MQTT vs. OPC-UA Debates

MQTT has many benefits compared to other industrial protocols. However, OPC-UA is another standard in the IoT space that gets at least as much traction in the market as MQTT. The debate about choosing the right IoT standard is controversial, often led by emotions and opinions, and still absolutely valid to discuss.

OPC-UA (Open Platform Communications Unified Architecture) is a machine-to-machine communication protocol for industrial automation. It enables seamless and secure communication and data exchange between devices and systems in various industrial settings.

OPC UA has become a widely adopted standard in the industrial automation and control domain, providing a foundation for secure and interoperable communication between devices, machines, and systems. Its open nature and support from industry organizations contribute to its widespread use in applications ranging from manufacturing and process control to energy management and more.

If you look at the promises of MQTT and OPC-UA, a lot of overlapping exists:

Scalable
Reliable
Real-time
Open
Standardized

All of them are true for both standards. Still, trade-offs exist. I won’t start a flame war here. Just search for “MQTT vs. OPC-UA”. You will find many blog posts, articles and videos. Most are very opinionated (and often driven by a vendor). Reality is that the industry adopted both MQTT and OPC-UA widely.

And while the above characteristics might all be true for both standards in general, the details make the difference for specific implementations. For instance, if you try to connect plenty of Siemens S3 PLCs via OPC-UA, then you quickly realize that the number of parallel connections is not as scalable as the OPC-UA standard specification tells you.

When to Choose MQTT vs. OPC-UA?

The clear recommendation is starting with the business problem, not the technology. Evaluate both standards and their implementations, supported interfaces, vendors cloud services, etc. Then choose the right technology.

Here is what I use as a simplified rule of thumb if you have to start a technical discussion:

MQTT: Use cases for connected IoT devices, vehicles, and other interfaces with support for lightweight infrastructure, large number of connections, and/or bad networks.
OPC-UA: Use cases for industrial automation to connect heavy equipment, PLCs, SCADA systems, data historians, etc.

This is just a rule of thumb. And the situation changes. Modern PLCs and other equipment add support for multiple protocols to be more flexible. But, nowadays, you rarely have an option anyway because specific equipment, devices, or vehicles only support one or the other. And you can still be happy: Otherwise, you need to use another IIoT platform to connect to proprietary legacy protocols like S3, Modbus, et al.

MQTT and OPC-UA Gotchas

A few additional gotchas I realized from various customer conversations around the world in the past quarters:

In theory, MQTT and OPC-UA work well together, i.e., MQTT is the underlying transportation protocol for OPC-UA. I did not see this yet in the real world (no statistical evidence, just my personal experience). But what I see is the combination of OPC-UA for the last mile integration to the PLC and then forwarding the data to other consumers via MQTT. All in a single gateway, usually a proprietary IoT platform.
OPC-UA defines many sub-standards for different industries or use cases. In theory, this is great. In practice, I see this more like the WS-* hell in the SOAP/WSDL web service world where most projects moved to a much simpler HTTP/REST architectures. Similarly, most integrations I see to OPC-UA use simple, custom-coded clients in Java or other programming languages – because the tools don’t support the complex standards.
IoT vendors pitch any possible integration scenario in marketing. I am amazed that MQTT and OPC-UA platforms directly integrate with MES and ERP system like SAP, and any data warehouse and data lake, like Google Big Query, Snowflake, or Databricks. But that’s only the theory. Should you really do this? And did you ever try to connect SAP ECC to MQTT or OPC-UA? Good luck from a technical, and even harder, from an organizational perspective. And do you want tight coupling and point-to-point communication in between the OT world and the ERP? In most cases, it is a good thing to have a clear separation of concerns between different business units, domains, and use cases. Choose the right tool and enterprise architecture; not just for the POC and first pipeline, but for the entire long-term strategy and vision.

The last point brings me to another growing trend: The combination of MQTT for IoT / OT workloads and data streaming with Apache Kafka for the integration with the IT world.

Trend 4: MQTT and Apache Kafka for OT/IT Data Processing

Contrary to MQTT, Apache Kafka is NOT an IoT platform. Instead, Kafka is an event streaming platform and used the underpinning of an event-driven architecture for various use cases across industries. It provides a scalable, reliable, and elastic real-time platform for messaging, storage, data integration, and stream processing. Apache Kafka and MQTT are a perfect combination for many IoT use cases.

Let’s explore the pros and cons of both technologies from the IoT perspective.

Trade-offs of MQTT

MQTT’s pros:

Lightweight
Built for thousands of connections
All programming languages supported
Built for poor connectivity / high latency scenarios
High scalability and availability (depending on broker implementation)•ISO Standard
Most popular IoT protocol (competing with OPC UA)

MQTT’s cons:

Adoption mainly in IoT use cases
Only pub/sub, not stream processing
No reprocessing of events

Trade-offs of Apache Kafka

Kafka’s pros:

Stream processing, not just pub/sub
High throughput
Large scale
High availability
Long-term storage and buffering
Reprocessing of events
Good integration to rest of the enterprise

Kafka’s cons:

Not built for tens of thousands of connections
Requires stable network and good infrastructure
No IoT-specific features like keep alive, last will, or testament

Use Cases, Architectures and Case Studies for MQTT and Kafka

I wrote a blog series about MQTT in conjunction with Apache Kafka with many more technical details and real-world case studies across industries.

The first blog post explores the relation between MQTT and Apache Kafka. Afterward, the other four blog posts discuss various use cases, architectures, and reference deployments.

Part 1 – Overview: Relation between Kafka and MQTT, pros and cons, architectures
Part 2 – Connected Vehicles: MQTT and Kafka in a private cloud on Kubernetes; use case: remote control and command of a car
Part 3 – Manufacturing: MQTT and Kafka at the edge in a smart factory; use case: Bidirectional OT-IT integration with Sparkplug B between PLCs, IoT Gateways, Data Historian, MES, ERP, Data Lake, etc.
Part 4 – Mobility Services: MQTT and Kafka leveraging serverless cloud infrastructure; use case: Traffic jam prediction service using machine learning
Part 5 – Smart City: MQTT at the edge connected to fully-managed Kafka in the public cloud; use case: Intelligent traffic routing by combining and correlating different 1st and 3rd party services

The following presentation is from my talk at the MQTT Summit. It explores various use cases and reference architectures for MQTT and Apache Kafka:

Fullscreen Mode

If you have a bad network, tens of thousands of clients, or the need for a lightweight push-based messaging solution, then MQTT is the right choice. Elsewhere, Kafka, a powerful event streaming platform, is probably the right choice for real-time messaging, data integration, and data processing. In many IoT use cases, the architecture combines both technologies. And even in the industrial space, various projects use Kafka for use cases like building a cloud-native data historian or real-time condition monitoring and predictive maintenance.

Data Governance for MQTT with Sparkplug and Kafka (and Beyond)

Unified Namespace and the concrete implementation with Sparkplug B is excellent for data governance in IoT workloads with MQTT. In a similar way, the Schema Registry defines the data contracts for Apache Kafka data pipelines.

Schema Registry should be the foundation of any Kafka project! Data contracts (aka Schemas, similar to Swagger in REST/HTTP APIs) enforce good data quality and interoperability between independent microservices in the Kafka ecosystem. Each business unit and its data products can choose any technology or API. But data sharing with others works only with good (enforced) data quality.

You can see the issue: Each technology uses its own data governance technology. If you add your favorite data lake, you will add another concept, like Apache Iceberg, to define the data tables for analytics storage systems. And that’s okay! Each data governance suite is optimized for its workloads and requirements. A company-wide master data management failed in the last two decades because each software category has different requirements.

Hence, one clear trend I see is an enterprise-wide data governance strategy across the different systems (with technologies like Collibra or Azure Purview). It has open interfaces and integrates with specific data contracts like Sparkplug B for MQTT, Schema Registry for Kafka, Swagger for HTTP/REST applications, or Iceberg for data lakes. Don’t try to solve the entire enterprise-wide data governance strategy with a single technology. It will fail! We have seen this before…

Legacy PLC (S7, Modbus, BACnet, etc.) with MQTT or Kafka?

MQTT and Kafka enable reliable and scalable end-to-end data pipelines between IoT and IT systems. At least, if you can use modern APIs and standards. Most IoT projects today are still brownfield. A lot of legacy PLCs, SCADA systems, and data historians only support proprietary protocols like Siemens S7, Modbus, BACnet, and so on.

MQTT or Kafka don’t support these legacy protocols out-of-the-box. Another middleware is required. Usually, enterprises choose a dedicated IoT platform for this. That means more cost and complexity, and slower projects.

In the Kafka world, Apache PLC4X is a great open source option if you want to build a modern, cloud-native data historian with Kafka. The framework provides integration with many legacy protocols. And it offers a Kafka Connect connector. The main issue is no official vendor support behind. Companies cannot buy support with a 24/7 business model for mission-critical applications. And that’s typically a blocker for any industrial deployment.

As MQTT is only a pub/sub message broker, it cannot help with legacy protocol integration. HiveMQ tries to solve this challenge with a new framework called HiveMQ Edge: A software-based industrial edge protocol converter. It is a young project and just kicking off. The core is open source. The first supported legacy protocol is Modbus. I think this is an excellent product strategy. I hope the project gets traction and evolves to support many other legacy IIoT technologies to modernize the brownfield shop floor. The project actually also supports OPC-UA. We will see how much demand that feature creates, too.

MQTT and Sparkplug Adoption Grows Year-By-Year for IoT Use Cases

In the IoT world, MQTT and OPC UA have established themselves as open and platform-independent standards for data exchange in Industrial IoT and Industry 4.0 use cases. Data Streaming with Apache Kafka is the data hub for integrating and processing massive volumes of data at any scale in real-time. The “Trinity of Data Streaming in IoT explores the combination of MQTT, OPC-UA and Apache Kafka” in more detail.

MQTT adoption grows year by year with the need for more scalable, reliable and open IoT communication between devices, equipment, vehicles, and the IT backend. The sweet spots of MQTT are unreliable networks, lightweight (but reliable and scalable) communication and infrastructure, and connectivity to thousands of things.

Maturing trends like the Unified Namespace with Sparkplug B, fully managed cloud services, and combined usage with Apache Kafka make MQTT one of the most relevant IoT standards across verticals like manufacturing, automotive, aviation, logistics, and smart city.

But don’t get fooled by architecture pictures and theory. For example, most diagrams for MQTT and Sparkplug show integrations with the ERP (e.g., SAP) and Data Lake (e.g., Snowflake). Should you really integrate directly from the OT world into the analytics platform? Most times, the answer is no because of cost, decoupling of business units, legal issues, and other reasons. This is where the combination of MQTT and Kafka (or another integration platform) shines.

How do you use MQTT and Sparkplug today? What are the use cases? Do you combine it with other technologies, like Apache Kafka, for end-to-end integration across the OT/IT pipeline? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post MQTT Market Trends: Cloud, Unified Namespace, Sparkplug, Kafka Integration appeared first on Kai Waehner.

Modernizing SCADA Systems and OT/IT Integration with Data Streaming

Kai Waehner — Sun, 10 Sep 2023 12:56:13 +0000

SCADA control systems are a vital component of IT/OT modernization. The old IT/OT infrastructure and SCADA system are monolithic, proprietary, not scalable, and miss open APIs based on standard interfaces. This post explains the modernization of such a system based on the real-life use case of 50Hertz, a transmission system operator for electricity in Germany. Two common business goals drove them: Improve the Overall Equipment Effectiveness (OEE) and stay innovative. A lightboard video about the related data streaming enterprise architecture is included.

The State of Data Streaming for Manufacturing in 2023

The evolution of industrial IoT, manufacturing 4.0, and digitalized B2B and customer relations require modern, open, and scalable information sharing. Data streaming allows integrating and correlating data in real-time at any scale. Trends like software-defined manufacturing and data streaming help modernize and innovate the entire engineering and sales lifecycle.

I have recently presented an overview of trending enterprise architectures in the manufacturing industry and data streaming customer stories from BMW, Mercedes, Michelin, and Siemens. A complete slide deck and on-demand video recording are included:

This blog post explores one of the enterprise architectures and case studies in more detail: Modernization of legacy and proprietary monoliths and SCADA systems to a scalable, open platform with real-time data integration capabilities.

What is a SCADA System? And how does Data Streaming help?

Supervisory control and data acquisition (SCADA) is a control system architecture comprising computers, networked data communications, and graphical user interfaces for high-level supervision of machines and processes. It also covers sensors and other devices, such as programmable logic controllers, which interface with process plants or machinery.

Data streaming helps connect high-volume sensor data from machines, PLCs, robots, and other IoT devices. This is possible in real-time at scale with stream processing. The de facto standard for data streaming is Apache Kafka and its ecosystems, like Kafka Stream and Kafka Connect.

Enterprises leverage Apache Kafka as the next generation of Data Historians. Integrating and pre-processing the events with data streaming is a prerequisite for data correlation with information systems like the MES or ERP (that might run at the edge or more often in the cloud).

50hertz: A cloud-native SCADA system built with Apache Kafka

50hertz is a transmission system operator for electricity in Germany. The company secures electricity supply to 18 million people in northern and eastern Germany.

The infrastructure must operate 24 hours, seven days a week. Various shift teams and a mission-critical SCADA infrastructure supervise and control the OT systems.

50hertz next-generation Modular Control Center System (MCCS) leverages a central, scalable, event-based integration platform based on Confluent:

Source: 50hertz

The first four containers include the Supervisory & Control (SCADA), Load Frequency Control (LFC), and Time Series Management & Forecasting applications. Each container can have multiple services/functions that follow the event-based microservices pattern.

50hertz provides central governance for security, protocols, and data schemas (CIM compliant) between platform containers/ modules. The cloud-native 24/7 SCADA system is developed in the cloud and deployed in safety-critical edge environments.

50hertz presented their OT/IT and SCADA modernization leveraging data streaming with Apache Kafka at the Confluent Data in Motion tour 2021. Unfortunately, the on-demand video recording is available only in German. Therefore, in another blog post, I wrote more about the case study: “A cloud-native SCADA System for Industrial IoT built with Apache Kafka“.

Lightboard Video: How Data Streaming Modernizes SCADA and OT/IT

Here is a five-minute lightboard video that describes how data streaming helps with modernizing monolith and proprietary SCADA infrastructure and OT/IT environments:

If you liked this video, make sure to follow the YouTube channel for many more lightboard videos across all industries.

Apache Kafka glues together the old and new OT/IT World

The 50Hertz case study showed how to modernize an existing legacy infrastructure with cloud-native technologies, whether you deploy at the edge or in the public cloud. For more case studies, check out the free “The State of Data Streaming in Manufacturing” on-demand recording or read the related blog post.

A common question in these scenarios is the proper communication and integration protocol when you move away from proprietary legacy PLCs and OT interfaces. MQTT and OPC-UA established themselves as excellent standards with different sweet spots. Data Streaming with Apache Kafka is complementary, not competitive. Learn more by reading “OPC UA, MQTT, and Apache Kafka – The Trinity of Data Streaming in IoT“.

How do you leverage data streaming in your manufacturing use cases? Do you deploy at the edge, in the cloud, or both? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post Modernizing SCADA Systems and OT/IT Integration with Data Streaming appeared first on Kai Waehner.

The State of Data Streaming for Energy & Utilities

Kai Waehner — Fri, 01 Sep 2023 07:14:02 +0000

This blog post explores the state of data streaming for the energy and utilities industry. The evolution of utility infrastructure, energy distribution, customer services, and new business models requires real-time end-to-end visibility, reliable and intuitive B2B and B2C communication, and integration with pioneering technologies like 5G for low latency or augmented reality for innovation. Data streaming allows integrating and correlating data in real-time at any scale to improve most workloads in the energy sector.

I look at trends in the utilities sector to explore how data streaming helps as a business enabler, including customer stories from SunPower, 50hertz, Powerledger, and more. A complete slide deck and on-demand video recording are included.

General trends in the energy & utilities industry

The energy & utilities industry is fundamental for a sustainable future. Garter explores the Top 10 Trends Shaping the Utility Sector in 2023: “In 2023, power and water utilities will continue to face a variety of forces that will challenge their business and operating models and shape their technology investments.

Utility technology leaders must confidently compose the future for their organizations in the midst of uncertainty during this energy transition volatile period — the future that requires your organizations to be both agile and resilient.”

From system-centric and large to smaller-scale and distributed

The increased use of digital tools makes the expected structural changes in the energy system possible:

Energy AI use cases

Artificial Intelligence (AI) with technologies like Machine Learning (ML) and Generative AI (GenAI) is a hot topic across all industries. Innovation around AI disrupts many business models, tasks, business processes, and labor.

NVIDIA created an excellent diagram showing the various opportunities for AI in the energy & utilities sector. It separates the scenarios by segment: upstream, midstream, downstream, power generation, and power distribution:

AI Use Cases in the Energy Sector (Source: NVIDIA)

Cybersecurity: The threat is real!

McKinsey & Company explains that “the cyberthreats facing electric-power and gas companies include the typical threats that plague other industries: data theft, billing fraud, and ransomware. However, several characteristics of the energy sector heighten the risk and impact of cyberthreats against utilities:”

Data streaming in the energy & utilities industry

Adopting trends like predictive maintenance, track&trace, proactive sales and marketing, or threat intelligence is only possible if enterprises in the energy sector can provide and correlate information at the right time in the proper context. Real-time, which means using the information in milliseconds, seconds, or minutes, is almost always better than processing data later (whatever later means):

Data streaming combines the power of real-time messaging at any scale with storage for true decoupling, data integration, and data correlation capabilities. Apache Kafka is the de facto standard for data streaming.

“Apache Kafka for Smart Grid, Utilities and Energy Production” is a great starting point to learn more about data streaming in the industry, including a few case studies not covered in this blog post – such as

EON: Smart grid for energy production and distribution with Apache Kafka
Devon Energy: Kafka at the edge for hybrid integration and analytics in the cloud
Tesla: Kafka-based data platform for trillions of data points per day

5 Ways Utilities Accomplish More with Real-Time Data

“After creating a collaborative team that merged customer experience and digital capabilities, one North American utility went after a 30 percent reduction in its cost-to-serve customers in some of its core journeys.”

As the Utilities Analytics Institute explains: “Utilities need to ensure that the data they are collecting is high quality, specific to their needs, preemptive in nature, and, most importantly, real-time.” The following five characteristics are crucial to add value with real-time data:

High-Quality Data
Data Specific to Your Needs
Make Your Data Proactive
Data Redundancy
Data is Constantly Changing

Real-Time Data for Smart Meters and Common Praxis

Smart meters are a perfect example of increasing business value with real-time data streaming. As Clou Global confirms: “The use of real-time data in smart grids and smart meters is a key enabler of the smart grid“.

Possible use cases include:

Load Forecasting
Fault Detection
Demand Response
Distribution Automation
Smart Pricing

Processing and correlating events from smart meters with stream processing is just one IoT use case. You can leverage “Apache Kafka and Apache Flink for many Industrial IoT and Manufacturing 4.0 use cases“.

And there is so much more if you expand your thinking from upstream through midstream to downstream applications to “transform the global supply chain with data streaming and IoT“.

Cloud adoption in utilities & energy sector

Accenture points out that 84% use Cloud SaaS solutions and 79% use Cloud PaaS Solutions in the energy & utilities market for various reasons:

New approach to IT
Incremental adoption
Improved scalability, efficiency, agility and security
Unlock most business value

This is a general statistic, but this applies to all components in the data-driven enterprise, including data streaming. A company does not just move a specific application to the cloud; this would be counter-intuitive from a cost and security perspective. Hence, most companies start with a hybrid architecture and bring more and more workloads to the public cloud.

Architecture trends for data streaming

The energy & utilities industry applies various trends for enterprise architectures for cost, flexibility, security, and latency reasons. The three major topics I see these days at customers are:

Global data streaming
Edge computing and hybrid cloud integration
OT/IT modernization

Let’s look deeper into some enterprise architectures that leverage data streaming for energy & utilities use cases.

Global data streaming across data centers, clouds and the edge

Energy and utilities require data infrastructure everywhere. While most organizations have a cloud-first strategy, there is no way around running some workloads at the edge outside a data center for cost, security, or latency reasons.

Data streaming is available everywhere:

Data synchronization across environments, regions and clouds is possible with open-source Kafka tools like MirrorMaker. However, this requires additional infrastructure and development/operations efforts. Innovative solutions like Confluent’s Cluster Linking leverage the Kafka protocol for real-time replication. This enables much easier deployments and significantly reduced network traffic.

Edge computing and hybrid cloud integration

Kafka deployments look different depending on where it needs to be deployed.

Fully managed serverless offerings like Confluent Cloud are highly recommended in the public cloud to focus on business logic with reduced time-to-market and TCO.

In a private cloud, data center or edge environment, most companies deploy on Kubernetes today to provide a similar cloud-native experience.

Kafka can also be deployed on industrial PCs (IPC) and other industrial hardware. Many use cases exist for data streaming at the edge. Sometimes, a single broker (without high availability) is good enough.

No matter how you deploy data streaming workloads, a key value is the unidirectional or bidirectional synchronization between clusters. Often, only curated and relevant data is sent to the cloud for cost reasons. Also, command & control patterns can start a business process in the cloud and send events to the edge.

OT/IT modernization with data streaming

The energy sector operates many monoliths, inflexible and closed software and hardware products. This is changing in this decade. OT/IT modernization and the digital transformation require open APIs, flexible scale, and decoupled applications (from different vendors).

Many companies leverage Apache Kafka to build a postmodern data historian to complement or replace existing expensive OT middleware:

Just to be clear: Kafka and any other IT software like Spark, Flink, Amazon Kinesis, and so on are NOT hard real-time. It cannot be used for safety-critical use cases with deterministic systems like autonomous driving or robotics. That is C, Rust, or other embedded software.

However, data streaming connects the OT and IT worlds. As part of that, connectivity with robotic systems, intelligent vehicles, and other IoT devices is the norm for improving logistics, integration with ERP and MES, aftersales, etc.

Learn more about this discussion in two articles:

New customer stories for data streaming in the energy & utilities sector

So much innovation is happening in the energy & utilities sector. Automation and digitalization change how utilities monitor infrastructure, build customer relationships, and create completely new business models.

Most energy service providers use a cloud-first approach to improve time-to-market, increase flexibility, and focus on business logic instead of operating IT infrastructure. And elastic scalability gets even more critical with all the growing networks, 5G workloads, autonomous vehicles, drones, and other innovations.

Here are a few customer stories from worldwide energy & utilities organizations:

50hertz: A grid operator modernization of the legacy, monolithic and proprietary SCADA infrastructure to cloud-native microservices and a real-time data fabric powered by data streaming. More details: A cloud-native SCADA System for Industrial IoT built with Apache Kafka.
SunPower: Solar solutions across the globe where 6+ million devices in the field send data to the streaming platform. However, sensor data alone is not valuable! Fundamentals for delivering customer value include measurement ingestion, metadata association, storage, and analytics.
aedifion: Efficient management of real estate to operate buildings better and meet environmental, social, and corporate governance (ESG) goals. Secure connectivity and reliable data collection are implemented with Confluent Cloud (and deprecated the existing MQTT-based pipeline).
Ampeers Energy: Decarbonization for the real estate. The service provides district management with IoT-based forecasts and optimization, and local energy usage accounting. The real-time analytics of time-series data is implemented with OPC-UA, Confluent Cloud and TimescaleDB.
Powerledger: Green energy trading with blockchain-based tracking, tracing and trading of renewable energy from rooftop solar power installations and virtual power plants. Non-fungible tokens (NFTs) representing renewable energy certificates (RECs) in. a decentralised rather than the conventional unidirectional market. Confluent Cloud ingests data from smart electricity meters. Learn more: data streaming and blockchain.

Resources to learn more

This blog post is just the starting point. Learn more about data streaming in the energy & utilities industry in the following on-demand webinar recording, the related slide deck, and further resources, including pretty cool lightboard videos about use cases.

On-demand video recording

The video recording explores the telecom industry’s trends and architectures for data streaming. The primary focus is the data streaming case studies. Check out our on-demand recording:

Slides

If you prefer learning from slides, check out the deck used for the above recording:

Fullscreen Mode

Case studies and lightboard videos for data streaming in the energy & utilities industry

The state of data streaming for energy & utilities is fascinating. New use cases and case studies come up every month. This includes better data governance across the entire organization, real-time data collection and processing data across hybrid edge and cloud infrastructures, data sharing and B2B partnerships for new business models, and many more scenarios.

We recorded lightboard videos showing the value of data streaming simply and effectively. These five-minute videos explore the business value of data streaming, related architectures, and customer stories. Stay tuned; I will update the links in the next few weeks and publish a separate blog post for each story and lightboard video.

And this is just the beginning. Every month, we will talk about the status of data streaming in a different industry. Manufacturing was the first. Financial services second, then retail, telcos, gaming, and so on… Check out my other blog posts.

Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post The State of Data Streaming for Energy & Utilities appeared first on Kai Waehner.

The State of Data Streaming for Telco

Kai Waehner — Fri, 02 Jun 2023 05:38:56 +0000

This blog post explores the state of data streaming for the telco industry. The evolution of telco infrastructure, customer services, and new business models requires real-time end-to-end visibility, fancy mobile apps, and integration with pioneering technologies like 5G for low latency or augmented reality for innovation. Data streaming allows integrating and correlating data in real-time at any scale to improve most telco workloads.

I look at trends in the telecommunications sector to explore how data streaming helps as a business enabler, including customer stories from Dish Network, British Telecom, Globe Telecom, Swisscom, and more. A complete slide deck and on-demand video recording are included.

General trends in the telco industry

The Telco industry is fundamental for growth and innovation across all industries.

The global spending on telecom services is expected to reach 1.595 trillion U.S. dollars by 2024 (Source: Statista, Jul 2022).

Cloud-native infrastructure and digitalization of business processes are critical enablers. 5G network capabilities and telco marketplaces enable entirely new business models.

5G enables new business models

Presentation of Amdocs / Mavenir:

A report from McKinsey & Company says, “74 percent of customers have a positive or neutral feeling about their operators offering different speeds to mobile users with different needs”. The potential for increasing the revenue per user (ARPU) with 5G use cases is enormous for telcos:

Telco marketplace

Many companies across industries are trying to build a marketplace these days. But especially the telecom sector might shine here because of its interface between infrastructure, B2B, partners, and end users for sales and marketing.

tmforum has a few good arguments for why communication service providers (CSP) should build a marketplace for B2C and B2B2X:

Operating the marketplace keeps CSP in control of the relationship with customers
A marketplace is a great sales channel for additional revenue
Operating the marketplace helps CSPs monetize third-party (over-the-top) content
The only other option is to be relegated to connectivity provider
Enterprise customers have decided this is their preferred method of engagement
CPSs can take a cut of all sales
Participating in a marketplace prevents any one company from owning the customer

Data streaming in the telco industry

Adopting trends like network monitoring, personalized sales and cybersecurity is only possible if enterprises in the telco industry can provide and correlate information at the right time in the proper context. Real-time, which means using the information in milliseconds, seconds, or minutes, is almost always better than processing data later (whatever later means):

“Use Cases for Apache Kafka in Telco” is a good article for starting with an industry-specific point of view on data streaming. “Apache Kafka for Telco-OTT and Media Applications” explores over-the-top B2B scenarios.

Data streaming with the Apache Kafka ecosystem and cloud services are used throughout the supply chain of the telco industry. Search my blog for various articles related to this topic: Search Kai’s blog.

From Telco to TechCo: Next-generation architecture

Deloitte describes the target architecture for telcos very well:

Data streaming provides these characteristics: Open, scalable, reliable, and real-time. This unique combination of capabilities made Apache Kafka so successful and widely adopted.

Kafka decouples applications and is the perfect technology for microservices across a telco’s enterprise architecture. Deloitte’s diagram shows this transition across the entire telecom sector:

This is a massive shift for telcos:

From purpose-built hardware to generic hardware and elastic scale
From monoliths to decoupled, independent services

Digitalization with modern concepts helps a lot in designing the future of telcos.

Open Data Architecture (ODA)

tmforum describes Open Digital Architecture (ODA) as follows:

“Open Digital Architecture is a standardized cloud-native enterprise architecture blueprint for all elements of the industry from Communication Service Providers (CSPs), through vendors to system integrators. It accelerates the delivery of next-gen connectivity and beyond – unlocking agility, removing barriers to partnering, and accelerating concept-to-cash.

ODA replaces traditional operations and business support systems (OSS/BSS) with a new approach to building software for the telecoms industry, opening a market for standardized, cloud-native software components, and enabling communication service providers and suppliers to invest in IT for new and differentiated services instead of maintenance and integration.”

If you look at the architecture trends and customer stories for data streaming in the next section, you realize that real-time data integration and processing at scale is required to provide most modern use cases in the telecommunications industry.

Architecture trends for data streaming

The telco industry applies various trends for enterprise architectures for cost, flexibility, security, and latency reasons. The three major topics I see these days at customers are:

Hybrid architectures with synchronization between edge and cloud in real-time
End-to-end network and infrastructure monitoring across multiple layers
Proactive service management and context-specific customer interactions

Let’s look deeper into some enterprise architectures that leverage data streaming for telco use cases.

Hybrid 5G architecture with data streaming

Most telcos have a cloud-first strategy to set up modern infrastructure for network monitoring, sales and marketing, loyalty, innovative new OTT services, etc. However, edge computing gets more relevant for use cases like pre-processing for cost reduction, innovative location-based 5G services, and other real-time analytics scenarios:

Learn about architecture patterns for Apache Kafka that may require multi-cluster solutions and see real-world examples with their specific requirements and trade-offs. That blog explores scenarios such as disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments, and global Kafka.

Edge deployments for data streaming are their own challenges. In separate blog posts, I covered use cases for Kafka at the edge and provided an infrastructure checklist for edge data streaming.

End-to-end network and infrastructure monitoring

Data streaming enables unifying telemetry data from various sources such as Syslog, TCP, files, REST, and other proprietary application interfaces:

End-to-end visibility into the telco networks allows massive cost reductions. And as a bonus, a better customer experience. For instance, proactive service management tells customers about a network outage:

Context-specific sales and digital lifestyle services

Customers expect a great customer experience across devices (like a web browser or mobile app) and human interactions (e.g., in a telco store). Data streaming enables a context-specific omnichannel sales experience by correlating real-time and historical data at the right time in the proper context:

“Omnichannel Retail and Customer 360 in Real Time with Apache Kafka” goes into more detail. But one thing is clear: Most innovative use cases require both historical and real-time data. In summary, correlating historical and real-time information is possible with data streaming out-of-the-box because of the underlying append-only commit log and replayability of events. A cloud-native Tiered Storage Kafka infrastructure to separate compute from storage makes such an enterprise architecture more scalable and cost-efficient.

The article “Fraud Detection with Apache Kafka, KSQL and Apache Flink” explores stream processing for real-time analytics in more detail, shows an example with embedded machine learning, and covers several real-world case studies.

New customer stories for data streaming in the telco industry

So much innovation is happening in the telecom sector. Automation and digitalization change how telcos monitor networks, build customer relationships, and create completely new business models.

Most telecommunication service providers use a cloud-first approach to improve time-to-market, increase flexibility, and focus on business logic instead of operating IT infrastructure. And elastic scalability gets even more critical with all the growing networks and 5G workloads.

Here are a few customer stories from worldwide telecom companies:

Dish Network: Cloud-native 5G Network with Kafka as the central communications hub between all the infrastructure interfaces and IT applications. The standalone 5G infrastructure in conjunction with data streaming enables new business models for customers across all industries, like retail, automotive, or energy sector.
Verizon: MEC use cases for low-latency 5G stream processing, such as autonomous drone-in-a-box-based monitoring and inspection solutions or vehicle-to-Everything (V2X).
Swisscom: Network monitoring and incident management with real-time data at scale to inform customers about outages, root cause analysis, and much more. The solution relies on Apache Kafka and Apache Druid for real-time analytics use cases.
British Telecom (BT): Hybrid multi-cloud data streaming architecture for proactive service management. BT extracts more value from its data and prioritizes real-time information and better customer experiences.
Globe Telecom: Industrialization of event streaming for various use cases. Two examples: Digital personalized rewards points based on customer purchases. Airtime loans are made easier to operationalize (vs. batch, where top-up cash is already spent again).

Resources to learn more

This blog post is just the starting point. Learn more about data streaming in the telco industry in the following on-demand webinar recording, the related slide deck, and further resources, including pretty cool lightboard videos about use cases.

On-demand video recording

The video recording explores the telecom industry’s trends and architectures for data streaming. The primary focus is the data streaming case studies. Check out our on-demand recording:

Slides

If you prefer learning from slides, check out the deck used for the above recording:

Fullscreen Mode

Case studies and lightboard videos for data streaming in telco

The state of data streaming for telco is fascinating. New use cases and case studies come up every month. This includes better data governance across the entire organization, real-time data collection and processing data from network infrastructure and mobile apps, data sharing and B2B partnerships with OTT players for new business models, and many more scenarios.

Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post The State of Data Streaming for Telco appeared first on Kai Waehner.

Edge Archives - Kai Waehner

Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming

Why Replace Legacy OT Middleware?

Challenges: Proprietary, Monolithic, Expensive

The Data Streaming Approach: Kafka & Flink as the Foundation for Modern OT Middleware

Apache Kafka: The Backbone of Real-Time OT Data Streaming

Apache Flink: The Real-Time Processing Engine for OT Data

Unifying Operational (OT) and Analytical (IT) Workloads

The Shift-Left Architecture: Bringing Advanced Analytics Closer to Industrial IoT

Open Table Format with Apache Iceberg / Delta Lake for Unified Workloads and Single Storage Layer

The Result: An Open, Cloud-Native, Future-Proof Data Historian for Industrial IoT

Key Benefits of Offloading OT Middleware to Data Streaming

A Step-by-Step Approach: Offloading vs. Replacing OT Middleware with Data Streaming

1. Hybrid Data Streaming: Process Once for OT and IT

Why?

How?

Result?

2. Lift-and-Shift: Reduce Costs While Keeping Existing OT Integrations

Why?

How?

Result?

3. Full OT Middleware Replacement: Cloud-Native, Scalable, and Future-Proof

Why?

How?

Result?

Why This Matters: The Future of OT is Real-Time & Open for Data Sharing

It’s Time to Move Beyond Legacy OT Middleware to Open Standards like MQTT, OPC-UA, Kafka

Industrial IoT Middleware for Edge and Cloud OT/IT Bridge powered by Apache Kafka and Flink

Industrial Automation – The OT/IT Bridge

OT/IT Hierarchy – Different Layers based on ISA-95 and the Purdue Model

Industrial IoT Middleware

Relevant Industries for IIoT Middleware

Industrial IoT Middleware Layers in OT/IT

Edge and Cloud Vendors for Industrial IoT

Traditional “Legacy” Solutions

Emerging Cloud Solutions

The IIoT Enterprise Architecture is a Mix of Vendors and Platforms

Data Streaming for Industrial IoT in the OT/IT World with Kafka and Flink

Apache Kafka and Flink in IT Environments

Apache Kafka as an OT/IT Bridge

Apache Kafka and Flink in OT Edge Applications

IoT Success Story: Industrial Edge Intelligence with Helin and Confluent

Business Value: Fuel Reduction, Increased Revenue, Saving Human Lives

Data Streaming with Kafka and Flink in the Cloud

Helin’s Data Streaming Journey from Self-Managed Kafka to Serverless Confluent Cloud

Data Streaming with Kafka and Flink Empowers Edge-to-Cloud Industrial IoT Middleware

Apache Kafka Cluster Type Deployment Strategies

Apache Kafka – The De Facto Standard for Event-Driven Architectures and Data Streaming

Different Apache Kafka Cluster Types

Apache Kafka Cluster Strategies and Architectures

Bridging Hybrid Kafka Clusters

RPO vs. RTO = Data Loss vs. Downtime

Stretched Kafka Cluster – Zero Data Loss with Synchronous Replication across Data Centers

Pricing of Kafka Cloud Offerings (vs. Self-Managed)

Kafka Storage – Tiered Storage and Iceberg Table Format to Store Data Only Once

Real-World Success Stories for Multiple Kafka Clusters

Paypal – Separation by Security Zone

JioCinema – Separation by Use Case and SLA

Audi – Operations vs. Analytics for Connected Cars

New Relic – Worldwide Multi-Cloud Observability

Multiple Kafka Clusters are the Norm; Not an Exception!

Energy Trading with Apache Kafka and Flink

What is Energy Trading?

Objectives of Energy Trading

What is Data Streaming with Apache Kafka and Flink?

Trading Architecture with Apache Kafka

Why Apache Kafka and Flink for Energy Trading?

Real-Time Data Processing

Scalability and Performance

Fault Tolerance and Reliability

Integration and Flexibility

Event-Driven Architecture (EDA)

Energy Trading at Uniper

Uniper’s Business Value of Data Streaming

Uniper’s IT Landscape

Webinar Recording: Energy Trading with Kafka and Flink @ Uniper

IoT Data for Energy Trading

Data Streaming to Ingest IoT Data into Energy Trading

Powerledger – Energy Trading with Kafka, MongoDB and Blockchain

re.alto – Solar Trading: Insights into the Production of Solar Plants