OSIsoft PI Archives - Kai Waehner https://www.kai-waehner.de/blog/category/osisoft-pi/ Technology Evangelist - Big Data Analytics - Middleware - Apache Kafka Mon, 17 Mar 2025 12:45:14 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 https://www.kai-waehner.de/wp-content/uploads/2020/01/cropped-favicon-32x32.png OSIsoft PI Archives - Kai Waehner https://www.kai-waehner.de/blog/category/osisoft-pi/ 32 32 Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming https://www.kai-waehner.de/blog/2025/03/17/modernizing-ot-middleware-the-shift-to-open-industrial-iot-architectures-with-data-streaming/ Mon, 17 Mar 2025 12:45:14 +0000 https://www.kai-waehner.de/?p=7573 Legacy OT middleware is struggling to keep up with real-time, scalable, and cloud-native demands. As industries shift toward event-driven architectures, companies are replacing vendor-locked, polling-based systems with Apache Kafka, MQTT, and OPC-UA for seamless OT-IT integration. Kafka serves as the central event backbone, MQTT enables lightweight device communication, and OPC-UA ensures secure industrial data exchange. This approach enhances real-time processing, predictive analytics, and AI-driven automation, reducing costs and unlocking scalable, future-proof architectures.

The post Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming appeared first on Kai Waehner.

]]>
Operational Technology (OT) has traditionally relied on legacy middleware to connect industrial systems, manage data flows, and integrate with enterprise IT. However, these monolithic, proprietary, and expensive middleware solutionsstruggle to keep up with real-time, scalable, and cloud-native architectures.

Just as mainframe offloading modernized enterprise IT, offloading and replacing legacy OT middleware is the next wave of digital transformation. Companies are shifting from vendor-locked, heavyweight OT middleware to real-time, event-driven architectures using Apache Kafka and Apache Flink—enabling cost efficiency, agility, and seamless edge-to-cloud integration.

This blog explores why and how organizations are replacing traditional OT middleware with data streaming, the benefits of this shift, and architectural patterns for hybrid and edge deployments.

Replacing OT Middleware with Data Streaming using Kafka and Flink for Cloud-Native Industrial IoT with MQTT and OPC-UA

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including architectures and customer stories for hybrid IT/OT integration scenarios.

Why Replace Legacy OT Middleware?

Industrial environments have long relied on OT middleware like OSIsoft PI, proprietary SCADA systems, and industry-specific data buses. These solutions were designed for polling-based communication, siloed data storage, and batch integration. But today’s real-time, AI-driven, and cloud-native use cases demand more.

Challenges: Proprietary, Monolithic, Expensive

  • High Costs – Licensing, maintenance, and scaling expenses grow exponentially.
  • Proprietary & Rigid – Vendor lock-in restricts flexibility and data sharing.
  • Batch & Polling-Based – Limited ability to process and act on real-time events.
  • Complex Integration – Difficult to connect with cloud and modern IT systems.
  • Limited Scalability – Not built for the massive data volumes of IoT and edge computing.

Just as PLCs are transitioning to virtual PLCs, eliminating hardware constraints and enabling software-defined industrial control, OT middleware is undergoing a similar shift. Moving from monolithic, proprietary middleware to event-driven, streaming architectures with Kafka and Flink allows organizations to scale dynamically, integrate seamlessly with IT, and process industrial data in real time—without vendor lock-in or infrastructure bottlenecks.

Data streaming is NOT a direct replacement for OT middleware, but it serves as the foundation for modernizing industrial data architectures. With Kafka and Flink, enterprises can offload or replace OT middleware to achieve real-time processing, edge-to-cloud integration, and open interoperability.

Event-driven Architecture with Data Streaming using Kafka and Flink in Industrial IoT and Manufacturing

While Kafka and Flink provide real-time, scalable, and event-driven capabilities, last-mile integration with PLCs, sensors, and industrial equipment still requires OT-specific SDKs, open interfaces, or lightweight middleware. This includes support for MQTT, OPC UA or open-source solutions like Apache PLC4X to ensure seamless connectivity with OT systems.

Apache Kafka: The Backbone of Real-Time OT Data Streaming

Kafka acts as the central nervous system for industrial data to ensure low-latency, scalable, and fault-tolerant event streaming between OT and IT systems.

  • Aggregates and normalizes OT data from sensors, PLCs, SCADA, and edge devices.
  • Bridges OT and IT by integrating with ERP, MES, cloud analytics, and AI/ML platforms.
  • Operates seamlessly in hybrid, multi-cloud, and edge environments, ensuring real-time data flow.
  • Works with open OT standards like MQTT and OPC UA, reducing reliance on proprietary middleware solutions.

And just to be clear: Apache Kafka and similar technologies support “IT real-time” (meaning milliseconds of latency and sometimes latency spikes). This is NOT about hard real-time in the OT world for embedded systems or safety critical applications.

Flink powers real-time analytics, complex event processing, and anomaly detection for streaming industrial data.

Condition Monitoring and Predictive Maintenance with Data Streaming using Apache Kafka and Flink

By leveraging Kafka and Flink, enterprises can process OT and IT data only once, ensuring a real-time, unified data architecture that eliminates redundant processing across separate systems. This approach enhances operational efficiency, reduces costs, and accelerates digital transformation while still integrating seamlessly with existing industrial protocols and interfaces.

Unifying Operational (OT) and Analytical (IT) Workloads

As industries modernize, a shift-left architecture approach ensures that operational data is not just consumed for real-time operational OT workloads but is also made available for transactional and analytical IT use cases—without unnecessary duplication or transformation overhead.

The Shift-Left Architecture: Bringing Advanced Analytics Closer to Industrial IoT

In traditional architectures, OT data is first collected, processed, and stored in proprietary or siloed middleware systems before being moved later to IT systems for analysis. This delayed, multi-step process leads to inefficiencies, including:

  • High latency between data collection and actionable insights.
  • Redundant data storage and transformations, increasing complexity and cost.
  • Disjointed AI/ML pipelines, where models are trained on outdated, pre-processed data rather than real-time information.

A shift-left approach eliminates these inefficiencies by bringing analytics, AI/ML, and data science closer to the raw, real-time data streams from the OT environments.

Shift Left Architecture with Data Streaming into Data Lake Warehouse Lakehouse

Instead of waiting for batch pipelines to extract and move data for analysis, a modern architecture integrates real-time streaming with open table formats to ensure immediate usability across both operational and analytical workloads.

Open Table Format with Apache Iceberg / Delta Lake for Unified Workloads and Single Storage Layer

By integrating open table formats like Apache Iceberg and Delta Lake, organizations can:

  • Unify operational and analytical workloads to enable both real-time data streaming and batch analytics in a single architecture.
  • Eliminate data silos, ensuring that OT and IT teams access the same high-quality, time-series data without duplication.
  • Ensure schema evolution and ACID transactions to enable robust and flexible long-term data storage and retrieval.
  • Enable real-time and historical analytics, allowing engineers, business users, and AI/ML models to query both fresh and historical data efficiently.
  • Reduce the need for complex ETL pipelines, as data is written once and made available for multiple workloadssimultaneously. And no need to use the anti-pattern of Reverse ETL.

The Result: An Open, Cloud-Native, Future-Proof Data Historian for Industrial IoT

This open, hybrid OT/IT architecture allows organizations to maintain real-time industrial automation and monitoring with Kafka and Flink, while ensuring structured, queryable, and analytics-ready data with Iceberg or Delta Lake. The shift-left approach ensures that data streams remain useful beyond their initial OT function, powering AI-driven automation, predictive maintenance, and business intelligence in near real-time rather than relying on outdated and inconsistent batch processes.

Open and Cloud Native Data Historian in Industrial IoT and Manufacturing with Data Streaming using Apache Kafka and Flink

By adopting this unified, streaming-first architecture to build an open and cloud-native data historian, organizations can:

  • Process data once and make it available for both real-time decisions and long-term analytics.
  • Reduce costs and complexity by eliminating unnecessary data duplication and movement.
  • Improve AI/ML effectiveness by feeding models with real-time, high-fidelity OT data.
  • Ensure compliance and historical traceability without compromising real-time performance.

This approach future-proofs industrial data infrastructures, allowing enterprises to seamlessly integrate IT and OT, while supporting cloud, edge, and hybrid environments for maximum scalability and resilience.

Key Benefits of Offloading OT Middleware to Data Streaming

  • Lower Costs – Reduce licensing fees and maintenance overhead.
  • Real-Time Insights – No more waiting for batch updates; analyze events as they happen.
  • One Unified Data Pipeline – Process data once and make it available for both OT and IT use cases.
  • Edge and Hybrid Cloud Flexibility – Run analytics at the edge, on-premise, or in the cloud.
  • Open Standards & Interoperability – Support MQTT, OPC UA, REST/HTTP, Kafka, and Flink, avoiding vendor lock-in.
  • Scalability & Reliability – Handle massive sensor and machine data streams continuously without performance degradation.

A Step-by-Step Approach: Offloading vs. Replacing OT Middleware with Data Streaming

Companies transitioning from legacy OT middleware have several strategies by leveraging data streaming as an integration and migration platform:

  1. Hybrid Data Processing
  2. Lift-and-Shift
  3. Full OT Middleware Replacement

1. Hybrid Data Streaming: Process Once for OT and IT

Why?

Traditional OT architectures often duplicate data processing across multiple siloed systems, leading to higher costs, slower insights, and operational inefficiencies. Many enterprises still process data inside expensive legacy OT middleware, only to extract and reprocess it again for IT, analytics, and cloud applications.

A hybrid approach using Kafka and Flink enables organizations to offload processing from legacy middleware while ensuring real-time, scalable, and cost-efficient data streaming across OT, IT, cloud, and edge environments.

Offloading from OT Middleware like OSISoft PI to Data Streaming with Kafka and Flink

How?

Connect to the existing OT middleware via:

  • A Kafka Connector (if available).
  • HTTP APIs, OPC UA, or MQTT for data extraction.
  • Custom integrations for proprietary OT protocols.
  • Lightweight edge processing to pre-filter data before ingestion.

Use Kafka for real-time ingestion, ensuring all OT data is available in a scalable, event-driven pipeline.

Process data once with Flink to:

  • Apply real-time transformations, aggregations, and filtering at scale.
  • Perform predictive analytics and anomaly detection before storing or forwarding data.
  • Enrich OT data with IT context (e.g., adding metadata from ERP or MES).

Distribute processed data to the right destinations, such as:

  • Time-series databases for historical analysis and monitoring.
  • Enterprise IT systems (ERP, MES, CMMS, BI tools) for decision-making.
  • Cloud analytics and AI platforms for advanced insights.
  • Edge and on-prem applications that need real-time operational intelligence.

Result?

  • Eliminate redundant processing across OT and IT, reducing costs.
  • Real-time data availability for analytics, automation, and AI-driven decision-making.
  • Unified, event-driven architecture that integrates seamlessly with on-premise, edge, hybrid, and cloud environments.
  • Flexibility to migrate OT workloads over time, without disrupting current operations.

By offloading costly data processing from legacy OT middleware, enterprises can modernize their industrial data infrastructure while maintaining interoperability, efficiency, and scalability.

2. Lift-and-Shift: Reduce Costs While Keeping Existing OT Integrations

Why?

Many enterprises rely on legacy OT middleware like OSIsoft PI, proprietary SCADA systems, or industry-specific data hubs for storing and processing industrial data. However, these solutions come with high licensing costs, limited scalability, and an inflexible architecture.

A lift-and-shift approach provides an immediate cost reduction by offloading data ingestion and storage to Apache Kafka while keeping existing integrations intact. This allows organizations to modernize their infrastructure without disrupting current operations.

How?

Use the Stranger Fig Design Pattern as a gradual modernization approach where new systems incrementally replace legacy components, reducing risk and ensuring a seamless transition:

Stranger Fig Pattern to Integrate, Migrate, Replace

“The most important reason to consider a strangler fig application over a cut-over rewrite is reduced risk.” Martin Fowler

Replace expensive OT middleware for ingestion and storage:

  • Deploy Kafka as a scalable, real-time event backbone to collect and distribute data.
  • Offload sensor, PLC, and SCADA data from OSIsoft PI, legacy brokers, or proprietary middleware.
  • Maintain the connectivity with existing OT applications to prevent workflow disruption.

Streamline OT data processing:

  • Store and distribute data in Kafka instead of proprietary, high-cost middleware storage.
  • Leverage schema-based data governance to ensure compatibility across IT and OT systems.
  • Reduce data duplication by ingesting once and distributing to all required systems.

Maintain existing IT and analytics integrations:

  • Keep connections to ERP, MES, and BI platforms via Kafka connectors.
  • Continue using existing dashboards and reports while transitioning to modern analytics platforms.
  • Avoid vendor lock-in and enable future migration to cloud or hybrid solutions.

Result?

  • Immediate cost savings by reducing reliance on expensive middleware storage and licensing fees.
  • No disruption to existing workflows, ensuring continued operational efficiency.
  • Scalable, future-ready architecture with the flexibility to expand to edge, cloud, or hybrid environments over time.
  • Real-time data streaming capabilities, paving the way for predictive analytics, AI-driven automation, and IoT-driven optimizations.

A lift-and-shift approach serves as a stepping stone toward full OT modernization, allowing enterprises to gradually transition to a fully event-driven, real-time architecture.

3. Full OT Middleware Replacement: Cloud-Native, Scalable, and Future-Proof

Why?

Legacy OT middleware systems were designed for on-premise, batch-based, and proprietary environments, making them expensive, inflexible, and difficult to scale. As industries embrace cloud-native architectures, edge computing, and real-time analytics, replacing traditional OT middleware with event-driven streaming platforms enables greater flexibility, cost efficiency, and real-time operational intelligence.

A full OT middleware replacement eliminates vendor lock-in, outdated integration methods, and high-maintenance costs while enabling scalable, event-driven data processing that works across edge, on-premise, and cloud environments.

How?

Use Kafka and Flink as the Core Data Streaming Platform

  • Kafka replaces legacy data brokers and middleware storage by handling high-throughput event ingestion and real-time data distribution.
  • Flink provides advanced real-time analytics, anomaly detection, and predictive maintenance capabilities.
  • Process OT and IT data in real-time, eliminating batch-based limitations.

Replace Proprietary Connectors with Lightweight, Open Standards

  • Deploy MQTT or OPC UA gateways to enable seamless communication with sensors, PLCs, SCADA, and industrial controllers.
  • Eliminate complex, costly middleware like OSIsoft PI with low-latency, open-source integration.
  • Leverage Apache PLC4X for industrial protocol connectivity, avoiding proprietary vendor constraints.

Adopt a Cloud-Native, Hybrid, or On-Premise Storage Strategy

  • Store time-series data in scalable, purpose-built databases like InfluxDB or TimescaleDB.
  • Enable real-time query capabilities for monitoring, analytics, and AI-driven automation.
  • Ensure data availability across on-premise infrastructure, hybrid cloud, and multi-cloud deployments.

Journey from Legacy OT Middleware to Hybrid Cloud

Modernize IT and Business Integrations

  • Enable seamless OT-to-IT integration with ERP, MES, BI, and AI/ML platforms.
  • Stream data directly into cloud-based analytics services, digital twins, and AI models.
  • Build real-time dashboards and event-driven applications for operators, engineers, and business stakeholders.

OT Middleware Integration, Offloading and Replacement with Data Streaming for IoT and IT/OT

Result?

  • Fully event-driven and cloud-native OT architecture that eliminates legacy bottlenecks.
  • Real-time data streaming and processing across all industrial environments.
  • Scalability for high-throughput workloads, supporting edge, hybrid, and multi-cloud use cases.
  • Lower operational costs and reduced maintenance overhead by replacing proprietary, heavyweight OT middleware.
  • Future-ready, open, and extensible architecture built on Kafka, Flink, and industry-standard protocols.

By fully replacing OT middleware, organizations gain real-time visibility, predictive analytics, and scalable industrial automation, unlocking new business value while ensuring seamless IT/OT integration.

Helin is an excellent example for a cloud-native IT/OT data solution powered by Kafka and Flink to focus on real-time data integration and analytics, particularly in the context of industrial and operational environments. Its industry focus on maritime and energy sector, but this is relevant across all IIoT industries.

Why This Matters: The Future of OT is Real-Time & Open for Data Sharing

The next generation of OT architectures is being built on open standards, real-time streaming, and hybrid cloud.

  • Most new industrial sensors, machines, and control systems are now designed with Kafka, MQTT, and OPC UA compatibility.
  • Modern IT architectures demand event-driven data pipelines for AI, analytics, and automation.
  • Edge and hybrid computing require scalable, fault-tolerant, real-time processing.

Industrial IoT Data Streaming Everywhere Edge Hybrid Cloud with Apache Kafka and Flink

Use Kafka Cluster Linking for seamless bi-directional data replication and command&control, ensuring low-latency, high-availability data synchronization across on-premise, edge, and cloud environments.

Enable multi-region and hybrid edge to cloud architectures with real-time data mirroring to allow organizations to maintain data consistency across global deployments while ensuring business continuity and failover capabilities.

It’s Time to Move Beyond Legacy OT Middleware to Open Standards like MQTT, OPC-UA, Kafka

The days of expensive, proprietary, and rigid OT middleware are numbered (at least for new deployments). Industrial enterprises need real-time, scalable, and open architectures to meet the growing demands of automation, predictive maintenance, and industrial IoT. By embracing open IoT and data streaming technologies, companies can seamlessly bridge the gap between Operational Technology (OT) and IT, ensuring efficient, event-driven communication across industrial systems.

MQTT, OPC-UA and Apache Kafka are a match in heaven for industrial IoT:

  • MQTT enables lightweight, publish-subscribe messaging for industrial sensors and edge devices.
  • OPC-UA provides secure, interoperable communication between industrial control systems and modern applications.
  • Kafka acts as the high-performance event backbone, allowing data from OT systems to be streamed, processed, and analyzed in real time.

Whether lifting and shifting, optimizing hybrid processing, or fully replacing legacy middleware, data streaming is the foundation for the next generation of OT and IT integration. With Kafka at the core, enterprises can decouple systems, enhance scalability, and unlock real-time analytics across the entire industrial landscape.

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And make sure to download my free book about data streaming use cases and industry success stories.

The post Modernizing OT Middleware: The Shift to Open Industrial IoT Architectures with Data Streaming appeared first on Kai Waehner.

]]>
Apache Kafka for Smart Grid, Utilities and Energy Production https://www.kai-waehner.de/blog/2021/01/14/apache-kafka-smart-grid-energy-production-edge-iot-oil-gas-green-renewable-sensor-analytics/ Thu, 14 Jan 2021 09:59:09 +0000 https://www.kai-waehner.de/?p=3018 The energy industry is changing from system-centric to smaller-scale and distributed smart grids and microgrids. These smart grids require a flexible, scalable, elastic, and reliable cloud-native infrastructure for real-time data integration and processing. This post explores use cases, architectures, and real-world deployments of event streaming with Apache Kafka in the energy industry to implement smart grids and real-time end-to-end integration.

The post Apache Kafka for Smart Grid, Utilities and Energy Production appeared first on Kai Waehner.

]]>
The energy industry is changing from system-centric to smaller-scale and distributed smart grids and microgrids. A smart grid requires a flexible, scalable, elastic, and reliable cloud-native infrastructure for real-time data integration and processing. This post explores use cases, architectures, and real-world deployments of event streaming with Apache Kafka in the energy industry to implement a smart grid and real-time end-to-end integration.

Smart Grid Energy Production and Distribution with Apache Kafka

Smart Grid – The Energy Production and Distribution of the Future

The energy sector includes corporations that primarily are in the business of producing or supplying energy such as fossil fuels or renewables.

What is a Smart Grid?

A smart grid is an electrical grid that includes a variety of operation and energy measures,, including smart meters, smart appliances, renewable energy resources, and energy-efficient resources. Electronic power conditioning and control of the production and distribution of electricity are important aspects of the smart grid.

The European Union Commission Task Force for Smart Grids provides smart grid definition as:

“A Smart Grid is an electricity network that can cost-efficiently integrate the behavior and actions of all users connected to it – generators, consumers and those that do both – to ensure economically efficient, sustainable power system with low losses and high levels of quality and security of supply and safety. A smart grid employs innovative products and services, together with intelligent monitoring, control, communication, and self-healing technologies to:

  1. Better facilitate the connection and operation of generators of all sizes and technologies.
  2. Allow consumers to play a part in optimizing the operation of the system.
  3. Provide consumers with greater information and options for how they use their supply.
  4. Significantly reduce the environmental impact of the whole electricity supply system.
  5. Maintain or even improve the existing high levels of system reliability, quality, and security of supply.
  6. Maintain and improve existing services efficiently.”

Technologies and Evolution of a Smart Grid

The Roll-out of smart grid technology also implies a fundamental re-engineering of the electricity services industry, although typical usage of the term is focused on the technical infrastructure. Key smart grid technologies include solar power, smart meters, microgrids, and self-optimizing systems:

Smart Grid - Energy Industry

Requirements for a Smart Grid and Modern Energy Infrastructure

  • Reliability: The smart grid uses technologies such as state estimation which improve fault detection and allow self-healing of the network without the intervention of technicians. This will ensure a more reliable supply of electricity and reduced vulnerability to natural disasters or attack.
  • Flexibility in network topology: Next-generation transmission and distribution infrastructure will be better able to handle possible bidirectional energy flows, allowing for distributed generation such as from photovoltaic panels on building roofs, but also charging to/from the batteries of electric cars, wind turbines, pumped hydroelectric power, the use of fuel cells, and other sources.
  • Efficiency: Numerous contributions to the overall improvement of the efficiency of energy infrastructure are anticipated from the deployment of smart grid technology, in particular including demand-side management, for example, turning off air conditioners during short-term spikes in electricity price, reducing the voltage when possible on distribution lines through Voltage/VAR Optimization (VVO), eliminating truck-rolls for meter reading, and reducing truck-rolls by improved outage management using data from Advanced Metering Infrastructure systems. The overall effect is less redundancy in transmission and distribution lines and greater utilization of generators, leading to lower power prices.
  • Sustainability: The smart grid’s improved flexibility permits greater penetration of highly variable renewable energy sources such as solar power and wind power, even without the addition of energy storage.
  • Market-enabling: The smart grid allows for systematic communication between suppliers (their energy price) and consumers (their willingness-to-pay. It permits both the suppliers and the consumers more flexible and sophisticated operational strategies.
  • Cybersecurity: Provide a secure infrastructure with encrypted and authenticated communication and real-time anomaly detection at scale across the supply chain.

Architectures with Kafka for a Smart Grid

From a technical perspective, use cases such as load adjustment/load balancing or peak curtailment/leveling and time of use pricing cannot be implemented with traditional, monolith software like they were used in the past in the energy industry.

A smart grid requires a cloud-native infrastructure that is flexible, scalable, elastic, and reliable. All of that in combination with real-time data integration and data processing. These requirements explain why more and more energy companies rely heavily on event streaming with Apache Kafka and its ecosystem.

Energy Production and Distribution with a Hybrid Architecture

Many energy companies have a cloud-first strategy. They build new innovative applications in the cloud. Especially in the energy industry, this makes a lot of sense. The need for elastic and scalable data processing is key to success. However, not everything can run in the cloud. Most energy-related use cases required data processing at the edge, too. This is true for energy production and energy distribution.

Here is an example architecture leveraging Apache Kafka in the cloud and at the edge:

Smart Grid Energy Production and Distribution with Apache Kafka in a Hybrid Architecture

The integration in the cloud requires connecting to modern technologies such as Snowflake data warehouse, Google’s Machine Learning services based on TensorFlow, or 3rd party SaaS services like Salesforce. The edge is different. Connectivity is required for machines, equipment, sensors, PLCs, SCADA systems, and many other systems. Kafka Connect is a perfect, Kafka-native tool to implement these integrations in real-time at scale.

Replication in real-time between the edge sites and the cloud is another important case where tools like MirrorMaker 2 or Confluent’s Cluster Linking fit perfectly.

The continuous processing of data (aka stream processing) is possible with Kafka-native components like Kafka Streams or ksqlDB. Using an external tool such as Apache Flink is also a good fit.

Event Streaming for Energy Production at the Edge with a 5G Campus Network

Kafka at the edge is the new black. Energy production a great example:

Event Streaming with Apache Kafka for Energy Production and Smart Grid at the Edge with a 5G Campus Network

More details about deploying Kafka at edge sites is described in the post “Building a Smart Factory with Apache Kafka and 5G Campus Networks“.

The edge is often disconnected from the cloud or remote data centers. Mission-critical applications have to run 24/7 in a decoupled way even if the internet connection is not available or not stable:

Energy Production and Smart Grid at the Disconnected Edge with Apache Kafka

Example: Real-Time Outage Management with Kafka in the Utilities Sector

Let’s take a look at an example implemented in the utilities sector: Continous processing of smart meter sensor data with Kafka and ksqlDB:

Smart Meters - High Frequency Noise Filter with Apache Kafka

The preprocessing and filtering happens at the edge, as shown in the above picture. However, the aggregation and monitoring of all the assets of the smart grid (including smart home, smart buildings, powerlines, switches, etc.) happen in the cloud:

Cloud Aggregator for Field Management and Smart Grid with Apache Kafka

 

Real-time data processing is not just important for operations. Huge added value comes from the improved customer experience. For instance, the outage management operations tool can alert a customer in real-time:

Real-Time Outage Management for a Better Customer Experience with Apache Kafka

Let’s now take a look at a few real-world examples of energy-related use cases.

Kafka Real-World Deployments in the Energy Sector

This section explores three very different deployments and architectures of Kafka-based infrastructure to build smart grids and energy production-related use cases: EON, WPX Energy, and Tesla.

EON – Smart Grid for Energy Production and Distribution with Kafka

The EON Streaming Platform is built on the following paradigms to provide a cloud-native smart grid infrastructure:

  • IoT scale capabilities of public cloud providers
  • EON microservices that are independent of cloud providers
  • Real-time data integration and processing powered by Apache Kafka

Kafka at EON Cloud Streaming Platform

WPX Energy – Kafka at the Edge for Integration and Analytics

WPX Energy (now merged into Devon Energy) is a company in the oil&gas industry. The digital transformation creates many opportunities to improve processes and reduce costs in this vertical. WPX leverages Confluent Platform on Hivecell edge hardware to realize edge processing and replication to the cloud in real-time at scale.

Concept of automation in oil and gas industry

The solution is designed for true real-time decision making and future potential closed-loop control optimization.  WPX conducts edge stream processing to enable true real-time decision-making at well site. They also replicate business-relevant data streams produced by machine learning models and analytical preprocessed data at the well site to the cloud, enabling WPX to harness the full power of its real-time events.

Kafka deployments at the edge (i.e., outside a data center) come up more and more in the energy industry, but also in factories, restaurants, retail stores, banks, and hospitals.

Tesla – Kafka-based Data Platform for Trillions of Data Points per Day

Tesla is not just a car maker. Telsa is a tech company writing a lot of innovative and cutting-edge software. They provide an energy infrastructure for cars with their Telsa Superchargers, solar energy production at their Gigafactories, and much more. Processing and analyzing the data from their smart grids and the integration with the rest of the IT backend services in real-time is a key piece of their success:

Telsa Factory

Tesla has built a Kafka-based data platform infrastructure “to support millions of devices and trillions of data points per day”. Tesla showed an interesting history and evolution of their Kafka usage at a Kafka Summit in 2019:

History of Kafka Usage at Tesla

Kafka + OT Middleware (OSIsoft PI, Siemens MindSphere, et al)

A common and very relevant question is the relation between Apache Kafka and traditional OT middleware such as OSIsoft PI or Siemens MindSphere.

TL;DR: Apache Kafka and OT middleware complement each other. Kafka is NOT an IoT platform. OT middleware is not built for the integration and correlation of OT data with the rest of the enterprise IT. Kafka and OT middleware connect to each other very well. Both sides provide integration options, including REST APIs, native Kafka Connect connectors, and more. Hence, in many cases, enterprises leverage and integrate both technologies instead of choosing just one of them.

Apache Kafka and OT Middleware such as OSIsoft PI or Siemens MindSphere

Please check out the following blogs/slides/videos to understand how Apache Kafka and OT middleware complement each other very well:

Slides: Kafka-based Smart Grid and Energy Use Cases and Architectures

The following slide deck goes into more details about this topic:

The Future of Kafka for the Energy Sector and Smart Grid

Kafka is relevant in many use cases for building an elastic and scalable smart grid infrastructure. Even beyond, many projects utilize Kafka heavily for customer 360, payment processing, and many other use cases. Check out the various “Use Cases and Architectures for Apache Kafka across Industries“. Energy companies can apply many of these use cases, too.

If you have read this far and actually wonder what “real-time” actually means in the context of Kafka and the OT/IT convergence, please check out the post “Kafka is NOT hard real-time“.

What are your experiences and plans for event streaming in the energy industry? Did you already build a smart grid with Apache Kafka? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

 

The post Apache Kafka for Smart Grid, Utilities and Energy Production appeared first on Kai Waehner.

]]>