MQTT Archives - Kai Waehner

IoT and Data Streaming with Kafka for a Tolling Traffic System with Dynamic Pricing

Kai Waehner — Fri, 01 Nov 2024 08:13:05 +0000

In the rapidly evolving landscape of intelligent traffic systems, innovative software provides real-time processing capabilities, dynamic pricing and new customer experiences, particularly in the domains of tolling, payments and safety inspection. With the increasing complexity of road networks and the need for efficient traffic management, these organizations are embracing cutting-edge technology to revolutionize traffic and logistics systems. This blog post explores success stories from Quarterhill and DKV Mobility providing traffic and payment systems for tolls. Data streaming powered by Apache Kafka has been pivotal in the journey towards building intelligent traffic systems in the cloud.

Traffic System for Tolls: Use Case, Challenges, and Business Models

Tolling systems are integral to modern infrastructure by providing a mechanism for funding road maintenance and expansion. The primary use case for tolling systems is to efficiently manage and collect tolls from vehicles using roadways. This involves roadside tracking, back-office accounting, and payment processing. However, the implementation of such systems is loaded with challenges.

Use Cases and Business Models for Tolling

Various business models have emerged to provide comprehensive tolling and payment solutions that integrate technology and data-driven strategies to optimize operations and revenue generation:

Roadside Tracking and Data Collection: At the core of modern tolling systems is the integration of IoT devices for roadside tracking. These devices capture essential data, such as vehicle identification, speed, and lane usage. This data is crucial for calculating tolls accurately and in real-time. The business model here involves deploying and maintaining a network of sensors and cameras that ensure seamless data collection across toll points.
Back-Office Accounting and Payment Processing: A robust back-office system is essential for processing toll transactions, managing accounts, and handling payments. This includes integrating with financial institutions for payment processing and ensuring compliance with financial regulations. The business model focuses on providing a secure and efficient platform for managing financial transactions, reducing administrative overhead, and enhancing customer satisfaction through streamlined payment processes.
Dynamic Pricing Models: To optimize revenue and manage traffic flow, tolling systems can implement dynamic pricing models. These models adjust toll rates based on real-time traffic conditions, time of day, and demand. By leveraging data analytics and machine learning, toll operators can predict traffic patterns and set prices that encourage optimal road usage. The business model here involves using data-driven insights to maximize revenue while minimizing congestion and improving the overall driving experience.
Interoperability and Cross-Agency Collaboration: Vehicles often travel across multiple tolling jurisdictions, causing interoperability between different toll agencies. Business models in this area focus on creating partnerships and agreements that allow for seamless data exchange and revenue sharing. This ensures that tolls are accurately attributed and collected, regardless of jurisdiction. This enhances the user experience and operational efficiency.
Subscription and Membership Models: Some tolling systems offer subscription or membership models that provide users with benefits such as discounted rates, priority access to express lanes, or bundled services. This business model aims to build customer loyalty and generate steady revenue streams by offering value-added services and personalized experiences.
Public-Private Partnerships (PPPs): Many tolling systems are developed and operated through collaborations. These leverage the strengths of both sectors, with the public sector providing regulatory oversight and the private sector offering technological expertise and investment. The business model focuses on sharing risks and rewards. This strategy ensures sustainable and efficient tolling operations.

Challenges of Traffic Systems

Intelligent tolling systems create lots of challenges for the project teams:

Integration with IoT Devices: Tolling systems rely heavily on IoT devices for roadside tracking. These devices generate vast amounts of data that need to be processed in real-time to ensure accurate toll collection.
Interoperability: Ensuring interoperability between different systems is crucial with vehicles crossing state lines and using multiple toll agencies.
Data Management: Managing and processing the data generated by IoT devices and various backend IT systems such as a CRM in a scalable and reliable manner is complex.
Static Pricing: Implementing innovative revenue-generating use cases such as dynamic pricing on express lanes requires real-time data processing to adjust toll rates based on current traffic conditions.

As you might expect, implementing and deploying intelligent tolling systems requires the use of modern, cloud-natives technologies. Conventional data integration and processing solutions, like databases, data lakes, ETL tools, or API platforms, lack the necessary capabilities. Therefore, data streaming becomes essential…

Data Streaming with Apache Kafka and Flink for Intelligent Traffic Systems

The ability to process data in real-time is crucial for ensuring efficient and accurate toll collection. Data streaming has emerged as a transformative technology that addresses the unique challenges faced by tolling systems, particularly in integrating IoT devices and implementing dynamic pricing models.

Apache Kafka became the de facto standard for data streaming. Apache Flink emerges as standard for stream processing. These technologies can help to implement tolling use cases:

Real-Time Toll Collection: Tolling systems rely on IoT devices to capture data from vehicles as they pass through toll points. This data includes vehicle identification, time of passage, and lane usage. Real-time processing of this data is essential to ensure that tolls are accurately calculated and collected without delay from IoT devices.
Dynamic Pricing Models: To optimize traffic flow and revenue, tolling systems can implement dynamic pricing models. These models adjust toll rates based on current traffic conditions, time of day, and other factors. Data streaming enables the continuous analysis of traffic data, allowing for real-time adjustments to pricing.
Interoperability Across Agencies: Vehicles often travel across multiple tolling jurisdictions, requiring seamless interoperability between different toll agencies. Data streaming facilitates the real-time exchange of data between agencies, ensuring that tolls are accurately attributed and collected regardless of jurisdiction.

Key Benefits of Data Streaming with Apache Kafka and Flink

Data streaming with Kafka and Flink makes a tremendous difference for building a next generation traffic system:

Real-Time Processing: Data streaming technologies like Apache Kafka and Apache Flink enable the real-time processing of data from IoT devices. Kafka acts as the backbone for data ingestion, capturing and storing streams of data from roadside sensors and devices. Flink provides the capability to process and analyze these data streams in real-time, ensuring that tolls are calculated and collected accurately and promptly.
Scalability: Tolling systems must handle large volumes of data, especially during peak traffic hours. Kafka’s distributed architecture allows it to scale horizontally, accommodating the growing data demands of expanding traffic networks. This scalability ensures that the system can handle increased data loads without compromising performance.
Reliability: Kafka’s robust architecture provides a reliable mechanism for tracking and processing data. It ensures that every message from IoT devices is captured and processed, reducing the risk of errors in toll collection. Kafka’s ability to replay messages also allows for recovery from potential data loss, ensuring data integrity.
Flexibility: By decoupling data processing from the underlying infrastructure, data streaming offers the flexibility to adapt to changing business needs. Kafka’s integration capabilities allow it to connect with various data sources and sinks. Flink’s stream processing capabilities enable complex event processing and real-time analytics. This flexibility allows tolling systems to develop and incorporate new technologies and business models as needed.

Quarterhill – Tolling and Enforcement with Dynamic Pricing

Quarterhill is a company that specializes in intelligent traffic systems, focusing on two main areas: tolling and safety/inspection. The company provides comprehensive solutions for managing tolling systems, which include roadside tracking, back-office accounting, and payment processing.

Source: Quarterhill

By integrating IoT devices and leveraging data streaming technologies, Quarterhill optimizes toll collection processes, implements dynamic pricing models, and ensures interoperability across different toll agencies to optimize revenue generation while ensuring smooth traffic flow.

I had the pleasure of doing a panel conversation with Josh LittleSun, VP Delivery of Quarterhill at Confluent’s Data in Motion Tour Chicago 2024.

Quarterhill’s Product Portfolio

Quarterhill’s product portfolio encompasses a comprehensive range of solutions designed to enhance traffic management and transportation systems. As you can see, many of these products are inherently designed for data streaming.

Roadside technologies include tools for congestion charging, performance management, insights and analytics, processing systems, and lane configuration, all aimed at optimizing road usage and efficiency.
Commerce and mobility platforms offer analytics, toll interoperability, a mobility marketplace, back-office solutions, and performance management, facilitating seamless transactions and mobility services.
Safety and enforcement solutions focus on ensuring compliance and safety for commercial vehicles, with features like maintenance, e-screening, tire anomaly detection, weight compliance, and commercial roadside technologies.
Smart Transportation solutions provide multi-modal data and intersection management, improving the coordination and flow of various transportation modes.
Data Solutions feature video-based systems, traffic recording systems, in-road sensor systems, and cloud-based solutions, offering advanced data collection and analysis capabilities for informed decision-making and maintenance.

How Quarterhill Built an Intelligent Traffic System in the Cloud with Data Streaming and IoT

Quarterhill’s journey towards building an intelligent traffic system began with the realization that traditional monolithic architectures did not meet the demands of modern tolling systems. The company embarked on a transformation journey, moving from monolith to microservices and adopting data streaming as a core component of their architecture.

Key Components of Quarterhill’s Intelligent Traffic System

Fully Managed Confluent Cloud on GCP: By leveraging Confluent Cloud on Google Cloud Platform (GCP) as its data streaming platform, Quarterhill could focus on solving business problems rather than managing infrastructure. This shift allowed for greater agility and reduced operational overhead.
Data Streaming Instead of Google Pub/Sub: Quarterhill chose data streaming over Google Pub/Sub because of its ability to provide various use cases beyond ingestion into the data lake, including real-time processing of transactional workloads and integration with IoT devices.
Direct Connection to the Cloud via MQTT, HTTP, and Connectors: IoT devices connect directly to Kafka using protocols like MQTT and HTTP. Connectors facilitate data integration and processing.
Edge Servers for Data Aggregation: In some cases, edge servers are used to aggregate data before sending it to the cloud. This option optimizes bandwidth usage and ensuring low-latency processing.
Consumers: Elastic, BigQuery, Custom Connectors: Data is consumed by various systems, including Elastic for search and analytics, Google BigQuery for data warehousing, and custom connectors for specific use cases.

Benefits Realized with a Fully Managed Data Streaming Platform

Elasticity and high throughput: The ability to scale with traffic volume ensures that tolling systems can handle peak loads without degradation in performance.
Resiliency and accuracy: The reliability of data streaming ensures that toll collection is accurate and resilient to failures.
Cost savings and efficiency: By moving to a fully managed cloud solution for the data streaming platform with Confluent Cloud, Quarterhill achieved significant cost savings (TCO) and reduced the demand for in-house resources.

DKV Mobility: On-the-Road Payments and Solutions

DKV Mobility stands as a leading European B2B platform specializing in on-the-road payments and solutions. With a robust customer base of over 300,000 active clients spanning over 50 service countries, DKV Mobility has revolutionized the way businesses manage their on-the-road expenses. The platform enables real-time payments and transaction processing, providing valuable insights for businesses on the move.

Source: DKV Mobility

DKV Mobility’s comprehensive services cover a wide range of needs, including refueling, electric vehicle (EV) charging, toll solutions, and vehicle services. The platform supports approximately 468,000 EV charge points, 63,000 fuel service stations, and 30,000 vehicle service stations, ensuring that businesses have access to essential services wherever they operate. Through its innovative solutions, DKV Mobility enhances operational efficiency and cost management for businesses across Europe.

If you are interested in how DKV Mobility transitioned from open source Kafka to fully managed SaaS and how they leverage stream processing with Kafka Streams, check out the DKV Mobility success story.

IoT Connectivity with MQTT and HTTP + Data Streaming with Apache Kafka = Next Generation Traffic System

Quarterhill’s intelligent traffic system for tolls and DKV Mobility’s real time on-the-road payment solution exemplify the transformative power of a data streaming platform using Apache Kafka in modern infrastructure to solve a specific business problem. Related scenarios, such as logistics and supply chain, can benefit from such a foundation and connect to existing data products for new business models or B2B data exchanges with partners.

By embracing a cloud-native, microservices-based architecture, Quarterhill and DKV Mobility have not only overcome the challenges of traditional tolling and payment systems but have also set a new standard for efficiency and innovation in the industry. Use cases such as IoT sensor integration and dynamic pricing are only possible with data streaming.

As these companies continue to leverage stream processing with Kafka Streams and explore new technologies like Apache Flink and data governance solutions, the future of intelligent traffic systems looks promising. The potential is huge to further enhance safety, efficiency of payments and customer experiences, and revenue generation on roadways.

How do you leverage data streaming in your enterprise architecture? How do you connect to IoT interfaces? What is your data processing strategy? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post IoT and Data Streaming with Kafka for a Tolling Traffic System with Dynamic Pricing appeared first on Kai Waehner.

Apache Kafka in Manufacturing at Automotive Supplier Brose for Industrial IoT Use Cases

Kai Waehner — Thu, 13 Jun 2024 07:15:57 +0000

Data streaming unifies OT/IT workloads by connecting information from sensors, PLCs, robotics and other manufacturing systems at the edge with business applications and the big data analytics world in the cloud. This blog post explores how the global automotive supplier Brose deploys a hybrid industrial IoT architecture using Apache Kafka in combination with Eclipse Kura, OPC-UA, MuleSoft and SAP.

Data Streaming and Industrial IoT / Industry 4.0

Data streaming with Apache Kafka plays a critical role in Industrial IoT by enabling real-time data ingestion, processing, and analysis from various industrial devices and sensors. Kafka’s high throughput and scalability ensure that it can reliably handle and integrate massive streams of data from IoT devices into analytics platforms for valuable insights. This real-time capability enhances predictive maintenance, operational efficiency, and overall automation in industrial settings.

Here is an exemplary hybrid industrial IoT architecture with data streaming at the edge in the factory and 5G supply chain environments synchronizing in real-time with business applications and analytics / AI platforms in the cloud:

Brose – A Global Automotive Supplier

Brose is a global automotive supplier headquartered in beautiful Franconia, Bavaria, Germany. The company has a global presence with 70 locations, 25 countries, 5 continents, and about 30,000 employees.

Brose specializes in mechatronic systems for vehicle doors, seats, and electric motors. They develop and manufacture innovative products that enhance vehicle comfort, safety, and efficiency, serving major car manufacturers worldwide.

Source: Brose

Brose’s Hybrid Architecture for Industry 4.0 with Eclipse Kura, OPC UA, Kafka, SAP and MuleSoft

Brose is an excellent example of combining data streaming using Confluent with other technologies like open source Eclipse Kura and OPC-UA for the OT and edge site, and IT infrastructure and cloud software like SAP, Splunk, SQL Server, AWS Kinesis and MuleSoft:

Source: Brose

Here is how it works according to Sven Matuschzik, Director of IT-Platforms and Databases at Brose:

Regional Kafka on-premise clusters are embedded within the IIoT and production platform, facilitating seamless data flow from the shop floor to the business world in combination with other integration tools. This hybrid IoT streaming architecture connects machines to the IT infrastructure, mastering various technologies, and ensuring zero trust security with micro-segmentation. It manages latencies between sites and central IT, enables two-way communication between machines and the IT world, and maintains high data quality from the shop floor.

For more insights from Brose (and Siemens) about IoT and data streaming with Apache Kafka, listen to the following interactive discussion.

Interactive Discussion with Siemens and Brose about Data Streaming and IoT

Brose and Siemens discussed with me

the practical strategies employed by Brose and Siemens to integrate data streaming in IoT for real-time data utilization.
the challenges faced by both companies in embracing data streaming, and reveal how they overcame barriers to maximize their potential with a hybrid cloud infrastructure.
how these enterprise architectures will be expanded, including real-time data sharing with customers, partners, and suppliers, and the potential impact of artificial intelligence (AI), including GenAI, on data streaming efforts, providing valuable insights to drive business outcomes and operational efficiency.
the significance of event-driven architectures and data streaming for enhanced manufacturing processes to improve overall equipment effectiveness (OEE) and seamlessly integrate with existing IT systems like SAP ERP and Salesforce CRM to optimize their operations.

Here is the video recording with Stefan Baer from Siemens and Sven Matuschzik from Brose:

Source: Confluent

Data Streaming with Apache Kafka to Unify Industrial IoT Workloads from Edge to Cloud with Apache Kafka

Many manufacturers leverage data streaming powered by Apache Kafka to unify the OT/IT world from edge sites like factories to the data center or public cloud for analytics and business applications.

I wrote a lot about data streaming with Apache Kafka and Flink in Industry 4.0, Industrial IoT and OT/IT modernization. Here are a few of my favourite articles:

How does your IoT architecture look like? Do you already use data streaming? What are the use cases and enterprise architecture? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka in Manufacturing at Automotive Supplier Brose for Industrial IoT Use Cases appeared first on Kai Waehner.

MQTT Market Trends: Cloud, Unified Namespace, Sparkplug, Kafka Integration

Kai Waehner — Fri, 08 Dec 2023 09:15:24 +0000

The lightweight and open IoT messaging protocol MQTT gets adopted more widely across industries. This blog post explores relevant market trends for MQTT: cloud deployments and fully managed services, data governance with unified namespace and Sparkplug B, MQTT vs. OPC-UA debates, and the integration with Apache Kafka for OT/IT data processing in real-time.

MQTT Summit in Munich

In December 2023, I attended the MQTT Summit Connack. HiveMQ sponsored the event. The agenda included various industry experts. The talks covered industrial IoT deployments, unified namespace, Sparkplug B, security and fleet management, and use cases for Kafka combined with MQTT like connected vehicles or smart city (my talk).

It was a pleasure to meet many industry peers of the MQTT community, independent consultants, and software vendors. I learned a lot about the adoption of MQTT in the real world, best practices, and a few trade-offs of Sparkplug B. The following sections summarize my trends for MQTT of this event combined with experiences I had this year in customer meetings around the world.

Special thanks to Kudzai Manditereza of HiveMQ to organize this great event with many international attendees across industries:

What is MQTT?

MQTT stands for Message Queuing Telemetry Transport. MQTT is a lightweight and open-source messaging protocol designed for small sensors and mobile devices with high-latency or unreliable networks. IBM originally developed MQTT in the late 1990s and later became an open standard.

MQTT follows a publish/subscribe model, where devices (or clients) communicate through a central message broker. The key components in MQTT are:

Client: The device or application that connects to the MQTT broker to send or receive messages.
Broker: The central hub that manages the communication between clients. It receives messages from publishing clients and routes them to subscribing clients based on topics.
Topic: A hierarchical string that acts as a label for a message. Clients subscribe to topics to receive messages and publish messages to specific topics.

When to use MQTT?

The publish/subscribe model allows for efficient communication between devices. When a client publishes a message to a specific topic, all other clients subscribed to that topic receive the message. This decouples the sender and receiver, enabling a scalable and flexible communication system.

The MQTT standard is known for its simplicity, low bandwidth usage, and support for unreliable networks. These characteristics make it well-suited for Internet of Things (IoT) applications, where devices often have limited resources and may operate under challenging network conditions. Good MQTT implementations provide a scalable and reliable platform for IoT projects.

MQTT has gained widespread adoption in various industries for IoT deployments, home automation, and other scenarios requiring lightweight and efficient communication.

I discuss the following four market trends for MQTT in the following sections. These have huge impact on the adoption and making a decision to choose MQTT:

MQTT in the Public Cloud
Data Governance for MQTT
MQTT vs. OPC-UA Debates
MQTT and Apache Kafka for OT/IT Data Processing

Trend 1: MQTT in the Public Cloud

Most companies have a cloud first strategy. Go serverless if you can! Focus on business problems, faster time-to-market, and an elastic infrastructure are the consequence.

Mature MQTT cloud services exist. At Confluent, we work a lot with HiveMQ. The combination even provides a fully managed integration between both cloud offerings.

Having said that, not everything can or should go to the (public) cloud. Security, latency and cost often make a deployment in the data center or at the edge (e.g., in a smart factory) the preferred or mandatory option. Hybrid architectures allow the combination of both options for building the most cost-efficient but also reliable and secure IoT infrastructure. I talked about zero-trust and air-gapped environments leveraging unidirectional hardware for the most critical use cases in another blog..

Automation and Security are the Typical Blockers for Public Cloud

Key for success, especially in hybrid architectures, is automation and fleet management with CI/CD and GitOps for multi-cluster management. Many projects leverage Kubernetes as a cloud-native infrastructure for the edge and private cloud. However, in the public cloud, the first option should always be a fully managed service (if security and other requirements allow it).

Be careful when adopting fully-managed MQTT cloud services: Support for MQTT is not always equal across the cloud vendors. Many vendors do not implement the entire protocol, miss features, and require usage limitations. HiveMQ wrote a great article showing this. The article is a bit outdated (and opinionated, of course, as a competing MQTT vendor). But it shows very well how some vendors provide offerings that are far away from a good MQTT cloud solution.

The hardest problem for public cloud adoption of MQTT is security! Double check the requirements early. Latency, availability or specific features are usually not the problem. The deployment and integration need to be compliant and follow the cloud strategy. As Industrial IoT projects always have to include some kind of edge story, it is a tougher discussion than sales or marketing projects.

Trend 2: Data Governance for MQTT

Data governance is crucial across the enterprise. From an IoT and MQTT perspective, the two main topics are unified namespace as the concept and Sparkplug B as the technology.

Unified Namespace for Industrial IoT

In the context of Industrial Internet of Things (IIoT), a unified namespace (UNS) typically refers to a standardized and cohesive way of naming and organizing devices, data, and resources within an industrial network or ecosystem. The goal is to provide a consistent naming structure that facilitates interoperability, data sharing, and management of IIoT devices and systems.

The term Unified Namespace (in Industrial IoT) was coined and popularized by Walker Reynolds, an expert and content creator for Industrial IoT.

Concepts of Unified Namespace

Here are some key aspects of a unified namespace in Industrial IoT:

Device Naming: Devices in an IIoT environment may come from various manufacturers and have different functionalities. A unified namespace ensures that devices are named consistently, making it easier for administrators, applications, and other devices to identify and interact with them.
Data Naming and Tagging: IIoT involves the generation and exchange of vast amounts of data. A unified namespace includes standardized naming conventions and tagging mechanisms for data points, variables, or attributes associated with devices. This consistency is crucial for applications that need to access and interpret data across different devices.
Interoperability: A unified namespace promotes interoperability by providing a common framework for devices and systems to communicate. When devices and applications follow the same naming conventions, it becomes easier to integrate new devices into existing systems or replace components without causing disruptions.
Security and Access Control: A well-defined namespace contributes to security by enabling effective access control mechanisms. Security policies can be implemented based on the standardized names and hierarchies, ensuring that only authorized entities can access specific devices or data.
Management and Scalability: In large-scale industrial environments, having a unified namespace simplifies device and resource management. It allows for scalable solutions where new devices can be added or replaced without requiring extensive reconfiguration.
Semantic Interoperability: Beyond just naming, a unified namespace may include semantic definitions and standards. This helps in achieving semantic interoperability, ensuring that devices and systems understand the meaning and context of the data they exchange.

Overall, a unified namespace in Industrial IoT is about establishing a common and standardized structure for naming devices, data, and resources, providing a foundation for efficient, secure, and scalable IIoT deployments. Standards organizations and industry consortia often play a role in developing and promoting these standards to ensure widespread adoption and compatibility across diverse industrial ecosystems.

Sparkplug B: Interoperability and Standardized Communication for MQTT Topics and Payloads

Unified Namespace is the theoretical concept for interoperability. The standardized implementation for payload structure enforcement is Sparkplug B. This is a specification created at the Eclipse foundation and turned into an ISO standard later.

Sparkplug B provides a set of conventions for organizing data and defining a common language for devices to exchange information. Here is an example of HiveMQ depicting how a unified namespace makes communication between devices, systems, and sites easier:

Source: HiveMQ

Key features of Sparkplug B include:

Payload Structure: Sparkplug B defines a specific format for the payload of MQTT messages. This format includes fields for information such as timestamp, data type, and value. This standardized payload structure ensures that devices can consistently understand and interpret the data being exchanged.
Topic Namespace: The specification defines a standardized topic namespace for MQTT messages. This helps in organizing and categorizing messages, making it easier for devices to discover and subscribe to relevant information.
Birth and Death Certificates: Sparkplug B introduces the concept of “Birth” and “Death” certificates for devices. When a device comes online, it sends a Birth certificate with information about itself. Conversely, when a device goes offline, it sends a Death certificate. This mechanism aids in monitoring the status of devices within the IIoT network.
State Management: The specification includes features for managing the state of devices. Devices can publish their current state, and other devices can subscribe to receive updates. This helps in maintaining a synchronized view of device states across the network.

Sparkplug B is intended to enhance the interoperability, scalability, and efficiency of IIoT deployments by providing a standardized framework for MQTT communication in industrial environments. Its adoption can simplify the integration of diverse devices and systems within an industrial ecosystem, promoting seamless communication and data exchange.

Limitations of Sparkplug B

Sparkplug B has a few limitations, such as:

Only supports Quality of Service (QoS) 0 providing at most once message delivery guarantees.
Limits in the structure of topic namespaces.
Very device centric (but MQTT is for many “things”)

Understand the pros and cons of Sparkplug B. It is perfect for some use cases. But the above limitations are blockers for some others. Especially, only supporting QoS 0 is a huge limitation for mission-critical use cases.

Trend 3: MQTT vs. OPC-UA Debates

MQTT has many benefits compared to other industrial protocols. However, OPC-UA is another standard in the IoT space that gets at least as much traction in the market as MQTT. The debate about choosing the right IoT standard is controversial, often led by emotions and opinions, and still absolutely valid to discuss.

OPC-UA (Open Platform Communications Unified Architecture) is a machine-to-machine communication protocol for industrial automation. It enables seamless and secure communication and data exchange between devices and systems in various industrial settings.

OPC UA has become a widely adopted standard in the industrial automation and control domain, providing a foundation for secure and interoperable communication between devices, machines, and systems. Its open nature and support from industry organizations contribute to its widespread use in applications ranging from manufacturing and process control to energy management and more.

If you look at the promises of MQTT and OPC-UA, a lot of overlapping exists:

Scalable
Reliable
Real-time
Open
Standardized

All of them are true for both standards. Still, trade-offs exist. I won’t start a flame war here. Just search for “MQTT vs. OPC-UA”. You will find many blog posts, articles and videos. Most are very opinionated (and often driven by a vendor). Reality is that the industry adopted both MQTT and OPC-UA widely.

And while the above characteristics might all be true for both standards in general, the details make the difference for specific implementations. For instance, if you try to connect plenty of Siemens S3 PLCs via OPC-UA, then you quickly realize that the number of parallel connections is not as scalable as the OPC-UA standard specification tells you.

When to Choose MQTT vs. OPC-UA?

The clear recommendation is starting with the business problem, not the technology. Evaluate both standards and their implementations, supported interfaces, vendors cloud services, etc. Then choose the right technology.

Here is what I use as a simplified rule of thumb if you have to start a technical discussion:

MQTT: Use cases for connected IoT devices, vehicles, and other interfaces with support for lightweight infrastructure, large number of connections, and/or bad networks.
OPC-UA: Use cases for industrial automation to connect heavy equipment, PLCs, SCADA systems, data historians, etc.

This is just a rule of thumb. And the situation changes. Modern PLCs and other equipment add support for multiple protocols to be more flexible. But, nowadays, you rarely have an option anyway because specific equipment, devices, or vehicles only support one or the other. And you can still be happy: Otherwise, you need to use another IIoT platform to connect to proprietary legacy protocols like S3, Modbus, et al.

MQTT and OPC-UA Gotchas

A few additional gotchas I realized from various customer conversations around the world in the past quarters:

In theory, MQTT and OPC-UA work well together, i.e., MQTT is the underlying transportation protocol for OPC-UA. I did not see this yet in the real world (no statistical evidence, just my personal experience). But what I see is the combination of OPC-UA for the last mile integration to the PLC and then forwarding the data to other consumers via MQTT. All in a single gateway, usually a proprietary IoT platform.
OPC-UA defines many sub-standards for different industries or use cases. In theory, this is great. In practice, I see this more like the WS-* hell in the SOAP/WSDL web service world where most projects moved to a much simpler HTTP/REST architectures. Similarly, most integrations I see to OPC-UA use simple, custom-coded clients in Java or other programming languages – because the tools don’t support the complex standards.
IoT vendors pitch any possible integration scenario in marketing. I am amazed that MQTT and OPC-UA platforms directly integrate with MES and ERP system like SAP, and any data warehouse and data lake, like Google Big Query, Snowflake, or Databricks. But that’s only the theory. Should you really do this? And did you ever try to connect SAP ECC to MQTT or OPC-UA? Good luck from a technical, and even harder, from an organizational perspective. And do you want tight coupling and point-to-point communication in between the OT world and the ERP? In most cases, it is a good thing to have a clear separation of concerns between different business units, domains, and use cases. Choose the right tool and enterprise architecture; not just for the POC and first pipeline, but for the entire long-term strategy and vision.

The last point brings me to another growing trend: The combination of MQTT for IoT / OT workloads and data streaming with Apache Kafka for the integration with the IT world.

Trend 4: MQTT and Apache Kafka for OT/IT Data Processing

Contrary to MQTT, Apache Kafka is NOT an IoT platform. Instead, Kafka is an event streaming platform and used the underpinning of an event-driven architecture for various use cases across industries. It provides a scalable, reliable, and elastic real-time platform for messaging, storage, data integration, and stream processing. Apache Kafka and MQTT are a perfect combination for many IoT use cases.

Let’s explore the pros and cons of both technologies from the IoT perspective.

Trade-offs of MQTT

MQTT’s pros:

Lightweight
Built for thousands of connections
All programming languages supported
Built for poor connectivity / high latency scenarios
High scalability and availability (depending on broker implementation)•ISO Standard
Most popular IoT protocol (competing with OPC UA)

MQTT’s cons:

Adoption mainly in IoT use cases
Only pub/sub, not stream processing
No reprocessing of events

Trade-offs of Apache Kafka

Kafka’s pros:

Stream processing, not just pub/sub
High throughput
Large scale
High availability
Long-term storage and buffering
Reprocessing of events
Good integration to rest of the enterprise

Kafka’s cons:

Not built for tens of thousands of connections
Requires stable network and good infrastructure
No IoT-specific features like keep alive, last will, or testament

Use Cases, Architectures and Case Studies for MQTT and Kafka

I wrote a blog series about MQTT in conjunction with Apache Kafka with many more technical details and real-world case studies across industries.

The first blog post explores the relation between MQTT and Apache Kafka. Afterward, the other four blog posts discuss various use cases, architectures, and reference deployments.

Part 1 – Overview: Relation between Kafka and MQTT, pros and cons, architectures
Part 2 – Connected Vehicles: MQTT and Kafka in a private cloud on Kubernetes; use case: remote control and command of a car
Part 3 – Manufacturing: MQTT and Kafka at the edge in a smart factory; use case: Bidirectional OT-IT integration with Sparkplug B between PLCs, IoT Gateways, Data Historian, MES, ERP, Data Lake, etc.
Part 4 – Mobility Services: MQTT and Kafka leveraging serverless cloud infrastructure; use case: Traffic jam prediction service using machine learning
Part 5 – Smart City: MQTT at the edge connected to fully-managed Kafka in the public cloud; use case: Intelligent traffic routing by combining and correlating different 1st and 3rd party services

The following presentation is from my talk at the MQTT Summit. It explores various use cases and reference architectures for MQTT and Apache Kafka:

Fullscreen Mode

If you have a bad network, tens of thousands of clients, or the need for a lightweight push-based messaging solution, then MQTT is the right choice. Elsewhere, Kafka, a powerful event streaming platform, is probably the right choice for real-time messaging, data integration, and data processing. In many IoT use cases, the architecture combines both technologies. And even in the industrial space, various projects use Kafka for use cases like building a cloud-native data historian or real-time condition monitoring and predictive maintenance.

Data Governance for MQTT with Sparkplug and Kafka (and Beyond)

Unified Namespace and the concrete implementation with Sparkplug B is excellent for data governance in IoT workloads with MQTT. In a similar way, the Schema Registry defines the data contracts for Apache Kafka data pipelines.

Schema Registry should be the foundation of any Kafka project! Data contracts (aka Schemas, similar to Swagger in REST/HTTP APIs) enforce good data quality and interoperability between independent microservices in the Kafka ecosystem. Each business unit and its data products can choose any technology or API. But data sharing with others works only with good (enforced) data quality.

You can see the issue: Each technology uses its own data governance technology. If you add your favorite data lake, you will add another concept, like Apache Iceberg, to define the data tables for analytics storage systems. And that’s okay! Each data governance suite is optimized for its workloads and requirements. A company-wide master data management failed in the last two decades because each software category has different requirements.

Hence, one clear trend I see is an enterprise-wide data governance strategy across the different systems (with technologies like Collibra or Azure Purview). It has open interfaces and integrates with specific data contracts like Sparkplug B for MQTT, Schema Registry for Kafka, Swagger for HTTP/REST applications, or Iceberg for data lakes. Don’t try to solve the entire enterprise-wide data governance strategy with a single technology. It will fail! We have seen this before…

Legacy PLC (S7, Modbus, BACnet, etc.) with MQTT or Kafka?

MQTT and Kafka enable reliable and scalable end-to-end data pipelines between IoT and IT systems. At least, if you can use modern APIs and standards. Most IoT projects today are still brownfield. A lot of legacy PLCs, SCADA systems, and data historians only support proprietary protocols like Siemens S7, Modbus, BACnet, and so on.

MQTT or Kafka don’t support these legacy protocols out-of-the-box. Another middleware is required. Usually, enterprises choose a dedicated IoT platform for this. That means more cost and complexity, and slower projects.

In the Kafka world, Apache PLC4X is a great open source option if you want to build a modern, cloud-native data historian with Kafka. The framework provides integration with many legacy protocols. And it offers a Kafka Connect connector. The main issue is no official vendor support behind. Companies cannot buy support with a 24/7 business model for mission-critical applications. And that’s typically a blocker for any industrial deployment.

As MQTT is only a pub/sub message broker, it cannot help with legacy protocol integration. HiveMQ tries to solve this challenge with a new framework called HiveMQ Edge: A software-based industrial edge protocol converter. It is a young project and just kicking off. The core is open source. The first supported legacy protocol is Modbus. I think this is an excellent product strategy. I hope the project gets traction and evolves to support many other legacy IIoT technologies to modernize the brownfield shop floor. The project actually also supports OPC-UA. We will see how much demand that feature creates, too.

MQTT and Sparkplug Adoption Grows Year-By-Year for IoT Use Cases

In the IoT world, MQTT and OPC UA have established themselves as open and platform-independent standards for data exchange in Industrial IoT and Industry 4.0 use cases. Data Streaming with Apache Kafka is the data hub for integrating and processing massive volumes of data at any scale in real-time. The “Trinity of Data Streaming in IoT explores the combination of MQTT, OPC-UA and Apache Kafka” in more detail.

MQTT adoption grows year by year with the need for more scalable, reliable and open IoT communication between devices, equipment, vehicles, and the IT backend. The sweet spots of MQTT are unreliable networks, lightweight (but reliable and scalable) communication and infrastructure, and connectivity to thousands of things.

Maturing trends like the Unified Namespace with Sparkplug B, fully managed cloud services, and combined usage with Apache Kafka make MQTT one of the most relevant IoT standards across verticals like manufacturing, automotive, aviation, logistics, and smart city.

But don’t get fooled by architecture pictures and theory. For example, most diagrams for MQTT and Sparkplug show integrations with the ERP (e.g., SAP) and Data Lake (e.g., Snowflake). Should you really integrate directly from the OT world into the analytics platform? Most times, the answer is no because of cost, decoupling of business units, legal issues, and other reasons. This is where the combination of MQTT and Kafka (or another integration platform) shines.

How do you use MQTT and Sparkplug today? What are the use cases? Do you combine it with other technologies, like Apache Kafka, for end-to-end integration across the OT/IT pipeline? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post MQTT Market Trends: Cloud, Unified Namespace, Sparkplug, Kafka Integration appeared first on Kai Waehner.

Data Streaming from Smart Factory to Cloud

Kai Waehner — Mon, 22 May 2023 05:14:06 +0000

A smart factory organizes itself without human intervention to produce the desired products. Data integration of IoT protocols, data correlation with other standard software like MES or ERP, and sharing data with independent business units for reporting or analytics is crucial for generating business value and improving the OEE. This blog post explores how data streaming powered by Apache Kafka helps connect and move data to the cloud at scale in real-time, including a case study from BMW and a simple lightboard video about the related enterprise architecture.

The State of Data Streaming for Manufacturing in 2023

The evolution of industrial IoT, manufacturing 4.0, and digitalized B2B and customer relations require modern, open, and scalable information sharing. Data streaming allows integrating and correlating data in real-time at any scale. Trends like software-defined manufacturing and data streaming help modernize and innovate the entire engineering and sales lifecycle.

I have recently presented an overview of trending enterprise architectures in the manufacturing industry and data streaming customer stories from BMW, Mercedes, Michelin, and Siemens. A complete slide deck and on-demand video recording are included:

This blog post explores one of the enterprise architectures and case studies in more detail: Data streaming between edge infrastructure (like a smart factory) and applications in the data center or public cloud.

What is a Smart Factory? And how does Data Streaming help?

Smart Factory is a term from research in manufacturing technology. It refers to the vision of a production environment in which manufacturing plants and logistics systems primarily organize themselves without human intervention to produce the desired products.

The technical basis is cyber-physical systems, i.e., physical manufacturing objects and virtual images in a centralized system. Digital Twins often play a crucial role in smart factories for simulation, engineering, condition monitoring, predictive maintenance, and other tasks.

In the broader context, the Internet of Things (IoT) is the foundation of a smart factory. Communication between the product (e.g., workpiece) and the manufacturing system continues to be part of this future scenario: The product brings its manufacturing information in machine-readable form, e.g., on an RFID chip. This data controls the product’s path through the production system and the individual production steps. Other transmission technologies, such as WLAN, Bluetooth, color coding, or QR codes, are also being experimented with.

Data streaming helps connect high-volume sensor data from machines, PLCs, robots, and other IoT devices. Integrating and pre-processing the events with data streaming is a prerequisite for data correlation with information systems like the MES or ERP (that might run at the edge or more often in the cloud). The latter is possible in real-time at scale with stream processing. The de facto standard for data streaming is Apache Kafka and its ecosystems, like Kafka Stream and Kafka Connect.

BMW Group: Data from 30 Smart Factories Streamed to the Cloud

BMW Group needed to make all data generated by its 30+ production facilities and worldwide sales network available in real-time to anyone across the global business.

The data ingested by BMW from its smart factories into the cloud with data streaming enables simple access to the data for visibility and new automation applications by any business unit.

The Apache Kafka ecosystem facilitates the decoupling between logistics and production systems. Transparent data flows and the flexibility of building innovative new services are possible with this access to events from everywhere in the company.

Stability is vital in manufacturing across the supply chain. This begins with Tier 1 and Tier 2 suppliers up to the aftersales and service management. Direct integration from the shop floor to serverless Confluent Cloud on Azure ensures a mission-critical data streaming environment for data pipelines between edge and cloud.

The use case enables reliable data sharing across the logistics and supply chain processes for BMW’s global plants.

Data streaming enables:

Access to information about the right stock in place (physically and in ERP systems like SAP)
Just in time, just in sequence process optimization
Improved overall equipment effectiveness (OEE) for a lot of critical applications
Last-mile integration between Apache Kafka and IoT standards like OPC-UA

Lightboard Video: How Data Streaming Connects Smart Factory and Cloud

Here is a five-minute lightboard video that describes how data streaming helps with the integration between production facilities (or any other edge environments) and the cloud:

If you liked this video, make sure to follow the YouTube channel for many more lightboard videos across all industries.

IoT and Edge are not contradictory to Cloud and Data Streaming

The BMW case study shows how you can build reliable real-time synchronization between smart factories and cloud applications. However, there are more options. For more case studies, check out the free “The State of Data Streaming in Manufacturing” on-demand recording or read the related blog post.

MQTT is combined with Kafka regularly if the use case requires supporting bad networks or millions of IoT clients. Another alternative is data streaming at the edge with highly available Kafka clusters on industrial PCs, e.g., for air-gapped environments, or embedded single Kafka brokers, e.g., deployment in a machine.

Humans are still crucial for the success of a smart factory. Improving the OEE requires a smart combination of software, robots, and people. Augmented Reality leveraging Data Streaming is an excellent example. VR/AR platforms like Unity enable remote services, training, or simulation. Apache Kafka is the foundation for real-time data sharing across these different technologies and interfaces.

How do you leverage data streaming in your manufacturing use cases? Do you deploy at the edge, in the cloud, or both? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post Data Streaming from Smart Factory to Cloud appeared first on Kai Waehner.

OPC UA, MQTT, and Apache Kafka – The Trinity of Data Streaming in IoT

Kai Waehner — Fri, 11 Feb 2022 03:44:16 +0000

In the IoT world, MQTT (Message Queue Telemetry Transport protocol) and OPC UA (OPC Unified Architecture) have established themselves as open and platform-independent standards for data exchange in Internet of Things (IIoT) and Industry 4.0 use cases. Data Streaming with Apache Kafka is the data hub for integrating and processing massive volumes of data at any scale in real-time. This blog post explores the relationship between Kafka and the IoT protocols, when to use which technology, and why sometimes HTTP/REST is the better choice. The end explores real-world case studies from Audi and BMW.

Industry 4.0: Data streaming platforms increase overall plant effectiveness and connect equipment

Machine data must be transformed and made available across the enterprise as soon as it is generated to extract the most value from the data. As a result, operations can avoid critical failures and increase the effectiveness of their overall plant.

Automotive manufacturers such as BMW and Tesla have already recognized the potential of data streaming platforms to get their data moving with the power of the Apache Kafka ecosystem. Let’s explore the benefits of data streaming and how this technology enriches data-driven manufacturing companies.

The goals of increasing digitization and automation of the manufacturing sector are many:

To make production processes more efficient
Faster and cheaper overall
To minimize error rates.

Manufacturers are also striving to increase overall equipment effectiveness (OEE) in their production facilities – from product design and manufacturing to maintenance operations. This confronts them with equally diverse challenges. Industry 4.0 respectively Industrial IoT (IIoT) means that the amount of data generated daily is increasing and needs to be transported, processed, analyzed, and made available through systems in near real-time.

Complicating matters further is that legacy IT environments continue to live in today’s manufacturing facilities. This limits manufacturers’ ability to efficiently integrate data across operations. Therefore, most manufacturers require a hybrid data replication and synchronization strategy.

An adaptive manufacturing strategy starts with real-time data

Automation.com published an excellent article explaining the need for real-time processes and monitoring to provide a flexible production line. TL;DR: Processes should be real-time when possible, but real-time is not always possible, even within an application. Think about just-in-time production fighting with the supply chain issues because of the Covid pandemic and the Suez Canal block in 2021.

The theory of just-in-time production does not work with supply chain issues! You need to provide flexibility and be able to switch between different approaches:

Just-in-time (JIT) vs. make to forecast
Fixed vs. variable price contracts
Build vs. buy plant capacity
Staffed vs. lights-out third shift
Linking vs. not linking prices for materials and finished goods

Kappa architecture for a real-time IoT data hub

Real-time production and process monitoring data are essential for success! This evolution is only possible with real-time Kappa architecture. Lambda architecture with batch workloads either completely fails or makes things much more complex and costs from an IT infrastructure and OEE perspective.

For clarification, when I speak about real-time, I talk about millisecond latency. This is not hard real-time and deterministic like in safety-critical and embedded environments. The post “Apache Kafka is NOT Hard Real-Time BUT Used Everywhere in Automotive and Industrial IoT” elaborates on this topic.

In IoT, the MQTT and OPC UA have established standards for data exchange as platform-independent open standards. See what this combination of IoT protocols and Kafka looks like in a smart factory.

When to use Kafka vs. MQTT and OPC UA?

Kafka is a fantastic data streaming platform for messaging, storage, data integration, and data processing in real-time at scale. However, it is not a silver bullet for every problem!

Kafka is NOT…

A proxy for millions of clients (like mobile apps) – but Kafka-native proxies (like REST or MQTT) exist for some use cases.
An API Management platform – but these tools are usually complementary and used for the creation, life cycle management, or the monetization of Kafka APIs.
A database for complex queries and batch analytics workloads – but good enough for transactional queries and relatively simple aggregations (especially with ksqlDB).
An IoT platform with features such as device management – but direct Kafka-native integration with (some) IoT protocols such as MQTT or OPC-UA is possible and the appropriate approach for (some) use cases.
A technology for hard real-time applications such as safety-critical or deterministic systems – but that’s true for any other IT framework, too. Embedded systems are different software!

For these reasons, Kafka is complementary, not competitive, to MQTT and OPC UA. Choose the right tool for the job and combine them! I wrote a detailed blog post exploring when NOT to use Apache Kafka. The above was just the summary.

You should also think about this question from the other side to understand when a message broker is not the right choice. For instance, United Manufacturing Hub is an open-source manufacturing data infrastructure that recently migrated from MQTT as messaging infrastructure to Kafka as the central nervous system because of its storage capabilities, higher throughput, and guaranteed ordering. However, to be clear, this update is not replacing but complementing MQTT with Kafka.

Meeting the challenges of Industry 4.0 through data streaming and data mesh

Machine-to-machine communications and the (Industrial) Internet of Things enable automation, data-driven monitoring, and the use of intelligent machines that can, for example, identify defects and vulnerabilities on their own.

For all these scenarios, large volumes of data must be processed in near real-time and made available across plants, companies, and, under certain circumstances, worldwide via a stream data exchange:

This novel design approach is often implemented with Apache Kafka as decentralized data streaming data mesh.

The essential requirement here is integrating various systems, such as edge and IoT devices and business software, and execution independent of the underlying infrastructure (edge, on-premises as well as public, multi-, and hybrid cloud).

Therefore, an open, elastic, and flexible architecture is essential to integrate with the legacy environment while taking advantage of modern cloud-native applications.

Event-driven, open, and elastic data streaming platforms such as Apache Kafka serve precisely these requirements. They collect relevant sensor and telemetry data alongside data from information technology systems and process it while it is in motion. That concept is called “data in motion“. The new fundamental change differs significantly from processing “data at rest“, meaning you store events in a database and wait until someone else looks at them later. The latter is a “too late architecture” in many IoT use cases.

Separation of concerns in the OT/IT world with domain-driven design and true decoupling

Data integration with legacy and modern systems takes place in near real-time – target systems can use relevant data immediately. It doesn’t matter what infrastructure the plant’s IT landscape is built on. Besides the continuous flow of data, the decoupling of systems also allows messages to be stored until the target systems need them.

That feature of true decoupling with backpressure handling and replayability of data is a unique differentiator compared to other messaging systems like RabbitMQ in the IT space or MQTT in the IoT space. Kafka is also highly available and fail-safe, which is critical in the production environment. “Domain-driven design (DDD) with Apache Kafka” dives deeper into this benefit:

How to choose between OPC UA and MQTT with Kafka?

Three de facto standards for open and standardized IoT architectures. Two IoT-specific protocols and REST / HTTP as simple (and often good enough) options. Modern proprietary protocols compete in the space, too:

OPC UA (Open Platform Communications Unified Architecture)
MQTT (Message Queuing Telemetry Transport)
REST / HTTP
Proprietary protocols and IoT platforms

These alternatives are great vs. the legacy proprietary monolith world of the last decades in the OT/IT and IoT space.

MQTT vs. OPC UA (vs. HTTP vs. Proprietary)

First of all, this discussion is only relevant if you have the choice. If you buy and install a new machine or PLC on your shop floor and that one only offers a specific interface, then you have to use it. However, new software like IoT gateways provides different options to choose from.

How to compare these communication protocols?

Well, frankly, it is challenging as most literature is opinionated and often includes FUD about the “competing protocols”. Every alternative has its sweet spots. Hence, it is more of an apples and oranges comparison.

More or less randomly, I googled “OPC UA vs MQTT” and found the following interesting comparison from Skynet’s proprietary DataHub Transfer Protocol (DHTP). The vendor pitches its commercial product against the open standards (and added AMQP as an additional alternative):

Each comparison on the web differs. The above comparison is valid (and some people will disagree with some points). And sometimes, proprietary solutions provide the better choice from a TCO and ROI perspective, too.

Hint: Look at different comparisons. Understand if the publication is related to a specific vendor and standard. Evaluate several solutions and vendors to understand the differences and added value.

Decision tree for evaluating IoT protocols

My recommendation for comparing the different IoT protocols is to use open standards whenever possible. Choose the right tool for the job and combine them in a best-of-breed approach as needed.

Let’s take at a simple decision tree to decide between OCP UA, MQTT, HTTP, and other proprietary IIoT protocols (note: This is just a very simplified point of view, and you can build your opinion with different decisions, of course):

A few notes on the reasoning for how I built this decision tree:

HTTP / REST is perfect for simple use cases (keep it as simple as possible). HTTP is supported almost everywhere, well understood, and simple to use. No additional tooling, APIs, or middleware is needed. Communication is synchronous request-response. Conversations with security teams are much easier if you just need to open port 80 or 443 for HTTP(S) instead of TCP ports, like most other protocols. HTTP is unidirectional communication (e.g., a connected car needs an HTTP server to get data pushed from the cloud – pub/sub is the right choice instead of HTTP here).
MQTT is perfect for intermittent networks, respectively limited bandwidth and/or connecting tens or hundreds of thousands of devices (e.g., connected car infrastructure). Communication is asynchronous publish/subscribe using an MQTT broker as the middleman. MQTT uses no standard data format. But developers can use Sparkplug as an add-on built for this purpose. MQTT is incredibly lightweight. Features like Quality of Service (QoS), last will, and testament solve many requirements for IoT use cases out-of-the-box. MQTT is excellent for IT use cases and can easily be used for bidirectional communication (e.g., connected cars <–> cloud communication). LoRaWAN and other low-power wide-area networks are great for MQTT, too.
OPC UA is perfect for industrial automation (e.g., machines at the production line). Communication is usually client/server today, but publish/subscribe is also supported. It uses standard data formats and provides a rich (= powerful but also complex) set of features, components, and industry-specific data formats. OPC UA is excellent for OT/IT integration scenarios. OPC UA TSN (time-sensitive networking), one optional component, is an Ethernet communication standard that provides open, deterministic, hard real-time communication.
Proprietary protocols suit specific problems that standard-based implementations cannot solve similarly. These protocols have various trade-offs. Often powerful and performant, but also expensive and proprietary.

Choosing between OPC UA, MQTT, and other protocols isn’t an either/or decision. Each protocol plays its role and excels at certain use cases. An optimal modern industrial network uses OPC UA and MQTT for modern applications. Both together combine the strengths of each and mitigate their downsides. Legacy applications and proprietary SCADA systems or other data historians are usually integrated with other existing proprietary middleware.

Many IIoT platforms, such as Siemens, OSIsoft, or Inductive Automation, support various modern and legacy protocols. Some smaller vendors focus on a specific sweet spot, like HiveMQ for MQTT or OPC Router for OPA-UA.

Integration between MQTT / OPC UA and Kafka

A few integration options between equipment, machines, and devices that support MQTT or OPC UA and Kafka are:

Kafka Connect connectors: Native Kafka integration on protocol level. Check Confluent Hub for a few alternatives. Some enterprises built their custom Kafka Connect connectors.
Custom integration: Integration via a low level MQTT / OPC UA API (e.g. using Kafka’s HTTP / REST Proxy) or Kafka client (e.g. .NET / C++ for Windows environments).
Modern and open 3rd party IoT middleware: Generic open source integration middleware (e.g., Apache Camel with its IoT connectors), IoT-specific frameworks (like Apache PLC4X or Eclipse Ditto), or proprietary 3rd party IoT middleware with open and standards-based APIs
Commercial IoT platforms: Best fit for existing historical deployments and glue code with legacy protocols such as Modbus, Siemens S7, et al. Traditional data historians, proprietary protocols, monolith architectures, limited scalability, batch ETL platforms, work well for these workloads to connect the past with the future of the OT/IT world and to create a bridge between on-premise and cloud. Almost all IoT platforms added connectors for MQTT, OCP UA, and Kafka in the meantime.

OEE scenarios that benefit from data streaming

Data streaming platforms apply in various use cases to increase overall plant effectiveness as the central nervous system. These include connectivity via industry standards such as OPC UA or MQTT, visualization of multiple devices and assets in digital twins, and modern maintenance in the form of condition monitoring and predictive maintenance.

Connectivity to machines and equipment with OPC UA or MQTT

OPC UA and MQTT are not designed for data processing and integration. Instead, the strength is that bidirectional “last mile communication” to devices, machines, PLCs, IoT gateway, or vehicles is established in real-time.

As discussed above, both standards have different “sweet spots” and can also be combined: OPC UA is supported by almost all modern machines, PLCs, and IoT gateways for the smart factory. MQTT is used primarily in poor networks and/or also for thousands and hundreds of thousands of devices.

These data streams are then streamed into the data streaming platforms via connectors. The streaming platform can either be deployed in parallel with an IoT platform ‘at the edge’ or combined in hybrid or cloud scenarios.

The data streaming platform is a flexible data hub for data integration and processing between OT and IT applications. Besides OPC UA and MQTT on the OT side, various IT applications such as MES, ERP, CRM, data warehouse, or data lake are connected in real-time, regardless of whether they are operated ‘at the edge’, on-premise, or in the cloud.

More details: Apache Kafka as Data Historian – an IIoT / Industry 4.0 Real-Time Data Lake.

Digital twins for development and predictive simulation

By continuously streaming data and processing and integrating sensor data, data streaming platforms enable the creation of an open, scalable, and highly available infrastructure for the deployment of Digital Twins.

Digital Twins combine IoT, artificial intelligence, machine learning, and other technologies to create a virtual simulation of, for example, physical components, devices, and processes. They can also consider historical data and update themselves as soon as the data generated by the physical counterpart changes.

Kafka is the leading system in the following digital twin example:

Kafka is combined with other technologies to build a digital twin most times. For instance, Eclipse Ditto is a project combining Kafka with IoT protocols. And some teams made a custom digital twin with Kafka and a database like MongoDB.

IoT Architectures for Digital Twin with Apache Kafka provide more details about different digital twin architectures.

Industry 4.0 benefits from digital twins, as they allow detailed insight into the lifecycle of the elements they simulate or monitor. For example, product and process optimization can be carried out, individual parts or entire systems can be tested for their functionality and performance, or forecasts can be made about energy consumption and wear and tear.

Condition monitoring and predictive maintenance

For modern maintenance, machine operators mainly ask themselves questions: Are all devices functioning as intended? How long will these devices usually function before maintenance work is necessary? What are the causes of anomalies and errors?

On the one hand, Digital Twins can also be used here for monitoring and diagnostics. They correlate current sensor data with historical data, which makes it possible to identify the causes of faults and expect maintenance measures.

On the other hand, production facilities can also benefit from data streaming in this area. A prerequisite for Modern Maintenance is a reliable and scalable infrastructure that enables the processing, analysis, and integration of data streams. This allows the detection of critical changes in plants, such as severe temperature fluctuations or vibrations, in near real-time, after which operators can initiate measures to maintain plant effectiveness.

Above all, more efficient predictive maintenance scheduling saves manufacturing companies valuable resources by ensuring equipment and facilities are serviced only when necessary. In addition, operators avoid costly downtime periods when machines are not productive for a while.

More details: Condition Monitoring and Predictive Maintenance with Apache Kafka.

Connected cars and streaming machine learning

A connected car is a car that can communicate bidirectionally with other systems outside of the vehicle. This allows the car to share internet access and data with other devices and applications inside and outside the car. The possibilities are endless! MQTT in conjunction with Kafka is more or less a de facto standard architecture for connected car use cases and infrastructures.

The following shows how to integrate with tens or hundreds of thousands of IoT devices and process the data in real-time. The demo use case is predictive maintenance (i.e., anomaly detection) in a connected car infrastructure to predict motor engine failures:

The blog post “IoT Live Demo – 100.000 Connected Cars with Kubernetes, Kafka, MQTT, TensorFlow” explores the architecture and implementation in more detail. The source code is available on Github.

BMW case study: Manufacturing 4.0 with smart factory and cloud

I spoke with Felix Böhm, responsible for BMW Plant Digitalization and Cloud Transformation, at our Data in Motion tour in Germany in 2021. We talked about their journey towards data in motion in manufacturing and the use cases and architectures. He also talked to Confluent CEO Jay Kreps at the Kafka Summit EU 2021.

Kafka and OPC UA as real-time data hub between equipment at the edge and applications in the cloud

Let’s explore this BMW success story from a technical perspective.

Decoupled IoT Data and Manufacturing

BMW connects workloads from their global smart factories and replicates them in real-time in the public cloud. The team uses an OPC UA connector to directly communicate with Confluent Cloud in Azure.

Kafka provides decoupling, transparency, and innovation. Confluent adds stability via products and expertise. The latter is critical for success in manufacturing. Each minute of downtime costs a fortune. Read my related article “Apache Kafka as Data Historian – an IIoT / Industry 4.0 Real-Time Data Lake” to understand how Kafka improves the Overall Equipment Effectiveness (OEE) in manufacturing.

Logistics and supply chain in global plants

The discussed use case covered optimized supply chain management in real-time.

The solution provides information about the right stock in place, both physically and in ERP systems like SAP. “Just in time, just in sequence” is crucial for many critical applications.

Things BMW couldn’t do before

Get IoT data without interfering with others, and get it to the right place
Collect once, process, and consume several times (by different consumers at different times with varying paradigms of communication like real-time, batch, request-response)
Enable scalable real-time processing and improve time-to-market with new applications

The true decoupling between different interfaces is a unique advantage of Kafka vs. other messaging platforms such as IBM MQ, Rabbit MQ, or MQTT brokers. I also explored this in my article about Domain-driven Design (DDD) with Kafka.

Check out “Apache Kafka Landscape for Automotive and Manufacturing” for more Kafka architectures and use cases in this industry.

Audi case study – Connected cars for swarm intelligence

Audi has built a connected car infrastructure with Apache Kafka. Their Kafka Summit keynote explored the use cases and architecture:

Use cases include real-time data analysis, swarm intelligence, collaboration with partners, and predictive AI.

Depending on how you define the term and buzzword “Digital Twin“, this is a perfect example: All sensor data from the connected cars are processed in real-time and stored for historical analysis and reporting. Read more about “Kafka for Digital Twin Architectures” here.

I wrote a whole blog series with many more practical use cases and architecture for Apache Kafka and MQTT to learn more.

Serverless data streaming enables focusing on IoT business applications and improving OEE

An event-driven data streaming platform is elastic and highly available. It represents an opportunity to increase production facilities’ overall asset effectiveness significantly.

With the help of their data processing and integration capabilities, data streaming complements machine connectivity via MQTT, OPC UA, HTTP, among others. This allows streams of sensor data to be transported throughout the plant and to the cloud in near real-time. This is the basis for the use of Digital Twins as well as Modern Maintenance such as Condition Monitoring and Predictive Maintenance. The increased overall plant effectiveness not only enables manufacturing companies to work more productively and avoid potential disruptions, but also to save time and costs.

I did not talk about operating the infrastructure for data streaming and IoT. TL;DR: Go serverless if you can. That enables you to focus on solving business problems. The above example of BMW had exactly this motivation and leverages Confluent Cloud for this reason to roll out their smart factory use cases across the globe. “Serverless Kafka” is your best choice for data streaming if connectivity and the network infrastructure allow it in your IoT projects.

Do you use MQTT or OPC UA with Apache Kafka today? What use cases? Or do you rely on the HTTP protocol because it is good enough and simpler to integrate? How do you decide which protocol to choose? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post OPC UA, MQTT, and Apache Kafka – The Trinity of Data Streaming in IoT appeared first on Kai Waehner.

Condition Monitoring and Predictive Maintenance with Apache Kafka

Kai Waehner — Mon, 25 Oct 2021 14:46:26 +0000

The manufacturing industry is moving away from just selling machinery, devices, and other hardware. Software and services increase revenue and margins. A former cost center becomes a profit center for innovation. Equipment-as-a-Service (EaaS) even outsources the maintenance to the vendor. This paradigm shift is only possible with reliable and scalable real-time data processing leveraging an event streaming platform such as Apache Kafka. This post explores how the next generation of software for Condition Monitoring and Predictive Maintenance can help build new innovative products and improve the OEE for customers.

Condition Monitoring and Predictive Maintenance

Let’s define the two terms first as no standard definition exists. Some literature sees condition monitoring as a major component of predictive maintenance. However, others interpret the latter as a more modern software leveraging machine learning. Both terms are sometimes used as synonyms, too.

Modern Maintenance Strategies and Goals

The main goal of modern maintenance strategies is a more efficient and optimized usage of machines and resources. Reactive maintenance or time-based/usage-based preventive measurements are suboptimal. Therefore, modern condition-based maintenance strategies take over.

Industrial IoT / Industry 4.0 enable several benefits on the shop floor level:

Maintain instead of repair
No (un)planned downtime
Maintenance optimizations and no unnecessary work
No negative financial impact
Optimized productivity
Improved overall equipment effectiveness (OEE)
Move from an isolated to a company-wide view

The machine operator is interested in the following questions:

Is the machine running normally? (Detect anomalies, classify errors)
How long can the engine still run? (Remaining useful life – RUL, time to the first failure)
Why does the machine run abnormally? (Sensor monitoring, root cause analysis)

Condition Monitoring and Predictive Maintenance

Condition Monitoring is the process of monitoring a parameter of condition in machinery (vibration, temperature, etc.) to identify a significant change indicative of a developing fault. It is a substantial component of predictive maintenance. The use of condition monitoring allows scheduling maintenance or taking other actions to prevent consequential damages and avoid its consequences. Condition monitoring has a unique benefit: It addresses conditions that shorten the expected lifespan before developing into a major failure.

Predictive maintenance techniques help determine the condition of in-service equipment to estimate when maintenance is necessary. The central promise of predictive maintenance is to allow convenient scheduling of corrective maintenance and prevent unexpected equipment failures.

TL;DR: Both approaches promise cost savings over routine or time-based preventive maintenance because maintenance tasks only are performed when warranted. However, modern maintenance means digitalization. That does not come for free.

Condition monitoring and predictive maintenance only work well if the infrastructure and software are reliable, scalable, and real-time. The main trade-off is a reasonable risk and costs analysis to plan the total cost of ownership (TCO) and return on investment (ROI).

Equipment as a Service (EaaS) as new Business Model

Equipment-as-a-Service (EaaS) is a business model that involves renting out equipment to end-users and collecting periodic subscription payments for using the equipment.

This service-driven business model, also known as Machine-as-a-Service, provides a variety of benefits to both sides:

The EaaS provider (OEMs and machine builders) can improve the product design (R&D, digital twin, etc.), plan recurring revenue, and provide predictive maintenance services.
The customer (manufacturers) can optimize machine utilization and productivity (with the help of the EaaS software) and reduce the overall cost (moving Capital Expenditures (CapEx) to Operating Expenses (OpEx) and reducing operations costs).

EaaS is only a successful business model if condition monitoring and predictive maintenance are stable 24/7 and continuously collect, process, and analyze incoming data streams.

Apache Kafka for Industrial IoT / Industry 4.0

Apache Kafka is the de facto standard for event streaming. Industrial IoT / Industry 4.0 deployments across the globe use event streaming in edge and hybrid cloud deployments. Here is an example of a smart factory architecture that combines event streaming in the public cloud, factories, and at the edge:

Kafka is an information technology (IT). It collects data from operational technology (OT) devices and machines at the edge. Kafka is soft real-time and not suitable for embedded systems or robotics. If you wonder about the relation, read the post “Apache Kafka is NOT Hard Real-Time BUT Used Everywhere in Automotive and Industrial IoT“.

Nevertheless, Kafka is suitable for mission-critical low-latency use cases such as condition monitoring and predictive maintenance where the end-to-end latency is a few milliseconds. Here is an example leveraging 5G together with Kafka and ksqlDB on Kubernetes:

Data in Motion with Event Streaming and Stream Processing

Condition monitoring and predictive maintenance require an event-based architecture to collect, process, and analyze data in motion. Traditional IIoT platforms are proprietary, inflexible, often not scalable, and not happy to integrate across different vendors and various standards. On the contrary, Kafka-native stream processing is an open, flexible, and scalable technology to implement data integration processing across IoT interfaces.

Let’s look at two examples: Stateless condition monitoring with Kafka Streams and predictive maintenance with ksqlDB and TensorFlow. To be clear: These are just examples. Any other technology can be integrated (with its pros and cons), like Apache Flink for stream processing, cloud-based ML platforms, proprietary IoT edge platforms for the last-mile integration, etc.

Here is the basic setup to build condition monitoring and predictive maintenance with Kafka:

On the left side, we see the Kafka log that stores and forwards events. On the right side, various machines ingest sensor data in real-time. This architecture works at any scale and in real-time. Some Confluent customers leverage Confluent Cloud to process 10GB and more per second.

The IoT integration between machines, PLCs, sensors, etc., is either implemented with Kafka Connect or other APIs for MQTT, OPC-UA, REST/HTTP, files, or any different open or proprietary interface. Let’s now explore the two examples. That’s not the topic of this post. “Kafka and PLC4x for Industrial IoT Integration” and “Kafka as a Modern Data Historian” are great resources to learn more.

Stateless Condition Monitoring with Kafka Streams

The following diagram shows Kafka-native condition monitoring analyzing temperature spikes in real-time:

The example is implemented with Kafka Streams, a Java-based library that can be embedded into any application. The business logic continuously monitors the sensor data. High volumes of data are processed in real-time. However, only relevant events showing temperature spikes over 100 degrees are forwarded to another Kafka topic. Any interested consumer gets it, for instance, a real-time alerting system or a batch report.

The application is stateless. It processes event by event. This capability is already compelling to realize streaming ETL for filtering or transformations. Any complex business logic is also possible within the application.

Stateful Predictive Maintenance with ksqlDB

While stateless stream processing is already powerful, stateful stream processing solves even more business problems. The following example shows how a Kafka-native ksqlDB microservice implements stateful stream processing to detect anomalies continuously:

A one-hour sliding window continuously aggregates the temperature spikes from sensors. Consumers use the data in real-time to proactively act on defined thresholds. For instance, the data science team could have analyzed historical data to determine that more than ten temperature spikes with an average of over 100 degrees significantly increase the risk of an outage. In that case, the machine operator is alerted in real-time to do maintenance.

Applied Machine Learning in Real-time with Kafka and TensorFlow

Simple business logic already solves many problems and improves the OEE and maintenance processes. Machine Learning adds additional “magic” to make condition monitoring and predictive maintenance even better.

The great news is that the architecture does not need to change. Analytic models can be embedded into a Kafka application like any other business logic. I talked about Kafka and Artificial Intelligence (AI)/Machine Learning(ML)/Deep Learning (DL) a lot in the past. Check out these posts to learn more:

Here is an example with ksqlDB and an embedded TensorFlow model:

A ksqlDB user-defined function (UDF) embeds the model. This model uses an unsupervised autoencoder for anomaly detection in real-time within the Kafka application. Supervised algorithms are possible the same way.

This architecture solves the impedance mismatch between the data science team and production engineers intelligently. Data scientists use Python and a Jupyther notebook for rapid prototyping and model development. The production team deploys the ksqlDB query in a cluster for real-time scoring at scale. You can learn from an excellent Github project that implements this separation of concerns with a Kappa architecture for a Connected Car infrastructure to do predictive maintenance with MQTT and Kafka:

Equipment-as-a-Service with Fully-Managed Kafka

Many manufacturers created a new business model: Equipment-as-a-Service (EaaS). Think about it: Many buyers do not want to operate machines and worry about maintenance. McKinsey published an excellent report about industry trends that shows why manufacturers want to provide machinery and devices as a service and get good margins:

EaaS takes over this burden from the buyer. The machine vendor continuously monitors if the engine or other components needs maintenance. Late maintenance means an irreparable engine. Early maintenance means higher costs. The solution is to determine the service life of the engine and use optimal maintenance times. Hence, the machine vendor has to provide this subscription maintenance service the best way it can, for its interest and a better customer experience.

Many manufacturers use Kafka and event streaming for their next-generation software solutions that run on top of the machinery or in the cloud connecting to it. Many modern IIoT services leverage a fully-managed and truly serverless Kafka solution like Confluent Cloud. The vendors want/need to focus on the business problems, not operating the infrastructure for event streaming.

Digital Twins play a vital role in this discussion; no matter if you use the buzzword or just the concepts behind it Here are a few articles related to fully-managed Kafka for building machine-as-a-service offerings with Digital Twins:

Video Recording – Apache Kafka in Industrial IoT

Here is a video recording walking you through the use case of prediction monitoring with the Kafka ecosystem:

Event Streaming for Next-Generation IoT Platforms and Equipment Services

This post showed how event streaming with the Kafka ecosystem enables new business models for manufacturers to sell machinery. Kafka-native stream processing allows using a single technology for different use cases such as condition monitoring or predictive maintenance. Stateless and stateful streaming analytics is beneficial to make proactive and predictive decisions in real-time at scale. This architecture is possible everywhere, in one or multiple cloud and/or regions, on-premise in data centers, at the edge outside the data center, or any combination of hybrid architectures.

Of course, other use cases not covered but necessary include integration with the ERP and MES systems, like direct connectivity between Kafka and SAP. Also, when you think about condition monitoring and predictive maintenance, not all data comes from sensors and interfaces such as OPC-UA or MQTT. Image, video, and sound processing are part of many scenarios. Kafka can handle large messages (with some trade-offs). Learn how and where this makes sense in a dedicated blog post.

How do you leverage event streaming at the shop floor level for condition monitoring and predictive maintenance? What technologies and architectures do you use? What projects did you already work on or are in the planning? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Condition Monitoring and Predictive Maintenance with Apache Kafka appeared first on Kai Waehner.

Apache Kafka in the Public Sector – Part 2: Smart City

Kai Waehner — Tue, 12 Oct 2021 07:48:48 +0000

The public sector includes many different areas. Some groups leverage cutting-edge technology, like military leverage. Others like the public administration are years or even decades behind. This blog series explores how the public sector leverages data in motion powered by Apache Kafka to add value for innovative new applications and modernizing legacy IT infrastructures. This post is part 2: Use cases and architectures for a Smart City.

Blog series: Apache Kafka in the Public Sector and Government

This blog series explores why many governments and public infrastructure sectors leverage event streaming for various use cases. Learn about real-world deployments and different architectures for Kafka in the public sector:

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts once published.

As a side note: If you wonder why healthcare is not on the above list. Healthcare is another blog series on its own. While the government can provide public health care through national healthcare systems, it is part of the private sector in many other cases.

Real-time is Mandatory for a Smart City Everywhere

I wrote a lot about event streaming and Apache Kafka for smart city infrastructure and use cases. I won’t repeat myself. Check out the following event Streaming with Kafka as Foundation for a Smart City and Apache Kafka and MQTT for the Last Mile IoT integration in a Smart City.

This post dives deeper into architectural questions and how collaboration with 3rd party services can look from the government’s perspective and public administration of a smart city.

The Need for Real-time Data Processing Everywhere in a Smart City and how Kafka helps

A smart city is a very complex beast. I am glad that I only cover technology and not regulatory or political discussions. However, even the technology standpoint is not straightforward. A smart city needs to correlate data across data centers, devices, vehicles, and many other things. This scenario is an actual internet of things (IoT) and therefore includes plenty of different technologies, communication paradigms, and infrastructures:

Smart city projects require the integration of various 1st party and 3rd party services. Most use cases only work well if that data is correlated in real-time; think about traffic routing, emergency alerts, predictive monitoring and maintenance, mobility services such as ride-hailing, and other fancy smart city use cases. Without real-time data processing, the use case is either a bad user experience or not cost-efficient. Hence, Kafka is adopted more and more for these scenarios.

Low Latency and 5G Networks for (some) Data Streaming Use Cases

The term “real-time” needs to be defined. Processing data in a few seconds is good enough in most use cases and a significant game-changer compared to hourly, daily, or weekly batch processing.

Having said this, some use cases like location-based upselling in retail or condition monitoring in equipment and manufacturing require lower latency, meaning sub-second end-to-end data processing.

Here is an example of leveraging 5G networks for low latency. The demo was built by the AWS Wavelength team, Verizon, and Confluent:

Most real-world deployments use separation of concerns: Low-latency use cases run at the edge and everything else in the regular data center or public cloud region. Read the article “Low Latency Data Streaming with Apache Kafka and Cloud-Native 5G Infrastructure” for more details.

At this point, it is important to remind everybody that Kafka (and any IT software) is not hard real-time and not built for the OT world and embedded systems. Learn more in the article “Kafka is NOT hard real-time but soft real-time“. Also, (soft) real-time is not competitive to batch processing and data warehouse/data lake architecture. As you can learn in “Serverless Kafka in a Cloud-native Data Lake Architecture” it is complimentary.

Collaboration between Government, City, and 3rd Party via Open API

Real-time data processing is crucial in implementing smart city use cases. Additionally, most smart city projects require collaboration between different teams, infrastructures, and 3rd party services.

Let’s take a look at three very different real-world event streaming deployments to see the broad spectrum of use cases and integration challenges:

Ohio Department of Transportation’s government-owned event streaming platform
Deutsche Bahn’s single source of truth for customer communication in real-time and 3rd party integration with the Google Maps API
Free Now’s mobility service in the cloud for real-time data correlation in compliance with regional laws and independent vehicles/drivers.

Ohio Department of Transportation (ODOT) – A Government-Owned Event Streaming Platform

Ohio Department of Transportation (ODOT) has an exciting initiative: DriveOhio. It aims to organize and accelerate smart vehicle and connected vehicle projects in the State of Ohio. DriveOhio offers to be the single point of contact for policymakers, agencies, researchers, and private companies to collaborate with one another on intelligent transportation efforts around the state.

ODOT presented their real-time data transportation data platform at the last Kafka Summit Americas:

The whole Kafka ecosystem powers ODOT’s cloud-native Event Streaming Platform (ESP). The platform enables continuous data integration and stream processing for transactional and analytical workloads. The ESP runs on Kubernetes to provide an elastic, flexible, and scalable infrastructure for real-time data processing.

Deutsche Bahn – Single Source of Truth and Google Maps Integration in Real-time

Deutsche Bahn is a German railway company. It is a private joint-stock company (AG), with the Federal Republic of Germany being its single shareholder. I already talked about their real-time traveler information system in another blog post: “Mobility Services and Transportation powered by Apache Kafka“.

They leverage the Apache Kafka ecosystem powered by Confluent because it combines several characteristics that you would have to integrate with different technologies otherwise:

Real-time messaging
Data integration
Data correlation
Storage and caching
Replication and high availability
Elastic scalability

This example is excellent for this blog. It shows how an existing solution needs connectivity to other internal applications and 3rd party services to provide a better customer experience and expand the customer base.

Recently, Deutsche Bahn integrated its platform with Google Maps via Google’s Open API. In addition to a better customer experience, the railway company can reach out to many new end-users to expand their business. The Railway-News has a good article about this integration. Here is my summary:

Free Now – Mobility Service in the Cloud Connected to Regional Laws and Vehicles

Free Now (former MyTaxi) is a mobility service. Their app uses mobile and GPS technology to match taxi drivers with passengers based on availability and proximity. Mobility services need to integrate with other 3rd party services for routing, payment, tax implications, and many different use cases.

Here is one example from Free Now’s Kafka Summit talk where they explain the added value of continuous stream processing for calculating context-specific dynamic pricing:

The public administration is always involved when a new mobility service is released to the public. While some cities build their mobility services, the reality is that most governments provide the infrastructure together with the Telco providers, and 3rd party vendors provide the mobility service. The specific relationship between the government, city, and mobility service provider differs across regions, countries, and continents.

Almost every mobility service uses Kafka as its backbone. Google for your favorite mobility service across the globe and add “Kafka” to the search. Chances are very high that you find some excellent blog posts, conferences talks, or at least job offers from the mobility service’s recruiting page. Here are just a few examples that posted great content about their Kafka usage: Uber, Lyft, Grab, Otonomo, Here Technologies, and many more.

Data in Motion with Kafka for a Connected and Innovative Smart City

Smart City is a vast topic. Many stakeholders are involved. Collaboration and Open APIs are critical for success. In most cases, governments work together with telco providers, infrastructure providers such as the cloud hyperscalers, and software vendors (including an event streaming platform like Kafka).

Most valuable and innovative smart city use cases require data processing in real-time. The use cases require data integration, storage, and backpressure handling, and data correlation. Event Streaming is the ideal technology for these use cases. Examples from the Ohio Department of Transportation, Deutsche Bahn and its Google Maps integration, and Free Now showed a few different angles to realize successful smart city projects.

How do you leverage event streaming in the public sector? Are you working on smart city projects? What technologies and architectures do you use? What projects did you already work on or are in the planning? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka in the Public Sector – Part 2: Smart City appeared first on Kai Waehner.

Apache Kafka for Industrial IoT and Manufacturing 4.0

Kai Waehner — Wed, 19 May 2021 08:47:24 +0000

This post explores use cases and architectures for processing data in motion with Apache Kafka in Industrial IoT (IIoT) across verticals such as automotive, energy, steel manufacturing, oil&gas, cybersecurity, shipping, logistics. Use cases include predictive maintenance, quality assurance, track and track, real-time locating system (RTLS), asset tracking, customer 360, and more. Examples include BMW, Bosch, Baader, Intel, Porsche, and Devon.

Why Kafka Is a Key Piece of the Evolution for Industrial IoT and Manufacturing

Industrial IoT was a mess of monolithic and proprietary technologies in the last decades. Modbus, Siemens S7, SCADA, and similar “concepts” controlled the industry. Vendors locked in enterprises by intentionally building incompatible products without open interfaces. These systems still run on Windows XP or similar non-supported outdated operating systems and without security in mind.

Fortunately, this is completely changing. Apache Kafka and its ecosystem play a key role in the IIoT evolution. System integration and data processing get an open architecture with a scalable, reliable infrastructure.

I speak to customers in this industry every week across the globe. Very different challenges, use cases, and innovative ideas originate. I covered this topic a lot in the past, already.

Check out my other related blog posts for Kafka in IIoT and Manufacturing. Learn about use cases and architecture for deployments at the edge (i.e., outside the data center), the relation between Kafka and other IoT standards like MQTT or OPC-UA, and how to build a modern, open and scalable data historian.

I want to highlight one post as it is superimportant for any discussion around shop floors, PLCs, machines, robots, cars, and any other embedded systems: Kafka and other IT software are NOT hard real-time.

This post here “just” shares my latest presentation on this topic, including the slide deck and on-demand video recording. Before we get there, let’s summarize the current scenarios for Kafka in Industrial IoT in one concrete example.

Requirements for Industrial IoT: Everywhere, Complete, Cloud-native!

Let’s take a look at one specific example. The following picture depicts the usage of event streaming in combination with other OT and IT technologies in the shipping industry:

This is an interesting example because it shows many challenges and requirements of many Industrial IoT real-world scenarios across verticals:

Everywhere: Industrial IoT is not possible only in the cloud. The edge is impossible to avoid because manufacturing produces tangible goods. Integration between the (often disconnected) edge and the data center is essential for many use cases.
Complete: Industrial IoT is mission-critical. Stability with zero downtime, security, and safety are crucial across verticals. The only realistic option is a robust, battle-tested enterprise-grade solution to realize IIoT use cases.
Cloud-native: Automation, scalability, decoupled agile applications, and flexibility regarding technologies and products are required for enterprises to stay competitive. Not just in the cloud, but also at the edge! Not all use cases required a critical, scalable solution, though. For instance, a single broker for data processing and storage is sufficient in a disconnected drone.

A unique value of Kafka is that you can use one single technology for scalable real-time messaging, storage and caching, continuous stateless and stateful data processing, and data integration with the OT and IT world. This is especially important at the edge where the hardware is constrained, and the network is limited. It is much easier to operate and much more cost-efficient to deploy one single infrastructure instead of glue together a best-of-breed like you often do in the cloud.

With this introduction, let’s now share the slide deck and video recording to talk about all these points in much more detail.

Slide Deck: Kafka for Industrial IoT and Manufacturing 4.0

Here is the slide deck:

Video Recording: Connect All the Things

Here is the video recording:

Apache Kafka for an open, scalable, flexible IIoT Architecture

Industrial IoT was a mess of monolithic and proprietary technologies in the last decades. Fortunately, Apache Kafka is completely changing many industrial environments. An open architecture with a scalable, reliable infrastructure changes how systems are integrated and how data is processed in the future.

What are your experiences and plans in IIoT projects? What use case and architecture did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka for Industrial IoT and Manufacturing 4.0 appeared first on Kai Waehner.

Apache Kafka and MQTT (Part 3 of 5) – Manufacturing 4.0 and Industrial IoT

Kai Waehner — Mon, 22 Mar 2021 09:18:19 +0000

Apache Kafka and MQTT are a perfect combination for many Industrial IoT use cases. This blog series covers the pros and cons of both technologies. Various use cases across industries, including connected vehicles, manufacturing, mobility services, and smart city are explored. The examples use different architectures, including lightweight edge scenarios, hybrid integrations, and serverless cloud solutions. This post is part three: Manufacturing, Industrial IoT, and Industry 4.0.

Apache Kafka + MQTT Blog Series

The first blog post explores the relation between MQTT and Apache Kafka. Afterward, the other four blog posts discuss various use cases, architectures, and reference deployments.

Part 1 – Overview: Relation between Kafka and MQTT, pros and cons, architectures
Part 2 – Connected Vehicles: MQTT and Kafka in a private cloud on Kubernetes; use case: remote control and command of a car
Part 3 – Manufacturing (THIS POST): MQTT and Kafka at the edge in a smart factory; use case: Bidirectional OT-IT integration with Sparkplug between PLCs, IoT Gateways, Data Historian, MES, ERP, Data Lake, etc.
Part 4 – Mobility Services: MQTT and Kafka leveraging serverless cloud infrastructure; use case: Traffic jam prediction service using machine learning
Part 5 – Smart City: MQTT at the edge connected to fully-managed Kafka in the public cloud; use case: Intelligent traffic routing by combining and correlating 3rd party services

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts as soon as published.

Use Case: Manufacturing 4.0 and Industrial IoT with Kafka

The following list shows different examples where Kafka is used as a strategic platform for various manufacturing use cases to implement Industry 4.0 initiatives:

Track&Trace / Production Control / Plant Logistics
Quality Assurance / Yield Management
Predictive Maintenance
Supply Chain Management
Cybersecurity
Servitization leveraging Digital Twins
Additive Manufacturing
Augmented Reality
Many more…

I already covered this topic in detail recently. Hence, check out the blog post “Apache Kafka for Manufacturing and Industrial IoT“. That post includes a detailed slide deck and video recording.

Let’s specific examples for Kafka and MQTT in Manufacturing 4.0 in the following sections.

Architecture: Smart Factory and Industry 4.0 with Kafka

The following diagram shows the architecture of a smart factory. Both MQTT and Kafka infrastructure are deployed at the edge in the factory for security, latency, and cost reasons:

This example connects to modern PLCs and other gateways via MQTT and Sparkplug B. The benefit of using this technology is a lightweight communication protocol and open standard.

An OT middleware such as OSIsoft PI is required if legacy and proprietary protocols such as Modbus or Siemens S7 need to be integrated. Most plants today are brownfield. Hence both proprietary integration platforms and open MQTT or OPC-UA integration must communicate with Kafka from the OT side. “Apache Kafka as Data Historian” explores the integration options in more detail.

The IT components (such as SAP ERP) and the integration platforms (HiveMQ, Confluent) run in the factory. Obviously, many other architectures are possible if latency and security allow it. Check out “Building a Smart Factory with Apache Kafka and 5G Campus Networks” for a hybrid cloud architecture. Additionally, most smart factories are not completely independent from the central IT world running in a remote data center or public cloud. Various “architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments” exist to replicate data bi-directionally between smart factories and data centers.

Example: MQTT for Critical Manufacturing @ Daimler

Manufacturing processes in the automotive industry cannot go down. Hence, Daimler built a Vehicle Diagnostic System (VDS) to efficiently share information between test devices on the factory floor and enterprise IT systems. The VDS fulfills some core functionality in the manufacturing process for E/E components, such as calibrating sensors controlled by an ECU, flashing new firmware, personalizing the key to the car, and testing to make sure each ECU works properly.

MQTT works in bad networks. Therefore, test devices behave properly even if the network connection is dropped and reconnected.

The system is rolled out to 24 factories around the world. 10,000 testing devices are connected. The devices generate 470 million messages/month.

The complete case study from HiveMQ explores the use case in more detail.

Example: Kafka in the Cloud for Business Critical Supply Chain Operations @ Baader

BAADER is a worldwide manufacturer of innovative machinery for the food processing industry. They run an IoT-based and data-driven food value chain on Confluent Cloud:

The Kafka-based infrastructure provides a single source of truth across the factories and regions across the food value chain. Business-critical operations are available 24/7 for tracking, calculations, alerts, etc.

The event streaming platform runs on Confluent Cloud. Hence, Baader can focus on building new innovative business applications. The serverless Kafka infrastructure provides mission-critical SLAs and consumption-based pricing for all required capabilities: Messaging, storage, data integration, and data processing.

MQTT provides connectivity to machines and GPS data from vehicles at the edge. Kafka Connect connectors integrate MQTT and other IT systems such as Elasticsearch, MongoDB, and AWS S3. ksqlDB processes the data in motion continuously. Stream processing or streaming analytics are other terms for this concept.

Kafka + MQTT = Manufacturing 4.0

In conclusion, Apache Kafka and MQTT are a perfect combination for Manufacturing, Industrial IoT, and Industry 4.0.

Follow the blog series to learn about use cases such as connected vehicles, manufacturing, mobility services, and smart city. Every blog post also includes real-world deployments from companies across industries. It is key to understand the different architectural options to make the right choice for your project.

What are your experiences and plans in IoT projects? What use case and architecture did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka and MQTT (Part 3 of 5) – Manufacturing 4.0 and Industrial IoT appeared first on Kai Waehner.

Apache Kafka and MQTT (Part 2 of 5) – V2X and Connected Vehicles

Kai Waehner — Fri, 19 Mar 2021 08:00:31 +0000

Apache Kafka and MQTT are a perfect combination for many IoT use cases. This blog series covers the pros and cons of both technologies. Various use cases across industries, including connected vehicles, manufacturing, mobility services, and smart city are explored. The examples use different architectures, including lightweight edge scenarios, hybrid integrations, and serverless cloud solutions. This post is part two: Connected Vehicles and V2X applications.

Apache Kafka + MQTT Blog Series

The first blog post explores the relation between MQTT and Apache Kafka. Afterward, the other four blog posts discuss various use cases, architectures, and reference deployments.

Part 1 – Overview: Relation between Kafka and MQTT, pros and cons, architectures
Part 2 – Connected Vehicles (THIS POST): MQTT and Kafka in a private cloud on Kubernetes; use case: remote control and command of a car
Part 3 – Manufacturing: MQTT and Kafka at the edge in a smart factory; use case: Bidirectional OT-IT integration with Sparkplug between PLCs, IoT Gateways, Data Historian, MES, ERP, Data Lake, etc.
Part 4 – Mobility Services: MQTT and Kafka leveraging serverless cloud infrastructure; use case: Traffic jam prediction service using machine learning
Part 5 – Smart City: MQTT at the edge connected to fully-managed Kafka in the public cloud; use case: Intelligent traffic routing by combining and correlating 3rd party services

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts as soon as published.

Use Case: Connected Vehicles and V2X

Vehicle-to-everything (V2X) is communication between a vehicle and any entity that may affect, or may be affected by, the vehicle. It is a vehicular communication system that incorporates other more specific types of communication as V2I (vehicle-to-infrastructure), V2N (vehicle-to-network), V2V (vehicle-to-vehicle), V2P (vehicle-to-pedestrian), V2D (vehicle-to-device), and V2G (vehicle-to-grid). The main motivations for V2X are road safety, traffic efficiency, energy savings, and better driver experience.

V2X includes various use cases. The following picture from 3G4G shows some examples :

Business Point of View for Connected Vehicles

From a business perspective, the following diagram from Frost & Sullivan explains the use cases for connected vehicles very well:

Technical Point of View for V2X and Connected Vehicles

A few things to point out from a technical perspective:

MQTT + Kafka provides a scalable real-time infrastructure for high volumes of data in motion in milliseconds with end-to-end processing between 10 and 20ms. This is good enough for the integration with backend IT systems and almost all mobility services.
MQTT and Kafka are not used for hard real-time and deterministic embedded systems.
Some safety-critical V2X use cases require other communication technologies such as 5G New Radio (NR) / NR C-V2X sidelink to directly connect vehicles or vehicles and local infrastructure (e.g. traffic lights). There is no need for an intermediary cellular network or radio access network (RAN).
Example: A self-driving car executes all its algorithms like image processing and decision making within the car in embedded systems. These use cases require deterministic behavior and hard real-time. Communication with 3rd party such as emergency services, traffic routing, parking, etc., connects to backend systems for data correlation (close to the edge or far away in a cloud data center). Real-time in milliseconds – or sometimes even seconds – is good enough in these cases.
Not every application is for tens or hundreds of thousands of connected vehicles. For instance, a real-time locating system (RTLS) is a perfect example for realizing use cases in logistics in transportation. This can be geofencing within a plant or regional global track&trace. “Real-Time Locating System (RTLS) with Apache Kafka for Transportation and Logistics” explores this use case in more detail.

The following sections focus on use cases that require real-time (but not hard real-time) data integration and processing at scale with 24/7 uptime between vehicles, networks, infrastructure, and applications.

Architecture: MQTT and Kafka for Connected Vehicles

Let’s take a look at an example: Remote control and command of a car. This can be simple scenarios like opening your car trunk from a remote location with your digital key for the mailman or more sophisticated use cases like the payment process for buying a new feature via OTA (over the air) update.

The following diagram shows an architecture for V2X leveraging MQTT and Kafka:

A few notes on the above architecture:

The MQTT and Kafka clusters run in a Kubernetes environment.
Kubernetes allows the deployment across data centers and multiple cloud providers with a single “template”.
Bi-directional communication is guaranteed in reliable, scalable infrastructure end-to-end in real-time.
The MQTT clients from cars and mobile devices communicate with the MQTT cluster. This allows connecting hundreds of thousands of interfaces and support of bad networks.
Kafka is the integration backbone for connected vehicles and mobile devices. Use cases include streaming ETL, correlation of the data in stateful business applications, or ingestion into other IT applications, databases, and cloud services.

V2X with MQTT and Kafka in a 5G Infrastructure

The following diagram shows the above use cases around connected vehicles from the V2X perspective:

The infrastructure is separated into three categories and networks:

The edge (vehicles, devices) using local processing and remote integration via 5G.
MEC (multi-access edge computing) region for low-latency use cases. This example leverages AWS Wavelength for combining the power of 5G with cloud services and Confluent Platform for processing data in motion at scale.
The public cloud infrastructure using AWS and Confluent Cloud for all other cloud-native applications.

The integration between the edge and the IT world depends on the requirements. In this example, we use mostly MQTT but also HTTP for the integration with the Kafka cluster. The connectivity to other IT applications happens via Kafka-native interfaces such as Kafka clients, Kafka Connect, or Confluent’s Cluster Linking (for the bi-directional replication between the AWS Wavelength zone and the AWS cloud region).

Direct communication between vehicles or vehicles and pedestrians requires deterministic behavior and ultra-low latency. Hence, this communication does not use technologies like MQTT or Kafka. Technologies like 5G Sidelink were invented for these requirements.

Let’s now look at two-real world examples for connected vehicles.

Example: MQTT and Kafka for Millions of Connected Cars @ Autonomic

Autonomic built the Transportation Mobility Cloud (TMC), a standard way of accessing connected vehicle data and sending remote commands. This platform provides the foundation to build smart mobility applications related to driver safety, preventive maintenance, fleet management.

Autonomic built a solution with MQTT and Kafka to connect millions of cars. MQTT forwards the car data in real-time to Kafka to distribute the messages to the different microservices and applications in the platform.

This is a great example of combining the benefits of MQTT and Kafka. Read the complete case study from HiveMQ for more details.

Example: Kafka as Car Data Collector @ Audi

Audi started its journey for connected cars a long time ago to collect data from hundreds of thousands of cars in real-time. The car data is collected and processed in real-time with Apache Kafka. The following diagram shows the idea:

As you can imagine, tens of potential use cases exist to reduce cost, improve the customer experience, and increase revenue. The following is the example of a real-time service to find a free parking lot:

Watch Audi’s Kafka Summit keynote for more details about the infrastructure and use cases.

Slide Deck – Kafka for Connected Vehicles and V2X

Here is a slide deck covering this topic in more detail:

Kafka + MQTT = Connected Vehicles and V2X

In conclusion, Apache Kafka and MQTT are a perfect combination for V2X and connected vehicles. It makes so many new IoT use cases possible!

Follow this blog series to learn about use cases such as connected vehicles, manufacturing, mobility services, and smart city. Every blog post also includes real-world deployments from companies across industries. It is key to understand the different architectural options to make the right choice for your project.

The post Apache Kafka and MQTT (Part 2 of 5) – V2X and Connected Vehicles appeared first on Kai Waehner.