Featured Posts Archives - Kai Waehner https://www.kai-waehner.de/blog/category/featured-posts/ Technology Evangelist - Big Data Analytics - Middleware - Apache Kafka Sun, 17 Nov 2019 15:35:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 https://www.kai-waehner.de/wp-content/uploads/2020/01/cropped-favicon-32x32.png Featured Posts Archives - Kai Waehner https://www.kai-waehner.de/blog/category/featured-posts/ 32 32 Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing https://www.kai-waehner.de/blog/2019/09/02/apache-kafka-ksql-and-apache-plc4x-for-iiot-data-integration-and-processing/ Mon, 02 Sep 2019 09:35:03 +0000 http://www.kai-waehner.de/blog/?p=1564 Data integration and processing in Industrial IoT (IIoT, aka Industry 4.0 or Automation Industry). Apache Kafka, its ecosystem (Kafka Connect, KSQL) and Apache PLC4X are a great open source choice to implement this integration end to end in a scalable, reliable and flexible way.

The post Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing appeared first on Kai Waehner.

]]>
Data integration and processing is a huge challenge in Industrial IoT (IIoT, aka Industry 4.0 or Automation Industry) due to monolithic systems and proprietary protocols. Apache Kafka, its ecosystem (Kafka Connect, KSQL) and Apache PLC4X are a great open source choice to implement this IIoT integration end to end in a scalable, reliable and flexible way.

This blog post covers a high level overview about the challenges and a good, flexible architecture to solve the problems. At the end, I share a video recording and the corresponding slide deck. These provide many more details and insights.

Challenges in IIoT / Industry 4.0

Here are some of the key challenges in IIoT / Industry 4.0:

  • IoT != IIoT: Automation industry does not use MQTT or other standards, but is slow, insecure, not scalable and proprietary.
  • Product Lifecycles are very long (tens of years), no simple changes or upgrades
  • IIoT usually uses incompatible protocols, typically proprietary and just built for one specific vendor.
  • Automation industry uses proprietary and expensive monoliths which are not scalable and not extendible.
  • Machines and PLCs are insecure by nature with no authentication, no authorization, no encryption.

This is still state of the art in automation industry. This is no surprise with such long product life cycles, but still very concerning.

Evolution of Convergence between IT and Automation Industry

Today, everybody talks about cloud, big data analytics, machine learning and real time processing at scale. The convergence between IT and Automation Industry is coming, as the analyst report from IoT research company IOT Analytics shows:

Evolution of convergence between IT and Automation Industry

There is huge demand to build an open, flexible, scalable platform. Many opportunities from business and technical perspective:

  • Cost reduction
  • Flexibility
  • Standards-based
  • Scalability
  • Extendibility
  • Security
  • Infrastructure-independent

So, how to get from legacy technologies and proprietary IIoT protocols to cloud, big data, machine learning, real time processing? How to build a reliable, scalable and flexible architecture and infrastructure?

Apache Kafka and Apache PLC4X for End-to-End IIoT Integration

I assume you already know it: Apache Kafka is the De-facto Standard for Real-Time Event Streaming. It provides

  • Open Source (Apache 2.0 License)
  • Global-scale
  • Real-time
  • Persistent Storage
  • Stream Processing

Global-scale Real-time Persistent Storage Stream Processing

If you need more details about Apache Kafka, check out the Kafka website, the extensive Confluent documentation or some free video recordings and slides from any Kafka Summit to learn about the technology and use cases.

The only very important thing I want to point out is that Apache Kafka includes Kafka Connect and Kafka Streams:

Apache Kafka includes Kafka Connect and Kafka Streams

Kafka Connect enables reliable and scalable integration of Kafka with other systems. Kafka Streams allows to write standard Java apps and microservices to continuously process your data in real-time with a lightweight stream processing API. And finally, KSQL enables Stream Processing using SQL-like Semantics.

Apache PLC4X for PLC Integration (Siemens S7, Modbus,  Allen Bradley, Beckhoff ADS, etc.)

Apache PLC4X is less established on the market than Apache Kafka. It also “just covers a niche” (a big one, of course) compared to Kafka, which is used in any industry for many different use cases. However, PLC4X is a very interesting top level Apache project for automation industry.

The Goal is to open up PLC interfaces from IIoT world to the outside world. PCL4X allows vertical integration and to write software independent of PLCs using JDBC-like adapters for various protocols like Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, OPC-UA, Emerson, Profinet, BACnet, Ethernet.

PLC4X provides a Kafka Connect connector. Therefore, you can leverage the benefits of Apache Kafka (high availability, high throughput, high scalability reliability, real time processing) to deploy PLC4X integration pipelines. With this, you can build one single architecture and infrastructure for

  • legacy IIoT connectivity using PLC4X and Kafka Connect
  • data processing using Kafka Streams / KSQL
  • integration with the rest of the enterprise using Kafka Connect and any other sink (database, big data analytics, machine learning, ERP, CRM, cloud services, custom business applications, etc.)

Apache Kafka and PLC4X Architecture for IIoT Automation Industry

As Kafka decouples the producers from the consumers, you can consume the IIoT machine sensor data from any application – some might be real time, some might be batch, and some might be request-response communication for human interaction on a web or mobile app.

Apache PLC4X vs. OPC-UA

A little bit off-topic: How to choose between Apache PLC4X (open source framework for IIoT) and OPC-UA (open standard for IIoT). In short, both are different things and can also be complementary. Here is a comparison:

OPC-UA

  • Open standard
  • All the pros and cons of an open standard (works with different vendors; slow adoption; inflexible, etc.)
  • Often poorly implemented by the vendors
  • Requires app server on top of PLC
  • Every device has to be retrofitted with the ability to speak a new protocol and use a common client to speak with these devices
  • Often over-engineering for just reading the data
  • Activating OPC-UA support on existing PLCs greatly increases the load on the PLCs
  • With licensing cost for every machine

Apache PLC4X

  • Open source framework (Apache 2.0 license)
  • Provides unified API by implementing drivers for communicating with most industrial controllers in the protocols they natively understand
  • No need to modify existing hardware
  • No increased load on the PLCs
  • No need to pay for licenses to activate OPC-UA support
  • Drivers being implemented from the specs or by reverse engineering protocols in order to be fully Apache 2.0 licensed
  • PLC4X adapter for OPC-UA available -> Both can be used together!

As you see, both have their pros and cons. To me, and this is clearly my subjective opinion, PLC4X provides a great alternatives with high flexibility and low footprint.

Confluent and IoT Platform Solutions

Many IoT Platform Solutions are available on the market. This includes products like Siemens MindSphere or Cisco Kinetic, and cloud services from the major cloud providers like AWS, GCP or Azure. And you have Kafka + PLC4X as you just learned above. Often, this is not a “neither … nor” decision:

Confluent and IoT Platform Solutions

You can either use

  • just Kafka and PLC4X for lightweight and flexible IIoT integration based on a scalable, reliable and open event streaming platform
  • just a IoT Platform Solution if the pros of such a specific product (dedicated for a specific vendor protocol, nice GUI, etc.) outperform the cons (like high cost, proprietary and inflexible solution)
  • both together where you use the IoT Platform Solution to integrate with the PLCs and then send the data to Kafka to integrate with the rest of the enterprise (with all the benefits and added value Kafka brings)
  • both together where you use Kafka and PLC4X for PLC integration and one of the consumers is the  IoT Platform Solution (while other consumers can also get the data from Kafka – fully decoupled from the IoT Platform Solution)

All alternatives have their pros and cons. There is no single solution which fits every use case! Therefore, no surprise that most IoT Solution Platforms provide Kafka source and sink connectors.

Apache Kafka and Apache PLC4X – Slides / Video Recording / Github Code Example

If you got curious about more details and insights, please check out my video recording and slide deck.

Slide Deck – Apache Kafka and PLC4X:

Video Recording – Apache Kafka and PLC4X:

Github Code Example – Apache Kafka and PLC4X:

We are also building a nice and simple demo on Github these days:

Kafka-native end-to-end IIoT Data Integration and Processing with Kafka Connect, KSQL and Apache PLC4X

PLC4X gets most exciting if you try it out by yourself and connect to your machines or tools. So, check out the example and adjust it to connect to your infrastructure.

Feedback and Questions?

Please let me know your feedback and questions about Kafka, its ecosystem and PLC4X for IIoT integration. Let’s also connect on LinkedIn to discuss interesting IIoT use cases and technologies in the future.

 

The post Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing appeared first on Kai Waehner.

]]>
Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video https://www.kai-waehner.de/blog/2019/03/07/apache-kafka-middleware-mq-etl-esb-comparison/ Thu, 07 Mar 2019 15:45:15 +0000 http://www.kai-waehner.de/blog/?p=1423 This post shares a slide deck and video recording of the differences between an event-driven streaming platform like Apache Kafka and middleware like Message Queues (MQ), Extract-Transform-Load (ETL) and Enterprise Service Bus (ESB).

The post Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video appeared first on Kai Waehner.

]]>
Learn the differences between an event-driven streaming platform like Apache Kafka and middleware like Message Queues (MQ), Extract-Transform-Load (ETL) and Enterprise Service Bus (ESB). Including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.

This blog post shares my slide deck and video recording. I discuss the differences between Apache Kafka as Event Streaming Platform and integration middleware. Learn if they are friends, enemies or frenemies.

Problems of Legacy Middleware

Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations.

Legacy Middleware (MQ, ESB, ETL, etc)

Apache Kafka – An Open Source Event Streaming Platform

This session discusses how teams in different industries solve these challenges by building a native event streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime, rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.

A concrete example architecture shows how to build a complete streaming platform leveraging the widely-adopted open source framework Apache Kafka to build a mission-critical, scalable, highly performant streaming platform. Messaging, integration and stream processing are all build on top of the same strong foundation of Kafka; deployed on premise, in the cloud or in hybrid environments. In addition, the open source Confluent projects, based on top of Apache Kafka, adds additional features like a Schema Registry, additional clients for programming languages like Go or C, or many pre-built connectors for various technologies.

Apache Kafka Middleware

Slides: Apache Kafka vs. Integration Middleware

Here is the slide deck:

Video Recording: Kafka vs. MQ / ETL / ESB – Friends, Enemies or Frenemies?

Here is the video recording where I walk you through the above slide deck:

Article: Apache Kafka vs. Enterprise Service Bus (ESB)

I also published a detailed blog post on Confluent blog about this topic in 2018:

Apache Kafka vs. Enterprise Service Bus (ESB)

Talk and Slides from Kafka Summit London 2019

The slides and video recording from Kafka Summit London 2019 (which are similar to above) are also available for free.

Why Apache Kafka instead of Traditional Middleware?

If you don’t want to spend a lot of time on the slides and recording, here is a short summary of the differences of Apache Kafka compared to traditional middleware:

Why Apache Kafka instead of Integration Middleware

Questions and Discussion…

Please share your thoughts, too!

Does your infrastructure see similar architectures? Do you face similar challenges? Do you like the concepts behind an Event Streaming Platform (aka Apache Kafka)? How do you combine legacy middleware with Kafka? What’s your strategy to integrate the modern and the old (technology) world? Is Kafka part of that architecture?

Please let me know either via a comment or via LinkedIn, Twitter, email, etc. I am curious about other opinions and experiences (and people who disagree with my presentation).

The post Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video appeared first on Kai Waehner.

]]>
Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j https://www.kai-waehner.de/blog/2018/11/27/deep-learning-example-apache-kafka-python-keras-tensorflow-deeplearning4j/ Tue, 27 Nov 2018 16:29:04 +0000 http://www.kai-waehner.de/blog/?p=1390 "Python + Keras + TensorFlow + DeepLearning4j + Apache Kafka + Kafka Streams" => New example added to my "Machine Learning + Kafka Streams Examples" Github project.

The post Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j appeared first on Kai Waehner.

]]>
I added a new example to my “Machine Learning + Kafka Streams Examples” Github project:

Python + Keras + TensorFlow + DeepLearning4j + Apache Kafka + Kafka Streams“.

This blog post discusses the motivation and why this is a great combination of technologies for scalable, reliable Machine Learning infrastructures. For more details about building Machine Learning / Deep Learning infrastructures leveraging the Apache Kafka open source ecosystem, check out these two blog posts:

Deep Learning with Python and Keras

The goal was to show how you can easily deploy a model developed with Python and Keras to a Java / Kafka ecosystem. Keras allows to use different Deep Learning backends under the hood: TensorFlow, CNTK, or Theano. It acts as high level wrapper and is very easy to use even for people new to Machine Learning. Due to these reasons, Keras is getting a lot of traction these days.

Deployment of Keras Models to Java / Kafka Ecosystem with Deeplearning4J (DL4J)

Machine Learning frameworks have different options for combining it with Java platform (and therefore with Apache Kafka ecosystem), like native Java APIs to load models or RPC interfaces to its framework-specific model servers.

Deeplearning4j seems to become the de facto standard for deployment of Keras models (if you trust Google search). The Deep Learning framework specifically built for the Java platform added many features and bug fixes to its Keras Model Import and supports many Keras concepts already getting better and better with every new (beta) release. I used 1.0.0-beta3 for my Github example. Please check here for a complete list of supported Keras features in Deeplearning4j like Layers, Loss Functions, Activation Functions, Initializers and Optimizers.

You can either fully train a model with Keras and “just” embed it into a Java application for model inference, or re-use the model or parts of it to improve the model with DL4J’s Java API (aka transfer learning).

Example: Python + Keras + TensorFlow + Apache Kafka + DL4J

I implemented a simple but still impressive example:

Development of an analytic model trained with Python, Keras and TensorFlow and deployment to Java and Kafka ecosystem. Simple to implement (as you see in the source code), but powerful, scalable and reliable.

This is no business case this time, but just a technical demonstration of a simple ‘Hello World’ Keras model. Feel free to replace the model with any other Keras model trained with your backend of choice. You just need to replace the model binary (and use a model which is compatible with DeepLearning4J ‘s model importer). Then you can embed it into your Kafka application for real time model inference.

Machine Learning Technologies

  • Python
  • DeepLearning4J
  • Keras – a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
  • TensorFlow – used as backend under the hood of Keras
  • DeepLearning4J ‘s KerasModelImport feature is used for importing the Keras / TensorFlow model into Java. The used model is its ‘Hello World’ model example.
  • The Keras model was trained with this Python script.

Apache Kafka and DL4J for Real Time Predictions

The trained model is embedded into a Kafka Streams application for real time predictions. Here is the core Kafka Streams logic where I use the Deeplearning4j API to do predictions:

Kafka Streams and Deeplearning4j Deployment of Keras / TensorFlow Model

The full source code of my unit test is here: Kafka Streams + DeepLearning4j + Keras + TensorFlow. Just do a “git clone” of the Github project and run the Maven build and it should work out-of-the-box without any configuration.

The post Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j appeared first on Kai Waehner.

]]>