Java / JEE Archives - Kai Waehner https://www.kai-waehner.de/blog/category/java-jee/ Technology Evangelist - Big Data Analytics - Middleware - Apache Kafka Sun, 17 Nov 2019 15:35:55 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 https://www.kai-waehner.de/wp-content/uploads/2020/01/cropped-favicon-32x32.png Java / JEE Archives - Kai Waehner https://www.kai-waehner.de/blog/category/java-jee/ 32 32 Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing https://www.kai-waehner.de/blog/2019/09/02/apache-kafka-ksql-and-apache-plc4x-for-iiot-data-integration-and-processing/ Mon, 02 Sep 2019 09:35:03 +0000 http://www.kai-waehner.de/blog/?p=1564 Data integration and processing in Industrial IoT (IIoT, aka Industry 4.0 or Automation Industry). Apache Kafka, its ecosystem (Kafka Connect, KSQL) and Apache PLC4X are a great open source choice to implement this integration end to end in a scalable, reliable and flexible way.

The post Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing appeared first on Kai Waehner.

]]>
Data integration and processing is a huge challenge in Industrial IoT (IIoT, aka Industry 4.0 or Automation Industry) due to monolithic systems and proprietary protocols. Apache Kafka, its ecosystem (Kafka Connect, KSQL) and Apache PLC4X are a great open source choice to implement this IIoT integration end to end in a scalable, reliable and flexible way.

This blog post covers a high level overview about the challenges and a good, flexible architecture to solve the problems. At the end, I share a video recording and the corresponding slide deck. These provide many more details and insights.

Challenges in IIoT / Industry 4.0

Here are some of the key challenges in IIoT / Industry 4.0:

  • IoT != IIoT: Automation industry does not use MQTT or other standards, but is slow, insecure, not scalable and proprietary.
  • Product Lifecycles are very long (tens of years), no simple changes or upgrades
  • IIoT usually uses incompatible protocols, typically proprietary and just built for one specific vendor.
  • Automation industry uses proprietary and expensive monoliths which are not scalable and not extendible.
  • Machines and PLCs are insecure by nature with no authentication, no authorization, no encryption.

This is still state of the art in automation industry. This is no surprise with such long product life cycles, but still very concerning.

Evolution of Convergence between IT and Automation Industry

Today, everybody talks about cloud, big data analytics, machine learning and real time processing at scale. The convergence between IT and Automation Industry is coming, as the analyst report from IoT research company IOT Analytics shows:

Evolution of convergence between IT and Automation Industry

There is huge demand to build an open, flexible, scalable platform. Many opportunities from business and technical perspective:

  • Cost reduction
  • Flexibility
  • Standards-based
  • Scalability
  • Extendibility
  • Security
  • Infrastructure-independent

So, how to get from legacy technologies and proprietary IIoT protocols to cloud, big data, machine learning, real time processing? How to build a reliable, scalable and flexible architecture and infrastructure?

Apache Kafka and Apache PLC4X for End-to-End IIoT Integration

I assume you already know it: Apache Kafka is the De-facto Standard for Real-Time Event Streaming. It provides

  • Open Source (Apache 2.0 License)
  • Global-scale
  • Real-time
  • Persistent Storage
  • Stream Processing

Global-scale Real-time Persistent Storage Stream Processing

If you need more details about Apache Kafka, check out the Kafka website, the extensive Confluent documentation or some free video recordings and slides from any Kafka Summit to learn about the technology and use cases.

The only very important thing I want to point out is that Apache Kafka includes Kafka Connect and Kafka Streams:

Apache Kafka includes Kafka Connect and Kafka Streams

Kafka Connect enables reliable and scalable integration of Kafka with other systems. Kafka Streams allows to write standard Java apps and microservices to continuously process your data in real-time with a lightweight stream processing API. And finally, KSQL enables Stream Processing using SQL-like Semantics.

Apache PLC4X for PLC Integration (Siemens S7, Modbus,  Allen Bradley, Beckhoff ADS, etc.)

Apache PLC4X is less established on the market than Apache Kafka. It also “just covers a niche” (a big one, of course) compared to Kafka, which is used in any industry for many different use cases. However, PLC4X is a very interesting top level Apache project for automation industry.

The Goal is to open up PLC interfaces from IIoT world to the outside world. PCL4X allows vertical integration and to write software independent of PLCs using JDBC-like adapters for various protocols like Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, OPC-UA, Emerson, Profinet, BACnet, Ethernet.

PLC4X provides a Kafka Connect connector. Therefore, you can leverage the benefits of Apache Kafka (high availability, high throughput, high scalability reliability, real time processing) to deploy PLC4X integration pipelines. With this, you can build one single architecture and infrastructure for

  • legacy IIoT connectivity using PLC4X and Kafka Connect
  • data processing using Kafka Streams / KSQL
  • integration with the rest of the enterprise using Kafka Connect and any other sink (database, big data analytics, machine learning, ERP, CRM, cloud services, custom business applications, etc.)

Apache Kafka and PLC4X Architecture for IIoT Automation Industry

As Kafka decouples the producers from the consumers, you can consume the IIoT machine sensor data from any application – some might be real time, some might be batch, and some might be request-response communication for human interaction on a web or mobile app.

Apache PLC4X vs. OPC-UA

A little bit off-topic: How to choose between Apache PLC4X (open source framework for IIoT) and OPC-UA (open standard for IIoT). In short, both are different things and can also be complementary. Here is a comparison:

OPC-UA

  • Open standard
  • All the pros and cons of an open standard (works with different vendors; slow adoption; inflexible, etc.)
  • Often poorly implemented by the vendors
  • Requires app server on top of PLC
  • Every device has to be retrofitted with the ability to speak a new protocol and use a common client to speak with these devices
  • Often over-engineering for just reading the data
  • Activating OPC-UA support on existing PLCs greatly increases the load on the PLCs
  • With licensing cost for every machine

Apache PLC4X

  • Open source framework (Apache 2.0 license)
  • Provides unified API by implementing drivers for communicating with most industrial controllers in the protocols they natively understand
  • No need to modify existing hardware
  • No increased load on the PLCs
  • No need to pay for licenses to activate OPC-UA support
  • Drivers being implemented from the specs or by reverse engineering protocols in order to be fully Apache 2.0 licensed
  • PLC4X adapter for OPC-UA available -> Both can be used together!

As you see, both have their pros and cons. To me, and this is clearly my subjective opinion, PLC4X provides a great alternatives with high flexibility and low footprint.

Confluent and IoT Platform Solutions

Many IoT Platform Solutions are available on the market. This includes products like Siemens MindSphere or Cisco Kinetic, and cloud services from the major cloud providers like AWS, GCP or Azure. And you have Kafka + PLC4X as you just learned above. Often, this is not a “neither … nor” decision:

Confluent and IoT Platform Solutions

You can either use

  • just Kafka and PLC4X for lightweight and flexible IIoT integration based on a scalable, reliable and open event streaming platform
  • just a IoT Platform Solution if the pros of such a specific product (dedicated for a specific vendor protocol, nice GUI, etc.) outperform the cons (like high cost, proprietary and inflexible solution)
  • both together where you use the IoT Platform Solution to integrate with the PLCs and then send the data to Kafka to integrate with the rest of the enterprise (with all the benefits and added value Kafka brings)
  • both together where you use Kafka and PLC4X for PLC integration and one of the consumers is the  IoT Platform Solution (while other consumers can also get the data from Kafka – fully decoupled from the IoT Platform Solution)

All alternatives have their pros and cons. There is no single solution which fits every use case! Therefore, no surprise that most IoT Solution Platforms provide Kafka source and sink connectors.

Apache Kafka and Apache PLC4X – Slides / Video Recording / Github Code Example

If you got curious about more details and insights, please check out my video recording and slide deck.

Slide Deck – Apache Kafka and PLC4X:

Video Recording – Apache Kafka and PLC4X:

Github Code Example – Apache Kafka and PLC4X:

We are also building a nice and simple demo on Github these days:

Kafka-native end-to-end IIoT Data Integration and Processing with Kafka Connect, KSQL and Apache PLC4X

PLC4X gets most exciting if you try it out by yourself and connect to your machines or tools. So, check out the example and adjust it to connect to your infrastructure.

Feedback and Questions?

Please let me know your feedback and questions about Kafka, its ecosystem and PLC4X for IIoT integration. Let’s also connect on LinkedIn to discuss interesting IIoT use cases and technologies in the future.

 

The post Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing appeared first on Kai Waehner.

]]>
Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j https://www.kai-waehner.de/blog/2018/11/27/deep-learning-example-apache-kafka-python-keras-tensorflow-deeplearning4j/ Tue, 27 Nov 2018 16:29:04 +0000 http://www.kai-waehner.de/blog/?p=1390 "Python + Keras + TensorFlow + DeepLearning4j + Apache Kafka + Kafka Streams" => New example added to my "Machine Learning + Kafka Streams Examples" Github project.

The post Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j appeared first on Kai Waehner.

]]>
I added a new example to my “Machine Learning + Kafka Streams Examples” Github project:

Python + Keras + TensorFlow + DeepLearning4j + Apache Kafka + Kafka Streams“.

This blog post discusses the motivation and why this is a great combination of technologies for scalable, reliable Machine Learning infrastructures. For more details about building Machine Learning / Deep Learning infrastructures leveraging the Apache Kafka open source ecosystem, check out these two blog posts:

Deep Learning with Python and Keras

The goal was to show how you can easily deploy a model developed with Python and Keras to a Java / Kafka ecosystem. Keras allows to use different Deep Learning backends under the hood: TensorFlow, CNTK, or Theano. It acts as high level wrapper and is very easy to use even for people new to Machine Learning. Due to these reasons, Keras is getting a lot of traction these days.

Deployment of Keras Models to Java / Kafka Ecosystem with Deeplearning4J (DL4J)

Machine Learning frameworks have different options for combining it with Java platform (and therefore with Apache Kafka ecosystem), like native Java APIs to load models or RPC interfaces to its framework-specific model servers.

Deeplearning4j seems to become the de facto standard for deployment of Keras models (if you trust Google search). The Deep Learning framework specifically built for the Java platform added many features and bug fixes to its Keras Model Import and supports many Keras concepts already getting better and better with every new (beta) release. I used 1.0.0-beta3 for my Github example. Please check here for a complete list of supported Keras features in Deeplearning4j like Layers, Loss Functions, Activation Functions, Initializers and Optimizers.

You can either fully train a model with Keras and “just” embed it into a Java application for model inference, or re-use the model or parts of it to improve the model with DL4J’s Java API (aka transfer learning).

Example: Python + Keras + TensorFlow + Apache Kafka + DL4J

I implemented a simple but still impressive example:

Development of an analytic model trained with Python, Keras and TensorFlow and deployment to Java and Kafka ecosystem. Simple to implement (as you see in the source code), but powerful, scalable and reliable.

This is no business case this time, but just a technical demonstration of a simple ‘Hello World’ Keras model. Feel free to replace the model with any other Keras model trained with your backend of choice. You just need to replace the model binary (and use a model which is compatible with DeepLearning4J ‘s model importer). Then you can embed it into your Kafka application for real time model inference.

Machine Learning Technologies

  • Python
  • DeepLearning4J
  • Keras – a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
  • TensorFlow – used as backend under the hood of Keras
  • DeepLearning4J ‘s KerasModelImport feature is used for importing the Keras / TensorFlow model into Java. The used model is its ‘Hello World’ model example.
  • The Keras model was trained with this Python script.

Apache Kafka and DL4J for Real Time Predictions

The trained model is embedded into a Kafka Streams application for real time predictions. Here is the core Kafka Streams logic where I use the Deeplearning4j API to do predictions:

Kafka Streams and Deeplearning4j Deployment of Keras / TensorFlow Model

The full source code of my unit test is here: Kafka Streams + DeepLearning4j + Keras + TensorFlow. Just do a “git clone” of the Github project and run the Maven build and it should work out-of-the-box without any configuration.

The post Deep Learning Example: Apache Kafka + Python + Keras + TensorFlow + Deeplearning4j appeared first on Kai Waehner.

]]>
Deep Learning KSQL UDF for Streaming Anomaly Detection of MQTT IoT Sensor Data https://www.kai-waehner.de/blog/2018/08/02/deep-learning-kafka-ksql-udf-anomaly-detection-mqtt-iot-sensor/ Thu, 02 Aug 2018 08:12:02 +0000 http://www.kai-waehner.de/blog/?p=1327 KSQL UDF for sensor analytics. Leverages the new API features of KSQL to build UDF / UDAF functions easily with Java to do continuous stream processing with Apache Kafka. Use Case: Connected Cars - Real Time Streaming Analytics using Deep Learning.

The post Deep Learning KSQL UDF for Streaming Anomaly Detection of MQTT IoT Sensor Data appeared first on Kai Waehner.

]]>
I built a scenario for a hybrid machine learning infrastructure leveraging Apache Kafka as scalable central nervous system. The public cloud is used for training analytic models at extreme scale (e.g. using TensorFlow and TPUs on Google Cloud Platform (GCP) via Google ML Engine. The predictions (i.e. model inference) are executed on premise at the edge in a local Kafka infrastructure (e.g. leveraging Kafka Streams or KSQL for streaming analytics).

This post focuses on the on premise deployment. I created a Github project with a KSQL UDF for sensor analytics. It leverages the new API features of KSQL to build UDF / UDAF functions easily with Java to do continuous stream processing on incoming events.

Use Case: Connected Cars – Real Time Streaming Analytics using Deep Learning

Continuously process millions of events from connected devices (sensors of cars in this example):

Connected_Cars_IoT_Deep_Learning

I built different analytic models for this. They are trained on public cloud leveraging TensorFlow, H2O and Google ML Engine. Model creation is not focus of this example. The final model is ready for production already and can be deployed for doing predictions in real time.

Model serving can be done via a model server or natively embedded into the stream processing application. See the trade-offs of RPC vs. Stream Processing for model deployment and a “TensorFlow + gRPC + Kafka Streams” example here.

Demo: Model Inference at the Edge with MQTT, Kafka and KSQL

The Github project generates car sensor data, forwards it via Confluent MQTT Proxy to Kafka cluster for KSQL processing and real time analytics.

This project focuses on the ingestion of data into Kafka via MQTT and processing of data via KSQL: MQTT_Proxy_Confluent_Cloud

A great benefit of Confluent MQTT Proxy is simplicity for realizing IoT scenarios without the need for a MQTT Broker. You can forward messages directly from the MQTT devices to Kafka via the MQTT Proxy. This reduces efforts and costs significantly. This is a perfect solution if you “just” want to communicate between Kafka and MQTT devices.

If you want to see the other part of the story (integration with sink applications like Elasticsearch / Grafana), please take a look at the Github project “KSQL for streaming IoT data“. This realizes the integration with ElasticSearch and Grafana via Kafka Connect and the Elastic connector.

KSQL UDF – Source Code

It is pretty easy to develop UDFs. Just implement the function in one Java method within a UDF class:

            @Udf(description = "apply analytic model to sensor input")
            public String anomaly(String sensorinput){ "YOUR LOGIC" }

Here is the full source code for the Anomaly Detection KSQL UDF.

How to run the demo with Apache Kafka and MQTT Proxy?

All steps to execute the demo are describe in the Github project.

You just need to install Confluent Platform and then follow these steps to deploy the UDF, create MQTT events and process them via KSQL leveraging the analytic model.

I use Mosquitto to generate MQTT messages. Of course, you can use any other MQTT client, too. That is the great benefit of an open and standardized protocol.

Hybrid Cloud Architecture for Apache Kafka and Machine Learning

If you want to learn more about the concepts behind a scalable, vendor-agnostic Machine Learning infrastructure, take a look at my presentation on Slideshare or watch the recording of the corresponding Confluent webinar “Unleashing Apache Kafka and TensorFlow in the Cloud“.

 

Please share any feedback! Do you like it, or not? Any other thoughts?

The post Deep Learning KSQL UDF for Streaming Anomaly Detection of MQTT IoT Sensor Data appeared first on Kai Waehner.

]]>
Model Serving: Stream Processing vs. RPC / REST with Java, gRPC, Apache Kafka, TensorFlow https://www.kai-waehner.de/blog/2018/07/09/model-serving-java-grpc-tensorflow-apache-kafka-streams-deeplearning-stream-processing-rpc-rest/ Mon, 09 Jul 2018 01:13:45 +0000 http://www.kai-waehner.de/blog/?p=1303 Machine Learning / Deep Learning models can be used in different ways to do predictions. Natively in the application or hosted in a remote model server. Then you combine stream processing with RPC / Request-Response paradigm. This blog post shows examples of stream processing vs. RPC model serving using Java, Apache Kafka, Kafka Streams, gRPC and TensorFlow Serving.

The post Model Serving: Stream Processing vs. RPC / REST with Java, gRPC, Apache Kafka, TensorFlow appeared first on Kai Waehner.

]]>
Machine Learning / Deep Learning models can be used in different ways to do predictions. My preferred way is to deploy an analytic model directly into a stream processing application (like Kafka Streams or KSQL). You could e.g. use the TensorFlow for Java API. This allows best latency and independence of external services. Several examples can be found in my Github project: Model Inference within Kafka Streams Microservices using TensorFlow, H2O.ai, Deeplearning4j (DL4J).

However, direct deployment of models is not always a feasible approach. Sometimes it makes sense or is needed to deploy a model in another serving infrastructure like TensorFlow Serving for TensorFlow models. Model Inference is then done via RPC / Request Response communication.  Organisational or technical reasons might force this approach. Or you might want to leverage the built-in features for managing and versioning different models in the model server.

So you combine stream processing with RPC / Request-Response paradigm. The architecture looks like the following:

Model Serving: Stream Processing vs. Request Response with Java, gRPC, Apache Kafka, TensorFlow

Pros of an external model serving infrastructure like TensorFlow Serving:

  • Simple integration with existing technologies and organizational processes
  • Easier to understand if you come from non-streaming world
  • Later migration to real streaming is also possible
  • Model management built-in for different models and versioning

Cons:

  • Worse latency as remote call instead of local inference
  • No offline inference (devices, edge processing, etc.)
  • Coupling the availability, scalability, and latency / throughput of your Kafka Streams application with the SLAs of the RPC interface
  • Side-effects (e.g. in case of failure) not covered by Kafka processing (e.g. Exactly Once)

Combination of Stream Processing and Model Server using Apache Kafka, Kafka Streams and TensorFlow Serving

I created the Github Java project “TensorFlow Serving + gRPC + Java + Kafka Streams” to demo how to do model inference with Apache Kafka, Kafka Streams and a TensorFlow model deployed using TensorFlow Serving. The concepts are very similar for other ML frameworks and Cloud Providers, e.g. you could also use Google Cloud ML Engine for TensorFlow (which uses TensorFlow Serving under the hood) or Apache MXNet and AWS model server.

Most ML servers for model serving are also extendible to serve other types of models and data, e.g. you could also deploy non-TensorFlow models to TensorFlow Serving. Many ML servers are available as cloud service and for local deployment.

TensorFlow Serving

Let’s discuss TensorFlow Serving quickly. It can be used to host your trained analytic models. Like with most model servers, you can do inference via request-response paradigm. gRPC and REST / HTTP are the two common technologies and concepts used.

The blog post “How to deploy TensorFlow models to production using TF Serving” is a great explanation of how to export and deploy trained TensorFlow models to a TensorFlow Serving infrastructure. You can either deploy your own infrastructure anywhere or leverage a cloud service like Google Cloud ML Engine. A SavedModel is TensorFlow’s recommended format for saving models, and it is the required format for deploying trained TensorFlow models using TensorFlow Serving or deploying on Goodle Cloud ML Engine.

The core architecture is described in detail in TensorFlow Serving’s architecture overview:

nsorFlow Serving's architecture overview

This architecture allows deployement and managend of different models and versions of these models including additional features like A/B testing. In the following demo, we just deploy one single TensorFlow model for Image Recognition (based on the famous Inception neural network).

Demo: Mixing Stream Processing with RPC: TensorFlow Serving + Kafka Streams

Disclaimer: The following is a shortened version of the steps to do. For full example including source code and scripts, please go to my Github project “TensorFlow Serving + gRPC + Java + Kafka Streams“.

Things to do

  1. Install and start a ML Serving Engine
  2. Deploy prebuilt TensorFlow Model
  3. Create Kafka Cluster
  4. Implement Kafka Streams application
  5. Deploy Kafka Streams application (e.g. locally on laptop or to a Kubernetes cluster)
  6. Generate streaming data to test the combination of Kafka Streams and TensorFlow Serving

Step 1: Create a TensorFlow model and export it to ‘SavedModel’ format

I simply added an existing pretrained Image Recognition model built with TensorFlow. You just need to export a model using TensorFlow’s API and then use the exported folder. TensorFlow uses Protobuf to store the model graph and adds variables for the weights of the neural network.

Google ML Engine shows how to create a simple TensorFlow model for predictions of census using the “ML Engine getting started guide“. In a second step, you can build a more advanced example for image recognition using Transfer Learning folling the guide “Image Classification using Flowers dataset“.

You can also combine cloud and local services, e.g. build the analytic model with Google ML Engine and then deploy it locally using TensorFlow Serving as we do.

Step 2: Install and start TensorFlow Serving server + deploy model

Different options are available. Installing TensforFlow Serving on a Mac is still a pain in mid of 2018. apt-get works much easier on Linux operating systems. Unforunately there is nothing like a ‘brew’ command or simple zip file you can use on Mac. Alternatives:

  • You can build the project and compile everything using Bazel build system – which literaly takes forever (on my laptop), i.e. many hours.
  • Install and run TensorFlow Serving via a Docker container . This also requires building the project. In addition, documentation is not very good and outdated.
  • Preferred option for beginners => Use a prebuilt Docker container with TensorFlow Serving. I used an example from Thamme Gowda. Kudos to him for building a project which not just contains the TensorFlow Serving Docker image, but also shows an example of how to do gRPC communication between a Java application and TensorFlow Serving.

If you want to your own model, read the guide “Deploy TensorFlow model to TensorFlow serving“. Or to use a cloud service, e.g. take a look at “Getting Started with Google ML Engine“.

Step 3: Create Kafka Cluster and Kafka topics

Create a local Kafka environment (Apache Kafka broker + Zookeeper). The easiest way is the open source Confluent CLI – which is also part of Confluent Open Source and Confluent Enteprise Platform. Just type “confluent start kafka“.

You can also create a cluster using Kafka as a Service. Best option is Confluent Cloud – Apache Kafka as a Service. You can choose between Confluent Cloud Professional for “playing around” or Confluent Cloud Enterprise on AWS, GCP or Azure for mission-critical deployments including 99.95% SLA and very large scale up to 2 GBbyte/second throughput. The third option is to connect to your existing Kafka cluster on premise or in cloud (note that you need to change the broker URL and port in the Kafka Streams Java code before building the project).

Next create the two Kafka topics for this example (‘ImageInputTopic’ for URLs to the image and ‘ImageOutputTopic’ for the prediction result):

Step 4 Build and deploy Kafka Streams app + send test messages

The Kafka Streams microservice (i.e. Java class) “Kafka Streams TensorFlow Serving gRPC Example” is the Kafka Streams Java client. The microservice uses gRPC and Protobuf for request-response communication with the TensorFlow Serving server to do model inference to predict the contant of the image. Note that the Java client does not need any TensorFlow APIs, but just gRPC interfaces.

This example executes a Java main method, i.e. it starts a local Java process running the Kafka Streams microservice. It waits continuously for new events arriving at ‘ImageInputTopic’ to do a model inference (via gRCP call to TensorFlow Serving) and then sending the prediction to ‘ImageOutputTopic’ – all in real time within milliseconds.

In the same way, you could deploy this Kafka Streams microservice anywhere – including Kubernetes (e.g. on premise OpenShift cluster or Google Kubernetes Engine), Mesosphere, Amazon ECS or even in a Java EE app – and scale it up and down dynamically.

Now send messages, e.g. with kafkacat, and use kafka-console-consumer to consume the predictions.

Once again, if you want to see source code and scripts, then please go to my Github project “TensorFlow Serving + gRPC + Java + Kafka Streams“.

The post Model Serving: Stream Processing vs. RPC / REST with Java, gRPC, Apache Kafka, TensorFlow appeared first on Kai Waehner.

]]>
KSQL Deep Dive – The Open Source Streaming SQL Engine for Apache Kafka https://www.kai-waehner.de/blog/2018/05/15/ksql-deep-dive-open-source-streaming-sql-engine-for-apache-kafka/ Tue, 15 May 2018 06:35:00 +0000 http://www.kai-waehner.de/blog/?p=1289 KSQL is the open source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka. This post shows a deep dive (slides + video recording) including its relation to Kafka Connect and Kafka Streams, concepts, architecture and deployment options.

The post KSQL Deep Dive – The Open Source Streaming SQL Engine for Apache Kafka appeared first on Kai Waehner.

]]>
I had a workshop at Kafka Meetup Tel Aviv in May 2018: “KSQL Deep Dive – The Open Source Streaming SQL Engine for Apache Kafka“.

Here are the agenda, slides and video recording.

KSQL – The Open Source Streaming SQL Engine for Apache Kafka

KSQL is the open-source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka which aims to simplify all this and make stream processing available to everyone. Even though it is simple to use, KSQL is built for mission-critical and scalable production deployments (using Kafka Streams under the hood).
Benefits of using KSQL include No coding required; no additional analytics cluster needed; streams and tables as first-class constructs; access to the rich Kafka ecosystem. This session introduces the concepts and architecture of KSQL. Use cases such as Streaming ETL, Real-Time Stream Monitoring or Anomaly Detection are discussed. A live demo shows how to setup and use KSQL quickly and easily on top of your Kafka ecosystem.

If you want to get started, try out the KSQL quick start guide. It get’s you started in 10min locally on your laptop or alternatively in a Docker environment.

History of Apache Kafka, Confluent, and KSQL

Agenda

  1. Apache Kafka Ecosystem
  2. Kafka Streams as Foundation for KSQL
  3. Motivation for KSQL
  4. KSQL Concepts
  5. Live Demo #1 – Intro to KSQL
  6. KSQL Architecture
  7. Live Demo #2 – Clickstream Analysis
  8. Building a User Defined Function (Example: Machine Learning)
  9. Getting Started

Slides

Video Recording

There was a Youtube live stream. Unfortunately, we had some technical problems. So the audio of the first half is not really good. Sorry for that. I still want to share it. The second half has good sounds quality:

Looking forward to get your feedback. Also please feel free to ask questions in the Confluent Slack community (where you can also get help from the engineers of KSQL) or create Github tickets if you have problems or contributions to this great open source project.

The post KSQL Deep Dive – The Open Source Streaming SQL Engine for Apache Kafka appeared first on Kai Waehner.

]]>
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL https://www.kai-waehner.de/blog/2018/03/13/rethinking-stream-processing-with-apache-kafka-streams-and-ksql/ Tue, 13 Mar 2018 16:02:49 +0000 http://www.kai-waehner.de/blog/?p=1267 I presented at JavaLand 2018 in Brühl recently. A great developer conference with over 1800 attendees. The location…

The post Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL appeared first on Kai Waehner.

]]>
I presented at JavaLand 2018 in Brühl recently. A great developer conference with over 1800 attendees. The location is also awesome! A theme park: Phantasialand. My talk: “New Era of Stream Processing with Apache Kafka’s Streams API and KSQL“. Just want to share the slide deck…

Kai Speaking at JavaLand 2018 about Kafka Streams and KSQL

Abstract

Stream Processing is a concept used to act on real-time streaming data. This session shows and demos how teams in different industries leverage the innovative Streams API from Apache Kafka to build and deploy mission-critical streaming real time application and microservices.

The session discusses important Streaming concepts like local and distributed state management, exactly once semantics, embedding streaming into any application, deployment to any infrastructure. Afterwards, the session explains key advantages of Kafka’s Streams API like distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events so you can recalculate output when your code changes.

A demo shows how to combine any custom code with your streams application – by an example using an analytic model built with any machine learning framework like Apache Spark ML or TensorFlow.

The end of the session introduces KSQL – the open source Streaming SQL Engine for Apache Kafka. Write “simple” SQL streaming queries with the scalability, throughput and fail-over of Kafka Streams under the hood.

Slide Deck

Here we go:

 

The post Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL appeared first on Kai Waehner.

]]>
Video Recording – Apache Kafka as Event-Driven Open Source Streaming Platform (Voxxed Zurich 2018) https://www.kai-waehner.de/blog/2018/03/13/video-recording-apache-kafka-as-event-driven-open-source-streaming-platform-voxxed-zurich-2018/ Tue, 13 Mar 2018 07:25:43 +0000 http://www.kai-waehner.de/blog/?p=1264 I spoke at Voxxed Zurich 2018 about Apache Kafka as Event-Driven Open Source Streaming Platform. The talk includes…

The post Video Recording – Apache Kafka as Event-Driven Open Source Streaming Platform (Voxxed Zurich 2018) appeared first on Kai Waehner.

]]>
I spoke at Voxxed Zurich 2018 about Apache Kafka as Event-Driven Open Source Streaming Platform. The talk includes an intro to Apache Kafka and its open source ecosystem (Kafka Streams, Connect, KSQL, Schema Registry, etc.). Just want to share the video recording of my talk.

Abstract

This session introduces Apache Kafka, an event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The open source Confluent Platform adds further components such as a KSQL, Schema Registry, REST Proxy, Clients for different programming languages and Connectors for different technologies and databases. Live Demos included.

Video Recording

The post Video Recording – Apache Kafka as Event-Driven Open Source Streaming Platform (Voxxed Zurich 2018) appeared first on Kai Waehner.

]]>
Apache Kafka + Kafka Streams + Mesos = Highly Scalable Microservices https://www.kai-waehner.de/blog/2018/01/12/apache-kafka-kafka-streams-mesos-highly-scalable-microservices/ Fri, 12 Jan 2018 16:34:12 +0000 http://www.kai-waehner.de/blog/?p=1227 This blog post discusses how to build a highly scalable, mission-critical microservice infrastructure with Apache Kafka, Kafka Streams, and Apache Mesos respectively in their vendor-supported platforms from Confluent and Mesosphere.

The post Apache Kafka + Kafka Streams + Mesos = Highly Scalable Microservices appeared first on Kai Waehner.

]]>
My latest article about Apache Kafka, Kafka Streams and Apache Mesos was published on Confluent’s blog:

Apache Mesos, Apache Kafka and Kafka Streams for Highly Scalable Microservices

This blog post discusses how to build a highly scalable, mission-critical microservice infrastructure with Apache Kafka, Kafka Streams, and Apache Mesos respectively in their vendor-supported platforms from Confluent and Mesosphere.

https://www.confluent.io/blog/apache-mesos-apache-kafka-kafka-streams-highly-scalable-microservices/

Have fun reading it and let me know if you have any feedback…

The post Apache Kafka + Kafka Streams + Mesos = Highly Scalable Microservices appeared first on Kai Waehner.

]]>
Apache Kafka + Kafka Streams + Mesos / DCOS = Scalable Microservices https://www.kai-waehner.de/blog/2017/10/27/mesos-kafka-streams-scalable-microservices/ Fri, 27 Oct 2017 08:05:16 +0000 http://www.kai-waehner.de/blog/?p=1208 Apache Kafka + Kafka Streams + Apache Mesos = Highly Scalable Microservices. Mission-critical deployments via DC/OS and Confluent on premise or public cloud.

The post Apache Kafka + Kafka Streams + Mesos / DCOS = Scalable Microservices appeared first on Kai Waehner.

]]>
I had a talk at MesosCon 2017 Europe in Prague about building highly scalable, mission-critical microservices with Apache Kafka, Kafka Streams and Apache Mesos / DCOS. I would like to share the slides and a video recording of the live demo.

Abstract

Microservices establish many benefits like agile, flexible development and deployment of business logic. However, a Microservice architecture also creates many new challenges. This includes increased communication between distributed instances, the need for orchestration, new fail-over requirements, and resiliency design patterns.

This session discusses how to build a highly scalable, performant, mission-critical microservice infrastructure with Apache Kafka, Kafka Streams and Apache Mesos respectively DC/OS. Apache Kafka brokers are used as powerful, scalable, distributed message backbone. Kafka’s Streams API allows to embed stream processing directly into any external microservice or business application. Without the need for a dedicated streaming cluster. Apache Mesos can be used as scalable infrastructure for both, the Apache Kafka brokers and external applications using the Kafka Streams API, to leverage the benefits of a cloud native platforms like service discovery, health checks, or fail-over management.

A live demo shows how to develop real time applications for your core business with Kafka messaging brokers and Kafka Streams API. You see how to deploy / manage / scale them on a DC/OS cluster using different deployment options.

Key takeaways

  • Successful microservice architectures require a highly scalable messaging infrastructure combined with a cloud-native platform which manages distributed microservices
  • Apache Kafka offers a highly scalable, mission critical infrastructure for distributed messaging and integration
  • Kafka’s Streams API allows to embed stream processing into any external application or microservice
  • Mesos respectively DC/OS allow management of both, Kafka brokers and external applications using Kafka Streams API, to leverage many built-in benefits like health checks, service discovery or fail-over control of microservices
  • See a live demo which combines the Apache Kafka streaming platform and DC/OS

Architecture: Kafka Brokers + Kafka Streams on Kubernetes and DC/OS

The following picture shows the architecture. You can either run Kafka Brokers and Kafka Streams microservices natively on DC/OS via Marathon or leverage Kubernetes as Docker container orchestration tool (which is also supported my Mesosphere in the meantime).

 

Architecture - Kafka Streams, Kubernetes and Mesos / DCOS

Slides

Here are the slides from my talk:

Live Demo

The following video shows the live demo. It is built on AWS using Mesosphere’s Cloud Formation script to setup a DC/OS cluster in ten minutes.

Here, I deployed both – Kafka brokers and Kafka Streams microservices – directly to DC/OS without leveraging Kubernetes. I expect to see many people continue to deploy Kafka brokers directly on DC/OS. For microservices many teams might move to the following stack: Microservice –> Docker –> Kubernetes –> DC/OS.

Do you also use Apache Mesos respectively DC/OS to run Kafka? Only the brokers or also Kafka clients (producers, consumers, Streams, Connect, KSQL, etc)? Or do you prefer another tool like Kubernetes (maybe on DC/OS)?

 

The post Apache Kafka + Kafka Streams + Mesos / DCOS = Scalable Microservices appeared first on Kai Waehner.

]]>
Deep Learning in Real Time with TensorFlow, H2O.ai and Kafka Streams (Slides from JavaOne 2017) https://www.kai-waehner.de/blog/2017/10/04/kafka-streams-deep-learning-tensorflow-h2o-ai/ Wed, 04 Oct 2017 16:59:04 +0000 http://www.kai-waehner.de/blog/?p=1201 Apache Kafka + Kafka Streams to Produductionize Neural Networks (Deep Learning). Models built with TensorFlow, DeepLearning4J, H2O. Slides from JavaOne 2017.

The post Deep Learning in Real Time with TensorFlow, H2O.ai and Kafka Streams (Slides from JavaOne 2017) appeared first on Kai Waehner.

]]>
Early October… Like every year in October, it is time for JavaOne and Oracle Open World in San Francisco… I am glad to be back at this huge event again. My talk at JavaOne 2017 was all about deployment of analytic models to scalable production systems leveraging Apache Kafka and Kafka Streams. Let’s first look at the abstract. After that I attach the slides and refer to further material around this topic.

Abstract “Deep Learning in Real Time with TensorFlow, H2O.ai and Kafka Streams”

Intelligent real time applications are a game changer in any industry. Deep Learning is one of the hottest buzzwords in this area. New technologies like GPUs combined with elastic cloud infrastructure enable the sophisticated usage of artificial neural networks to add business value in real world scenarios. Tech giants use it e.g. for image recognition and speech translation. This session discusses some real-world scenarios from different industries to explain when and how traditional companies can leverage deep learning in real time applications.

This session shows how to deploy Deep Learning models into real time applications to do predictions on new events. Apache Kafka will be used to inter analytic models in a highly scalable and performant way.

The first part introduces the use cases and concepts behind Deep Learning. It discusses how to build Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Autoencoders leveraging open source frameworks like TensorFlow, DeepLearning4J or H2O.

The second part shows how to deploy the built analytic models to real time applications leveraging Apache Kafka as streaming platform and Apache Kafka’s Streams API to embed the intelligent business logic into any external application or microservice.

Apache Kafka, Kafka Streams and Deep Learning

Key Takeaways for the Audience: Kafka Streams + Deep Learning

Here are the takeaways of this talk:

  • Focus of this talk is to discuss and show how to productionize analytic models built by data scientists – the key challenge in most companies.
  • Deep Learning allows to build different neural networks to solve complex classification and regression scenarios and can add business value in any industry
  • Deep Learning is used to build analytics models using open source frameworks like TensorFlow, DeepLearning4J or H2O.ai.
  • Apache Kafka’s Streams API allows to embed the intelligent business logic into any application or microservice
  • Apache Kafka’s Streams API leverages these Deep Learning Models (without Redeveloping) to act on new events in real time

Slides and Further Material around Apache Kafka and Machine Learning

Here are the slides of my talk:

Some further material around Apache Kafka, Kafka Streams and Machine Learning:

I will post more examples and use cases around Apache Kafka and Machine Learning in the upcoming months… Stay tuned!

The post Deep Learning in Real Time with TensorFlow, H2O.ai and Kafka Streams (Slides from JavaOne 2017) appeared first on Kai Waehner.

]]>