EAI Archives - Kai Waehner

Mainframe Integration, Offloading and Replacement with Apache Kafka

Kai Waehner — Fri, 24 Apr 2020 14:55:10 +0000

Time to get more innovative; even with the mainframe! This blog post covers the steps I have seen in projects where enterprises started offloading data from the mainframe to Apache Kafka with the final goal of replacing the old legacy systems.

“Mainframes are still hard at work, processing over 70 percent of the world’s most important computing transactions every day. Organizations like banks, credit card companies, medical facilities, stock brokerages, and others that can absolutely not afford downtime and errors depend on the mainframe to get the job done. Nearly three-quarters of all Fortune 500 companies still turn to the mainframe to get the critical processing work completed” (BMC).

Cost, monolithic architectures and missing experts are the key challenges for mainframe applications. Mainframes are used in several industries, but I want to start this post by thinking about the current situation at banks in Germany; to understand the motivation why so many enterprises ask for help with mainframe offloading and replacement…

Finance Industry in 2020 in Germany

Germany is where I live, so I get the most news from the local newspapers. But the situation is very similar in other countries across the world…

Here is the current situation in Germany.

Challenges for Traditional Banks

Traditional banks are in trouble. More and more branches are getting closed every year.
Financial numbers are pretty bad. Market capitalization went down significantly (before Corona!). Deutsche Bank and Commerzbank – former flagships of the German finance industry – are in the news every week. Usually for bad news.
Commerzbank had to leave the DAX in 2019 (the blue chip stock market index consisting of the 30 major German companies trading on the Frankfurt Stock Exchange); replaced by Wirecard, a modern global internet technology and financial services provider offering its customers electronic payment transaction services and risk management. UPDATE Q3 2020: Wirecard is now insolvent due to the Wirecard scandal. A really horrible scandal for the German financial market and government.
Traditional banks talk a lot about creating a “cutting edge” blockchain consortium and distributed ledger to improve their processes in the future and to be “innovative”. For example, some banks plan to improve the ‘Know Your Customer’ (KYC) process with a joint distributed ledger to reduce costs. Unfortunately, this is years away from being implemented; and not clear if it makes sense and adds real business value. Blockchain has still not proven its added value in most scenarios.

Neobanks and FinTechs Emerging

Neobanks like Revolut or N26 provide an innovative mobile app and great customer experience, including a great KYC process and experience (without the need for a blockchain). In my personal International bank transactions typically take less than 24 hours. In contrary, the mobile app and website of my private German bank is a real pain in the ass. And it feels like getting worse instead of better. To be fair: Neobanks do not shine with customers service if there are problems (like fraud; they sometimes simply freeze your account and do not provide good support).
International Fintechs are also arriving in Germany to change the game. Paypal is the normal everywhere, already. German initiatives for a “German Paypal alternative” failed. Local retail stores already accept Apple Pay (the local banks are “forced” to support it in the meantime). Even the Chinese service Alipay is supported by more and more shops.

Traditional banks should be concerned. The same is true for other industries, of course. However, as said above, many enterprises still rely heavily on the 50+ year old mainframe while competing with innovative competitors. It feels like these companies relying on mainframes have a harder job to change than let’s say airlines or the retail industry.

Digital Transformation with(out) the Mainframe

I had a meeting with an insurance company a few months ago. The team presented me an internal paper quoting their CEO: “This has to be the last 20M mainframe contract with IBM! In five years, I expect to not renew this.” That’s what I call ‘clear expectations and goals’ for the IT team…

Why does Everybody Want to Get Rid of the Mainframe?

Many large, established companies, especially those in the financial services and insurance space still rely on mainframes for their most critical applications and data. Along with reliability, mainframes come with high operational costs, since they are traditionally charged by MIPS (million instructions per second). Reducing MIPS results in lowering operational expense, sometimes dramatically.

Many of these same companies are currently undergoing architecture modernization including cloud migration, moving from monolithic applications to micro services and embracing open systems.

Modernization with the Mainframe and Modern Technologies

This modernization doesn’t easily embrace the mainframe, but would benefit from being able to access this data. Kafka can be used to keep a more modern data store in real-time sync with the mainframe, while at the same time persisting the event data on the bus to enable microservices, and deliver the data to other systems such as data warehouses and search indexes.

This not only will reduce operational expenses, but will provide a path for architecture modernization and agility that wasn’t available from the mainframe alone.

As final step and the ultimate vision of most enterprises, the mainframe might be replaced by new applications using modern technologies.

Kafka in Financial Services and Insurance Companies

I recently wrote about how payment applications can be improved leveraging Kafka and Machine Learning for real time scoring at scale. Here are a few concrete examples from banking and insurance companies leveraging Apache Kafka and its ecosystem for innovation, flexibility and reduced costs:

Capital One: Becoming truly event driven – offering a service other parts of the bank can use.
ING: Significantly improved customer experience – as a differentiator + Fraud detection and cost savings.
Freeyou: Real time risk and claim management in their auto insurance applications.
Generali: Connecting legacy databases and the modern world with a modern integration architecture based on event streaming.
Nordea: Able to meet strict regulatory requirements around real-time reporting + cost savings.
Paypal: Processing 400+ Billion events per day for user behavioral tracking, merchant monitoring, risk & compliance, fraud detection, and other use cases.
Royal Bank of Canada (RBC): Mainframe off-load, better user experience and fraud detection – brought many parts of the bank together.

The last example brings me back to the topic of this blog post: Companies can save millions of dollars “just” by offloading data from their mainframes to Kafka for further consumption and processing. Mainframe replacement might be the long term goal; but just offloading data is a huge $$$ win.

Domain-Driven Design (DDD) for Your Integration Layer

So why use Kafka for mainframe offloading and replacement? There are hundreds of middleware tools available on the market for mainframe integration and migration.

Well, in addition to being open and scalable (I won’t cover all the characteristics of Kafka in this post), one key characteristic is that Kafka enables decoupling applications better than any other messaging or integration middleware.

Legacy Migration and Cloud Journey

Legacy migration is a journey. Mainframes cannot be replaced in a single project. A big bang will fail. This has to be planned long-term. The implementation happens step by step.

Almost every company on this planet has a cloud strategy in the meantime. The cloud has many benefits, like scalability, elasticity, innovation, and more. Of course, there are trade-offs: Cloud is not always cheaper. New concepts need to be learned. Security is very different (I am NOT saying worse or better, just different). And hybrid will be the normal for most enterprises. Only enterprises younger than 10 years are cloud-only.

Therefore, I use the term ‘cloud’ in the following. But the same phases would exist if you “just” want to move away from mainframes but stay in your own on premises data centers.

On a very high level, mainframe migration could contain three phases:

Phase 1 – Cloud Adoption: Replicate data to the cloud for further analytics and processing. Mainframe offloading is part of this.
Phase 2 – Hybrid Cloud: Bidirectional replication of data in real time between the cloud and application in the data center, including mainframe applications.
Phase 3 – Cloud-First Development: All new applications are build in the cloud. Even the core banking system (or at least parts of it) is running in the cloud (or in a modern infrastructure in the data center).

How do we get there? As I said, a big bang will fail. Let’s take a look at a very common approach I have seen at various customers…

Status Quo: Mainframe Limitations and $$$

The following is the status quo: Applications are consuming data from the mainframe; creating expensive MIPS:

The mainframe is running and core business logic is deployed. It works well (24/7, mission-critical). But it is really tough or even impossible to make changes or add new features. Scalability is often becoming an issue, already. And well, let’s not talk about cost. Mainframes cost millions.

Mainframe Offloading

Mainframe offloading means data is replicated to Kafka for further analytics and processing by other applications. Data continues to be written to the mainframe via the existing legacy applications:

Offloading data is the easy part as you do not have to change the code on the mainframe. Therefore, the first project is often just about reading data.

Writing to the mainframe is much harder because the business logic on the mainframe must be understood to make changes. Therefore, many projects avoid this huge (and sometimes not solvable) challenge by keeping the writes as it is… This is not ideal, of course. You cannot add new features or changes to the existing application.

People are not able or do not want to take the risk to change it. There is no DevOps or CI/CD pipelines and no A/B testing infrastructure behind your mainframe, dear university graduates!

Mainframe Replacement

Everybody wants to replace Mainframes due to its technical limitations and cost. But enterprises cannot simply shut down the mainframe because of all the mission-critical business applications.

Therefore, COBOL, the main programming language for the mainframe is still widely used in applications to do large-scale batch and transaction processing jobs. Read the blog post ‘Brush up your COBOL: Why is a 60 year old language suddenly in demand?‘ for a nice ‘year 2020 introduction to COBOL’.

Enterprises have the following options:

Option 1: Continue to develop actively on the mainframe

Train or hire more (expensive) Cobol developers to extend and maintain your legacy applications. Well, this is where you want to go away from. If you forgot why: Cost, cost, cost. And technical limitations. I find it amazing how mainframes even support “modern technologies” such as Web Services. Mainframes do not stand still. Having said this, even if you use more modern technologies and standards, you are still running on the mainframe with all its drawbacks.

Option 2: Replace the Cobol code with a modern application using a migration and code generation tool

Various vendors provide tools which take COBOL code and automatically migrate it to generated source code in a modern programming language (typically Java). The new source code can be refactored and changed (as Java uses static typing so that any errors can be shown in the IDE and changes can be apply to all related dependencies in the code structure). This is the theory. The more complex your COBOL code is, the harder it gets to migrate it and the more custom coding is required for the migration (which results in the same problem the COBOL developer had on the mainframe: If you don’t understand the code routines, you cannot change it without risking errors, data loss or downtime in the new version of the application).

Option 3: Develop a new future-ready application with modern technologies to provide an open, flexible and scalable architecture

This is the ideal approach. Keep the old legacy mainframe running as long as needed. A migration is not a big bang. Replace use cases and functionality step-by-step. The new applications will have very different feature requirements anyway. Therefore, take a look at the Fintech and Insurtech companies. Starting green field has many advantages. The big drawback is high development and migration costs.

In reality, a mix of these three options can also happen. No matter what works for you, let’s now think how Apache Kafka fits into this story.

Domain-Driven Design for your Integration Layer

Kafka is not just a messaging system. It is an event streaming platform that provides storage capabities. In contrary to MQ systems or Web Service / API based architectures, Kafka really decouples producers and consumers:

With Kafka and its Domain-driven Design, every application / service / microservice / database / “you-name-it” is completely independent and loosely coupled from each other. But scalable, highly available and reliable! The blog post “Apache Kafka, Microservices and Domain-Driven Design (DDD)” goes much deeper into this topic.

Sberbank, the biggest bank in Russia, is a great example for their domain-driven approach: Sberbank built their new core banking system on top of the Apache Kafka ecosystem. This infrastructure is open and scalable, but also ready for mission-critical payment use cases like instant-payment or fraud detection, but also for innovative applications like Chat Bots in customer service or Robo-advising for trading markets.

Why is this decoupling so important? Because the mainframe integration, migration and replacement is a (long!) journey…

Event Streaming Platform and Legacy Middleware

An Event Streaming Platform gives applications independence. No matter if an application uses a new modern technology or legacy, proprietary, monolithic components. Apache Kafka provides the freedom to tap into and manage shared data, no matter if the interface is real time messaging, a synchronous REST / SOAP web service, a file based system, a SQL or NoSQL database, a Data Warehouse, a data lake or anything else:

Apache Kafka as Integration Middleware

The Apache Kafka ecosystem is a highly scalable, reliable infrastructure and allows high throughput in real time. Kafka Connect provides integration with any modern or legacy system, be it Mainframe, IBM MQ, Oracle Database, CSV Files, Hadoop, Spark, Flink, TensorFlow, or anything else. More details here:

Kafka Ecosystem for Security and Data Governance

The Apache Kafka ecosystem provides additional capabilities. This is important as Kafka is not just used as messaging or ingestion layer, but a platform for your most mission-critical use cases. Remember the examples from banking and insurance industry I showed in the beginning of this blog post. These are just a few of many more mission-critical Kafka deployments across the world. Check past Kafka Summit video recordings and slides for many more mission-critical use cases and architectures.

I will not go into detail in this blog post, but the Apache Kafka ecosystem provides features for security and data governance like Schema Registry (+ Schema Evolution), Role Based Access Control (RBAC), Encryption (on message or field level), Audit Logs, Data Flow analytics tools, etc…

Mainframe Offloading and Replacement in the Next 5 Years

With all the theory in mind, let’s now take a look at a practical example. The following is a journey many enterprises walk through these days. Some enterprises are just in the beginning of this journey, while others already saved millions by offloading or even replacing the mainframe.

I will walk you through a realistic approach which takes ~5 years in this example (but this can easily take 10+ years depending on the complexity of your deployments and organization). The importance is quick wins and a successful step-by-step approach; no matter how long it takes.

Year 0: Direct Communication between App and Mainframe

Let’s get started. Here is again the current situation: Applications directly communicate with the mainframe.

Let’s reduce cost and dependencies between the mainframe and other applications.

Year 1: Kafka for Decoupling between Mainframe and App

Offloading data from the mainframe enables existing applications to consume the data from Kafka instead of creating high load and cost for direct mainframe access:

This saves a lot of MIPS (million instructions per second). MIPS is a way to measure the cost of computing on mainframes. Less MIPS, less cost.

After offloading the data from the mainframe, new applications can also consume the mainframe data from Kafka. No additional MIPS cost! This allows building new applications on the mainframe data. Gone is the time where developers cannot build new innovative applications because the MIPS cost was the main blocker to access the mainframe data.

Kafka can store the data as long as it needs to be stored. This might be just 60 minutes for one Kafka Topic for log analytics or 10 years for customer interactions for another Kafka Topic. With Tiered Storage for Kafka, you can even reduce costs and increase elasticity by separating processing from storage with a back-end storage like AWS S3. This discussion is out of scope of this article. Check out “Is Apache Kafka a Database?” for more details.

Change Data Capture (CDC) for Mainframe Offloading to Kafka

The most common option for mainframe offloading is Change Data Capture (CDC): Transaction log-based CDC pushes data changes (insert, update, delete) from the mainframe to Kafka. The advantages:

Real time push updates to Kafka
Eliminate disruptive full loads, i.e. minimize production impact
Reduce MIPS consumption
Integrate with any mainframe technology (DB2 Z/OS, VSAM, IMS/DB, CISC, etc.)
Full support

On the first connection, the CDC tool reads a consistent snapshot of all of the tables that are whitelisted. When that snapshot is complete, the connector continuously streams the changes that were committed to the DB2 database for all whitelisted tables in capture mode. This generates corresponding insert, update and delete events. All of the events for each table are recorded in a separate Kafka topic, where they can be easily consumed by applications and services.

The big disadvantage of CDC is high licensing costs. Therefore, other integration options exist…

Integration Options for Kafka and Mainframe

While CDC is typically the preferred choice, there are more alternatives. Here are the integration options I have seen being discussed in the field for mainframe offloading:

IBM InfoSphere Data Replication (IIDR) CDC solution for Mainframe
3rd Party commercial CDC solution (e.g. Attunity or HLR)
Open-source CDC solution (e.g Debezium – but you still need an IIDC license, this is the same challenge as with Oracle and GoldenGate CDC)
Create interface tables + Kafka Connect + JDBC connector
IBM MQ interface + Kafka Connect’s IBM MQ connector
Confluent REST Proxy and HTTP(S) calls from the mainframe
Kafka Clients on the mainframe

Evaluate the trade-offs and make your decision. Requiring a fully supported solution often eliminates several options quickly In my experience, most people use CDC with IIDR or a 3rd party tool, followed by IBM MQ. Other options are more theoretical in most cases.

Year 2 to 4: New Projects and Applications

Now it is time to build new applications:

This can be agile, lightweight Microservices using any technology. Or this can be external solutions like a data lake or cloud service. Pick what you need. Your are flexible. The heart of your infrastructure allows this. It is open, flexible and scalable. Welcome to a new modern IT world!

Year 5: Mainframe Replacement

At some point, you might wonder: What about the old mainframe applications? Can we finally replace them (or at least some of them)? When the new application based on a modern technology is ready, switch over:

As this is a step-by-step approach, the risk is limited. First of all, the mainframe can keep running. If the new application does not work, switch back to the mainframe (which is still up-to-date as you did not stop inserting the updates into its database in parallel to the new application).

As soon as the new application is battle-tested and proven, the mainframe application can be shut down. The $$$ budgeted for the next mainframe renewal can be used for other innovative projects. Congratulations!

Mainframe Offloading and Replacement is a Journey… Kafka can Help!

Here is the big picture of our mainframe offloading and replacement story:

Hybrid Cloud Infrastructure with Apache Kafka and Mainframe

Remember the three phases I discussed earlier: Phase 1 – Cloud Adoption, Phase 2 – Hybrid Cloud, Phase 3 – Cloud-First Development. I did not tackle this in detail in this blog post. Having said this, Kafka is an ideal candidate for this journey. Therefore, this is a perfect combination for offloading and replacing mainframes in a hybrid and multi-cloud strategy. More details here:

Slides and Video Recording for Kafka and Mainframe Integration

Here are the slides:

And the video walking you through the slides:

What are your experiences with mainframe offloading and replacement? Did you or do you plan to use Apache Kafka and its ecosystem? What is your strategy? Let’s connect on LinkedIn and discuss! Stay informed about new blog posts by subscribing to my newsletter.

The post Mainframe Integration, Offloading and Replacement with Apache Kafka appeared first on Kai Waehner.

Apache Kafka is the New Black at the Edge in Industrial IoT, Logistics and Retailing

Kai Waehner — Wed, 01 Jan 2020 10:45:28 +0000

The following question comes up almost every week in conversations with customers: Can and should I deploy Apache Kafka at the edge? Or should I just deploy Kafka in a “real” data center or public cloud infrastructure? I am glad that people ask because it is a valid question in various industries, including manufacturing, automation industry, aviation, logistics, and retailing. This blog post explains why Apache Kafka is the New Black at the Edge in Internet of Things (IoT) projects. I cover use cases and different architectures. The last section discusses how Kafka as event streaming platform complements other IoT frameworks and products at the edge for data integration and edge processing in real time at scale.

Multiple Kafka Clusters Became the Norm, not an Exception!

Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. A Kafka deployment at the edge can be an independent project. However, in most cases, Kafka at the edge is part of an overall Kafka architecture.

Many reasons exist to create more than just one Kafka cluster in your organization:

Independent Projects
Hybrid integration
Edge computing
Aggregation
Migration
Disaster recovery
Global infrastructure (regional or even cross continent communication)
Cross-company communication

This blog post focuses on the deployment of Apache Kafka at the edge. The relation to all the other kinds of Kafka architectures will be discussed in February 2020 at DevNexus in Atlanta. There, I will explain in detail the “architecture patterns for distributed, hybrid and global Apache Kafka deployments“. I will share the slides and a video recording in another blog post after the conference.

Before we think about running Kafka at the edge, we need to define the term “edge”.

What is “The Edge” or “Edge Computing”?

Wikipedia says “edge computing is a distributed computing paradigm which brings computation and data storage closer to the location where it is needed, to improve response times and save bandwidth”. Other benefits include add cost reduction, flexible architecture and separation of concerns.

Apache Kafka at the Edge

Different options exist to discuss “Kafka for edge computing”:

Only Clients at the Edge: Kafka Clients running at the edge. Kafka Cluster deployed in a Data Center or Public Cloud environment.
Everything at the Edge: Kafka Cluster and Kafka Clients (e.g. sensors in the factory) deployed at the edge.
Edge and Beyond: Kafka Cluster deployed at the edge. Kafka Clients (e.g. smartphones in the region) running close to the edge.

Therefore, the range of “Kafka at the Edge” is gigantic:

Edge in an IIoT shop floor could be a Kafka Client written in C and deployed to a microcontroller in a sensor. Such a sensor typically has just a few kilobyte of memory. It lives for a defined time span before it dies and gets replaced.
Edge in telco business could be a full distributed Kafka cluster running on StarlingX. This is an open source private cloud infrastructure software stack based on Kubernetes for the edge used by the most demanding applications in industrial IoT, telecom, video delivery and other ultra-low latency use cases. The hardware requirements for such an edge deployment are higher than what some teams in a traditional bank or insurance company get after waiting six months for management approvals.
In most cases, edge is somewhere in the middle of the two extreme scenarios described above.

In most scenarios, “Kafka at the edge” means the deployment of a Kafka cluster on site at the edge. The Kafka Clients are either running on site or close to the site (“close” could be several miles away in some cases). Sometimes, the edge site is offline and disconnected from the cloud regularly. This is my definition for this blog post, too. Just make sure to define what you mean with the term “edge” in your conversations with colleagues and customers.

Now let’s think about use cases for running Kafka at the edge.

Use Cases for Kafka at the Edge

Use cases for Kafka deployments at the edge exist in various industries. No matter if “your things” are smartphones, machines in the shop floor, sensors, cars, robots, cash point machines, or anything else.

Let’s take a look at a few examples of what I have seen in 2019 in many different enterprises:

Industrial IoT (IIoT): Edge integration and processing in real time are key for success in modern IoT architectures. Various use cases exist for Industry 4.0, including predictive maintenance, quality insurance, process optimization and cyber security. Building a Digital Twin with Kafka is one of the most frequent scenarios and a perfect fit for many use cases.
Retailing: The digital transformation enables many new, innovative use cases, including customer 360 experiences, cross-selling and collaboration with partner vendors. This is relevant for any kind of retailer. No matter if you think about retail stores like Walmart, coffee shops like Starbucks, or cutting-edge shops like the Amazon Go store.
Logistics: Correlation of data in real time at scale is a game changer for any logistics scenario: Track and trace end-to-end for package delivery, communication between delivery drones (or humans using cars) and the local self-service collection booths, accelerated processing in the logistics center, coordination and planning of car sharing / ride sharing, traffic light management in a smart city, and so on.

Kafka at the Edge – High Level Architecture

No matter which use case you want to implement: The architecture for Apache Kafka at the edge typically looks more or less like the following (on a very high level):

Why and How does Kafka help at the Edge?

The ideas and use cases discussed above are nothing new. But think about your favorite consumer application, retailer or coffee shop: How many enterprises have already rolled out real time applications for context-specific push messages, customer experience or other services at scale in a reliable way?

Honestly, not many. Only some tech companies like Uber and Lyft did a great job. Though, these companies could start on the green field a few years ago. Not really surprisingly, all the tech companies use the event streaming platform Kafka as the heart of their real time infrastructure.

But the situation gets better and better these days. More and more traditional enterprises already rolled out an event streaming platform in their data centers or in the cloud to process data in real time at scale to build innovative applications.

Challenges at the Edge

However, enterprises often have challenges to bring these innovative real time applications into scenarios where data is distributed in local sites, like factories, retail stores, coffee shops, etc.

Challenges include:

Integration with all the hardware, machines, devices at the edge is hard due to bad network and many other limitations
Processing at scale and in real time is mandatory for many use cases. Thus, processing should happen on site, not in a remote data center or cloud (often hundreds of miles away).
Various technologies and protocols have to be integrated at the edge. Often, legacy and proprietary protocols need to communicate with modern big data tools on the other side of the tunnel.
Limited hardware resources and human expertise at the edge. IT experts cannot go on site to every location. Teams cannot spend a lot of money on hardware and operations continuously in every site.

Data has to be stored and processed locally on site in real time at scale. Plus, data needs to be replicated to the data center or cloud to do further processing and analytics with the aggregated data from different sites. In best case, communication is bidirectional so that smaller local sites can be controlled from one single point by sending commands and control events back to the local sites.

The Same Infrastructure and Technology at the Edge and in the Data Center / Cloud

The implementation of edge use cases is a lot of efforts. Rolling out the solution to all the sites is another big challenge. Ideally, enterprises can leverage the same infrastructure, technologies and applications BOTH on site at the edge in factories or stores AND in the big data center or cloud.

This is why Apache Kafka comes into play in more and more edge scenarios: Leverage the same infrastructure, technologies and applications everywhere. Real time stream processing, reliability and flexible scalability are core features of Kafka. This allows large scale scenarios in the cloud and small or medium scale scenarios at the edge.

Let’s take a loot at different options for deploying Kafka at the edge.

Kafka Architectures for Edge Computing

The key question you have to ask yourself: Do I need high availability at the edge?

Edge Computing does not always require high availability. If it does, then you deploy a traditional Kafka cluster. If it does not, then you choose the simple and less costly option with just one single Kafka broker at the edge. A hardware appliance could make this even easier for roll out in tens or hundreds of sites.

The following example shows three edge sites. One Kafka cluster is deployed at each site including additional Kafka components:

Resilient Deployment at the Edge with 3+ Kafka Brokers

Kafka and its ecosystem are build for high availability and zero downtime (even if individual nodes fail). Usually, you deploy a distributed system. Kafka and Zookeeper required at least three nodes. Other components require at least two nodes to guarantee robust operations without data loss:

Check out the Apache Kafka and Confluent Platform Reference Architecture for more details about deployment best practices. These best practices do not change at the edge. Having said this, the load and throughput is often lower at the edge. Therefore, less memory and smaller disks are often sufficient. It all depends on your SLAs, requirements and use cases.

Non-Resilient Deployment at the Edge with One Single Kafka Broker

The demand to deploy a “lightweight Kafka cluster” at the edge and synchronize / replicate data with a bigger central Kafka cluster comes up more and more. Due to hardware limitations or lower SLAs regarding high availability, the deployment of just one single Kafka broker (plus one Zookeeper) at the edge is totally fine. You can even deploy the whole Kafka environment on one single server:

This creates some obvious drawbacks: No replication, downtime in case of failure of the node or network, risk of data loss. However, a single-node Kafka deployment works and still provides many benefits of the Kafka fundamentals:

Decoupling between producers and consumers
Handling of back-pressure
High volume real time processing (even one broker has a lot of power)
Storage on disks
Ability to reprocess data
All Kafka-native components available (Kafka Connect for integration, Kafka Streams or ksqlDB for stream processing, Schema Registry for governance).

TL;DR: A Single-Broker Kafka deployment works well. Just be aware of the drawbacks and doublecheck if this is okay for your SLAs and requirements.

ZooKeeper Removal – A Great Help for Kafka at the Edge

Kafka is a powerful distributed infrastructure. Hence, Kafka is not the easiest piece of infrastructure to operate. A key reason is the dependency to ZooKeeper (the same is true for many other distributed systems like Hadoop or Spark). I won’t go into details about the challenges and problems with ZooKeeper here. There is enough information on the web. TL;DR: ZooKeeper makes Kafka harder to operate and less scalable. Many P1 and P2 support tickets are not about Kafka but about ZooKeeper (not because ZooKeeper is unstable, but because it is hard to operate).

The good news: Kafka will get rid of ZooKeeper in the next few releases. Check out “KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum” for more details. KIP-500 will make Kafka much more lightweight, scalable, elastic and simple to operate. This is relevant BOTH at extreme scale in the cloud AND in small single-broker deployments at the edge.

As most IoT projects do not plan just one year ahead, but being rolled out to different sites over five and more years, remember that Kafka will be much easier and more lightweight without ZooKeeper in ~1 year from now.

Kafka as Gateway between Edge Devices and Cloud

A Kafka gateway is an additional optional component in the architecture. In some configurations you might want the edges devices to communicate with a local gateway on premise.

As example, in an industry plant, you might see multiple machines or production lines as edge devices. They integrate with their own Kafka cluster and send this data to a gateway Kafka cluster. At the gateway Kafka cluster, you can do some analytics directly on premise, and maybe filter the data or transform it, before sending it to a large remote aggregation Kafka cluster:

In the above example, we have two independent factories. Both use non-resilient single-broker Kafka deployments for local processing. A resilient gateway Kafka cluster with three Kafka brokers aggregates and processes the data locally in the factory. Only the important and pre-processed data is forwarded to a remote Kafka cluster; in this case Confluent Cloud. The Kafka cluster in the cloud aggregates the data from different factories to integrate with other business applications or analytics tools.

Kafka at the Edge as OEM or in Hardware Appliances

At the edge, enterprises do not have the capabilities and possibilities like in their data centers or in the public cloud. Installing hardware is a huge challenge. Operations is even harder. A standardized way of installing Kafka components at the edge reduces the efforts and risks.

Tens of hardware vendors are available to build your own OEM or hardware appliance. Or you just pick a ready-to-ship box and install all required software components with some DevOps tools via remote management.

Plenty of options exist to ease installation and operations of a Kafka cluster at the edge. Hivecell is one interesting example which you could use. “Hivecell enables companies to deploy and maintain software at the edge without an army of technicians” is the slogan of this startup. You just ship one or more boxes to the site, connect it to the local WiFi and everything else can be done remotely. The Hivecell boxes be shipped in a pre-configured way. For instance, the hardware could already have installed Kubernetes, the Kafka ecosystem and other business applications. Tooling like Confluent Operator can run on the boxes to ease and automate operations of the Kafka environment at the edge. This way, remote management is typically sufficient.

Communication, Connectivity, Integration, Data Processing

As already shown in the diagrams above, a Kafka environment includes more than just the Kafka broker and mandatory Zookeeper. Communication, connectivity, integration and data processing are important components in a Kafka infrastructure, no matter if in the cloud, on premise or at the edge:

Communication between Kafka Brokers and Kafka Clients:

Edge-Only: Device -> Kafka at the edge -> Device
Edge-to-Remote: Device -> Kafka at the edge -> Replication -> Kafka (Data Center / Cloud) -> Analytics / Real Time Processing
Bidirectional: A combination of 1) and 2) for bidirectional communication between the edge and remote Kafka cluster

Kafka-native Connectivity, Integration, Data Processing:

Kafka-native components leverage Kafka under the hood. This way, you have to manage just one platform for communication, integration and processing of data in real time at scale:

Kafka Connect: MQTT, OPC-UA, FTP, CSV, PLC4X (Legacy and proprietary IIoT protocols and PLCs like Modbus, Siemens S7, Beckhoff, Allen Bradley), etc.
Mirrormaker 2 / Confluent Replicator: Uni- or Bidirectional replication between two Kafka Clusters
Kafka Clients (Producers / Consumers): Java, Python, C++, C, Go, Javascript, …
Data Processing: Stream Processing (stateless streaming ETL or stateful applications) with Kafka Streams or ksqlDB
Proxies: REST Proxy for HTTP(S) communication, MQTT Proxy for MQTT integration
Schema Registry: Governance and schema enforcement.

Make sure to plan the whole infrastructure from the beginning. Especially at the edge where you typically have limited hardware resources… Doublecheck if you really need another database or external processing framework! Maybe the Kafka stack is good enough for your needs?

Kafka is NOT an IoT framework

Kafka can be used in very different scenarios. It is best for event streaming at scale and building a reliable and open infrastructure to integrate IoT at the edge and the rest of the enterprise.

However, Kafka is not the silver bullet for every problem. A specific IoT product might be the better choice for integration and processing of IoT interfaces and shop floors. This depends on the specific requirements, existing ecosystem and already used software. Complexity and cost of solutions need to be evaluated. “Build vs. buy” is always a valid question. Often, the best choice and solution is a mix of building an open, flexible, self-built central streaming infrastructure and buying COTS for specific edge integration and processing scenarios.

Hybrid Architecture – When and Why to use Additional IoT Frameworks and Products?

You should always ask yourself: Is Kafka alone sufficient? If yes, why use an additional framework or product? End-to-End integration gets harder with each additional technology. 24/7 deployments, zero data loss, and real time processing without latency spikes are requirements Kafka works best for.

If Kafka is not sufficient alone, combine it with another IoT framework or solution:

Sometimes, the shop floor connects to an IoT solution. The IoT solution is used as gateway or proxy. This can be a broad, powerful (but also more complex and expensive) solution like Siemens MindSphere. Or the choice is to deploy “just” a specific solution, more lightweight solution to solve one problem. For instance, HiveMQ could be deployed as scalable MQTT cluster to connect to machines and devices. This IoT gateway or proxy connects to Kafka. Kafka is either deployed in the same infrastructure or in another data center or cloud. The Kafka cluster connects to the rest of the enterprise.

In other scenarios, Kafka is used as IoT gateway or proxy to connect to the PLCs or Distributed Control System (DCS) directly. Kafka then connects to an IoT Solution like AWS IoT or Google Cloud’s MQTT Bridge where further processing and analytics happen.

Communication is often bidirectional. No matter what architecture you choose: The data is ingested from the shop floor or other IoT devices, processed and correlated in real time, and finally control events are sent back to the machines. For instance, in predictive analytics, you first train analytic models with tools like TensorFlow in the cloud. Then you deploy the analytic model at the edge for real time predictions.

Do you really need NiFi / MiNiFi or an ESB / ETL Tool?

I just explained why you might combine Kafka with other IoT frameworks or solutions. These are very complementary to the Kafka ecosystem and focus on different use cases like device management, training an analytic model, reporting or building a digital twin.

For example, the major cloud providers provide IoT services for device management, proxies to their own cloud services, and analytics tools. At the edge, Eclipse IoT alone provides various different IoT frameworks. For instance, Eclipse ditto is a great open source framework for building a digital twin.

For integration problems, I think differently!

24/7 Uptime and Zero Data Loss Required?

This is a long discussion, but TL;DR: Most integration projects have critical SLAs. The infrastructure has to run 24/7 without downtime and without data loss. The more middleware components combined, the harder it gets to ensure your SLAs and requirements. If you can run Kafka infrastructure 99.95, for each additional middleware components you combine with it, the end-to-end availability goes down. Additionally, you have to develop, test, operate and pay for two or more middleware components instead of just focusing on one single infrastructure.

SLAs Important? Kafka Eats its Own Dog Food

This is one of the key benefits of Kafka; Kafka eats its own dog food: All components leverage the Kafka protocol and its features like offsets, replication, consumer groups, etc. under the hood:

I see the added value of tools like Apache NiFi or Node-RED: You have a drag&drop UI to build pipelines. This is really nice! If you just have to built a pipeline to send data from the edge to the data lake for reporting and analytics, Nifi et al are great tools – if you can live with the risk of downtime and data loss!

Kafka + NiFi + XYZ = Too Many Distributed Systems to Operate!

If you have to built a scalable, reliable streaming infrastructure for edge computing and hybrid architecture without downtime and without data loss: Trust me, you will have a lot of pain. I have seen many deployments where end-to-end integration combining different middleware tools did not go through the integration tests. The idea looks nice in the beginning, but is not robust enough for mission-critical scenarios.

Think about it again: The more tools you combine with each other, the higher the risk for an outage or data loss! NiFi, as one example, runs its own distributed infrastructure. This means you have to guarantee 24/7 end-to-end uptime from the producers via NiFi and Kafka to the final consumer. Kafka-native tools like Kafka Connect or Kafka Streams use Kafka topics (including all the high availability features of Kafka) under the hood. This means you have to operate just one single infrastructure in 24/7 mode to guarantee end-to-end integration without downtime and without data loss.

My heart aches when I see architecture recommendations where you have pipelines like “Sensor ABC -> NiFi (Ingestion) -> Kafka Topic A -> NiFi (Transformation) -> Kafka Topic B -> NiFi (Load) -> Application XYZ”. Again, this is fine for batch ETL pipelines and I see the added value of a nice UI tool like NiFi. But this is NOT the right way to build a 24/7 infrastructure for real time processing at scale with zero data loss.

I have a lot of material to learn the differences between an event-driven streaming platform like Apache Kafka and middleware like NiFi, Node-RED, Message Queues (MQ), Extract-Transform-Load (ETL) and Enterprise Service Bus (ESB).

Kafka at the Edge to Consolidate the Hybrid IoT Architecture

“Kafka at the edge” is the new black. Many industries deploy Kafka in hybrid architectures. Edge computing allows innovative use cases, increases processing speed, reduces network cost and makes the overall infrastructure more scalable, reliable and robust.

Start small, roll out Kafka at the edge in one site, connect it to the remote Kafka cluster. Then, connect more and more sites step-by-step or build bidirectional use cases.

Edge computing is just a part of the overall architecture. It is okay to deploy just one single Kafka broker in small sites like a coffee shop, retail store or small plant. A “real Kafka cluster” has to be deployed on site to ensure mission-critical use cases. Local processing allows mission-critical processing in real time at scale without the need for remote communication. This reduces costs and increases security.

However, added value often comes from combining the data from different sites to use it for real time decisions. IoT Kafka infrastructures often combine small edge deployments with bigger Kafka deployments in the data center or public cloud. In the meantime, you can even run a single Apache Kafka cluster across multiple datacenters to build regional and global Kafka infrastructures – and connect these to the local edge Kafka clusters.

What are your thoughts about Kafka at the edge and in hybrid architectures? Please let me know. Also, check out the “Infrastructure Checklist for Apache Kafka at the Edge” if you plan to go that direction!

Let’s connect on LinkedIn and discuss your use cases, architectures, and requirements. Also, stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka is the New Black at the Edge in Industrial IoT, Logistics and Retailing appeared first on Kai Waehner.

IoT Live Demo – 100.000 Connected Cars with Kubernetes, Kafka, MQTT, TensorFlow

Kai Waehner — Fri, 08 Nov 2019 13:38:34 +0000

You want to see an Internet of Things (IoT) example at huge scale? Not just 100 or 1000 devices producing data, but a really scalable demo with millions of messages from tens of thousands of devices? This is the right demo for you! we leveraging Kubernetes, Apache Kafka, MQTT and TensorFlow.

The demo shows how you can integrate with tens or hundreds of thousands IoT devices and process the data in real time. The demo use case is predictive maintenance (i.e. anomaly detection) in a connected car infrastructure to predict motor engine failures:

IoT Infrastructure – MQTT and Kafka on Kubernetes

We deploy Kubernetes, Kafka, MQTT and TensorFlow in a scalable, cloud-native infrastructure to integrate and analyse sensor data from 100000 cars in real time. The infrastructure is built with Terraform. We use GCP, but you could do the same on AWS, Azure, Alibaba or on premises.

Data processing and analytics is done in real time at scale with GCP GKE, HiveMQ, Confluent and TensorFlow I/O for streaming machine learning / deep learning and bi-directional communication in a scalable, elastic and reliable infrastructure:

Github Project – 100000 Connected Cars

The project is available on Github. You can set the demo up in ~30min by just installing a few CLI tools and executing two or three shell scripts.

Check out the Github project “Streaming Machine Learning at Scale from 100000 IoT Devices with HiveMQ, Apache Kafka and TensorFlow“.

Please try out the demo. Feedback and PRs are welcome.

20min Live Demo – IoT at Scale on GCP with GKE, Confluent, HiveMQ and TensorFlow IO

Here is the video recording of the live demo:

If your area of interest is Industrial IoT (IIoT), you might also check out the following example. It covers the integration of machines and PLCs like Siemens S7, Modbus or Beckhoff in factories and shop floors:

Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing

The post IoT Live Demo – 100.000 Connected Cars with Kubernetes, Kafka, MQTT, TensorFlow appeared first on Kai Waehner.

Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing

Kai Waehner — Mon, 02 Sep 2019 09:35:03 +0000

Data integration and processing is a huge challenge in Industrial IoT (IIoT, aka Industry 4.0 or Automation Industry) due to monolithic systems and proprietary protocols. Apache Kafka, its ecosystem (Kafka Connect, KSQL) and Apache PLC4X are a great open source choice to implement this IIoT integration end to end in a scalable, reliable and flexible way.

This blog post covers a high level overview about the challenges and a good, flexible architecture to solve the problems. At the end, I share a video recording and the corresponding slide deck. These provide many more details and insights.

Challenges in IIoT / Industry 4.0

Here are some of the key challenges in IIoT / Industry 4.0:

IoT != IIoT: Automation industry does not use MQTT or other standards, but is slow, insecure, not scalable and proprietary.
Product Lifecycles are very long (tens of years), no simple changes or upgrades
IIoT usually uses incompatible protocols, typically proprietary and just built for one specific vendor.
Automation industry uses proprietary and expensive monoliths which are not scalable and not extendible.
Machines and PLCs are insecure by nature with no authentication, no authorization, no encryption.

This is still state of the art in automation industry. This is no surprise with such long product life cycles, but still very concerning.

Evolution of Convergence between IT and Automation Industry

Today, everybody talks about cloud, big data analytics, machine learning and real time processing at scale. The convergence between IT and Automation Industry is coming, as the analyst report from IoT research company IOT Analytics shows:

There is huge demand to build an open, flexible, scalable platform. Many opportunities from business and technical perspective:

Cost reduction
Flexibility
Standards-based
Scalability
Extendibility
Security
Infrastructure-independent

So, how to get from legacy technologies and proprietary IIoT protocols to cloud, big data, machine learning, real time processing? How to build a reliable, scalable and flexible architecture and infrastructure?

Apache Kafka and Apache PLC4X for End-to-End IIoT Integration

I assume you already know it: Apache Kafka is the De-facto Standard for Real-Time Event Streaming. It provides

Open Source (Apache 2.0 License)
Global-scale
Real-time
Persistent Storage
Stream Processing

If you need more details about Apache Kafka, check out the Kafka website, the extensive Confluent documentation or some free video recordings and slides from any Kafka Summit to learn about the technology and use cases.

The only very important thing I want to point out is that Apache Kafka includes Kafka Connect and Kafka Streams:

Kafka Connect enables reliable and scalable integration of Kafka with other systems. Kafka Streams allows to write standard Java apps and microservices to continuously process your data in real-time with a lightweight stream processing API. And finally, KSQL enables Stream Processing using SQL-like Semantics.

Apache PLC4X for PLC Integration (Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, etc.)

Apache PLC4X is less established on the market than Apache Kafka. It also “just covers a niche” (a big one, of course) compared to Kafka, which is used in any industry for many different use cases. However, PLC4X is a very interesting top level Apache project for automation industry.

The Goal is to open up PLC interfaces from IIoT world to the outside world. PCL4X allows vertical integration and to write software independent of PLCs using JDBC-like adapters for various protocols like Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, OPC-UA, Emerson, Profinet, BACnet, Ethernet.

PLC4X provides a Kafka Connect connector. Therefore, you can leverage the benefits of Apache Kafka (high availability, high throughput, high scalability reliability, real time processing) to deploy PLC4X integration pipelines. With this, you can build one single architecture and infrastructure for

legacy IIoT connectivity using PLC4X and Kafka Connect
data processing using Kafka Streams / KSQL
integration with the rest of the enterprise using Kafka Connect and any other sink (database, big data analytics, machine learning, ERP, CRM, cloud services, custom business applications, etc.)

As Kafka decouples the producers from the consumers, you can consume the IIoT machine sensor data from any application – some might be real time, some might be batch, and some might be request-response communication for human interaction on a web or mobile app.

Apache PLC4X vs. OPC-UA

A little bit off-topic: How to choose between Apache PLC4X (open source framework for IIoT) and OPC-UA (open standard for IIoT). In short, both are different things and can also be complementary. Here is a comparison:

OPC-UA

Open standard
All the pros and cons of an open standard (works with different vendors; slow adoption; inflexible, etc.)
Often poorly implemented by the vendors
Requires app server on top of PLC
Every device has to be retrofitted with the ability to speak a new protocol and use a common client to speak with these devices
Often over-engineering for just reading the data
Activating OPC-UA support on existing PLCs greatly increases the load on the PLCs
With licensing cost for every machine

Apache PLC4X

Open source framework (Apache 2.0 license)
Provides unified API by implementing drivers for communicating with most industrial controllers in the protocols they natively understand
No need to modify existing hardware
No increased load on the PLCs
No need to pay for licenses to activate OPC-UA support
Drivers being implemented from the specs or by reverse engineering protocols in order to be fully Apache 2.0 licensed
PLC4X adapter for OPC-UA available -> Both can be used together!

As you see, both have their pros and cons. To me, and this is clearly my subjective opinion, PLC4X provides a great alternatives with high flexibility and low footprint.

Confluent and IoT Platform Solutions

Many IoT Platform Solutions are available on the market. This includes products like Siemens MindSphere or Cisco Kinetic, and cloud services from the major cloud providers like AWS, GCP or Azure. And you have Kafka + PLC4X as you just learned above. Often, this is not a “neither … nor” decision:

You can either use

just Kafka and PLC4X for lightweight and flexible IIoT integration based on a scalable, reliable and open event streaming platform
just a IoT Platform Solution if the pros of such a specific product (dedicated for a specific vendor protocol, nice GUI, etc.) outperform the cons (like high cost, proprietary and inflexible solution)
both together where you use the IoT Platform Solution to integrate with the PLCs and then send the data to Kafka to integrate with the rest of the enterprise (with all the benefits and added value Kafka brings)
both together where you use Kafka and PLC4X for PLC integration and one of the consumers is the IoT Platform Solution (while other consumers can also get the data from Kafka – fully decoupled from the IoT Platform Solution)

All alternatives have their pros and cons. There is no single solution which fits every use case! Therefore, no surprise that most IoT Solution Platforms provide Kafka source and sink connectors.

Apache Kafka and Apache PLC4X – Slides / Video Recording / Github Code Example

If you got curious about more details and insights, please check out my video recording and slide deck.

Slide Deck – Apache Kafka and PLC4X:

Video Recording – Apache Kafka and PLC4X:

Github Code Example – Apache Kafka and PLC4X:

We are also building a nice and simple demo on Github these days:

Kafka-native end-to-end IIoT Data Integration and Processing with Kafka Connect, KSQL and Apache PLC4X

PLC4X gets most exciting if you try it out by yourself and connect to your machines or tools. So, check out the example and adjust it to connect to your infrastructure.

Feedback and Questions?

Please let me know your feedback and questions about Kafka, its ecosystem and PLC4X for IIoT integration. Let’s also connect on LinkedIn to discuss interesting IIoT use cases and technologies in the future.

The post Apache Kafka, KSQL and Apache PLC4X for IIoT Data Integration and Processing appeared first on Kai Waehner.

IoT Integration with Kafka Connect, REST / HTTP, MQTT, OPC-UA – Lightboard Video

Kai Waehner — Fri, 26 Jul 2019 08:06:35 +0000

I just want to share my lightboard video recording. I talk about IoT integration and processing with Apache Kafka using Kafka Connect, Kafka Streams, KSQL, REST / HTTP, MQTT and OPC-UA. Use cases, alternative architectures and different integration options are discussed on whiteboard.

End-to-End IoT Integration from Edge to Confluent Cloud

In this lightboard, Confluent’s Kai Waehner (Technology Evangelist) and Konstantin Karantasis (Software Engineer) discuss use cases leveraging the Apache Kafka open source ecosystem as a streaming platform to process IoT data. The session shows architectural alternatives of how devices like cars, machines or mobile devices connect to Apache Kafka via IoT standards like MQTT or OPC-UA.

Learn how to analyze the IoT data either natively on Apache Kafka with Kafka Streams / KSQL or other tools leveraging Kafka Connect. Kai Waehner will also discuss the benefits of Confluent Cloud and other tools like Confluent Replicator or MQTT Proxy to build bidirectional real time integration from the edge to the cloud.

This presentation may help you:

Understand end-to-end use cases from different industries where you integrate IoT devices with enterprise IT using open source technologies and standards
See how Apache Kafka enables bidirectional end-to-end integration processing from IoT data to various backend applications in the cloud
Compare different architectural alternatives and see their benefits and caveats
Learn about various standards, APIs and tools of integrating and processing IoT data with different open source components of the Apache Kafka ecosystem
Understand the benefits of Confluent Cloud, which provides a highly available and scalable Apache Kafka ecosystem as a managed service

IoT Edge to Confluent Cloud

Here is the video recording:

Please let me know if you have any comments or questions.

Integration between Kafka and Siemens S7, Modbus, Beckhoff ADS, OPC-UA using PLC4X

Stay tuned for a dedicated blog post, slide deck and video recording about a core them in IoT space: Integration with Industrial IoT (IIoT). I will talk about integration between Apache Kafka and technologies / proprietary standards from automation industry. I will discuss how to integrate with proprietary protocols like Siemens S7, Modbus, Beckhoff ADS, OPC-UA, etc. leveraging the Kafka Connect connector for PLC4X. This content is in the works these days and I will probably publish it in August or September 2019.

Please also check out the following related material about IoT Integration:

Processing IoT Data from End to End with MQTT and Apache Kafka (Slides + Video from Kafka Summit Talk)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) – Friends, Enemies or Frenemies? (Slides + Video + Article)
Enabling connected transformation with Apache Kafka and TensorFlow on Google Cloud Platform (Google Blog about the use case “connected car infrastructure”)

The post IoT Integration with Kafka Connect, REST / HTTP, MQTT, OPC-UA – Lightboard Video appeared first on Kai Waehner.

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Kai Waehner — Thu, 07 Mar 2019 15:45:15 +0000

Learn the differences between an event-driven streaming platform like Apache Kafka and middleware like Message Queues (MQ), Extract-Transform-Load (ETL) and Enterprise Service Bus (ESB). Including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.

This blog post shares my slide deck and video recording. I discuss the differences between Apache Kafka as Event Streaming Platform and integration middleware. Learn if they are friends, enemies or frenemies.

Problems of Legacy Middleware

Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations.

Apache Kafka – An Open Source Event Streaming Platform

This session discusses how teams in different industries solve these challenges by building a native event streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime, rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.

A concrete example architecture shows how to build a complete streaming platform leveraging the widely-adopted open source framework Apache Kafka to build a mission-critical, scalable, highly performant streaming platform. Messaging, integration and stream processing are all build on top of the same strong foundation of Kafka; deployed on premise, in the cloud or in hybrid environments. In addition, the open source Confluent projects, based on top of Apache Kafka, adds additional features like a Schema Registry, additional clients for programming languages like Go or C, or many pre-built connectors for various technologies.

Slides: Apache Kafka vs. Integration Middleware

Here is the slide deck:

Video Recording: Kafka vs. MQ / ETL / ESB – Friends, Enemies or Frenemies?

Here is the video recording where I walk you through the above slide deck:

Article: Apache Kafka vs. Enterprise Service Bus (ESB)

I also published a detailed blog post on Confluent blog about this topic in 2018:

Apache Kafka vs. Enterprise Service Bus (ESB)

Talk and Slides from Kafka Summit London 2019

The slides and video recording from Kafka Summit London 2019 (which are similar to above) are also available for free.

Why Apache Kafka instead of Traditional Middleware?

If you don’t want to spend a lot of time on the slides and recording, here is a short summary of the differences of Apache Kafka compared to traditional middleware:

Questions and Discussion…

Please share your thoughts, too!

Does your infrastructure see similar architectures? Do you face similar challenges? Do you like the concepts behind an Event Streaming Platform (aka Apache Kafka)? How do you combine legacy middleware with Kafka? What’s your strategy to integrate the modern and the old (technology) world? Is Kafka part of that architecture?

Please let me know either via a comment or via LinkedIn, Twitter, email, etc. I am curious about other opinions and experiences (and people who disagree with my presentation).

The post Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video appeared first on Kai Waehner.

Apache Kafka + MQTT = End-to-End IoT Integration (Code, Slides, Video)

Kai Waehner — Mon, 10 Sep 2018 12:52:33 +0000

Kafka and MQTT are two complementary technologies. Together they allows to build IoT end-to-end integration from the edge to the data center. No matter if on premise or in the public cloud. I talked about this topic at Kafka Summit SF in San Francisco in October 2018: “Processing IoT Data from End to End with MQTT and Apache Kafka“. The main goal was to discuss different Kafka-native approaches and their trade-offs for integrating Kafka and MQTT.

As preparation for the talk, I created two demo projects on Github. I want to share these here, plus a video recording demoing it. The slide deck from my talk is also shared below. The full video recording will be available for free the website of Kafka Summit.

Motivation for using Apache Kafka and MQTT together

MQTT is a widely used ISO standard (ISO/IEC PRF 20922) publish-subscribe-based messaging protocol. MQTT has many implementations such as Mosquitto or HiveMQ. For a good overview, see Wikipedia’s comparison of MQTT brokers. MQTT is mainly used in Internet of Things scenarios (like connected cars or smart home). But it is also used more and more in mobile devices due to its support for WebSockets.

However, MQTT is not built for high scalability, longer storage or easy integration to legacy systems. Apache Kafka is a highly scalable distributed streaming platform. Kafka ingests, stores, processes and forwards high volumes of data from thousands of IoT devices.

Therefore, MQTT and Apache Kafka are a perfect combination for end-to-end IoT integration from edge to data center (and back, of course, i.e. bi-directional).

Let’s take a look at two different Kafka-native alternatives for building this integration. Both solutions allow highly scalable and mission-critical integration of IoT scenarios leveraging the features and benefits of the Apache Kafka open source ecosystem. Though, they use different concepts (including trade-offs).

MQTT Broker + Apache Kafka + Kafka Connect MQTT Connector

This project focuses on the integration of MQTT sensor data into Kafka via MQTT Broker and Kafka Connect for further processing:

In this approach, you pull the data from the MQTT broker via Kafka Connect to the Kafka broker. You can leverage any features of Kafka Connect such as built-in fault tolerance, load balancing, Converters, Single Message Transformations (SMT) for routing / filtering / etc., scaling different connectors in one Connect worker instance, and other Kafka Connect related benefits.

Here is the Github project including demo script to try it out by yourself:

Apache Kafka + Kafka Connect + MQTT Connector + Sensor Data

MQTT Proxy + Apache Kafka (no MQTT Broker)

As alternative to using Kafka Connect, you can leverage Confluent MQTT Proxy: Integrate IoT data from IoT devices directly without the need for a MQTT Broker:

In this approach, you push the data directly to the Kafka broker via the Confluent MQTT Proxy instead of using the pull approach of Kafka Connect (which uses Kafka Consumers under the hood). You can scale MQTT Proxy easily with a Load Balancer (similar to Confluent REST Proxy). The huge advantage is that you do not need an additional MQTT Broker in the middle. This reduces efforts and costs significantly. It is a better approach for pushing data into Kafka.

Here is the Github project including demo script to try it out by yourself:

Deep Learning UDF for KSQL for Streaming Anomaly Detection of MQTT IoT Sensor Data

Both approaches have their trade-offs. The good thing is that you have the choice. Choose the right tool and architecture for your use case.

Slide Deck from Kafka Summit SF 2018

Here you see the slide deck from my presentation at Kafka Summit 2018 in San Francisco. Focus lies on discussing different Kafka-native approaches and their trade-offs to implement MQTT-Kafka-Integration leveraging Kafka Connect, MQTT Proxy or REST / HTTP.

Video Recording: Kafka MQTT Integration via Kafka Connector / Confluent MQTT Proxy

Here is a 15min live demo showing both options:

Conclusion: MQTT and Apache Kafka are complementary and integrate well

Both, MQTT and Apache Kafka, have great benefits for their own use cases. But none of them is the single allrounder for everything. The combination of both makes them very powerful and a great solution to build IoT end-to-end scenarios from edge to data center and back.

Different approaches exist to integrate MQTT and Apache Kafka end-to-end. I am not just talking about connectivity, but also about data processing, filtering, routing, etc. You should compare Kafka Connect + MQTT Broker vs. MQTT Proxy without MQTT Broker vs. REST / HTTP integration. All have their trade-offs. Though, the key advantage all share is their Kafka-native integration. They leverage the features of Kafka under the hood enabling high throughput, scalability and reliability.

By the way: If you want to see an IoT example with Kafka sink applications like Elasticsearch / Grafana, please take a look at the project “Kafka and KSQL for Streaming IoT Data from MQTT to the Real Time UI“. This example shows how to realize the integration with ElasticSearch and Grafana via Kafka Connect.

The post Apache Kafka + MQTT = End-to-End IoT Integration (Code, Slides, Video) appeared first on Kai Waehner.

Apache Kafka vs. ESB / ETL / MQ

Kai Waehner — Wed, 18 Jul 2018 18:33:56 +0000

Apache Kafka and Enterprise Service Bus (ESB) are complementary, not competitive!

Apache Kafka is much more than messaging in the meantime. It evolved to a streaming platform including Kafka Connect, Kafka Streams, KSQL and many other open source components. Kafka leverages events as a core principle. You think in data flows of events and process the data while it is in motion. Many concepts, such as event sourcing, or design patterns such as Enterprise Integration Patterns (EIPs), are based on event-driven architecture.

Kafka is unique because it combines messaging, storage, and processing of events all in one platform. It does this in a distributed architecture using a distributed commit log and topics divided into multiple partitions.

ETL and ESB have excellent tooling, including graphical mappings for doing complex integration with legacy systems and technologies such as SOAP, EDIFACT, SAP BAPI, COBOL, etc. (Trust me, you don’t want to write the code for this.)

Therefore, existing MQ and ESB solutions, which already integrate with your legacy world, are not competitive to Apache Kafka. Rather, they are complementary!

Read more details about this question in my Confluent blog post:

Apache Kafka® vs. Enterprise Service Bus (EBS) | Confluent

As always, I appreciate any feedback, comments or criticism.

The post Apache Kafka vs. ESB / ETL / MQ appeared first on Kai Waehner.

Video Recording – Apache Kafka as Event-Driven Open Source Streaming Platform (Voxxed Zurich 2018)

Kai Waehner — Tue, 13 Mar 2018 07:25:43 +0000

I spoke at Voxxed Zurich 2018 about Apache Kafka as Event-Driven Open Source Streaming Platform. The talk includes an intro to Apache Kafka and its open source ecosystem (Kafka Streams, Connect, KSQL, Schema Registry, etc.). Just want to share the video recording of my talk.

Abstract

This session introduces Apache Kafka, an event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The open source Confluent Platform adds further components such as a KSQL, Schema Registry, REST Proxy, Clients for different programming languages and Connectors for different technologies and databases. Live Demos included.

Video Recording

The post Video Recording – Apache Kafka as Event-Driven Open Source Streaming Platform (Voxxed Zurich 2018) appeared first on Kai Waehner.

Agile Cloud-to-Cloud Integration with iPaaS, API Management and Blockchain

Kai Waehner — Sun, 23 Apr 2017 18:41:06 +0000

Cloud-to-Cloud integration is part of a hybrid integration architecture. It enables to implement quick and agile integration scenarios without the burden of setting up complex VM- or container-based infrastructures. One key use case for cloud-to-cloud integration is innovation using a fail-fast methodology where you realize new ideas quickly. You typically think in days or weeks, not in months. If an idea fails, you throw it away and start another new idea. If the idea works well, you scale it out and bring it into production to a on premise, cloud or hybrid infrastructure. Finally, you make expose the idea and make it easily available to any interested service consumer in your enterprise, partners or public end users.

A great example where you need agile, fail-fast development is blockchain because it is a very hot topic, but frameworks are immature and change very frequently these days. Note that blockchain is not just about Bitcoin and finance industry. This is just the tip of the iceberg. Blockchain, which is the underlying infrastructure of Bitcoin, will change various industries, beginning with banking, manufacturing, supply chain management, energy, and others.

Middleware and Integration as Key for Success in Blockchain Projects

A key for successful blockchain projects is middleware as it allows integration with the rest of an enterprise architecture. Blockchain only adds value if it works together with your other applications and services. See an example of how to combine streaming analytics with TIBCO StreamBase and Ethereum to correlate blockchain events in real time to act proactively, e.g. for fraud detection.

The drawback of blockchain is its immaturity today. APIs change with every minor release, development tools are buggy and miss many required features for serious development, and new blockchain cloud services, frameworks and programming languages come and go every quarter.

This blog post shows how to leverage cloud integration with iPaaS and API Management to realize innovative projects quickly, fail fast, and adopt changing technologies and business requirements easily. We will use TIBCO Cloud Integration (TCI), TIBCO Mashery and Hyperledger Fabric running as IBM Bluemix cloud service.

IBM Hyperledger Fabric as Bluemix Cloud Service

Hyperledger is a vendor independent open source blockchain project which consists of various components. Many enterprises and software vendors committed to it for building different solutions for various problems and use cases. One example is IBM’s Hyperledger Fabric. The benefit of being a very flexible construction kit makes Hyperledger much more complex for getting started than other blockchain platforms, like Ethereum, which are less flexible but therefore easier to set up and use.

This is the reason why I use the BlueMix BaaS (Blockchain as a Service) to get started quickly without spending days on just setting up my own Hyperledger network. You can start a Hyperledger Fabric blockchain infrastructure with four peers and a membership service with just a few clicks. It takes around two minutes. Hyperledger Fabric is evolving fast. Therefore, a great example for quickly changing features, evolving requirements and (potentially) fast failing projects.

My project uses version Hyperledger Fabric 0.6 as free beta cloud service. I leverage its Swagger-based REST API in my middleware microservice to interconnect other cloud services with the blockchain:

However, when I began the project, it was already clear that the REST interface is deprecated and will not be included in the upcoming 1.0 release anymore. Thus, we are aware that an easy move to another API is needed in a few weeks or months.

Cloud-to-Cloud Integration, iPaaS and API Management for Agile and Fail-Fast Development

As mentioned in the introduction, middleware is key for success in blockchain projects to interconnect it with the existing enterprise architecture. This example leverages the following middleware components:

iPaaS: TIBCO Cloud Integration (TCI) is hosted and managed by TIBCO. It can be used without any setup or installation to quickly build a REST microservice which integrates with Hyperledger Fabric. TCI also allows to configure caching, throttling and security configuration to ensure controlled communication between the blockchain and other applications and services.
API Management: TIBCO Mashery is used to expose the TCI REST microservice for other services consumers. This can be internal, partners or public end users, depending on the use case.
On Premise / Cloud-Native Integration Infrastructure: TIBCO BusinessWorks 6 (BW6) respectively TIBCO BusinessWorks Container Edition (BWCE) can be used to deploy and productionize successful TCI microservices in your existing infrastructure on premise or on a cloud-native platform like CloudFoundry, Kubernetes or any other Docker-based platform. Of course, you can also continue to use the services within TCI itself to run and scale the production services in the public cloud, hosted and managed by TIBCO.

Scenario: TIBCO Cloud Integration + Mashery + Hyperledger Fabric Blockchain + Salesforce

I implemented the following technical scenario with the goal of showing agile cloud-to-cloud integration with fail-fast methodology from a technical perspective (and not to build a great business case): The scenario implements a new REST microservice with TCI via visual coding and configuration. This REST service connects to two cloud services: Salesforce and Hyperledger Fabric. The following picture shows the architecture:

Here are the steps to realize this scenario:

Create a REST service which receives customer ID and customer name as parameters.
Enhance the input data from the REST call with additional data from Salesforce CRM. In this case, we get a block hash which is stored in Salesforce as reference to the blockchain data. The block hash allows to doublecheck if Salesforce has the most up-to-date information about a customer while the blockchain itself ensures that the customer updates from various systems like CRM, ERP, Mainframe are stored and updated in a correct, secure, distributed chain – which can be accessed by all related systems, not just Salesforce.
Use Hyperledger REST API to validate via Salesforce’ block hash that the current information about the customer in Salesforce is up-to-date. If Salesforce has an older block hash, then update Salesforce to up-to-date values (both, the most recent hash block and the current customer data stored on blockchain)
Return the up-to-date customer data to the service consumer of the TCI REST service
Configure caching, throttling and security requirements so that service consumers cannot do unexpected behavior
Leverage API Management to expose the TCI REST service as public API so that external service consumers can subscribe to a payment plan and use it in other applications or microservices.

Implementation: Visual Coding, Web Configuration and Automation via DevOps

This section discusses the relevant steps of the implementation in more detail, including development and deployment of the chain code (Hyperledger’s term for smart contracts in blockchain), implementation of the REST integration service with visual coding in the IDE of TCI, and exposing the service as API via TIBCO Mashery.

Chain Code for Hyperledger Fabric on IBM Bluemix Cloud

Hyperledger Fabric has some detailed “getting started” tutorials. I used a “Hello World” Tutorial and adopted it to our use case so that you can store, update and query customer data on the blockchain network. Hyperledger Fabric uses Go for chain code and will allow other programming languages like Java in future releases. This is both a pro and con. The benefit is that you can leverage your existing expertise in these programming languages. However, you also have to be careful not to use “wrong” features like threads, indefinite loops or other functions which cause unexpected behavior on a blockchain. Personally, I prefer the concept of Ethereum. This platform uses Solidity, a programming language explicitly designed to develop smart contracts (i.e. chain code).

Cloud-to-Cloud Integration via REST Service with TIBCO Cloud Integration

The following steps were necessary to implement the REST microservice:

Create REST service interface with API Modeler web UI leveraging Swagger
Import Swagger interface into TCI’s IDE
Use Salesforce activity to read block hash from Salesforce’ cloud interface
Use Rest Client activity to do an HTTP Request to Hyperledger Fabric’s REST interface
Configure graphical mappings between the activities (REST Request à Salesforce à Hyperledger Fabric à REST Response)
Deploy microservice to TCI’s runtime in the cloud and test it from Swagger UI
Configure caching and throttling in TCI’s web UI

The last point is very important in blockchain projects. A blockchain infrastructure does not have millisecond response latency. Furthermore, every blockchain update costs money – Bitcoin, Ether, or whatever currency you use to “pay for mining” and to reach consensus in your blockchain infrastructure. Therefore, you can leverage caching and throttling in blockchain projects much more than in some other projects (only if it fits into your business scenario, of course).

Here is the implemented REST service in TCI’s graphical development environment:

And the caching configuration in TCI’s web user interface:

Exposing the REST Service to Service Consumers through API Management via TIBCO Mashery

The deployed TCI microservice can either be accessed directly by service consumers or you expose it via API Management, e.g. with TIBCO Mashery. The latter option allows to define rules, payment plans and security configuration in a centralized manner with easy-to-use tooling.

Note that the deployment to TCI and exposing its API to Mashery can also be done in one single step. Both products are loosely coupled, but highly integrated. Also note that this is typically not done manually (like in this technical demonstration), but integrated into a DevOps infrastructure leveraging frameworks like Maven or Jenkins to automate the deployment and testing steps.

Agile Cloud-to-Cloud Integration with Fail-Fast Methodology

Our scenario is now implemented successfully. However, it is already clear that the REST interface is deprecated and not included in the upcoming 1.0 release anymore. In blockchain frameworks, you can expect changes very frequently.

While implementing this example, IBM announced on its big user conference IBM Interconnect in Las Vegas that Hyperledger Fabric 1.0 is now publicly available. Well, it is really available on Bluemix since that day, but if you try to start the service you get the following message: “Due to high demand of our limited beta, we have reached capacity.” A strange error as I thought we are using public cloud infrastructures like Amazon AWS, Microsoft Azure, Google Cloud Platform or IBM Bluemix for exactly this reason to scale out easily… J

Anyway, the consequence is that we still have to work on our 0.6 version and do not know yet when we will be able or have to migrate to version 1.0. The good news is that iPaaS and cloud-to-cloud integration can realize changes quickly. In the case of our TCI REST service, we just need to replace the REST activity calling Hyperledger REST API with a Java activity and leverage Hyperledger’s Java SDK – as soon as it is available. Right now only a Client SDK for Node.js is available – not really an option for “enterprise projects” where you want to leverage the JVM respectively Java platform instead of JavaScript. Side note: This topic of using Java vs. JavaScript in blockchain projects is also well discussed in “Static Type Safety for DApps without JavaScript”.

This blog post focuses just on a small example, and Hyperledger Fabric 1.0 will bring other new features and concept changes with it, of course. Same is true for SaaS cloud offerings such as Salesforce. You cannot control when they change their API and what exactly they change. But you have to adopt it within a relative short timeframe or your service will not work anymore. An iPaaS is the perfect solution for these scenarios as you do not have a lot of complex setup in your private data center or a public cloud platform. You just use it “as a service”, change it, replace logic, or stop it and throw it away to start a new project. The implicit integration into API Management and DevOps support also allow to expose new versions for your service consumers easily and quickly.

Conclusion: Be Innovative by Failing Fast with Cloud Solutions

Blockchain is very early stage. This is true for platforms, tooling, and even the basic concepts and theories (like consensus algorithms or security enforcement). Don’t trust the vendors if they say Blockchain is now 1.0 and ready for prime time. It will still feel more like a 0.5 beta version these days. This is not just true for Hyperledger and its related projects such as IBM’s Hyperledger Fabric, but also for all others, including Ethereum and all the interesting frameworks and startups emerging around it.

Therefore, you need to be flexible and agile in blockchain development today. You need to be able to fail fast. This means set up a project quickly, try out new innovative ideas, and throw them away if they do not work; to start the next one. The same is true for other innovative ideas – not just for blockchain.

Middleware helps integrating new innovative ideas with existing applications and enterprise architectures. It is used to interconnect everything, correlate events in real time, and find insights and patterns in correlated information to create new added value. To support innovative projects, middleware also needs to be flexible, agile and support fail fast methodologies. This post showed how you can leverage iPaaS with TIBCO Cloud Integration and API Management with TIBCO Mashery to build innovative middleware microservices in innovative cloud-to-cloud integration projects.

The post Agile Cloud-to-Cloud Integration with iPaaS, API Management and Blockchain appeared first on Kai Waehner.