ESB Archives - Kai Waehner

How Lufthansa uses Apache Kafka for Middleware and Analytics

Kai Waehner — Sun, 24 Sep 2023 16:30:22 +0000

Aviation and travel are notoriously vulnerable to social, economic, and political events, as well as the ever-changing expectations of consumers. The coronavirus was just a piece of the challenge. This post explores how Lufthansa leverages data streaming powered by Apache Kafka as cloud-native middleware for mission-critical data integration projects and as data fabric for AI/machine learning scenarios such as real-time predictions in fleet management. An interactive conversation with Lufthansa as an on-demand video is added at the end as a highlight if you want to learn more.

Data streaming in the aviation industry

The future business of airlines and airports will be digitally integrated into the ecosystem of partners and suppliers. Companies will provide more personalized customer experiences and be enabled by a new suite of the latest technologies, including automation, robotics, and biometrics.

The entire aviation industry leverages data streaming powered by Apache Kafka already. This includes airlines, airports, global distribution systems (GDS), aircraft manufacturers, travel agencies, etc. Why? Because real-time data beats slow data across almost all use cases.

Learn more in my blog about “Apache Kafka in the Airline, Aviation and Travel Industry” covering companies like Singapore Airlines, Air France, and Amadeus.

This article focuses on data streaming in critical Lufthansa projects. Lufthansa is a major German airline and one of the largest in Europe. It is known for its extensive network of domestic and international flights. Lufthansa offers services ranging from passenger transportation to cargo logistics and is a member of the Star Alliance, one of the world’s largest airline alliances.

Apache Kafka as next-generation middleware replacing ETL, ESB, and iPaaS

Typically, an enterprise service bus (ESB) or other integration solutions like extract-transform-load (ETL) tools have been used trying to decouple systems. However, the sheer number of connectors, as well as the requirement that applications publish and subscribe to the data at the same time, mean that systems are always intertwined. As a result, development projects depend on other systems, and nothing can be truly decoupled.

Many enterprises leverage the ecosystem of Apache Kafka for successful integration of different legacy and modern applications. Data streaming differs but also complements existing integration solutions like ESB or ETL tools. Apache Kafka is unique because it combines the following characteristics into a single middleware platform:

Real-time messaging at any scale
Event store for true decoupling, backpressure handling, and replayability of historical events
Data integration eliminating the need for additional integration tools
Stream processing for stateless and stateful data correlation of real-time and historical data

“Apache Kafka vs. Enterprise Service Bus (ESB) – Friends, Enemies or Frenemies?” explores how data streaming with Kafka complements legacy middleware. If your workloads run mostly in the public cloud, you need to understand the difference between Integration Platform as a Service (iPaaS) and data streaming powered by fully-managed Kafka infrastructure.

Lufthansa uses Apache Kafka as cloud-native middleware for mission-critical integrations

Lufthansa leverages data streaming with Confluent as cloud-native middleware for its strategic integration project KUSCO (Kafka Unified Streaming Cloud Operations).

The team discussed the benefits of using Apache Kafka instead of traditional messaging queues (TIBCO EMS, IBM MQ) for data processing. My two favorite statements:

“Scaling Kafka is really inexpensive”
“Kafka adopted and integrated within 3 months”

Lufthansa’s Kafka architecture does not have any surprises. A key lesson learned from many companies: The real added value is created when you leverage Kafka not just for messaging, but its entire ecosystem, including different clients/proxies, connectors, stream processing, and data governance.

The result at Lufthansa: A better, cheaper, and faster infrastructure for real-time data processing at scale.

Watch the full talk from Marcos Carballeira Rodríguez from Lufthansa Group recorded at the Confluent Streaming Days 2020 to see all the architectures and quotes from Lufthansa. More and more projects are onboarded on the KUSCO platform. Here are a few statistics on the adoption from 2022 to 2023 of the KUSCO project that System Architect Krzysztof Torunski of Lufthansa Group presented:

I see this typical pattern in customers across industries: The first use case is the hardest to get live. Afterward, new business units tap into the data feeds and build their projects. It has never been easier to access data feeds in real-time and with good data quality at any scale. Just build a downstream application (with your favorite programming language, tool, or SaaS) and start innovating.

Apache Kafka for analytics and AI/machine learning

Apache Kafka serves thousands of enterprises as the mission-critical and scalable real-time data fabric for machine learning infrastructures. The evolution of Generative AI (GenAI) with large language models (LLM) like ChatGPT changed how people think about intelligent software and automation. In various blog posts, I explored the relationship between data streaming with the Kafka ecosystem and AI/machine learning.

My latest article shows the enormous opportunities and some early adopters combining Kafka and GenAI beyond the buzz.

Lufthansa uses Apache Kafka with AI/machine learning for real-time predictions

Lufthansa leverages the KUSCO platform to build new analytics use cases with real-time data for critical workloads. In the webinar, we learned about the following two projects from Lufthansa Groups’s Domain Architect Sebastian Weber: anomaly detection for alerts and fleet management for aircraft operations.

Anomaly detection with Apache Kafka and ksqlDB

Data is fed into the streaming platform from various data sources. Lufthansa consolidates and aggregates the data with stream processing before the analytics applications do real-time alerting.

Machine learning and Apache Kafka for real-time fleet management

Lufthansa leverages the streaming platform as data fabric for data ingestion, data processing, and model scoring.

Embedding analytic models into a Kafka application is a standard best practice. While the data lake or lakehouse (that receives data via Kafka) trains the model in batch, many use cases require real-time model scoring and predictions at scale with critical SLAs and low latency. That’s exactly the sweet spot of the Kafka ecosystem.

You can either directly embed a model into the Kafka app or leverage a model server that supporting streaming interfaces. I blogged about the trade-offs and use cases: “Streaming Machine Learning with Kafka-native Model Deployment“.

Interactive conversation with Lufthansa

Here is an on-demand video of my conversation with Lufthansa. We talk about use cases for data streaming in the aviation industry and how Lufthansa leverages Apache Kafka as cloud-native middleware and as the data fabric for analytics and machine learning:

Data streaming as cloud-native middleware and for mission-critical analytics

Lufthansa showed us how you can innovate in the airline industry with a fast time-to-market while still integrating with traditional technologies. The two projects show very different challenges and use cases solved with data streaming powered by the Apache Kafka ecosystem.

The aviation industry is changing rapidly. A good customer experience, valuable loyalty platforms, and competitive pricing (or better hard and soft products) require digitalization of the end-to-end supply chain. This includes topics like Industrial IoT (e.g., predictive maintenance), B2B communication with partners (like GDS, airports, and retailers), and customer 360 (including great mobile apps and omnichannel experiences).

How do you leverage data streaming with Apache Kafka in your projects and enterprise architecture? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post How Lufthansa uses Apache Kafka for Middleware and Analytics appeared first on Kai Waehner.

When to use Apache Camel vs. Apache Kafka?

Kai Waehner — Fri, 28 Jan 2022 06:31:02 +0000

Should I use Apache Camel or Apache Kafka for my next integration project? The question is very valid and comes up regularly. This blog post explores both open-source frameworks and explains the difference between application integration and event streaming. The comparison discusses when to use Kafka or Camel, when to combine them, when not to use them at all. A decision tree shows how you can quickly qualify out one for the other.

The history of application integration and event streaming

My personal history and experience in application integration and event streaming are the following. It shows my background and how I see the integration and data streaming markets.

A discussion that started over a decade ago…

With my background of work in the last decade at Talend, TIBCO, and Confluent, the comparison between Camel and Kafka is very exciting as I have spent a lot of time with both open-source frameworks:

Apache Camel powered Talend ESB. Talend had a visual coding tool to design Camel routes with code generation. Unfortunately, the tool’s primary focus was Talend Data Integration (ETL and batch). The Camel-powered ESB code was integrated, but it was neither perfect nor complete.

TIBCO BusinessWorks competed with Talend ESB while TIBCO StreamBase competed with other stream processing solutions. The Kafka ecosystem came up more and more in conversations with customers.

I posted about “When to use Apache Camel” in 2011 already. In 2012, I did my first talk at an international software conference in the US. The name of the conference? CamelOne! A forum only about Apache Camel. What an exciting time. Claus Ibsen, THE Camel guy, wrote an excellent summary of CamelOne 2012 in Boston.

In my conference summary, I talked about my two talks. One of them covered a comparison between Apache Camel, Spring Integration, and Mulesoft ESB. The presentation has over 35000 views, and the number still goes up today.

… from application integration to event streaming

Over time, the buzzword “big data” came up more and more. I spent some time at Talend and TIBCO to learn new programming concepts such as Map-Reduce and Shuffling, mainly powered by Apache Hadoop and Apache Spark. The big data ecosystem snowballed with tens of frameworks such as Hive, HBase, Pig, and many more.

However, the first people realized that real-time data beats slow data in almost all use cases. The Lambda architecture was invented to separate real-time workloads from batch workloads. Event Streaming was born. Apache Kafka became the de facto standard for data streaming. Like CamelOne a decade ago, Kafka Summit is the one-stop-show for Kafka use cases, architectures, and success stories. Contrary to the small CamelOne, Kafka Summit is a global event with events across the globe, plus online events.

Data in motion with the Kappa architecture replacing Lambda

In 2014, a guy called Jay Kreps (few people knew him) was already questioning the Lambda architecture. Instead, he proposed to provide a single real-time layer to provide data for real-time and batch consumers. The Kappa architecture was born. Today, the Kappa architecture is mainstream, replacing Lambda. Various vendors adopt Kafka in the meantime.

Confluent became the clear leader in the event streaming software category. Confluent Platform is powered by Apache Kafka. The focus is on event streaming. That’s different from most other vendors like Cloudera; they focus on 10-20 frameworks or products and try to combine and integrate them somehow. Today, Confluent Cloud is a complete game-changer providing Apache Kafka and its ecosystem for application integration and stream processing as a serverless cloud offering.

This is where we are today in 2022. Application integration (= Camel) and event streaming (= Kafka) play a critical role in every modern enterprise architecture. Open-source is widely adopted and usually preferred compared to proprietary solutions for various reasons, including avoiding vendor lock-in. That’s true for self-managed and serverless cloud offerings.

Hence, the question arises: Should I use Apache Camel for application integration or Apache Kafka for event streaming? Or both? Or does one solve the other, too? These questions will be answered in the following sections, concluding with a decision tree to help you make the right choice for your project.

Let’s look at the similarities between Camel and Kafka, when to use which framework, when and how to combine them, and when not to use them at all.

Features in Apache Camel AND Apache Kafka

Camel and Kafka have many positive and negative characteristics in common. Hence, it is no surprise that people compare the two frameworks:

Open source under Apache 2.0 license
Vibrant community and adoption in the industry
Mature framework with deployments in enterprises across the globe
Fixing point-to-point spaghetti architectures with a central integration backbone
Open architecture and extensibility with custom functions and connectors
Small and big deployments possible, plus single-node deployments for non-mission-critical use cases
Re-engineered and optimized for cloud-native deployments (container, Kubernetes, cloud)
Connectivity to any technology, API, communication paradigm, and SaaS
Transformation of any data types and formats
Processes transactional and analytical workloads
Domain-specific language (DSL) for message at a time processing, with similar logic such as aggregation, filtering, conditional processing
Relative complex frameworks because of their robust feature set, hence not suitable for solving a minor problem
Not a replacement of a database, data warehouse, or data lake

Beyond the similarities, Kafka and Camel have very different sweet spots built to solve distinct problems. Hence, comparing these two tools is a bit comparison of apples and oranges. Some minor projects might use one or the other to solve the problem, but critical enterprise projects show the differences more quickly.

When to use Apache Camel?

The mission of Camel

Apache Camel is an integration framework. It solves a particular problem: Data integration between different applications, APIs, protocols, and communication paradigms. This concept is often called application integration or enterprise integration. Camel implements the famous Enterprise Integration Patterns (EIP). EIPs are based on messaging principles.

Camel’s strengths

Event-based backbone based on well-known and adopted EIP concepts
Connectivity to almost any API
Integration, processing, and routing of information with an intuitive domain-specific language (DSL) with a focus on integration; providing the ability of composability in a programming context for finer grain control in code for doing conditional logic or transformations/reformatting
Powerful routing capabilities with many built-in EIPs
Many deployment options (standalone, web container, application server, Spring, OSGi, Kubernetes via the Camel K sub-project) – okay, I guess some options are not relevant in this decade anymore
Lightweight alternative to proprietary ETL and ESB tools

Camel’s weaknesses

Only a “routing machine”, i.e., not built for long-term storage (additional cache or storage needed), for that reason, Camel is not the right choice for a central nervous system like Kafka
No stream processing (like you know it from Kafka Streams or Apache Flink)
Limited scalability, not built for massive volumes of data
No powerful visual coding like you know it from proprietary ETL/ESB/iPaaS tools
No serverless cloud offering, with that also not competing with other iPaaS offerings
Red Hat is the only vendor supporting it
Built to be deployed in a single data center or cloud region, not across hybrid or multi-cloud scenarios

The evolution of Apache Camel

Camel is widely adopted and has a strong community. Unfortunately, from a vendor and support perspective, the offerings declined in the last few years. One of the most significant pain points: I still don’t see a serverless cloud offering anywhere today:

Camel TL;DR

Camel is an application integration framework to connect different applications and interfaces. Camel is NOT built for processing data in motion continuously, i.e., stream processing. Hence, it should be compared to ETL and ESB tools, not data streaming technologies like Kafka, Kinesis, or Flink. If you look for a serverless cloud offering, you are out of luck. If you look for vendor support, Red Hat is the only option.

When to use Apache Kafka?

The mission of Kafka

Real-time data beats slow data at any scale. The event streaming platform enables processing data in motion. Kafka is the de facto standard for event streaming, including messaging, data integration, stream processing, and storage. Kafka provides all capabilities in one infrastructure at scale. It is reliable and allows to process analytics and transactional workloads.

Kafka’s strengths

Event-based streaming platform
A unique combination of pub/sub messaging, data processing, data integration, and storage in a single framework
Built for massive volumes of data and extreme scale from the beginning, with that a single framework can be used for transactional (low volume) and analytics (high volume) use cases
True decoupling between producers and consumers because of its storage component makes it the de facto standard for microservice architectures
Guaranteed ordering of events in the distributed commit log
Distributed data processing with fault-tolerance and recoverability built-in
Replayability of events
The de facto standard for event streaming
Built with hybrid and multi-cloud data replication in mind (with included tools like MirrorMaker and separate, more advanced, and more straightforward tools like Confluent Cluster Linking)
Support from many vendors, including Confluent, Cloudera, IBM, Red Hat, Amazon, Microsoft, and many more
Paradigm shift: Built to process data in motion end-to-end from source to one or more sinks

Kafka’s weaknesses

Paradigm shift: Enterprises need to learn and understand the added value of event streaming, a new software category that enables new use cases but also requires different design patterns and operations approaches
No powerful visual coding like you know it from proprietary ETL/ESB/iPaaS tools
Limited out-of-the-box routing capabilities (Kafka Connect SMT or Kafka Streams / ksqlDB app do the job very well, but not as simple as Camel)
Complex operations (if you run it by yourself instead of using 3rd party tools or even better a serverless cloud offering)

The evolution of Apache Kafka

Kafka was built at LinkedIn to process high volumes of data, as no other open-source framework could do this. Kafka found quick adoption after LinkedIn open-sourced it. Several vendors adopted Kafka and added it to their product portfolio. Some vendors just added Kafka for the sake of having it. Others innovated and used additional tools to make Kafka cloud-native for the next generation of event streaming. Kafka as a serverless cloud offering is a critical piece of many modern enterprise architectures today:

Kafka TL;DR

Kafka is an event streaming platform to process data in motion continuously. If you “just” need an integration framework to route data from a source to one or more sinks (= ETL / ESB), then Camel can be used, too. However, Kafka kills two birds with one stone (= integrating data AND processing it in motion where needed).

Plenty of Kafka offerings are available on the market. Check out the Apache Kafka landscape and comparison to understand the differences between offerings from Confluent, Cloudera, IBM, Red Hat, Amazon, Microsoft, and others.

Decision tree – Camel or Kafka?

The above sections explored when to use Camel and Kafka. So far, so good. Nevertheless, both frameworks overlap with their capabilities. Let’s get some help to choose the right one in that case.

Qualify out – the easiest way to start an evaluation!

The easiest way to decide on a specific option is to qualify out other frameworks that cannot fulfill the requirements.

Therefore, do you need

Big data processing?
A storage component for true decoupling and replayability of events?
Stateless or stateful stream processing?
A serverless cloud offering?

The above section discussed these differentiators of Kafka. In all these cases, you can qualify out Camel. It does not fulfill these requirements. These requirements are not necessarily a complete list. And you might also find a few aspects to qualify out Kafka from the beginning. Hence, you could also start from the Camel perspective and ask yourself: When should I not use Kafka. But I think it is easier the other way round.

Qualifying out solutions because of their limitations makes the decision tree and evaluation process much easier from the beginning.

Decision Tree for Camel and Kafka

Here is my decision tree to find out if Camel or Kafka is the right choice and what vendors you could evaluate:

When to use Camel and Kafka together?

It is possible to use Camel and Kafka together in a single integration architecture. Should you do that? Two options exist. One makes more sense than the other:

Kafka for event streaming and Camel for ETL

Camel and Kafka integrate well with each other. The native Kafka component of Camel is the best native integration point as a bridge between both environments:

The above architecture shows how Camel and Kafka live next to each other. Camel is used in a business domain for application integration. Kafka is the central nervous system between the Camel integration application and many other applications. I also added Kong as API Gateway to clarify that Camel or Kafka is not a silver bullet to solve every problem.

Once again, the vast advantage of Kafka as central integration layer is its unique combination of characteristics within a single infrastructure, including:

Real-time messaging at any scale
Storage for true decoupling between different applications and communication paradigms
Built-in backpressure handling and replayability of events
Data integration
Stream processing

Real-time data replication across hybrid and multi-cloud is not shown in the above picture but is also part of the enterprise architecture out-of-the-box leveraging take Kafka protocol.

With true decoupling within modern microservice architecture, each business team can decide whether they need application integration (using Camel) or event streaming (using Kafka). Often, both could be used. Additional questions around single vs. multi frameworks and APIs, vendor support, scalability needs, and other characteristics need to be evaluated to make the right choice for your business problem.

Camel connectors embedded into Kafka Connect

There is another way to combine Kafka and Camel: The “Camel Kafka Connector” sub-project of Apache Camel. Don’t get confused. This feature is not the Kafka component (= connector) of Camel! Instead, it is a relatively new initiative to deploy camel components into the Kafka Connect infrastructure.

The obvious benefit: This way, you get hundreds of new connectors “for free” within the Kafka ecosystem. This capability sounds excellent. And it is!

However, consider the total cost of ownership and the overall efforts using this approach. Application integration is one of the most challenging problems in computer science – especially if you talk about transactional data sets that require zero data loss, exactly-once semantics, and no downtime. The more components you combine in the end-to-end data flow, the harder it gets to keep your performance and reliability SLAs.

Hence, using Camel components within Kafka Connect has a considerable disadvantage: Combining two frameworks with complexities and different design concepts. Just a few examples:

Kafka world: Partitions, Offsets, Leader and Follower, Key/Value/Header, connectors (based on Kafka Connect), Bootstrap Server, ConsumerRecord, Retention Time, etc.
Camel world: Routes, RouteBuilder CamelContext, Exchange, Processor, components (Camel connectors), Endpoints, Type Converters, Registry, etc.

Please think twice before mixing two integration tools that are powerful but complex on their own. Getting this running is just one piece of the puzzle (the simple part). Don’t forget end-to-end testing, resiliency, SLAs, support across technologies and APIs. Even buying support for Camel and Kafka from Red Hat (i.e., a single vendor) does not improve this approach.

It is likely better to take the business logic and API calls out of the Camel component and copy it into a Kafka Connect connector template to run the integration natively with only Kafka code. This workaround allows a clean architecture, end-to-end integration with a single framework, a single vendor behind it, and much easier testing / debugging / monitoring.

TL;DR: I recommend only using the “Camel Kafka Connector” sub-project if the following options do not work:

Use only Apache Camel for application integration
Leverage Apache Kafka for event streaming and application integration
Choose separate deployments of Camel and Kafka and use the Camel-Kafka-Bridge

When NOT to use Camel or Kafka at all?

Once again, the easiest way for your evaluation to start is qualifying out tools that do not work to solve the problem.

Both Camel and Kafka are NOT built for the following scenarios:

A proxy for millions of clients (like mobile apps) – but native proxies (like a REST or MQTT Proxy for Kafka) exist for some use cases.
An API Management platform – but these tools are usually complementary and used to create life cycle management or monetize APIs deployed with Camel or Kafka.
A database for complex queries and batch analytics workloads
an IoT platform with features such as device management – but direct native integration with (some) IoT protocols such as MQTT or OPC-UA is possible and the approach for (some) use cases.
A technology for hard real-time applications such as safety-critical or deterministic systems – but that’s true for any other IT framework, too. Embedded systems are a different software than Camel or Kafka!

I wrote a very detailed post about this topic from a Kafka perspective. It maps almost 1:1 to the Camel world, too (and any related technology such as Flink, Spark, Pulsar, etc.): “When NOT to use Apache Kafka?”

Apache Camel vs. Apache Kafka – Who is the winner?

Simple answer: Both!

When you compare apples and oranges, you might become happy when you are hungry as both are good to eat. The same is true for Camel and Kafka. Both can do application integration. But they serve very different needs.

Many integration scenarios can use Camel or Kafka.

Camel is the right tool if you need to integrate data within an application context or business unit (with no need for stream processing, true decoupling, replayability, large scale, replication across data centers or cloud regions).

Kafka is the central event-based nervous system across business units, regions, and hybrid clouds. Kafka is all about event streaming. Application integration is just a piece of this puzzle. On the other side, I have seen plenty of integration projects powered by Apache Kafka. It is often replacing other middleware. That’s true for ETL/ESB legacy modernization and in discussions about using a cloud-native iPaaS.

Do you use Camel or Kafka today? What use cases? How do you decide which one to choose? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post When to use Apache Camel vs. Apache Kafka? appeared first on Kai Waehner.

Is Apache Kafka an iPaaS or is Event Streaming its own Software Category?

Kai Waehner — Wed, 03 Nov 2021 06:57:54 +0000

Enterprise integration is more challenging than ever before. The IT evolution requires the integration of more and more technologies. Applications are deployed across the edge, hybrid, and multi-cloud architectures. Traditional middleware such as MQ, ETL, ESB does not scale well enough or only processes data in batch instead of real-time. This post explores why Apache Kafka is the new black for integration projects, how Kafka fits into the discussion around cloud-native iPaaS solutions, and why event streaming is a new software category. A concrete real-world example shows the difference between event streaming and traditional integration platforms respectively iPaaS.

What is iPaaS (Enterprise Integration Platform as a Service)?

iPaaS (Enterprise Integration Platform as a Service) is a term coined by Gartner. Here is the official Gartner definition: “Integration Platform as a Service (iPaaS) is a suite of cloud services enabling development, execution, and governance of integration flows connecting any combination of on-premises and cloud-based processes, services, applications, and data within individual or across multiple organizations.” The acronym eiPaaS (Enterprise Integration Platform as a Service)” is used in some reports as a replacement for iPaaS.

The Gartner Magic Quadrant for iPaaS shows various vendors:

Three points stand out for me:

Many very different vendors provide a broad spectrum of integration solutions.
Many vendors (have to) list various products to provide an iPaaS; this means different technologies, codebases, support teams, etc.
No Kafka offering (like Confluent, Cloudera, Amazon MSK) is in the magic quadrant.

The last bullet point makes me wonder if Kafka-based solutions should be considered iPaaS or not?

Is Apache Kafka an iPaaS?

I don’t know. It depends on the definition of the term “iPaaS”. Yes, Kafka solutions fit into the iPaaS, but it is just a piece of the event streaming success story.

Kafka is an event streaming platform. Use cases differ from traditional middleware like MQ, ETL, ESB, or iPaaS. Check out real-world Kafka deployments across industries if you don’t know the use cases yet.

Kafka does not directly compete with ETL tools like Talend or Informatica, MQ frameworks like IBM MQ or RabbitMQ, API Management platforms like MuleSoft, and cloud-based iPaaS like Boomi or TIBCO. At least not if people understand the differences between Kafka and traditional middleware. For that reason, many people (including me) think that Event Streaming should be its Magic Quadrant.

Having said this, all these very different vendors are in the iPaaS Magic Quadrant. So, should Kafka respectively its vendors be in here? I think so because I have seen hundreds of customers leverage the Kafka ecosystem as a cloud-native, scalable, event-driven integration platform, often in hybrid and multi-cloud architectures. And that’s an iPaaS.

What’s different with Kafka as an Integration Platform?

If you are new to this discussion, check out the article “Apache Kafka vs. MQ, ETL, ESB” or the related slides and video. Here is my summary on why Kafka is unique for integration scenarios and therefore adopted everywhere:

A unique selling point for event streaming is the ability to leverage a single platform. In contrast, other iPaaS solutions require different products (including codebases, support teams, integration between the additional tech, etc.).

Kafka as Cloud-native and Serverless iPaaS

Fundamental differences exist between modern iPaaS solutions and traditional legacy middleware; this includes the software architecture, the scalability and operations of the platform, and data processing capabilities. On a high level, an “Kafka iPaaS” requires the following characteristics:

Cloud-native Infrastructure: Elasticity is vital for success in the modern IT world. The ability to scale up and down (= expanding and shrinking Kafka clusters) is mandatory. This capability enables starting with a small pilot project and scaling up or handling load spikes (like Christmas business in retail).
Automated Operations: Truly serverless SaaS should always be the preferred option if the software runs in a public cloud. Orchestration tools like Kubernetes and related tools (like a Kafka operator for Kubernetes) are the next best option in a self-managed data center or at the edge outside a data center.
Complete Platform: An integration platform requires real-time messaging and storage for backpressure handling and long-running processes. Data integration and continuous data processing are mandatory, too. Hence, an “Kafka iPaaS” is only possible if you have access to various pre-built Kafka-native connectors to open standards, legacy systems, and modern SaaS interfaces. Otherwise, Kafka needs to be combined with another middleware like an iPaaS or an ETL tool like Apache Nifi.
Single Solution: It sounds trivial, but most other middleware solutions use several codebases and products under the hood. Just look at stacks from traditional players such as IBM and Oracle or open-source-driven Cloudera. The complex software stack makes it much harder to provide end-to-end integration, 24/7 operations, and mission-critical support. Don’t get me wrong: Kafka-native solutions like Confluent Cloud also include different products with additional pricing (like a fully-managed connector or data governance add-ons), but all run on a single Kafka-native platform.

From this perspective, some Kafka solutions are modern, cloud-native, scalable, iPaaS. Having said this, would I be happy if you consider some Kafka solutions as an iPaaS on your technology radar? No, not really!

Event Streaming is its Software Category!

While some Kafka solutions can be used as iPaaS, this is only one of many usage scenarios for event streaming. However, as explained above, Kafka-based solutions differ greatly from other iPaaS solutions in the Gartner Magic Quadrant. Hence, event streaming deserves its software category.

If you still wonder what I mean, check out event streaming use cases across industries to understand the difference between Kafka and traditional iPaaS, MQ, ETL, ESB, API tools. Here is a relatively old but still fantastic diagram that summarizes the broad spectrum of use cases for event streaming:

TL;DR: Kafka provides capabilities for various use cases, not just integration scenarios. Many new innovative business applications are built with Kafka. It is not just an integration platform but a unique suite of capabilities for end-to-end data processing in real-time at scale.

New concepts like Data Mesh also prove the evolution. The basic principles are not unique: Domain-driven design, microservices, true decoupling of services, but now with much more focus on data as a product. The latter means it is turning from a cost center into a profit center and innovative new services. Event streaming is a critical component of a data mesh-powered enterprise architecture as real-time almost always beats slow data across use cases.

The Non-Existing Event Streaming Gartner Magic Quadrant or Forrester Wave

Unfortunately, a Gartner Magic Quadrant or Forrester Wave for Event Streaming does NOT exist today. While some event streaming solutions fit into some of these reports (like the Gartner iPaaS Magic Quadrant or the Forrester Wave for Streaming Analytics), it is still an apple to orange comparison.

Event Streaming is its category. Many software vendors built their entire business around this category. Confluent is the leader in this space – note that I am biased as a Confluent employee, but I guess there is no question around this statement Many other companies emerge around Kafka, or in a related way using the Kafka protocol, or competitive event streaming offerings such as Amazon Kinesis or Apache Pulsar.

The following Event Streaming Landscape 2021 summarizes the current status:

I hope to see a Gartner Magic Quadrant for Event Streaming and a Forrester Wave for Event Streaming soon, too.

Open Source vs. Partially Managed vs. Fully-Managed Event Streaming

One more aspect to point out: You might have noticed that I said, “some event streaming solutions can be considered an iPaaS”. The word “some” is a crucial detail. Just providing an open-source framework is not enough.

iPaaS requires a complete offering, ideally as fully-managed services. Many vendors for event streaming use Kafka, Pulsar, or another framework but do NOT provide a complete offering with operations tooling, commercial 24/7 support, user interfaces, etc. The following resources should help you learn more about the event streaming landscape in 2021:

TL;DR: Evaluate the various offerings. A lot of capabilities are just marketing! Many “fully-managed services” are only partially managed instead of serverless and with very limited SLAs and support. Some other offerings provide plenty of cool features but are more an alpha version and overselling than a mature battle-service solution. A counterexample is the complexity in T-Mobile’s report about upgrading Amazon MSK. This shows the difference between “promoting and selling a fully managed service” and the “not at all fully-managed reality”. A truly fully-managed offering does NOT require the end user to upgrade the infrastructure.

Kafka as Event Streaming iPaaS at Deutsche Bahn (German Railway)

Let’s now look at a practicable example to understand why a traditional iPaaS cannot help in use cases that require event streaming and why this combination of capabilities in a single technology sets a new software category.

This section explores a real-world use case with the journey of Deutsche Bahn (German Railway) providing a great customer experience to their customers. This example uses Event Streaming as iPaaS (regarding my definition of these terms).

Use Case: Improve Customer Experience with Real-time Notifications

The use case sounds very simple: Improve the customer experience by providing real-time information and notifications to customers across various channels like websites, smartphones, and displays at train stations.

Delays and cancellations happen in a complex rail network like in Germany. Frequent travelers accept this downside. Nobody can do anything against lousy weather, self-murder using a traveling train, and technical defects.

However, customers at least expect real-time information and notifications so that they can wait in a coffee shop or lounge instead of freezing at the station platform for minutes or even hours. The reality at Deutsche Bahn was a different sequence of notifications: 10min delay, 20min delay, 30min delay, train canceled – take the next train.

The goal of the project Reisendeninformation (= traveler information system) was to improve the day-to-day experience of millions of travelers across Germany by delivering up-to-date, accurate, and consistent travel information in any location.

Initial Project: A mess of different Integration Technologies

Again, the use case sounds simple. Let’s send real-time notifications to customers if a train is delayed or canceled. Every magic black iPaaS box can do this:

Is this true or just the marketing of all the integration vendors? Can each black box integrate all the different systems to correlate events in real-time?

Deutsche Bahn started with an open-source messaging platform to send real-time notifications. Unfortunately, the team quickly found out that not every piece of information was coming in in real-time. So, a caching and storage system was added to the project to handle the late-arriving information from some legacy batch or file-based systems. Now, the data from the legacy systems needed to be integrated. Hence, an integration framework was installed to connect to files, databases, and other applications. Now, the data needed to be processed, correlating real-time and non-real-time data from various systems. A stream processing engine can do this.

The pilot project included several different frameworks. A few skeptical questions came up:

How to scale this combination of very different technologies?
How to get end-to-end support across various platforms?
Is this cost-efficient?
What is the time-to-market for new features?

Deutsche Bahn re-evaluated their tech stack and found out that Apache Kafka provides all the required capabilities out-of-the-box within one platform.

The Migration to Cloud-native Kafka

The team at Deutsche Bahn re-architected their pilot project. Here is the new solution leveraging Kafka as the single point of truth between various systems, technologies, and communication paradigms:

A traditional iPaaS can implement this scenario. But with several codebases, technologies, and clusters, even if you select one software vendor! Some iPaaS might even do well in the beginning but struggle to scale up. Only event streaming allows to start small but scales up with no need to re-architect the infrastructure.

Today, the project is in production. Check your DB Navigator mobile app to get real-time updates about all trains in Germany. Living in Germany, I appreciate this new service to have a much better traveling experience.

Learn more about the project from the Deutsche Bahn team. They gave several public talks at different conferences and wrote on the Confluent Blog about their Kafka journey. Though, the journey did not end here As described in their blog post, Deutsche Bahn is now evaluating the migration from a self-managed Confluent Platform deployment in the public cloud to the fully-managed, truly serverless Confluent Cloud offering to reduce TCO and improve time-to-market.

Complete Platform: Kafka is more than just Messaging

A project like the one described above is only possible with a complete platform. Many people still think about Kafka as an ingestion layer into a data lake or data warehouse, as this was one of the first prominent Kafka use cases. Data ingestion is still an excellent use case today. Many projects already use more than just the core of Kafka to implement this. Kafka Connect provides out-of-the-box connectivity between Kafka and the data store. If you are in the public cloud, you even get integrated in a fully-managed, serverless manner whether you need to integrate with a 1st party cloud service like Amazon S3, Google Cloud BigQuery, Azure Cosmos DB, or other 3rd SaaS like MongoDB Atlas, Snowflake, or Databricks.

Continous Kafka-native stream processing is the next level of a complete platform. For instance, Deutsche Bahn leverages Kafka Streams a lot for their data correlations in real-time at scale. Other companies use ksqlDB as a fully-managed function in Confluent Cloud. The enormous benefit is that you don’t need yet another platform or service for streaming analytics. A complete platform makes the architecture more cost-efficient, and end-to-end integration is easier from SLA and support perspective.

A complete platform requires many additional services “on top”, like visualization of data flows, data governance, role-based access control, audit logs, self-service capabilities, user interfaces, visual coding/code generation tools, etc. Visual coding is the point where traditional middleware and iPaaS tools are stronger today than event streaming offerings.

3rd Party integration via Open API and non-Kafka Tools

So far, you learned why event streaming is its software category and how Deutsche Bahn is an excellent example to show this. However, event streaming is NOT the silver bullet for every problem! When exploring if Kafka and MQ/ETL/ESB are friends, enemies, or frenemies, I already pointed this out. For instance, MQ or an ESB can complement event streaming in an integration project, depending on your project requirements.

Let’s go back to Deutsche Bahn. As mentioned, their real-time traveler information platform is live, with Kafka as the single point of truth. Recently, Deutsche Bahn announced a partnership with Google and 3rd Party Integration with Google Maps:

Real-time Schedule Updates to 3rd Party Google Maps API

The integration provides real-time train schedule updates to Google Maps users:

The integration allows to reach new people and expand the business. Users can buy train tickets via one click from the Google Maps page.

I don’t know what technology or product this 3rd party integration uses. The heart of Deutsche Bahn’s real-time infrastructure enables new innovative business models and collaboration with partners.

Likely, this integration between Deutsche Bahn and Google Maps does not directly use the Kafka protocol (even though this is done sometimes, for instance, see Here Technologies Open API for their mapping service).

Event Streaming is complementary to other services. In this example, the project team might have used an API Management platform to provide internal APIs to external consumers, including access control, billing, and reporting. The article “Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?” explores the relationship between event streaming and API Management.

Event Streaming Everywhere – Cloud, Data Center, Edge

Real-time beats slow data everywhere. This is a new software category because we don’t just send events into another database via a messaging system. Instead, we use and correlate data from different data source in real-time. That’ the real added value and game changer in innovative projects.

Hence, event streaming must be possible in every location. While cloud-first is a viable strategy for many IT projects, edge and hybrid scenarios are and will continue to be very relevant.

Think about a project related to the Deutsche Bahn example above (but being completely fictive): A hybrid architecture with real-time applications the cloud and edge computing within the trains:

I covered this in other articles, including “Edge Use Cases for Apache Kafka Across Industries“. TL;DR: Leverage the open architecture of event streaming for real-time data processing everywhere, including multi-cloud, data centers, and edge deployments (i.e., outside a data center). The enterprise architecture does not need various technologies and products to implement real-time data processing and integration with separate iPaaS, ETL tools, ESBs, MQ systems.

However, once again, it is crucial to understand how event streaming fits into the enterprise architecture. For instance, Kafka is often combined with IoT technologies such as MQTT for the last mile integration with IoT devices in these edge scenarios.

Slide Deck and Video for “Apache Kafka vs. Cloud-native iPaaS Middleware”

Here are the related slides for this topic:

And the video recording of this presentation:

Kafka is a cloud-native iPaaS, and much more!

Kafka is the new black for integration projects across industries because of its unique combination of capabilities. Some Kafka solutions are part of the iPaaS category, with trade-offs like any other integration platform.

However, event streaming is its software category. Hence, iPaaS is just one usage of Kafka or other similar event streaming platforms. Real-time data beats slow data. For that reason, event streaming is the backbone for many projects to process data in motion (but also integrate with other systems that store data at rest for reporting, model training, and other use cases).

How do you leverage event streaming and Kafka as an integration platform? What projects did you already work on or are in the planning? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Is Apache Kafka an iPaaS or is Event Streaming its own Software Category? appeared first on Kai Waehner.

Kafka and XML Messages – Transformation, Connector, Middleware

Kai Waehner — Fri, 25 Sep 2020 07:20:15 +0000

XML messages and XML Schema are not very common in the Apache Kafka and Event Streaming world! Why? Many people call XML legacy. It is complex, verbose, and often associated with the ugly WS-* Hell (SOAP, WSDL, etc). On the other side, every company older than five years uses XML. It is well understood, provides a good structure, and is human- and machine-readable.

This post does not want to start another flame war between XML and other technologies such as JSON (which also provides JSON Schema now), Avro, or Protobuf. Instead, I will walk you through the three main approaches to integrate between Kafka and XML messages as there is still a vast demand for implementing this integration today (often for integrating legacy applications and middleware).

XML and XML Schema

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium’s XML 1.0 Specification of 1998 and several other related specifications – all of them free open standards – define XML.

The design goals of XML emphasize simplicity, generality, and usability across the Internet. It is a textual data format with strong support via Unicode for different human languages. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services. Several schema systems exist to aid in defining XML-based languages, while programmers have developed many application programming interfaces (APIs) to assist the processing of XML data.

SOAP / WSDL Web Services – The WS-* Hell

Web Services use XML messages that follow the SOAP standard and have been popular with traditional enterprises for many years. In such systems, there is often a machine-readable description of the operations offered by the service written in the Web Services Description Language (WSDL). Web Services are one of the predominant use cases for XML integration. Some people call this the “WS-* Hell”:

Kafka for any Data Format (JSON, XML, Avro, Protobuf, …)

Kafka can store and process anything, including XML. The Kafka brokers are dumb. They don’t care about data formats. The implementation of Kafka under the hood stores and processes only byte arrays. This approach follows the design principle of dumb pipes and smart endpoints (coined by Martin Fowler for microservice architectures). Dumb brokers are one of the architectural reasons why Kafka scales and performs so well.

As Kafka supports any data format, XML is no problem at all. It accepts any serializable input format. XML is just text so that plain string serializers can be used. However, if you want additional validation before pushing messages into Kafka (like checking the content is actually XML or doing Schema Validation using XML Schema), then you need to write your own XML Serializer/Deserializer implementation.

XML mappings can be very complex, including referencing other documents and using plenty of ugly open standards. XML includes generic standards such as WS-Security or industry-specific standards such as XBRL for regulatory processing or HL7 for healthcare. It is a mess. Period. Even mature tools struggle as soon as any parts of such a standard are adjusted to their own needs (even though the standards support customization).

Kafka works with any programming language, and Confluent also provides a REST Proxy for HTTP(S) communication with Kafka. But these clients require the developer to implement all the complex XML mapping and processing. Hence, let’s now talk about the most common approaches to implement the integration between any XML-based application and Apache Kafka: 3rd party middleware (ETL / ESB tools) and Kafka-native Kafka Connect.

Kafka and XML via Middleware (ETL, ESB)

Apache Kafka and traditional middleware (ETL, ESB) are frenemies (friends and enemies). Check out this blog and video recording/slide deck for a more in-depth discussion and comparison. If these legacy integration tools such as TIBCO BusinessWorks or Software AG webMethods do one thing well, then it is graphical mappings of complex XML structures, including good and mature (but not 100%) support of the ugly WS-* web service standards.

Pros of 3rd Party Middleware for XML-Kafka Integration

Visual coding for a more straightforward mapping experience (especially crucial for very complex structures) – for all coders: Trust me, this is really easier and more time-efficient than writing, testing, and debugging source code!
Mature (10-20 years old technologies don’t have many bugs anymore – if they are still alive)
Support of complex XML Schema structures (but yet often issues with import, UI, and export)
(Often) already in place (implemented, tested, and deployed)
Kafka integration exists for any middleware which is still alive (i.e., maintained and supported by the vendor)

Cons of 3rd Party Middleware for XML-Kafka Integration

Products are legacy – as old as the source systems
Monolithic, inflexible architecture
Separate infrastructure to operate, test, maintain and pay
End-to-end integration is more challenging (from a technical and support perspective) as two systems in the middle instead of just one
Licensing
Point-to-point and tight coupling, and not event-based streaming with real decoupling
(Often) proprietary solution

Traditional middleware (such as TIBCO, IBM, Software AG, or Mulesoft) complements Kafka deployments. If you have the middleware already running and licensed (and do not plan to migrate away from it), then this is a viable approach to integrate between XML messages from legacy systems and Kafka.

Kafka and XML via Kafka Connect

The open Kafka ecosystem provides Kafka-native support for XML integration leveraging Kafka Connect. Kafka Connect is a Kafka-native tool for scalably and reliably streaming data integration between Apache Kafka and other data systems. It makes it simple to quickly define connectors that move large data sets into and out of Kafka. Think about it as ESB-on-Kafka.

Use cases include

Messaging integration (MQ)
Mainframe offloading
File outputs from batch processes
Web services (SOAP / WSDL / WS-*)
Legacy applications

Pros

Kafka-native (leveraging Kafka under the hood for scalability, throughput, high availability, exactly-once semantics, low latency, etc.)
Decoupled design (Domain-driven Microservice approach instead of tight coupling)
Open ecosystem and flexible integration with any data sources and sinks – check out Confluent Hub for hundreds of open source and commercial connectors
Cloud-native to be deployed in any edge / on-premise or cloud infrastructure such as Kubernetes

Cons

Limited support for complex XML Schemas and standards – not all ugly documents will work well
No visual coding – unfortunately, no Kafka-native visual coding tools exist in 2020. Let’s go, Confluent!

Let’s take a look at two Kafka Connect approaches in more detail: A dedicated XML Connector and an SMT (Single Message Transformation) embedded into any Kafka Connect source or sink connector.

Kafka Connect Connector for XML Files

An XML connector directly accesses the XML file to parse and transform the content:

Connect FilePulse is an open-source Kafka Connect connector built by streamthoughts to parse, transform and stream any XML file. Other file formats are also supported. But as many other tools support the modern data formats such as JSON, CSV, Avro, or Protobuf, I really think the highlight of this connector is the XML support.

Features:

Support for recursive scanning of local directories.
Reading and writing files into Kafka line by line.
Support multiple input file formats (e.g: CSV, JSON, AVRO, XML).
Parsing and transforming data using built-in or custom processing filters
Error handler definition
Monitoring files while they are written into Kafka
Support pluggable strategies to cleanup up completed files

Here is an excellent tutorial for using this XML connector for Kafka Connect: Streaming data into Kafka – Loading an XML file.

The Connect FilePulse Kafka Connector is the right choice for direct integration between XML files and Kafka.

SMT for Embedding XML Transformations into ANY Kafka Connect Connector

An SMT (Single Message Transformation) is part of the Kafka Connect framework. SMTs are applied to messages as they flow through Kafka Connect. They transform inbound messages after a source connector has produced them, but before they are written to Kafka. SMTs transform outbound messages before they are sent to a sink connector.

An SMT can be embedded into any Kafka Connect source or sink connector. Hence, the XML SMT for Kafka Connect allows direct integration with any interface and mapping XML messages without the need for storing the file or using a specific XML connector.

SMTs even allow to add or change metadata, e.g., by adding a new header in addition to the key and value of the message.

Here is an example: Receive XML messages from JMS-based messaging platforms and convert the XML payload to JSON, AVRO, or Protobuf for further processing and integration into the rest of the (modern) enterprise architecture. For instance, Confluent provides a generic JMS connector but also dedicated connectors for various legacy MQ products such as IBM MQ (often running on the mainframe), TIBCO EMS, and ActiveMQ. The XML SMT allows on-the-fly transformation of the incoming XML messages. I have seen integration and later replacement of these MQ tools across the globe in any kind of industry.

Dead Letter Queue (DLQ) for Handling Bad XML Messages

Just transforming messages is often not sufficient. A Dead Letter Queue (DLQ), aka Dead Letter Channel, is an Enterprise Integration Pattern (EIP) to handle bad messages. This design pattern is complementary for XML integration. For instance, a DLQ can store badly processed XML that didn’t fit the XSD in the transform. Here is an example of how to implement the rerouting to a DLQ using the above SMT.

Confluent Schema Registry for Data Governance

The Confluent Schema Registry is a complimentary (optional) tool. It provides a smart implementation of data format and content validation (including enforcement, versioning, and other features). I see it used in ~70% of Kafka projects across the globe. As soon as you do more than just data ingestion into a data lake like HDFS or S3, the added value is enormous. Today, Confluent Schema Registry supports JSON Schema, Avro, and Protobuf.

The Schema Registry provides an open interface and is pluggable. For example, some users have asked for Schema Registry to support XML. Now, you can add XML support to Schema Registry directly, and use the Schema Registry to store both XML and Avro at the same time. For more on how to add your schema formats, please refer to the documentation. The workaround is to do the discussed XML-to-another-format transformation first and start your event streaming data governance from that point.

Summary

XML is predominant in most enterprises and mostly used for legacy applications, batch processing, and SOAP / WSDL web services. A digital transformation can only be successful if the old world is connected well to the new world (as you might have learned in my example about how to integrate between Kafka and Mainframes).

This post explored the three most common options for integration between Kafka and XML:

XML integration with a 3rd party middleware
Kafka Connect connector for integration with XML files
Kafka Connect SMT to embed the transformation into ANY other Kafka Connect source or sink connector to transform XML messages on the flight

What are your experiences with XML integration for Kafka? Which implementation did you choose? What challenges did you face, and how did you or do you plan to solve this? What is your strategy? Let’s connect on LinkedIn and discuss it!

The post Kafka and XML Messages – Transformation, Connector, Middleware appeared first on Kai Waehner.

Kafka SAP Integration – APIs, Tools, Connector, ERP et al

Kai Waehner — Tue, 25 Aug 2020 13:58:42 +0000

A question I get every week from customers across the globe: How can I integrate my SAP system with Apache Kafka? This post explores various alternatives, including connectors, 3rd party tools, custom glue code, and trade-offs between the different options.

After exploring what SAP is, I will discuss several integration options between Apache Kafka and SAP systems:

Traditional middleware (ETL/ESB)
Web services (SOAP/REST)
3rd party turnkey solutions
Kafka-native connectivity with Kafka Connect
Custom glue code using SAP SDKs

Disclaimer before you read on:

I am not an SAP expert. It is tough to stay up-to-date with the vast and complex ecosystem of SAP products, (re-)brands, versions, services, SDKs, and APIs. I am sorry if some of the below information is not 100% accurate or outdated. Always double-check on the SAP website (if the links from Google still work – I had some issues with some pages “no longer available” while researching for this blog post). If you see any inaccurate or missing information, please let me know and I will update the blog post.

What is SAP?

SAP is a German multinational software corporation that makes enterprise software to manage business operations and customer relations. In 2019, SAP had revenue of €27.553 billion, a net income of €3.387 billion, and ~100.000 employees.

It is quite interesting: Nobody asks how to integrate with IBM or Oracle. Instead, people more specifically ask how to integrate with IBM MQ, IBM DB2, IBM Mainframe (still very ambiguous), or any other of the 100s of IBM products.

For SAP, people ask: How can I integrate with SAP? Let’s clarify what SAP is before exploring integration options.

The company is primarily known for its ERP software. But if you check out the official “What is SAP?” page, you find out that SAP offers solutions across a wide range of areas:

ERP and Finance
CRM and Customer Experience
Network and Spend Management
Digital Supply Chain
HR and People Engagement
Experience Management
Business Technology Platform
Digital Transformation
Small and Midsize Enterprises
Industry Solutions

SAP’s Software Portfolio

SAP’s stack includes homegrown products like SAP ERP and acquisitions with their own codebase, including Ariba for supplier network, hybris for e-commerce solutions, Concur for travel & expense management, and Qualtrics for experience management.

Even if you talk about SAP ERP, the situation is still not that easy. Most companies still run SAP ERP Central Component (ECC, formerly called SAP R/3), SAP’s sophisticated (and aged) ERP product. ECC runs on a third-party relational database from Oracle, IBM, or Microsoft, while HANA is SAP’s in-memory database. The new ERP product is SAP S4/Hana (no, this is not just the famous in-memory database). Oh, and there is SAP S4/Hana Cloud. And before you wonder: No, this is not the same feature set as the on-premise version!

Various interfaces exist depending on your product. An interface can be an (awful) proprietary technologies like BAPI or iDoc, (okayish) standards-based web service APIs using SOAP or REST / HTTP, a (non-scalable) JDBC database connectivity, or if you are lucky even a (scalable and real-time) Event / Messaging API. The article “The ERP is Dead. Long live the Distributed Planning System” from the SAP blog describes the situation very well.

And sorry, we are still not done yet. Even if you talk about ERP systems, this can mean anything from a zoo of products or components, depending on who you are talking to:

So, before you want to discuss the integration of your SAP product with Kafka, please please please find out the product, version, and deployment infrastructure of your SAP components.

Different Integration Options between Kafka and SAP

After this introduction, you hopefully understand that there is no silver bullet for SAP integration. The following will explore different integration options between Kafka and SAP and their trade-offs. The main focus is on SAP ERP (old ECC and new S4/Hana), but the overview is more generic, including integration capabilities with other components and products.

Also, keep in mind that you typically need or want to integrate with a function or service. Direct integration with the data object does not make much sense in most cases, as you would have to re-implement the mapping and denormalization between the data objects. Especially for source integration, i.e., building pipelines from SAP to Kafka. In the case of SAP ERP, you typically integrate with RFC/BAPI/iDoc or any other web service interface for this reason.

Traditional Middleware (ETL / ESB) for SAP Integration

Integration tools exist just for the sake of integrating different sources and sinks:

Extract-Transform-Load (ETL) for batch integration, like Informatica, Talend or SAP NetWeaver Process Integration
Enterprise Service Bus (ESB) for integration via web services and messaging, like TIBCO BusinessWorks or Software AG webMethods
Integration Platform as a Service (iPaaS) for cloud-native integration, similar to ETL/ESB tools, but provided as a fully managed service, such as Boomi, Mulesoft, or SAP Cloud Integration (and some cloud-washed products from legacy middleware vendors).

Most traditional middleware products were built to integrate with complex, proprietary systems from the last 20+ years, such as IBM Mainframe, EDIFACT, and guess what ERP systems like SAP ECC. In the meantime, all of them also have a Kafka connector. There are plenty of good reasons why many companies chose Kafka as a modern integration platform instead of a legacy of traditional middleware.

Most traditional ETL and ESB tools provide SAP connectivity. SAP Cloud Platform Integration (SAP CPI) is SAP’s own “modern” middleware solution. CPI includes a Kafka adapter to send and receive Kafka messages.

Pros:

In place: Typically already in place, no new project is required.
Maturity: Built over the years (because of the complexity), running in production for a long time already
Tooling: Visual coding for the integration (required because of the complexity), directly map iDoc / BAPI / Hana / SOAP schemas to other data structures
Integration: Not just connectors to the legacy systems but also Kafka for producing and consuming messages (due to market pressure)

Cons:

Legacy: Products are as old as the source and sink systems,
Scalability: Monolithic, inflexible architecture
Tight coupling: Integration has to be developed and runs on the middleware, no real decoupling and domain-driven design DDD like in Kafka
Licensing: High-cost per server, often already planned to be replaced (e.g., you can replace 100+ IBM MQ or TIBCO EMS servers with a single Kafka cluster)
Point-to-point: No streaming architecture, most integrations are based on web services (even if the core under the hood is based on a messaging system)

TL;DR:

Traditional integration tools are mature and have great tooling, but limited scalability/flexibility and high licensing cost. Often a quick win as it is already running, and you just need to add the Kafka connector.

Custom Glue Code for Kafka Integration using SAP SDKs

Writing your custom integration between SAP systems and Kafka is a viable option. This glue code typically leverages the same SDKs as 3rd party tools use under the hood:

Legacy: SAP NetWeaver RFC SDK – a C/C++ interface for connecting to SAP systems from release R/3 4.6C up to today’s SAP S/4HANA systems.
Legacy: SAP Java Connector (SAP JCo) – the famous JCO.jar library – is a Java SDK for integration to the SAP ECC / ERP (this is just a wrapper around the C/C++ SDK).
Legacy: SAP ACO is an integrated ABAP component that is designed to consume RFC Services on remote ABAP systems.
Legacy: SAP ABAP TCP Push Channel if you are forced to use custom ABAP code and need or want to use TCP instead of the Confluent REST Proxy for HTTP communication.
Legacy: JMS Adapter to integrate via the standard messaging protocol. Great option (if you get it running and working for your use case and functions). For instance, JMS integration can be done via SAP PI.
Modern: SAP Cloud SDK allows developing applications with Java or JavaScript that communicate with SAP solutions and services such as SAP S/4 Hana Cloud, SAP SuccessFactors, and others (the term ‘Cloud’ actually means ‘Cloud-native’ in this case, i.e., this SDK also works with SAP’s on-premise products).
Modern: SAP Cloud Platform Enterprise Messaging: S4/Hana provides an asynchronous messaging interface (running on Solace on CloudFoundry under the hood). Different messaging standards are supported, including AMQP 1.0 and JMS (depending on the specific product you look at). Some examples demonstrate how to connect via the Java Client using the JMS API.
Modern: SAP ODP (Operational Data Provisioning): Technical infrastructure for operational analytics, and data extraction + replication. Some kind of CDC (Change Data Capture) with out-of-the-box support for various SAP products, including SAP BW, SAP BW/4HANA, SAP Data Services, and SAP HANA Smart Data Integration. ODP is not just for SAP interfaces but also integrates with 3rd party technologies (via a custom connector, not out-of-the-box) such as HDFS or Kafka.

Pros:

Flexibility: Custom coding allows you to implement precisely what you need.

Cons:

Maintenance: No vendor support – develop, maintain, operate, support by yourself.
Point-to-point: No streaming architecture, most integrations are based on web services (even if the core under the hood is based on a messaging system).

TL;DR:

“Build vs. Buy” always has trade-offs. I have only seen custom glue code for SAP integration in the field if no solution from a vendor was available and affordable. SAP Cloud Platform Enterprise Messaging is a possible integration pattern for Kafka, but it also adds yet another messaging layer to the architecture.

SOAP / REST Web Services for SAP Integration

The last 15 years brought us web services for building a Service-oriented Architecture (SOA) to integrate applications. A web service typically uses SOAP or REST / HTTP as technology. I will not start yet another FUD war here. Both have their use cases and trade-offs.

Pros:

Standards-based: Different SDKs, products, and services talk the same language (at least in theory; true for HTTP, not so true for SOAP); most middleware tools have proper support for building HTTP services.
Simplicity (HTTP): Well-understood, supported by most programming languages and APIs, established for many use cases – middleware is just yet another one.

Cons:

Point-to-point: No streaming architecture, most integrations are based on web services (even if the core under the hood is based on a messaging system).
Tight coupling: Integration has to be developed and runs on the middleware, no real decoupling, and domain-driven design DDD like in Kafka.
Complexity (SOAP): SOAP/WSDL is just the tip of the iceberg! Check out the list of WS-* standards to understand why this is often called the “WS star hell”. The AXIS framework (Apache extensible Interaction System) is one example of SAP’s SOAP integration using an open framework. While the Apache project was last updated in 2006, SAP still recommends using this interface in 2020.
Missing features (REST / HTTP): Representational state transfer (REST) is a concept, but most people mean synchronous HTTP communication. Most middleware tools (and most other applications) only just a small fraction of the full standard. HTTP is an excellent standard, but all the tooling and features need to be built on top of it.
Only indirect support: Several SAP products do not provide open interfaces. While using SOAP or HTTP under the hood, you are forced to use the licensed tooling to create web services. For instance, SAP Business Connector (restricted license version of webMethods Integration Server), SAP NetWeaver Process Integration (PI), SAP Process Orchestration (PO), Cloud Platform Integration (CPI), or SAP Cloud Integration.

TL;DR:

SOAP and REST web services work well for point-to-point communication and have good tool support. Both have their trade-offs, make sure to choose the right one – if your SAP product provides both interfaces. Unfortunately, you will often not have a choice. Even worse: You cannot use any tool but are forced to use the right licensed SAP tool or wrapper interface. Large scale, high volume, and continuous processing of data are not ideal requirements for these (legacy) integration products.

For direct HTTP(S) communication with Kafka, Confluent Rest Proxy is an excellent option for producing, consuming, and administrating from any Kafka client (including custom SAP applications). For instance, SAP Cloud Platform Integration (CPI) can use this integration pattern to integrate between SAP and Kafka.

SAP-specific 3rd Party Tools for Kafka

SAP integration is a huge market globally. SAP provides several tools for data integration (some legacy, some modern – honestly, I don’t have a full overview of their complex product and API portfolio). Additionally, plenty of other software vendors have built specific integration software for SAP systems.

A few examples I have seen in the field recently:

Examples:

SAP itself has various integration tools, though many are already deprecated. One example of using an SAP solution is the SAP OpenHub Service. It allows you to distribute data from a BW system to non-SAP data marts, analytical applications, and other applications. SAP Data Hub (rebranded to SAP Data Intelligence – always search for both terms) can also connect to Kafka as demonstrated in this example using ODP under the hood. Limitations of SAP Data Hub include that you have to operate a licensed BI system, that dedicated “OpenHub” components have to be implemented in BW for each scenario, that a rather batch-oriented, request-based processing takes place and that this is a classic point-to-point scenario. Furthermore, the additional license fees have to be taken into account.
There are more SAP products in the confusing SAP universe that are able to integrate with Kafka. Honestly, the list of deprecated products in conjunction with new products and rebranding of old products is a total mess. Even the SAP websites struggle with this a lot!
ASAPIO: Their Cloud Integrator is designed for SAP ERP and SAP S/4HANA. It enables the replication of required data between system environments based on SAP NetWeaver technology.
Advantco: The Kafka Adapter is fully integrated with the SAP Adapter Framework, which means the Kafka connections can be configured, managed, and monitored using the standard SAP PI/PO & CPI tools. Includes support for Confluent Schema Registry (+ Avro / Protobuf).
workato: The company provides SAP OData Integration to Kafka and various pre-built recipe templates.
INIT Software has implemented a Kafka Connect ODP connector called i-OhJa. The Kafka-native nature provides benefits such as high performance, high scalability, and exactly-once semantics.
KaTe Kafka Adapter to connect SAP PO with Kafka in both directions.

These are just a few examples. Many more exist for on-premise, cloud, and hybrid integration with different SAP products and interfaces.

Some of these tools are natively integrated into SAP’s integration tools instead of being completely independent runtimes. This can be good or bad. An advantage of this approach is that you can leverage the SAP-native features for complex iDoc / BAPI mappings and the integrated 3rd party connector for Kafka communication.

Pros:

Turnkey solution: Built for SAP integration, often combined with other additional helpful features beyond just doing the connectivity, more lightweight than traditional generic middleware.
Focus: Many 3rd party solutions focus on a few specific use cases and/or products and technologies. It is much harder to integrate with “SAP in general” than focusing on a particular niche, e.g., Human Resources processes and related HTTP interfaces.
Maturity: Built over the years
Tooling: Visual coding for the integration (required because of the complexity), directly map iDoc / BAPI / Hana / SOAP schemas to other data structures
Integration: Not just connectors to the legacy systems but also modern technologies such as Kafka

Cons:

Scalability: Often monolithic, inflexible architecture (but focusing on SAP integration only, therefore often “okayish”)
Tight coupling: Integration has to be developed and runs on the tool, but separated from other middleware, thus decoupling and domain-driven design DDD in conjunction with Kafka
Licensing: Moderate cost per server (typically cheaper than the traditional generic middleware)
Point-to-point: No streaming architecture, most integrations are based on web services (even if the core under the hood is based on a messaging system)

TL;DR:

A turnkey solution is an excellent choice in many scenarios. I see this pattern of combining Kafka with a dedicated 3rd party solution for SAP integration very often. I like it as the architecture is still decoupled, but no vast efforts required for doing a (complex) SAP integration. And there is still hope that even SAP themselves releases a nice Kafka-native integration platform.

Kafka-native SAP Integration with Kafka Connect

Kafka Connect, an open-source component of Apache Kafka, is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems.

Kafka Connect connectors are available for SAP ERP databases:

Confluent Hana connector and SAP Hana connector for S4/Hana
Confluent JDBC connector for R/3 / ECC to integrate with Oracle / IBM DB2 / MS SQL Server.

Pros:

Kafka-native: Kafka under the hood, providing real-time processing for high volumes of data with high scalability and reliability.
Simplicity: Just one infrastructure for messaging and data integration, much easier to develop, test, operate, scale, and license than using different frameworks or products (e.g., Kafka for messaging plus an ESB for data integration).
Real decoupling: Kafka’s architecture uses smart endpoints and dumb pipes by design, one of the key design principles of microservices. Not just for the applications, but also for the integration components. Leverage all the benefits of a domain-driven architecture for your Kafka-native middleware.
Custom connectors: Kafka Connect provides an open template. If no connector is available, you (or your favorite system integrator or Kafka-vendor) can build an SAP-specific connector once, and you can roll it out everywhere.

Cons:

Only database connectors: No connectors available beyond the native JDBC database integration are available at the time of writing this.
Anti-pattern of direct database access: In most cases, you want or need to integrate with a function or service, not with the data objects. In most cases, you don’t even get direct access from the database admin anyway.
Efforts: Build your own SAP-native (i.e., non-JDBC) connector or ask (and pay) your favorite SI or Kafka vendor.

UPDATE January 2021: A Kafka-native integration is available with INIT’s ODP connector (as discussed in section “3rd party tools”). It eliminates the above cons and might be a great 3rd party option for some use cases.

TL;DR:

Kafka Connect is a great framework and used in most Kafka architectures for various good reasons. For SAP integration, the situation is different because no connectors are available (beyond direct database access). It took 3rd party vendors many years to implement RFC/BAPI/iDoc integration with their tools. Such an implementation will probably not happen again for Kafka because it is very complex, and these proprietary legacy interfaces are dying anyway. The situation is different for modern SAP interfaces: Some 3rd party providers leverage Kafka Connect for their product. For instance, INIT Software’s Kafka Connect ODP connector.

A Kafka Connect connector for SAP Cloud Platform Enterprise Messaging using its Java client would be a feasible and best option. I assume we will see such a connector on the market sooner or later.

Embedded Kafka in SAP Products

We have seen various integration options between SAP and Kafka. Unfortunately, all of them are based on the principle of “data at rest” in contrary to Kafka processing “data in motion”. The closest fit until here is the integration via the SAP Cloud Platform Enterprise Messaging because you can at least leverage an asynchronous messaging API.

The real added value comes when Kafka is leveraged not just for real-time messaging but for event streaming. Kafka provides a combination of messaging, data integration, data processing, and real decoupling using its distributed storage infrastructure.

Native Event Streaming with Kafka in SAP Products

Interestingly, some of SAPs acquisitions leverage Kafka under the hood for event streaming. Two public examples:

SAP Concur: Wanny Morellato, a director of engineering, lead the effort of refactoring Concur monolithic travel and expense backend into an event-driven distributed system of microservices using Kafka: Breaking Down a SQL Monolith with Change Tracking, Kafka and KStreams/KSQL
SAP Qualtrics: A particular challenge amidst their growth was implementing standardizations around different types of data. They were working out how to blend numerical data – about companies’ subscriptions, or sales, or content engagement – with experience data collected from surveys. To do that, they utilize technologies like Kafka and Spark and really fast data stores to create a real-time engine to transform data into actionable observational reports.

Obviously, people are also waiting for the Kafka-native SAP S4/Hana interface so that they can leverage events in real-time for processing data in motion and correlate real-time and historical data together. A native Kafka integration with SAP S4/Hana should be the next step for SAP! HERE Technologies provides a great example of how to provide a Kafka-native interface (and an alternative REST option) for their product.

Having said this, current SAP blogs (from mid of 2019) still talk about replacing the 20+ years old BAPI and RFC integration style with SOAP and OData (Open Data Protocol, an open protocol that allows the creation and consumption of queryable and interoperable REST) APIs in SAP S/4HANA Public Cloud.

My personal feeling and hope are that a native Kafka interface is just a matter of time as the market demand is everywhere across the globe (I talk to many customers in EMEA, US, and APAC), and even several non-S4/Hana SAP products use Kafka internally.

I have also seen a two-fold approach from some other vendors: Provide a Kafka-native interface to the outside world first (in SAP terms you could e.g. provide a Kafka-interface on top of BAPIs. At a later point, reengineer the internal architecture away from the non-scalable technology to Kafka under the hood (in SAP terms you could replace RFC / BAPI functions with a more scalable Kafka-native version – even using the same API interface and message structure).

Native Streaming Replication between Products, Departments, and Companies

Native Kafka integration does not just happen within a product or company. A widespread trend I see on the market in different industries is to integrate with partners via Kafka-native streaming replication instead of REST APIs:

Think about it: If you use Kafka in different application infrastructures, but the interface is just a web service or database, then all the benefits might go away because scalability and/or real-time data correlation capabilities go away.

More and more vendors of standard software use Kafka as the backbone of their internal architecture. If the interface between products (imaginatively say SAP’s ERP system, SAP’s MES system, and the SCM application of an OEM customer) is just a SOAP or REST API, then this does not scale and perform well for the requirements of use cases in the digital transformation and Industry 4.0.

Hence, more and more companies leverage Kafka not just internally but also between departments or even different companies. Streaming replication between companies is possible with tools like MirrorMaker 2.0 or Confluent Replicator. Or you use the much simpler Cluster Linking from Confluent, which enables integration between hybrid, multi-cloud, or 3rd party integration using the Kafka protocol under the hood.

SAP + Apache Kafka = The Future for ERP et al

There is huge demand across the globe to integrate SAP applications with Apache Kafka for real-time messaging, data integration, and data processing at scale. The demand is true for SAP ERP (ECC and S4/Hana) but also for most other products from the vast SAP portfolio.

Kafka is deployed in many modern and innovative use cases for supply chain management, manufacturing, customer experience, and so on. Edge, hybrid and multi-cloud Kafka deployments is the norm, not an exception.

Kafka integrates with SAP systems well. Different integration options are available via SAP SDKs and 3rd party products for proprietary interfaces, open standards, and modern messaging and event streaming concepts. Choose the right option for your need and get started with Kafka SAP integration…

If you want to build a modernize your existing ERP infrastructure (no matter if SAP or any other vendor), also check out the article “Building a Postmodern ERP with Apache Kafka“.

What are your experiences with SAP Kafka integration? How did it work? What challenges did you face and how did you or do you plan to solve this? What is your strategy? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Kafka SAP Integration – APIs, Tools, Connector, ERP et al appeared first on Kai Waehner.

IoT Integration with Kafka Connect, REST / HTTP, MQTT, OPC-UA – Lightboard Video

Kai Waehner — Fri, 26 Jul 2019 08:06:35 +0000

I just want to share my lightboard video recording. I talk about IoT integration and processing with Apache Kafka using Kafka Connect, Kafka Streams, KSQL, REST / HTTP, MQTT and OPC-UA. Use cases, alternative architectures and different integration options are discussed on whiteboard.

End-to-End IoT Integration from Edge to Confluent Cloud

In this lightboard, Confluent’s Kai Waehner (Technology Evangelist) and Konstantin Karantasis (Software Engineer) discuss use cases leveraging the Apache Kafka open source ecosystem as a streaming platform to process IoT data. The session shows architectural alternatives of how devices like cars, machines or mobile devices connect to Apache Kafka via IoT standards like MQTT or OPC-UA.

Learn how to analyze the IoT data either natively on Apache Kafka with Kafka Streams / KSQL or other tools leveraging Kafka Connect. Kai Waehner will also discuss the benefits of Confluent Cloud and other tools like Confluent Replicator or MQTT Proxy to build bidirectional real time integration from the edge to the cloud.

This presentation may help you:

Understand end-to-end use cases from different industries where you integrate IoT devices with enterprise IT using open source technologies and standards
See how Apache Kafka enables bidirectional end-to-end integration processing from IoT data to various backend applications in the cloud
Compare different architectural alternatives and see their benefits and caveats
Learn about various standards, APIs and tools of integrating and processing IoT data with different open source components of the Apache Kafka ecosystem
Understand the benefits of Confluent Cloud, which provides a highly available and scalable Apache Kafka ecosystem as a managed service

IoT Edge to Confluent Cloud

Here is the video recording:

Please let me know if you have any comments or questions.

Integration between Kafka and Siemens S7, Modbus, Beckhoff ADS, OPC-UA using PLC4X

Stay tuned for a dedicated blog post, slide deck and video recording about a core them in IoT space: Integration with Industrial IoT (IIoT). I will talk about integration between Apache Kafka and technologies / proprietary standards from automation industry. I will discuss how to integrate with proprietary protocols like Siemens S7, Modbus, Beckhoff ADS, OPC-UA, etc. leveraging the Kafka Connect connector for PLC4X. This content is in the works these days and I will probably publish it in August or September 2019.

Please also check out the following related material about IoT Integration:

Processing IoT Data from End to End with MQTT and Apache Kafka (Slides + Video from Kafka Summit Talk)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) – Friends, Enemies or Frenemies? (Slides + Video + Article)
Enabling connected transformation with Apache Kafka and TensorFlow on Google Cloud Platform (Google Blog about the use case “connected car infrastructure”)

The post IoT Integration with Kafka Connect, REST / HTTP, MQTT, OPC-UA – Lightboard Video appeared first on Kai Waehner.

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Kai Waehner — Thu, 07 Mar 2019 15:45:15 +0000

Learn the differences between an event-driven streaming platform like Apache Kafka and middleware like Message Queues (MQ), Extract-Transform-Load (ETL) and Enterprise Service Bus (ESB). Including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.

This blog post shares my slide deck and video recording. I discuss the differences between Apache Kafka as Event Streaming Platform and integration middleware. Learn if they are friends, enemies or frenemies.

Problems of Legacy Middleware

Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations.

Apache Kafka – An Open Source Event Streaming Platform

This session discusses how teams in different industries solve these challenges by building a native event streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime, rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.

A concrete example architecture shows how to build a complete streaming platform leveraging the widely-adopted open source framework Apache Kafka to build a mission-critical, scalable, highly performant streaming platform. Messaging, integration and stream processing are all build on top of the same strong foundation of Kafka; deployed on premise, in the cloud or in hybrid environments. In addition, the open source Confluent projects, based on top of Apache Kafka, adds additional features like a Schema Registry, additional clients for programming languages like Go or C, or many pre-built connectors for various technologies.

Slides: Apache Kafka vs. Integration Middleware

Here is the slide deck:

Video Recording: Kafka vs. MQ / ETL / ESB – Friends, Enemies or Frenemies?

Here is the video recording where I walk you through the above slide deck:

Article: Apache Kafka vs. Enterprise Service Bus (ESB)

I also published a detailed blog post on Confluent blog about this topic in 2018:

Apache Kafka vs. Enterprise Service Bus (ESB)

Talk and Slides from Kafka Summit London 2019

The slides and video recording from Kafka Summit London 2019 (which are similar to above) are also available for free.

Why Apache Kafka instead of Traditional Middleware?

If you don’t want to spend a lot of time on the slides and recording, here is a short summary of the differences of Apache Kafka compared to traditional middleware:

Questions and Discussion…

Please share your thoughts, too!

Does your infrastructure see similar architectures? Do you face similar challenges? Do you like the concepts behind an Event Streaming Platform (aka Apache Kafka)? How do you combine legacy middleware with Kafka? What’s your strategy to integrate the modern and the old (technology) world? Is Kafka part of that architecture?

Please let me know either via a comment or via LinkedIn, Twitter, email, etc. I am curious about other opinions and experiences (and people who disagree with my presentation).

The post Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video appeared first on Kai Waehner.

Apache Kafka vs. ESB / ETL / MQ

Kai Waehner — Wed, 18 Jul 2018 18:33:56 +0000

Apache Kafka and Enterprise Service Bus (ESB) are complementary, not competitive!

Apache Kafka is much more than messaging in the meantime. It evolved to a streaming platform including Kafka Connect, Kafka Streams, KSQL and many other open source components. Kafka leverages events as a core principle. You think in data flows of events and process the data while it is in motion. Many concepts, such as event sourcing, or design patterns such as Enterprise Integration Patterns (EIPs), are based on event-driven architecture.

Kafka is unique because it combines messaging, storage, and processing of events all in one platform. It does this in a distributed architecture using a distributed commit log and topics divided into multiple partitions.

ETL and ESB have excellent tooling, including graphical mappings for doing complex integration with legacy systems and technologies such as SOAP, EDIFACT, SAP BAPI, COBOL, etc. (Trust me, you don’t want to write the code for this.)

Therefore, existing MQ and ESB solutions, which already integrate with your legacy world, are not competitive to Apache Kafka. Rather, they are complementary!

Read more details about this question in my Confluent blog post:

Apache Kafka® vs. Enterprise Service Bus (EBS) | Confluent

As always, I appreciate any feedback, comments or criticism.

The post Apache Kafka vs. ESB / ETL / MQ appeared first on Kai Waehner.

Agile Cloud-to-Cloud Integration with iPaaS, API Management and Blockchain

Kai Waehner — Sun, 23 Apr 2017 18:41:06 +0000

Cloud-to-Cloud integration is part of a hybrid integration architecture. It enables to implement quick and agile integration scenarios without the burden of setting up complex VM- or container-based infrastructures. One key use case for cloud-to-cloud integration is innovation using a fail-fast methodology where you realize new ideas quickly. You typically think in days or weeks, not in months. If an idea fails, you throw it away and start another new idea. If the idea works well, you scale it out and bring it into production to a on premise, cloud or hybrid infrastructure. Finally, you make expose the idea and make it easily available to any interested service consumer in your enterprise, partners or public end users.

A great example where you need agile, fail-fast development is blockchain because it is a very hot topic, but frameworks are immature and change very frequently these days. Note that blockchain is not just about Bitcoin and finance industry. This is just the tip of the iceberg. Blockchain, which is the underlying infrastructure of Bitcoin, will change various industries, beginning with banking, manufacturing, supply chain management, energy, and others.

Middleware and Integration as Key for Success in Blockchain Projects

A key for successful blockchain projects is middleware as it allows integration with the rest of an enterprise architecture. Blockchain only adds value if it works together with your other applications and services. See an example of how to combine streaming analytics with TIBCO StreamBase and Ethereum to correlate blockchain events in real time to act proactively, e.g. for fraud detection.

The drawback of blockchain is its immaturity today. APIs change with every minor release, development tools are buggy and miss many required features for serious development, and new blockchain cloud services, frameworks and programming languages come and go every quarter.

This blog post shows how to leverage cloud integration with iPaaS and API Management to realize innovative projects quickly, fail fast, and adopt changing technologies and business requirements easily. We will use TIBCO Cloud Integration (TCI), TIBCO Mashery and Hyperledger Fabric running as IBM Bluemix cloud service.

IBM Hyperledger Fabric as Bluemix Cloud Service

Hyperledger is a vendor independent open source blockchain project which consists of various components. Many enterprises and software vendors committed to it for building different solutions for various problems and use cases. One example is IBM’s Hyperledger Fabric. The benefit of being a very flexible construction kit makes Hyperledger much more complex for getting started than other blockchain platforms, like Ethereum, which are less flexible but therefore easier to set up and use.

This is the reason why I use the BlueMix BaaS (Blockchain as a Service) to get started quickly without spending days on just setting up my own Hyperledger network. You can start a Hyperledger Fabric blockchain infrastructure with four peers and a membership service with just a few clicks. It takes around two minutes. Hyperledger Fabric is evolving fast. Therefore, a great example for quickly changing features, evolving requirements and (potentially) fast failing projects.

My project uses version Hyperledger Fabric 0.6 as free beta cloud service. I leverage its Swagger-based REST API in my middleware microservice to interconnect other cloud services with the blockchain:

However, when I began the project, it was already clear that the REST interface is deprecated and will not be included in the upcoming 1.0 release anymore. Thus, we are aware that an easy move to another API is needed in a few weeks or months.

Cloud-to-Cloud Integration, iPaaS and API Management for Agile and Fail-Fast Development

As mentioned in the introduction, middleware is key for success in blockchain projects to interconnect it with the existing enterprise architecture. This example leverages the following middleware components:

iPaaS: TIBCO Cloud Integration (TCI) is hosted and managed by TIBCO. It can be used without any setup or installation to quickly build a REST microservice which integrates with Hyperledger Fabric. TCI also allows to configure caching, throttling and security configuration to ensure controlled communication between the blockchain and other applications and services.
API Management: TIBCO Mashery is used to expose the TCI REST microservice for other services consumers. This can be internal, partners or public end users, depending on the use case.
On Premise / Cloud-Native Integration Infrastructure: TIBCO BusinessWorks 6 (BW6) respectively TIBCO BusinessWorks Container Edition (BWCE) can be used to deploy and productionize successful TCI microservices in your existing infrastructure on premise or on a cloud-native platform like CloudFoundry, Kubernetes or any other Docker-based platform. Of course, you can also continue to use the services within TCI itself to run and scale the production services in the public cloud, hosted and managed by TIBCO.

Scenario: TIBCO Cloud Integration + Mashery + Hyperledger Fabric Blockchain + Salesforce

I implemented the following technical scenario with the goal of showing agile cloud-to-cloud integration with fail-fast methodology from a technical perspective (and not to build a great business case): The scenario implements a new REST microservice with TCI via visual coding and configuration. This REST service connects to two cloud services: Salesforce and Hyperledger Fabric. The following picture shows the architecture:

Here are the steps to realize this scenario:

Create a REST service which receives customer ID and customer name as parameters.
Enhance the input data from the REST call with additional data from Salesforce CRM. In this case, we get a block hash which is stored in Salesforce as reference to the blockchain data. The block hash allows to doublecheck if Salesforce has the most up-to-date information about a customer while the blockchain itself ensures that the customer updates from various systems like CRM, ERP, Mainframe are stored and updated in a correct, secure, distributed chain – which can be accessed by all related systems, not just Salesforce.
Use Hyperledger REST API to validate via Salesforce’ block hash that the current information about the customer in Salesforce is up-to-date. If Salesforce has an older block hash, then update Salesforce to up-to-date values (both, the most recent hash block and the current customer data stored on blockchain)
Return the up-to-date customer data to the service consumer of the TCI REST service
Configure caching, throttling and security requirements so that service consumers cannot do unexpected behavior
Leverage API Management to expose the TCI REST service as public API so that external service consumers can subscribe to a payment plan and use it in other applications or microservices.

Implementation: Visual Coding, Web Configuration and Automation via DevOps

This section discusses the relevant steps of the implementation in more detail, including development and deployment of the chain code (Hyperledger’s term for smart contracts in blockchain), implementation of the REST integration service with visual coding in the IDE of TCI, and exposing the service as API via TIBCO Mashery.

Chain Code for Hyperledger Fabric on IBM Bluemix Cloud

Hyperledger Fabric has some detailed “getting started” tutorials. I used a “Hello World” Tutorial and adopted it to our use case so that you can store, update and query customer data on the blockchain network. Hyperledger Fabric uses Go for chain code and will allow other programming languages like Java in future releases. This is both a pro and con. The benefit is that you can leverage your existing expertise in these programming languages. However, you also have to be careful not to use “wrong” features like threads, indefinite loops or other functions which cause unexpected behavior on a blockchain. Personally, I prefer the concept of Ethereum. This platform uses Solidity, a programming language explicitly designed to develop smart contracts (i.e. chain code).

Cloud-to-Cloud Integration via REST Service with TIBCO Cloud Integration

The following steps were necessary to implement the REST microservice:

Create REST service interface with API Modeler web UI leveraging Swagger
Import Swagger interface into TCI’s IDE
Use Salesforce activity to read block hash from Salesforce’ cloud interface
Use Rest Client activity to do an HTTP Request to Hyperledger Fabric’s REST interface
Configure graphical mappings between the activities (REST Request à Salesforce à Hyperledger Fabric à REST Response)
Deploy microservice to TCI’s runtime in the cloud and test it from Swagger UI
Configure caching and throttling in TCI’s web UI

The last point is very important in blockchain projects. A blockchain infrastructure does not have millisecond response latency. Furthermore, every blockchain update costs money – Bitcoin, Ether, or whatever currency you use to “pay for mining” and to reach consensus in your blockchain infrastructure. Therefore, you can leverage caching and throttling in blockchain projects much more than in some other projects (only if it fits into your business scenario, of course).

Here is the implemented REST service in TCI’s graphical development environment:

And the caching configuration in TCI’s web user interface:

Exposing the REST Service to Service Consumers through API Management via TIBCO Mashery

The deployed TCI microservice can either be accessed directly by service consumers or you expose it via API Management, e.g. with TIBCO Mashery. The latter option allows to define rules, payment plans and security configuration in a centralized manner with easy-to-use tooling.

Note that the deployment to TCI and exposing its API to Mashery can also be done in one single step. Both products are loosely coupled, but highly integrated. Also note that this is typically not done manually (like in this technical demonstration), but integrated into a DevOps infrastructure leveraging frameworks like Maven or Jenkins to automate the deployment and testing steps.

Agile Cloud-to-Cloud Integration with Fail-Fast Methodology

Our scenario is now implemented successfully. However, it is already clear that the REST interface is deprecated and not included in the upcoming 1.0 release anymore. In blockchain frameworks, you can expect changes very frequently.

While implementing this example, IBM announced on its big user conference IBM Interconnect in Las Vegas that Hyperledger Fabric 1.0 is now publicly available. Well, it is really available on Bluemix since that day, but if you try to start the service you get the following message: “Due to high demand of our limited beta, we have reached capacity.” A strange error as I thought we are using public cloud infrastructures like Amazon AWS, Microsoft Azure, Google Cloud Platform or IBM Bluemix for exactly this reason to scale out easily… J

Anyway, the consequence is that we still have to work on our 0.6 version and do not know yet when we will be able or have to migrate to version 1.0. The good news is that iPaaS and cloud-to-cloud integration can realize changes quickly. In the case of our TCI REST service, we just need to replace the REST activity calling Hyperledger REST API with a Java activity and leverage Hyperledger’s Java SDK – as soon as it is available. Right now only a Client SDK for Node.js is available – not really an option for “enterprise projects” where you want to leverage the JVM respectively Java platform instead of JavaScript. Side note: This topic of using Java vs. JavaScript in blockchain projects is also well discussed in “Static Type Safety for DApps without JavaScript”.

This blog post focuses just on a small example, and Hyperledger Fabric 1.0 will bring other new features and concept changes with it, of course. Same is true for SaaS cloud offerings such as Salesforce. You cannot control when they change their API and what exactly they change. But you have to adopt it within a relative short timeframe or your service will not work anymore. An iPaaS is the perfect solution for these scenarios as you do not have a lot of complex setup in your private data center or a public cloud platform. You just use it “as a service”, change it, replace logic, or stop it and throw it away to start a new project. The implicit integration into API Management and DevOps support also allow to expose new versions for your service consumers easily and quickly.

Conclusion: Be Innovative by Failing Fast with Cloud Solutions

Blockchain is very early stage. This is true for platforms, tooling, and even the basic concepts and theories (like consensus algorithms or security enforcement). Don’t trust the vendors if they say Blockchain is now 1.0 and ready for prime time. It will still feel more like a 0.5 beta version these days. This is not just true for Hyperledger and its related projects such as IBM’s Hyperledger Fabric, but also for all others, including Ethereum and all the interesting frameworks and startups emerging around it.

Therefore, you need to be flexible and agile in blockchain development today. You need to be able to fail fast. This means set up a project quickly, try out new innovative ideas, and throw them away if they do not work; to start the next one. The same is true for other innovative ideas – not just for blockchain.

Middleware helps integrating new innovative ideas with existing applications and enterprise architectures. It is used to interconnect everything, correlate events in real time, and find insights and patterns in correlated information to create new added value. To support innovative projects, middleware also needs to be flexible, agile and support fail fast methodologies. This post showed how you can leverage iPaaS with TIBCO Cloud Integration and API Management with TIBCO Mashery to build innovative middleware microservices in innovative cloud-to-cloud integration projects.

The post Agile Cloud-to-Cloud Integration with iPaaS, API Management and Blockchain appeared first on Kai Waehner.

Case Study: From a Monolith to Cloud, Containers, Microservices

Kai Waehner — Fri, 24 Feb 2017 15:14:12 +0000

The following shows a case study about successfully moving from a very complex monolith system to a cloud-native architecture. The architecture leverages containers and Microservices. This solve issues such as high efforts for extending the system, and a very slow deployment process. The old system included a few huge Java applications and a complex integration middleware deployment.

The new architecture allows flexible development, deployment and operations of business and integration services. Besides, it is vendor-agnostic so that you can leverage on-premise hardware, different public cloud infrastructures, and cloud-native PaaS platforms.

The session will describe the challenges of the existing monolith system, the step-by-step procedure to move to the new cloud-native Microservices architecture. It also explains why containers such as Docker play a key role in this scenario.

A live demo shows how container solutions such as Docker, PaaS cloud platforms such as CloudFoundry, cluster managers such as Kubernetes or Mesos, and different programming languages are used to implement, deploy and scale cloud-native Microservices in a vendor-agnostic way.

Key Takeaways

Key takeaways for the audience:

– Best practices for moving to a cloud-native architecture

– How to leverage microservices and containers for flexible development, deployment and operations

– How to solve challenges in real world projects

– Understand key technologies, which are recommended

– How to stay vendor-agnostic

– See a live demo of how cloud-native applications respectively services differ from monolith applications regarding development and runtime

Slides and Video from Microservices Meetup Mumbai

Here are the slides and video recording. Presented in February 2017 at Microservices Meetup Mumbai, India.

The post Case Study: From a Monolith to Cloud, Containers, Microservices appeared first on Kai Waehner.