Integration Archives - Kai Waehner

Replacing Legacy Systems, One Step at a Time with Data Streaming: The Strangler Fig Approach

Kai Waehner — Thu, 27 Mar 2025 06:37:24 +0000

Organizations looking to modernize legacy applications often face a high-stakes dilemma: Do they attempt a complete rewrite or find a more gradual, low-risk approach? Enter the Strangler Fig Pattern, a method that systematically replaces legacy components while keeping the existing system running. Unlike the “Big Bang” approach, where companies try to rewrite everything at once, the Strangler Fig Pattern ensures smooth transitions, minimizes disruptions, and allows businesses to modernize at their own pace. Data streaming transforms the Strangler Fig Pattern into a more powerful, scalable, and truly decoupled approach. Let’s explore why this approach is superior to traditional migration strategies and how real-world enterprises like Allianz are leveraging it successfully.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming architectures and use cases, including various use cases from the retail industry.

What is the Strangler Fig Design Pattern?

The Strangler Fig Pattern is a gradual modernization approach that allows organizations to replace legacy systems incrementally. The pattern was coined and popularized by Martin Fowler to avoid risky “big bang” system rewrites.

Inspired by the way strangler fig trees grow around and eventually replace their host, this pattern surrounds the old system with new services until the legacy components are no longer needed. By decoupling functionalities and migrating them piece by piece, businesses can minimize disruptions, reduce risk, and ensure a seamless transition to modern architectures.

When combined with data streaming, this approach enables real-time synchronization between old and new systems, making the migration even smoother.

Why Strangler Fig is Better than a Big Bang Migration or Rewrite

Many organizations have learned the hard way that rewriting an entire system in one go is risky. A Big Bang migration or rewrite often:

Takes years to complete, leading to stale requirements by the time it’s done
Disrupts business operations, frustrating customers and teams
Requires a high upfront investment with unclear ROI
Involves hidden dependencies, making the transition unpredictable

The Strangler Fig Pattern takes a more incremental approach:

It allows gradual replacement of legacy components one service at a time
Reduces business risk by keeping critical systems operational during migration
Enables continuous feedback loops, ensuring early wins
Keeps costs under control, as teams modernize based on priorities

Here is an example from the Industrial IoT space for the strangler fig pattern leveraging data streaming to modernizing OT Middleware:

If you come from the traditional IT world (banking, retail, etc.) and don’t care about IoT, then you can explore my article about mainframe integration, offloading and replacement with data streaming.

Instead of replacing everything at once, this method surrounds the old system with new components until the legacy system is fully replaced—just like a strangler fig tree growing around its host.

Better Than Reverse ETL: A Migration with Data Consistency across Operational and Analytical Applications

Some companies attempt to work around legacy constraints using Reverse ETL—extracting data from analytical systems and pushing it back into modern operational applications. On paper, this looks like a clever workaround. In reality, Reverse ETL is a fragile, high-maintenance anti-pattern that introduces more complexity than it solves.

Reverse ETL carries several critical flaws:

Batch-based by nature: Data remains at rest in analytical lakehouses like Snowflake, Databricks, Google BigQuery, Microsoft Fabric or Amazon Redshift. It is then periodically moved—usually every few hours or once a day—back into operational systems. This results in outdated and stale data, which is dangerous for real-time business processes.
Tightly coupled to legacy systems: Reverse ETL pipelines still depend on the availability and structure of the original legacy systems. A schema change, performance issue, or outage upstream can break downstream workflows—just like with traditional ETL.
Slow and inefficient: It introduces latency at multiple stages, limiting the ability to react to real-world events at the right moment. Decision-making, personalization, fraud detection, and automation all suffer.
Not cost-efficient: Reverse ETL tools often introduce double processing costs—you pay to compute and store the data in the warehouse, then again to extract, transform, and sync it back into operational systems. This increases both financial overhead and operational burden, especially as data volumes scale.

In short, Reverse ETL is a short-term fix for a long-term challenge. It’s a temporary bridge over the widening gap between real-time operational needs and legacy infrastructure.

Why Data Streaming with Apache Kafka and Flink is an Excellent Fit for the Strangler Fig Pattern

Many modernization efforts fail because they tightly couple old and new systems, making transitions painful. Data streaming with Apache Kafka and Flink changes the game by enabling real-time, event-driven communication.

1. True Decoupling of Old and New Systems

Data streaming using Apache Kafka with its event-driven streaming and persistence layer enables organizations to:

Decouple producers (legacy apps) from consumers (modern apps)
Process real-time and historical data without direct database dependencies
Allow new applications to consume events at their own pace

This avoids dependency on old databases and enables teams to move forward without breaking existing workflows.

2. Real-Time Replication with Persistence

Unlike traditional migration methods, data streaming supports:

Real-time synchronization of data between legacy and modern systems
Durable persistence to handle retries, reprocessing, and recovery
Scalability across multiple environments (on-prem, cloud, hybrid)

This ensures data consistency without downtime, making migrations smooth and reliable.

3. Supports Any Technology and Communication Paradigm

Data streaming’s power lies in its event-driven architecture, which supports any integration style—without compromising scalability or real-time capabilities.

The data product approach using Kafka Topics with data contracts handles:

Real-time messaging for low-latency communication and operational systems
Batch processing for analytics, reporting and AI/ML model training
Request-response for APIs and point-to-point integration with external systems
Hybrid integration—syncing legacy databases with cloud apps, uni- or bi-directionally

This flexibility lets organizations modernize at their own pace, using the right communication pattern for each use case while unifying operational and analytical workloads on a single platform.

4. No Time Pressure – Migrate at Your Own Pace

One of the biggest advantages of the Strangler Fig Pattern with data streaming is flexibility in timing.

No need for overnight cutovers—migrate module by module
Adjust pace based on business needs—modernization can align with other priorities
Handle delays without data loss—thanks to durable event storage

This ensures that companies don’t rush into risky migrations but instead execute transitions with confidence.

5. Intelligent Processing in the Data Migration Pipeline

In a Strangler Fig Pattern, it’s not enough to just move data from old to new systems—you also need to transform, validate, and enrich it in motion.

While Apache Kafka provides the backbone for real-time event streaming and durable storage, Apache Flink adds powerful stream processing capabilities that elevate the modernization journey.

With Apache Flink, organizations can:

Real-time preprocessing: Clean, filter, and enrich legacy data before it’s consumed by modern systems.
Data product migration: Transform old formats into modern, domain-driven event models.
Improved data quality: Validate, deduplicate, and standardize data in motion.
Reusable business logic: Centralize processing logic across services without embedding it in application code.
Unified streaming and batch: Support hybrid workloads through one engine.

Apache Flink allows you to roll out trusted, enriched, and well-governed data products—gradually, without disruption. Together with Kafka, it provides the real-time foundation for a smooth, scalable transition from legacy to modern.

Allianz’s Digital Transformation and Transition to Hybrid Cloud: An IT Modernization Success Story

Allianz, one of the world’s largest insurers, set out to modernize its core insurance systems while maintaining business continuity. A full rewrite was too risky, given the critical nature of insurance claims and regulatory requirements. Instead, Allianz embraced the Strangler Fig Pattern with data streaming.

This approach allows an incremental, low-risk transition. By implementing data streaming with Kafka as an event backbone, Allianz could gradually migrate from legacy mainframes to a modern, scalable cloud architecture. This ensured that new microservices could process real-time claims data, improving speed and efficiency without disrupting existing operations.

Source: Allianz at Data in Motion Tour Frankfurt 2022

To achieve this IT modernization, Allianz leveraged automated microservices, real-time analytics, and event-driven processing to enhance key operations, such as pricing, fraud detection, and customer interactions.

A crucial component of this shift was the Core Insurance Service Layer (CISL), which enabled decoupling applications via a data streaming platform to ensure seamless integration across various backend systems.

With the goal of migrating over 75% of applications to the cloud, Allianz significantly increases agility, reduced operational complexity, and minimized technical debt, positioning itself for long-term digital success by transitioning many applications to the cloud, as you can read (in German) in the CIO.de article:

“As one of the largest insurers in the world, Allianz plans to migrate over 75 percent of its applications to the cloud and modernize its core insurance system.”

Beyond technology, Allianz recognized that successful modernization also required cultural transformation. To drive internal adoption of event-driven architectures, the company launched Allianz Eventing Day—an initiative to educate teams on the benefits of Kafka-based streaming.

Hosted in partnership with Confluent, the event brought together Allianz experts and industry leaders to discuss real-world implementations, best practices, and future innovations. This collaborative environment reinforced Allianz’s commitment to continuous learning and agile development, ensuring that data streaming became a core pillar of its IT strategy.

Allianz also extended this engagement to the broader insurance industry, organizing the first Insurance Roundtable on Event-Driven Architectures with top insurers from across Germany and Switzerland. Experts from Allianz, Generali, HDI, Swiss Re, and others exchanged insights on topics like real-time data analytics, API decoupling, and event discovery. The discussions highlighted how streaming technologies drive competitive advantage, allowing insurers to react faster to customer needs and continuously improve their services. By embracing domain-driven design (DDD) and event storming, Allianz ensured that event-driven architectures were not just a technical upgrade, but a fundamental shift in how insurance operates in the digital age.

The Future of IT Modernization and Legacy Migrations with Strangler Fig using Data Streaming

The Strangler Fig Pattern is the most pragmatic approach to modernizing enterprise systems, and data streaming makes it even more powerful.

By decoupling old and new systems, enabling real-time synchronization, and supporting multiple architectures, data streaming with Apache Kafka and Flink provides the flexibility, reliability, and scalability needed for successful migrations and long-term integrations between legacy and cloud-native applications.

As enterprises continue to strengthen, this approach ensures that modernization doesn’t become a bottleneck, but a business enabler. If your organization is considering legacy modernization, data streaming is the key to making the transition seamless and risk-free.

Are you ready to transform your legacy systems without the risks of a big bang rewrite? What’s one part of your legacy system you’d “strangle” first—and why? If you could modernize just one workflow with real-time data tomorrow, what would it be?

Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And download my free book about data streaming architectures, use cases and success stories in the retail industry.

The post Replacing Legacy Systems, One Step at a Time with Data Streaming: The Strangler Fig Approach appeared first on Kai Waehner.

When to use Request-Response with Apache Kafka?

Kai Waehner — Fri, 03 Jun 2022 07:35:00 +0000

How can I do request-response communication with Apache Kafka? That’s one of the most common questions I get regularly. This blog post explores when (not) to use this message exchange pattern, the differences between synchronous and asynchronous communication, the pros and cons compared to CQRS and event sourcing, and how to implement request-response within the data streaming infrastructure.

Message Queue Patterns in Data Streaming with Apache Kafka

Before I go into this post, I want to make you aware that this content is part of a blog series about “JMS, Message Queues, and Apache Kafka”:

10 Comparison Criteria for JMS Message Broker vs. Apache Kafka Data Streaming
Alternatives for Error Handling via a Dead Letter Queue (DLQ) in Apache Kafka
THIS POST – Implementing the Request-Reply Pattern with Apache Kafka
UPCOMING – A Decision Tree for Choosing the Right Messaging System (JMS vs. Apache Kafka)
UPCOMING – From JMS Message Broker to Apache Kafka: Integration, Migration, and/or Replacement

I will link the other posts here as soon as they are available. Please follow my newsletter to get updated in real-time abo t new posts. (no spam or ads)

What is the Request-Response (Request-Reply) Message Exchange Pattern?

Request-response (sometimes called request-reply) is one of the primary methods computers use to communicate with each other in a network.

The first application sends a request for some data. The second application responds to the request. It is a message exchange pattern in which a requestor sends a request message to a replier system, which receives and processes the request, ultimately returning a message in response.

Request-reply is inefficient and can suffer a lot of latency depending on the use case. HTTP or better gRPC is suitable for some use cases. Request-reply is “replaced” by the CQRS (Command and Query Responsibility Segregation) pattern with Kafka for streaming data. CQRS is not possible with JMS API, since JMS provides no state capabilities and lacks event sourcing capability. Let’s dig deeper into these statements.

Request-Response (HTTP) vs. Data Streaming (Kafka)

Prior to discussing synchronous and asynchronous communication, let’s explore the concepts behind request-response and data streaming. Traditionally, these are two different paradigms:

Request-response (HTTP):

Typically synchronous
Point to point
High latency (compared to data streaming)
Pre-defined API

Data streaming (Kafka):

Continuous processing
Often asynchronous
Event-driven
Low latency
General-purpose events

Most architectures need request-response for point-to-point communication (e.g., between a server and mobile app) and data streaming for continuous data processing. With this in mind, let’s look at use cases where HTTP is used with Kafka.

Synchronous vs. Asynchronous Communication

The request-response message exchange pattern is often implemented purely synchronously. However, request-response may also be implemented asynchronously, with a response being returned at some unknown later time.

Let’s look at the most prevalent examples of message exchanges: REST, message queues, and data streaming.

Synchronous Restful APIs (HTTP)

A web service is the primary technology behind synchronous communication in application development and enterprise application integration. While WSDL and SOAP were dominant many years ago, REST / HTTP is the communication standard in almost all web services today.

I won’t go into the “HTTP vs. REST” debate in this post. In short, REST (Representational state transfer) has been employed throughout the software industry and is a widely accepted set of guidelines for creating stateless, reliable web APIs. A web API that obeys the REST constraints is informally described as RESTful. RESTful web APIs are typically loosely based on HTTP methods.

Synchronous web service calls over HTTP hold a connection open and wait until the response is delivered or the timeout period expires.

The latency of HTTP web services is relatively high. It requires setting up and tearing down a TCP connection for each request-response iteration when using HTTP. To be clear: The latency is still good enough for many use cases.

Another possible drawback is that the HTTP requests might block waiting for the head of the queue request to be processed and may require HTTP circuit breakers set up on the server if there are too many outstanding HTTP requests.

Asynchronous Message Queue (IBM MQ, RabbitMQ)

The message queue paradigm is a sibling of the publisher/subscriber design pattern and is typically one part of a more extensive message-oriented middleware system. Most messaging systems support both the publisher/subscriber and message queue models in their API, e.g., Java Message Service (JMS). Read the “JMS Message Queue vs. Apache Kafka” article if you are new to this discussion.

Producers and consumers are decoupled from each other and communicate asynchronously. The message queue stores events until they are consumed successfully.

Most message queue middleware products provide built-in request-response APIs. Its communication is asynchronous. The implementation uses correlation IDs.

The request-response API (for example, in JMS) creates a temporary queue or topic that is referenced in the request to be used by the consumer by taking the reply-to endpoint from the request. The ID is used to separate the requests from the single requestor. These queues or topics are also only available while the requestor is alive with a session to the reply.

Such an implementation with a temporary queue or topic does not make sense in Kafka. I have actually seen enterprises trying to do this. Kafka does not work like that. The consequence was way too many partitions and topics in the Kafka cluster. Scalability and performance issues were the consequence.

Asynchronous Data Streaming (Apache Kafka)

Data streaming continuously processes ingested events from data sources. Such data should be processed incrementally using stream processing techniques without having access to all the data.

The asynchronous communication paradigm is like message queues. Contrary to message queues, data streaming provides long-term storage of events and replayability of historical information. The consequence is a true decoupling between producers and consumers. In most Apache Kafka deployments, several producers and consumers with very different communication paradigms and latency capabilities send and read events.

Apache Kafka does not provide request-response APIs built-in. This is not necessarily a bad thing, as some people think. Data streaming provides different design patterns. That’s the main reason for this blog post! Let’s explore trade-offs of the request-response pattern in messaging systems and understand alternative approaches that suit better into a data streaming world. But this post also explores how to implement asynchronous or synchronous request-reply with Kafka.

But keep in mind: Don’t re-use your “legacy knowledge” about HTTP and MQ and try to re-build the same patterns with Apache Kafka. Having said this, request-response is possible with Kafka, too. More on this in the following sections.

Request-Reply vs. CQRS and Event Sourcing

CQRS (Command Query Responsibility Segregation) states that every method should either be a command that performs an action or a query that returns data to the caller, but not both. Services become truly decoupled from each other.

Martin Fowler has a nice diagram for CQRS:

Event sourcing is an architectural pattern in which entities do not track their internal state using direct serialization or object-relational mapping, but by reading and committing events to an event store.

When event sourcing is combined with CQRS and domain-driven design, aggregate roots are responsible for validating and applying commands (often by having their instance methods invoked from a Command Handler) and then publishing events.

With CQRS, the state is updated against every relevant event message. Therefore, the state is always known. Querying the state that is stored in the materialized view (for example, a KTable in Kafka Streams) is efficient. With request-response, the server must calculate or determine the state of every request. With CQRS, it is calculated/updated only once regardless of the number of state queries at the time the relevant occurs.

These principles fit perfectly into the data streaming world. Apache Kafka is a distributed storage that appends incoming events to the immutable commit log. True decoupling and replayability of events are built into the Kafka infrastructure out-of-the-box. Most modern microservice architectures with domain-driven design are built with Apache Kafka, not REST or MQ.

Don’t use Request-Response in Kafka if not needed!

If you build a modern enterprise architecture and new applications, apply the natural design patterns that work best with the technology. Remember: Data streaming is a different technology than web services and message queues! CQRS with event sourcing is the best pattern for most use cases in the Kafka world:

Do not use the request-response concept with Kafka if you do not really have to! Kafka was built for streaming data and true decoupling between various producers and consumers.

This is true even for transactional workloads. A transaction does NOT need synchronous communication. The Kafka API supports mission-critical transactions (although it is asynchronous and distributed by nature). Think about making a bank payment. It is never synchronous, but a complex business process with many independent events within and across organizations.

Synchronous vs. Asynchronous Request-Response with Apache Kafka

After I explained that request-response should not be the first idea when building a new Kafka application, it does not mean it is not possible. And sometimes, it is the better, simpler, or faster approach to solve a problem. Hence, let’s look at examples of synchronous and asynchronous request-response implementations with Kafka.

The request-reply pattern can be implemented with Kafka. But differently. Trying to do it like in a JMS message broker (with temporary queues etc.) will ultimately kill the Kafka cluster (because it works differently). Nevertheless, the used concepts are the same under the hood as in the JMS API, like a correlation ID.

Asynchronous Request-Response with Apache Kafka

The Spring project and its Kafka Spring Boot Kafka Template libraries have a great example of the asynchronous request-reply pattern built with Kafka.

Check out “org.springframework.kafka.requestreply.ReplyingKafkaTemplate“. It creates request/reply applications using the Kafka API easily. The example is interesting since it implements the asynchronous request/reply, which is more complicated to write if you are using, for example, JMS API).

What’s great about Spring is the availability of easy-to-use templates and method signatures. The framework enables using design patterns without custom code to implement them. For instance, here are the two Java methods for request-reply with Kafka:

RequestReplyFuture sendAndReceive(ProducerRecord record);

RequestReplyFuture sendAndReceive(ProducerRecord record, Duration replyTimeout);

The result is a ListenableFuture that is asynchronously populated with the result (or an exception, for a timeout). The result also has a sendFuture property, which is the result of calling KafkaTemplate.send(). You can use this future to determine the result of the send operation.

Synchronous Request-Response with Apache Kafka

Another excellent DZone article talks about synchronous request/reply using Spring Kafka templates. The example shows a Kafka service to calculate the sum of two numbers with synchronous request-response behavior to return the result:

Spring automatically sets a correlation ID in the producer record. This correlation ID is returned as-is by the @SendTo annotation at the consumer end.

Check out the DZone post for the complete code example.

The Spring documentation for Kafka Templates has a lot of details and code examples about the Request/Reply pattern for Kafka. Using Spring, the request/reply pattern is pretty simple to implement with Kafka. If you are not using Spring, you can learn how to do request-reply with Kafka in your framework. That’s the beauty of open-source…

Combination of Data Streaming and REST API

The above examples showed how you can implement the request-response pattern with Apache Kafka. Nevertheless, it is still only the second-best approach and is often an anti-pattern for streaming data.

Data streaming and request-response REST APIs are often combined to get the best out of both worlds. I wrote a dedicated blog post about “Use Cases and Architectures for HTTP and REST APIs with Apache Kafka“.

Apache Kafka and API Management

A very common approach is to implement applications in real-time at scale with the Kafka ecosystem, but then put an API Management layer on top to expose the events as API to the outside world (either another internal business domain or a B2B 3rd party application).

Here is an example of connecting SAP data. SAP has tens of options for integrating with Kafka, including Kafka Connect connectors, REST / HTTP, proprietary API, or 3rd party middleware.

No matter how you get data into the streaming data hub, on the right side, the Kafka REST API is used to expose events via HTTP. An API Management solution handles the security and monetization/billing requirements on top of the Kafka interface:

Read more about this discussion in the blog post “Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?“. It covers the relation between Apache Kafka and API Management platforms like Kong, MuleSoft Anypoint, or Google’s Apigee.

After discussing the relationship between APIs, request-response communication, and Kafka, let’s explore a significant trend in the market: Data Mesh (the buzzword) and stream exchange for real-time data sharing (the problem solver).

Data Mesh is a new architecture paradigm that gets a lot of buzz these days. No single technology is the perfect fit to build a Data Mesh. An open and scalable decentralized real-time platform like Apache Kafka is often the heart of the Data Mesh infrastructure, complemented by many other data platforms to solve business problems.

Stream-native data sharing instead of using request-response and REST APIs in the middle is the natural evolution for many use cases:

Learn more in the post “Streaming Data Exchange with Kafka and a Data Mesh in Motion“.

Use Data Streaming and Request-Response Together!

Most architectures need request-response for point-to-point communication (e.g., between a server and mobile app) and data streaming for continuous data processing.

Synchronous and asynchronous request-response communication can be implemented with Apache Kafka. However, CQRS and event sourcing is the better and more natural approach for data streaming most times. Understand the different options and their trade-offs, and use the right tool (in this case, the correct design pattern) for the job.

What is your strategy for using request-response with data streaming? How do you implement the communication in your Apache Kafka applications? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post When to use Request-Response with Apache Kafka? appeared first on Kai Waehner.

Streaming ETL with Apache Kafka in the Healthcare Industry

Kai Waehner — Fri, 01 Apr 2022 05:47:00 +0000

IT modernization and innovative new technologies change the healthcare industry significantly. This blog series explores how data streaming with Apache Kafka enables real-time data processing and business process automation. Real-world examples show how traditional enterprises and startups increase efficiency, reduce cost, and improve the human experience across the healthcare value chain, including pharma, insurance, providers, retail, and manufacturing. This is part three: Streaming ETL. Examples include Babylon Health and Bayer.

Blog Series – Kafka in Healthcare

Many healthcare companies leverage Kafka today. Use cases exist in every domain across the healthcare value chain. Most companies deploy data streaming in different business domains. Use cases often overlap. I tried to categorize a few real-world deployments into different technical scenarios and added a few real-world examples:

Overview – Data Streaming Use Cases and Architectures for Healthcare (including Slide Deck)
Legacy Modernization and Hybrid Cloud (Optum / UnitedHealth Group, Centene, Bayer)
THIS POST: Streaming ETL (Bayer, Babylon Health)
Real-time Analytics (Cerner, Celmatix, CDC/Centers for Disease Control and Prevention)
Machine Learning and Data Science (Recursion, Humana)
Open API and Omnichannel (Care.com, Invitae)

Stay tuned for a dedicated blog post for each of these topics as part of this blog series. I will link the blogs here as soon as they are available (in the next few weeks). Subscribe to my newsletter to get an email after each publication (no spam or ads).

Streaming ETL with Apache Kafka

Streaming ETL is similar to concepts you might know from traditional ETL tools. I have already explored how data streaming with Kafka differs from data integration tools and iPaaS cloud services. The critical difference is that you leverage a single platform for data integration and processing at scale in real-time. There is no need to combine several platforms to achieve this. The result is a Kappa architecture that enables real-time but also batch workloads with a single integration architecture.

Streaming ETL with Kafka combines different components and features:

Kafka Connect as Kafka-native integration framework
Kafka Connect source and sink connectors to consume and produce data from/to any other database, application, or API
Single Message Transform (SMT) – an optional Kafka Connect feature – to process (filter, change, remove, etc.) incoming or outgoing messages within the connector deployment
Kafka Streams or ksqlDB for continuous data processing in real-time at scale for stateless or stateful ETL jobs
Data governance via schema management, enforcement and versioning using the Schema Registry
Security and access control using features like role-based access control, audit logs, and end-to-end encryption

In the cloud, you can leverage a serverless Kafka offering for the whole Streaming ETL pipeline. Confluent Cloud fully manages Kafka’s end-to-end infrastructure, including connectors, ksqlDB workloads, data governance, and security.

One last general note: Don’t Design for Data at Rest to Reverse it! Learn more here: “When to Use Reverse ETL and when it is an Anti-Pattern“. Instead, use real-time Streaming ETL for Data in Motion and the Kappa architecture from scratch.

Let’s look at a few real-world deployments in the healthcare sector.

Babylon Health – PII and GDRP compliant Security

Babylon Health is a digital-first health service provider and value-based care company that combines an artificial intelligence-powered platform with virtual clinical operations for patients. Patients are connected with health care professionals through its web and mobile application.

Babylon’s mission is to put an accessible and affordable health service in the hands of every person on earth. For that mission, Babylon built an agile microservice architecture with the Kafka ecosystem:

Here are the “wonders of working” in Healthcare for Babylon (= reasons to choose Kafka):

Real-time data processing
Replayability of historical information
Order matters and is ensured with guaranteed ordering
GDPR and data ownership for PII compliant security
Data governance via the schema registry to provide true decoupling and access via many programming languages like Java, Python, and Ruby

Bayer – Data Integration and Processing in R&D

Bayer AG is a German multinational pharmaceutical and life sciences company and one of the largest pharmaceutical companies in the world. They leverage Kafka in various use cases and business domains.

The following scenario is from the research and development department of the pharma business unit. Their focus areas are cardiovascular diseases, oncology, and women’s health. The division employs over 7,500 R&D people and expenses over 2.75 billion euros for R&D.

The use case Bayer presented at a recent Kafka Summit is about analyzing clinical trials, patents, reports, news, and literature leveraging the Kafka ecosystem. The R&D team processes 250 Million documents from 30+ individual data sources. The data includes 7 TB of raw text-rich data with daily updates, additions, and deletions. Algorithms and data evolve. Bayer needs to completely reprocess the data regularly. Various document streams with different formats and schemas flow through several text processing and enrichment steps.

Scalable, reliable Kafka pipelines with Kafka Streams (Java) and Faust (Python) replaced custom, error-prone, non-scalable scripts. Schemas are used as the data interface to ensure data governance. Avro is the first-class citizen data format to enable compression and better throughput.

The true decoupling of Kafka in conjunction with the Schema Registry guarantees interoperability among different components and technologies (java, python, commercial tools, open-source, scientific, proprietary).

Streaming ETL with Kafka for Real-Time Data Integration at any Scale

Think about IoT sensor analytics, cybersecurity, patient communication, insurance, research, and many other domains. Real-time data beats slow data in the healthcare supply chain almost everywhere.

This blog post explored the capabilities of the Apache Kafka Ecosystem for Streaming ETL. Real-world deployments from Babylon Health and Bayer showed how enterprises successfully deploy Kafka for different enterprise architecture use cases.

How do you leverage data streaming with Apache Kafka in the healthcare industry? What architecture does your platform use? Which products do you combine with data streaming? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Streaming ETL with Apache Kafka in the Healthcare Industry appeared first on Kai Waehner.

When to use Apache Camel vs. Apache Kafka?

Kai Waehner — Fri, 28 Jan 2022 06:31:02 +0000

Should I use Apache Camel or Apache Kafka for my next integration project? The question is very valid and comes up regularly. This blog post explores both open-source frameworks and explains the difference between application integration and event streaming. The comparison discusses when to use Kafka or Camel, when to combine them, when not to use them at all. A decision tree shows how you can quickly qualify out one for the other.

The history of application integration and event streaming

My personal history and experience in application integration and event streaming are the following. It shows my background and how I see the integration and data streaming markets.

A discussion that started over a decade ago…

With my background of work in the last decade at Talend, TIBCO, and Confluent, the comparison between Camel and Kafka is very exciting as I have spent a lot of time with both open-source frameworks:

Apache Camel powered Talend ESB. Talend had a visual coding tool to design Camel routes with code generation. Unfortunately, the tool’s primary focus was Talend Data Integration (ETL and batch). The Camel-powered ESB code was integrated, but it was neither perfect nor complete.

TIBCO BusinessWorks competed with Talend ESB while TIBCO StreamBase competed with other stream processing solutions. The Kafka ecosystem came up more and more in conversations with customers.

I posted about “When to use Apache Camel” in 2011 already. In 2012, I did my first talk at an international software conference in the US. The name of the conference? CamelOne! A forum only about Apache Camel. What an exciting time. Claus Ibsen, THE Camel guy, wrote an excellent summary of CamelOne 2012 in Boston.

In my conference summary, I talked about my two talks. One of them covered a comparison between Apache Camel, Spring Integration, and Mulesoft ESB. The presentation has over 35000 views, and the number still goes up today.

… from application integration to event streaming

Over time, the buzzword “big data” came up more and more. I spent some time at Talend and TIBCO to learn new programming concepts such as Map-Reduce and Shuffling, mainly powered by Apache Hadoop and Apache Spark. The big data ecosystem snowballed with tens of frameworks such as Hive, HBase, Pig, and many more.

However, the first people realized that real-time data beats slow data in almost all use cases. The Lambda architecture was invented to separate real-time workloads from batch workloads. Event Streaming was born. Apache Kafka became the de facto standard for data streaming. Like CamelOne a decade ago, Kafka Summit is the one-stop-show for Kafka use cases, architectures, and success stories. Contrary to the small CamelOne, Kafka Summit is a global event with events across the globe, plus online events.

Data in motion with the Kappa architecture replacing Lambda

In 2014, a guy called Jay Kreps (few people knew him) was already questioning the Lambda architecture. Instead, he proposed to provide a single real-time layer to provide data for real-time and batch consumers. The Kappa architecture was born. Today, the Kappa architecture is mainstream, replacing Lambda. Various vendors adopt Kafka in the meantime.

Confluent became the clear leader in the event streaming software category. Confluent Platform is powered by Apache Kafka. The focus is on event streaming. That’s different from most other vendors like Cloudera; they focus on 10-20 frameworks or products and try to combine and integrate them somehow. Today, Confluent Cloud is a complete game-changer providing Apache Kafka and its ecosystem for application integration and stream processing as a serverless cloud offering.

This is where we are today in 2022. Application integration (= Camel) and event streaming (= Kafka) play a critical role in every modern enterprise architecture. Open-source is widely adopted and usually preferred compared to proprietary solutions for various reasons, including avoiding vendor lock-in. That’s true for self-managed and serverless cloud offerings.

Hence, the question arises: Should I use Apache Camel for application integration or Apache Kafka for event streaming? Or both? Or does one solve the other, too? These questions will be answered in the following sections, concluding with a decision tree to help you make the right choice for your project.

Let’s look at the similarities between Camel and Kafka, when to use which framework, when and how to combine them, and when not to use them at all.

Features in Apache Camel AND Apache Kafka

Camel and Kafka have many positive and negative characteristics in common. Hence, it is no surprise that people compare the two frameworks:

Open source under Apache 2.0 license
Vibrant community and adoption in the industry
Mature framework with deployments in enterprises across the globe
Fixing point-to-point spaghetti architectures with a central integration backbone
Open architecture and extensibility with custom functions and connectors
Small and big deployments possible, plus single-node deployments for non-mission-critical use cases
Re-engineered and optimized for cloud-native deployments (container, Kubernetes, cloud)
Connectivity to any technology, API, communication paradigm, and SaaS
Transformation of any data types and formats
Processes transactional and analytical workloads
Domain-specific language (DSL) for message at a time processing, with similar logic such as aggregation, filtering, conditional processing
Relative complex frameworks because of their robust feature set, hence not suitable for solving a minor problem
Not a replacement of a database, data warehouse, or data lake

Beyond the similarities, Kafka and Camel have very different sweet spots built to solve distinct problems. Hence, comparing these two tools is a bit comparison of apples and oranges. Some minor projects might use one or the other to solve the problem, but critical enterprise projects show the differences more quickly.

When to use Apache Camel?

The mission of Camel

Apache Camel is an integration framework. It solves a particular problem: Data integration between different applications, APIs, protocols, and communication paradigms. This concept is often called application integration or enterprise integration. Camel implements the famous Enterprise Integration Patterns (EIP). EIPs are based on messaging principles.

Camel’s strengths

Event-based backbone based on well-known and adopted EIP concepts
Connectivity to almost any API
Integration, processing, and routing of information with an intuitive domain-specific language (DSL) with a focus on integration; providing the ability of composability in a programming context for finer grain control in code for doing conditional logic or transformations/reformatting
Powerful routing capabilities with many built-in EIPs
Many deployment options (standalone, web container, application server, Spring, OSGi, Kubernetes via the Camel K sub-project) – okay, I guess some options are not relevant in this decade anymore
Lightweight alternative to proprietary ETL and ESB tools

Camel’s weaknesses

Only a “routing machine”, i.e., not built for long-term storage (additional cache or storage needed), for that reason, Camel is not the right choice for a central nervous system like Kafka
No stream processing (like you know it from Kafka Streams or Apache Flink)
Limited scalability, not built for massive volumes of data
No powerful visual coding like you know it from proprietary ETL/ESB/iPaaS tools
No serverless cloud offering, with that also not competing with other iPaaS offerings
Red Hat is the only vendor supporting it
Built to be deployed in a single data center or cloud region, not across hybrid or multi-cloud scenarios

The evolution of Apache Camel

Camel is widely adopted and has a strong community. Unfortunately, from a vendor and support perspective, the offerings declined in the last few years. One of the most significant pain points: I still don’t see a serverless cloud offering anywhere today:

Camel TL;DR

Camel is an application integration framework to connect different applications and interfaces. Camel is NOT built for processing data in motion continuously, i.e., stream processing. Hence, it should be compared to ETL and ESB tools, not data streaming technologies like Kafka, Kinesis, or Flink. If you look for a serverless cloud offering, you are out of luck. If you look for vendor support, Red Hat is the only option.

When to use Apache Kafka?

The mission of Kafka

Real-time data beats slow data at any scale. The event streaming platform enables processing data in motion. Kafka is the de facto standard for event streaming, including messaging, data integration, stream processing, and storage. Kafka provides all capabilities in one infrastructure at scale. It is reliable and allows to process analytics and transactional workloads.

Kafka’s strengths

Event-based streaming platform
A unique combination of pub/sub messaging, data processing, data integration, and storage in a single framework
Built for massive volumes of data and extreme scale from the beginning, with that a single framework can be used for transactional (low volume) and analytics (high volume) use cases
True decoupling between producers and consumers because of its storage component makes it the de facto standard for microservice architectures
Guaranteed ordering of events in the distributed commit log
Distributed data processing with fault-tolerance and recoverability built-in
Replayability of events
The de facto standard for event streaming
Built with hybrid and multi-cloud data replication in mind (with included tools like MirrorMaker and separate, more advanced, and more straightforward tools like Confluent Cluster Linking)
Support from many vendors, including Confluent, Cloudera, IBM, Red Hat, Amazon, Microsoft, and many more
Paradigm shift: Built to process data in motion end-to-end from source to one or more sinks

Kafka’s weaknesses

Paradigm shift: Enterprises need to learn and understand the added value of event streaming, a new software category that enables new use cases but also requires different design patterns and operations approaches
No powerful visual coding like you know it from proprietary ETL/ESB/iPaaS tools
Limited out-of-the-box routing capabilities (Kafka Connect SMT or Kafka Streams / ksqlDB app do the job very well, but not as simple as Camel)
Complex operations (if you run it by yourself instead of using 3rd party tools or even better a serverless cloud offering)

The evolution of Apache Kafka

Kafka was built at LinkedIn to process high volumes of data, as no other open-source framework could do this. Kafka found quick adoption after LinkedIn open-sourced it. Several vendors adopted Kafka and added it to their product portfolio. Some vendors just added Kafka for the sake of having it. Others innovated and used additional tools to make Kafka cloud-native for the next generation of event streaming. Kafka as a serverless cloud offering is a critical piece of many modern enterprise architectures today:

Kafka TL;DR

Kafka is an event streaming platform to process data in motion continuously. If you “just” need an integration framework to route data from a source to one or more sinks (= ETL / ESB), then Camel can be used, too. However, Kafka kills two birds with one stone (= integrating data AND processing it in motion where needed).

Plenty of Kafka offerings are available on the market. Check out the Apache Kafka landscape and comparison to understand the differences between offerings from Confluent, Cloudera, IBM, Red Hat, Amazon, Microsoft, and others.

Decision tree – Camel or Kafka?

The above sections explored when to use Camel and Kafka. So far, so good. Nevertheless, both frameworks overlap with their capabilities. Let’s get some help to choose the right one in that case.

Qualify out – the easiest way to start an evaluation!

The easiest way to decide on a specific option is to qualify out other frameworks that cannot fulfill the requirements.

Therefore, do you need

Big data processing?
A storage component for true decoupling and replayability of events?
Stateless or stateful stream processing?
A serverless cloud offering?

The above section discussed these differentiators of Kafka. In all these cases, you can qualify out Camel. It does not fulfill these requirements. These requirements are not necessarily a complete list. And you might also find a few aspects to qualify out Kafka from the beginning. Hence, you could also start from the Camel perspective and ask yourself: When should I not use Kafka. But I think it is easier the other way round.

Qualifying out solutions because of their limitations makes the decision tree and evaluation process much easier from the beginning.

Decision Tree for Camel and Kafka

Here is my decision tree to find out if Camel or Kafka is the right choice and what vendors you could evaluate:

When to use Camel and Kafka together?

It is possible to use Camel and Kafka together in a single integration architecture. Should you do that? Two options exist. One makes more sense than the other:

Kafka for event streaming and Camel for ETL

Camel and Kafka integrate well with each other. The native Kafka component of Camel is the best native integration point as a bridge between both environments:

The above architecture shows how Camel and Kafka live next to each other. Camel is used in a business domain for application integration. Kafka is the central nervous system between the Camel integration application and many other applications. I also added Kong as API Gateway to clarify that Camel or Kafka is not a silver bullet to solve every problem.

Once again, the vast advantage of Kafka as central integration layer is its unique combination of characteristics within a single infrastructure, including:

Real-time messaging at any scale
Storage for true decoupling between different applications and communication paradigms
Built-in backpressure handling and replayability of events
Data integration
Stream processing

Real-time data replication across hybrid and multi-cloud is not shown in the above picture but is also part of the enterprise architecture out-of-the-box leveraging take Kafka protocol.

With true decoupling within modern microservice architecture, each business team can decide whether they need application integration (using Camel) or event streaming (using Kafka). Often, both could be used. Additional questions around single vs. multi frameworks and APIs, vendor support, scalability needs, and other characteristics need to be evaluated to make the right choice for your business problem.

Camel connectors embedded into Kafka Connect

There is another way to combine Kafka and Camel: The “Camel Kafka Connector” sub-project of Apache Camel. Don’t get confused. This feature is not the Kafka component (= connector) of Camel! Instead, it is a relatively new initiative to deploy camel components into the Kafka Connect infrastructure.

The obvious benefit: This way, you get hundreds of new connectors “for free” within the Kafka ecosystem. This capability sounds excellent. And it is!

However, consider the total cost of ownership and the overall efforts using this approach. Application integration is one of the most challenging problems in computer science – especially if you talk about transactional data sets that require zero data loss, exactly-once semantics, and no downtime. The more components you combine in the end-to-end data flow, the harder it gets to keep your performance and reliability SLAs.

Hence, using Camel components within Kafka Connect has a considerable disadvantage: Combining two frameworks with complexities and different design concepts. Just a few examples:

Kafka world: Partitions, Offsets, Leader and Follower, Key/Value/Header, connectors (based on Kafka Connect), Bootstrap Server, ConsumerRecord, Retention Time, etc.
Camel world: Routes, RouteBuilder CamelContext, Exchange, Processor, components (Camel connectors), Endpoints, Type Converters, Registry, etc.

Please think twice before mixing two integration tools that are powerful but complex on their own. Getting this running is just one piece of the puzzle (the simple part). Don’t forget end-to-end testing, resiliency, SLAs, support across technologies and APIs. Even buying support for Camel and Kafka from Red Hat (i.e., a single vendor) does not improve this approach.

It is likely better to take the business logic and API calls out of the Camel component and copy it into a Kafka Connect connector template to run the integration natively with only Kafka code. This workaround allows a clean architecture, end-to-end integration with a single framework, a single vendor behind it, and much easier testing / debugging / monitoring.

TL;DR: I recommend only using the “Camel Kafka Connector” sub-project if the following options do not work:

Use only Apache Camel for application integration
Leverage Apache Kafka for event streaming and application integration
Choose separate deployments of Camel and Kafka and use the Camel-Kafka-Bridge

When NOT to use Camel or Kafka at all?

Once again, the easiest way for your evaluation to start is qualifying out tools that do not work to solve the problem.

Both Camel and Kafka are NOT built for the following scenarios:

A proxy for millions of clients (like mobile apps) – but native proxies (like a REST or MQTT Proxy for Kafka) exist for some use cases.
An API Management platform – but these tools are usually complementary and used to create life cycle management or monetize APIs deployed with Camel or Kafka.
A database for complex queries and batch analytics workloads
an IoT platform with features such as device management – but direct native integration with (some) IoT protocols such as MQTT or OPC-UA is possible and the approach for (some) use cases.
A technology for hard real-time applications such as safety-critical or deterministic systems – but that’s true for any other IT framework, too. Embedded systems are a different software than Camel or Kafka!

I wrote a very detailed post about this topic from a Kafka perspective. It maps almost 1:1 to the Camel world, too (and any related technology such as Flink, Spark, Pulsar, etc.): “When NOT to use Apache Kafka?”

Apache Camel vs. Apache Kafka – Who is the winner?

Simple answer: Both!

When you compare apples and oranges, you might become happy when you are hungry as both are good to eat. The same is true for Camel and Kafka. Both can do application integration. But they serve very different needs.

Many integration scenarios can use Camel or Kafka.

Camel is the right tool if you need to integrate data within an application context or business unit (with no need for stream processing, true decoupling, replayability, large scale, replication across data centers or cloud regions).

Kafka is the central event-based nervous system across business units, regions, and hybrid clouds. Kafka is all about event streaming. Application integration is just a piece of this puzzle. On the other side, I have seen plenty of integration projects powered by Apache Kafka. It is often replacing other middleware. That’s true for ETL/ESB legacy modernization and in discussions about using a cloud-native iPaaS.

Do you use Camel or Kafka today? What use cases? How do you decide which one to choose? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post When to use Apache Camel vs. Apache Kafka? appeared first on Kai Waehner.

Apache Kafka in the Insurance Industry

Kai Waehner — Mon, 07 Jun 2021 12:54:20 +0000

The rise of data in motion in the insurance industry is visible across all lines of business, including life, healthcare, travel, vehicle, and others. Apache Kafka changes how enterprises rethink data. This blog post explores use cases and architectures for event streaming. Real-world examples from Generali, Centene, Humana, and Tesla show innovative insurance-related data integration and stream processing in real-time.

Digital Transformation in the Insurance Industry

Most insurance companies have similar challenges:

Challenging market environments
Stagnating economy
Regulatory pressure
Changes in customer expectations
Proprietary and monolithic legacy applications
Emerging competition from innovative insurtechs
Emerging competition from other verticals that add insurance products

Only a good transformation strategy guarantees a successful future for traditional insurance companies. Nobody wants to become the next Nokia (mobile phone), Kodak (photo camera), or BlockBuster (video rental). If you fail to innovate in time, you are done.

Real-time beats slow data. Automation beats manual processes. The combination of these two game changers creates completely new business models in the insurance industry. Some examples:

Claims processing including review, investigation, adjustment, remittance or denial of the claim
Claim fraud detection by leveraging analytic models trained with historical data
Omnichannel customer interactions including a self-service portal and automated tools like NLP-powered chatbots
Risk prediction based on lab testing, biometric data, claims data, patient-generated health data (depending on the laws of a specific country)

These are just a few examples.

The shift to real-time data processing and automation is key for many other use cases, too. Machine learning and deep learning enable the automation of many manual and error-prone processes like document and text processing.

The Need for Brownfield Integration

Traditional insurance companies usually (have to) start with brownfield integration before building new use cases. The integration of legacy systems with modern application infrastructures and the replication between data centers and public or private cloud-native infrastructures are a key piece of the puzzle.

Common integration scenarios use traditional middleware that is already in place. This includes MQ, ETL, ESB, and API tools. Kafka is complementary to these middleware tools:

More details about this topic are available in the following two posts:

Greenfield Applications at Insurtech Companies

Insurtechs have a huge advantage: They can start greenfield. There is no need to integrate with legacy applications and monolithic architectures. Hence, some traditional insurance companies go the same way. They start from scratch with new applications instead of trying to integrate old and new systems.

This setup has a huge architectural advantage: There is no need for traditional middleware as only modern protocols and APIs need to be integrated. No monolithic and proprietary interfaces such as Cobol, EDI, or SAP BAPI/iDoc exist in this scenario. Kafka makes new applications agile, scalable, and flexible with open interfaces and real-time capabilities.

Here is an example of an event streaming architecture for claim processing and fraud detection with the Kafka ecosystem:

Real-World Deployments of Kafka in the Insurance Industry

Let’s take a look at a few examples of real-world deployments of Kafka in the insurance industry.

Generali – Kafka as Integration Platform

Generali is one of the top ten largest insurance companies in the world. The digital transformation from Generali Switzerland started with Confluent as a strategic integration platform. They started their journey by integrating with hundreds of legacy systems like relational databases. Change Data Capture (CDC) pushes changes into Kafka in real-time. Kafka is the central nervous system and integration platform for the data. Other real-time and batch applications consume the events.

From here, other applications consume the data for further processing. All applications are decoupled from each other. This is one of the unique benefits of Kafka compared to other messaging systems. Real decoupling and domain-driven design (DDD) are not possible with traditional MQ systems or SOAP / REST web services.

Design Principles of Generali’s Cloud-Native Architecture

The key design principles for the next-generation platform at Generali include agility, scalability, cloud-native, governance, data, and event processing. Hence, Generali’s architecture is powered by a cloud-native infrastructure leveraging Kubernetes and Apache Kafka:

The following integration flow shows the scalable microservice architecture of Generali. The streaming ETL process includes data integration and data processing decoupled environments:

Centene – Integration and Data Processing at Scale in Real-Time

Centene is the largest Medicaid and Medicare Managed Care Provider in the US. Their mission statement is “transforming the health of the community, one person at a time”. The healthcare insurer acts as an intermediary for both government-sponsored and privately insured health care programs.

Centene’s key challenge is growth. Many mergers and acquisitions require a scalable and reliable data integration platform. Centene chose Kafka due to the following capabilities:

highly scalable
high autonomy and decoupling
high availability and data resiliency
real-time data transfer
complex stream processing

Centene’s architecture uses Kafka for data integration and orchestration. Legacy databases, MongoDB, and other applications and APIs leverage the data in real-time, batch, and request-response:

Swiss Mobiliar – Decoupling and Orchestration

Swiss Mobiliar (Schweizerische Mobiliar aka Die Mobiliar) is is the oldest private insurer in Switzerland.

Event Streaming with Kafka supports various use cases at Swiss Mobiliar:

Orchestrator application to track the state of a billing process
Kafka as database and Kafka Streams for data processing
Complex stateful aggregations across contracts and re-calculations
Continuous monitoring in real-time

Their architecture shows the decoupling of applications and orchestration of events:

Also, check out the on-demand webinar with Mobiliar and Spoud to learn more about their Kafka usage.

Humana – Real-Time Integration and Analytics

Humana Inc. is a for-profit American health insurance. In 2020, the company ranked 52 on the Fortune 500 list.

Humana leverages Kafka for real-time integration and analytics. They built an interoperability platform to transition from an insurance company with elements of health to truly a health company with elements of insurance.

Here are the key characteristics of their Kafka-based platform:

Consumer-centric
Health plan agnostic
Provider agnostic
Cloud resilient and elastic
Event-driven and real-time

Kafka integrates conversations between the users and the AI platform powered by IBM Watson. The platform captures conversational flows and processes them with natural language processing (NLP) – a deep learning concept.

Some benefits of the platform:

Adoption of open standards
Standardized integration partners
In-workflow integration
Event-driven for real-time patient interactions
Highly scalable

freeyou – Stateful Streaming Analytics

freeyou is an insurtech for vehicle insurance. Streaming analytics for real-time price adjustments powered by Kafka and ksqlDB enable new business models. Their marketing slogan shows how they innovate and differentiate from traditional competitors:

“We make insurance simple. With our car insurance, we make sure that you stay mobile in everyday life – always and everywhere. You can take out the policy online in just a few minutes and manage it easily in your freeyou customer account. And if something should happen to your vehicle, we’ll take care of it quickly and easily.”

A key piece of freeyou’s strategy is a great user experience and automatic price adjustments in real-time in the backend. Obviously, Kafka and its stream processing ecosystem are a perfect fit here.

As discussed above, the huge advantage of an insurtech is the possibility to start from the greenfield. No surprise that freeyou’s architectures leverage cutting-edge design and technology. Kafka and KQL enable streaming analytics within the pricing engine, recalculation modules, and other applications:

Tesla – Carmaker and Utility Company, now also Car Insurer

Everybody knows: Tesla is an automotive company that sells cars, maintenance, and software upgrades.

More and more people know: Tesla is a utility company that sells energy infrastructure, solar panels, and smart home integration.

Almost nobody knows: Tesla is a car insurer for their car fleet (limited to a few regions in the early phase). That is the obvious next step if you already collect all the telemetry data from all your cars on the street.

Tesla has built a Kafka-based data platform infrastructure “to support millions of devices and trillions of data points per day”. Tesla showed an interesting history and evolution of their Kafka usage at a Kafka Summit in 2019:

Tesla’s infrastructure heavily relies on Kafka.

There is no public information about Telsa using Kafka for their specific insurance applications. But at a minimum, the data collection from the cars and parts of the data processing relies on Kafka. Hence, I thought this is a great example to think about innovation in car insurance.

Tesla: “Much Better Feedback Loop”

Elon Musk made clear: “We have a much better feedback loop” instead of being statistical like other insurers. This is a key differentiator!

There is no doubt that many vehicle insurers will use fleet data to calculate insurance quotes and provide better insurance services. For sure, some traditional insurers will partner with vehicle manufacturers and fleet providers. This is similar to smart city development, where several enterprises partner to build new innovative use cases.

Connected vehicles and V2X (Vehicle to X) integrations are the starting point for many new business models. No surprise: Kafka plays a key role in the connected vehicles space (not just for Tesla).

Many benefits are created by a real-time integration pipeline:

Shift from human experts to automation driven by big data and machine learning
Real-time telematics data from all its drivers’ behavior and the performance of its vehicle technology (cameras, sensors, …)
Better risk estimation of accidents and repair costs of vehicles
Evaluation of risk reduction through new technologies (autopilot, stability control, anti-theft systems, bullet-resistant steel)

For these reasons, event streaming should be a strategic component of any next-generation insurance platform.

Slide Deck: Kafka in the Insurance Industry

The following slide deck covers the above discussion in more detail:

Kafka Changes How to Think Insurance

Apache Kafka changes how enterprises rethink data in the insurance industry. This includes brownfield data integration scenarios and greenfield cutting-edge applications. The success stories from traditional insurance companies such as Generali and insurtechs such as freeyou prove that Kafka is the right choice everywhere.

Kafka and its ecosystem enable data processing at scale in real-time. Real decoupling allows the integration between monolith legacy systems and modern cloud-native infrastructure. Kafka runs everywhere, from edge deployments to multi-cloud scenarios.

What are your experiences and plans for low latency use cases? What use case and architecture did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka in the Insurance Industry appeared first on Kai Waehner.

Apache Kafka and MQTT (Part 4 of 5) – Mobility Services and Transportation

Kai Waehner — Thu, 25 Mar 2021 10:48:00 +0000

Apache Kafka and MQTT are a perfect combination for many IoT use cases. This blog series covers the pros and cons of both technologies. Various use cases across industries, including connected vehicles, manufacturing, mobility services, and smart city are explored. The examples use different architectures, including lightweight edge scenarios, hybrid integrations, and serverless cloud solutions. This post is part four: Mobility Services and Transportation.

Apache Kafka + MQTT Blog Series

The first blog post explores the relation between MQTT and Apache Kafka. Afterward, the other four blog posts discuss various use cases, architectures, and reference deployments.

Part 1 – Overview: Relation between Kafka and MQTT, pros and cons, architectures
Part 2 – Connected Vehicles: MQTT and Kafka in a private cloud on Kubernetes; use case: remote control and command of a car
Part 3 – Manufacturing: MQTT and Kafka at the edge in a smart factory; use case: Bidirectional OT-IT integration with Sparkplug between PLCs, IoT Gateways, Data Historian, MES, ERP, Data Lake, etc.
Part 4 – Mobility Services (THIS POST): MQTT and Kafka leveraging serverless cloud infrastructure; use case: Traffic jam prediction service using machine learning
Part 5 – Smart City: MQTT at the edge connected to fully-managed Kafka in the public cloud; use case: Intelligent traffic routing by combining and correlating 3rd party services

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts as soon as published.

Use Case: Mobility as a Service (MaaS) and Transportation

Transportation is changing significantly these days. Mobility Services – often called Mobility-as-a-Service (MaaS) – is a type of service that through a joint digital channel enables users to plan, book, and pay for multiple types of mobility services.

The concept describes a shift away from personally-owned modes of transportation and towards mobility provided as a service. This is enabled by combining transportation services from public and private transportation providers through a unified gateway that creates and manages the trip, which users can pay for with a single account. Users can pay per trip or a monthly fee for a limited distance. The key concept behind MaaS is to offer travelers mobility solutions based on their travel needs. Specialist urban mobility applications are also expanding their offerings to enable MaaS, such as Transit, Uber, and Lyft.

Travel planning typically begins in a journey planner. For example, a trip planner can show that the user can get from one destination to another by using a train/bus combination. The user can then choose their preferred trip based on cost, time, and convenience. At that point, any necessary bookings (e.g. calling a taxi, reserving a seat on a long-distance train) would be performed as a unit. It is expected that this service should allow roaming, that is, the same end-user app should work in different cities, without the user needing to become familiar with a new app or to sign up for new services.

As you can hopefully already imagine, plenty of innovative new use cases are possible by combining Apache Kafka and MQTT for MaaS. And most of these scenarios require data integration and data processing at scale in real-time.

Architecture: MQTT and Kafka for Mobility Services

Mobility services are often separated from other core IT infrastructure. MaaS – as the term says – is just consumed as a service. Hence, most mobility services I have seen run in the cloud. The following diagram shows an intelligent navigation service built with MQTT, Kafka, and Machine Learning:

A few notes on the architecture:

As mobility services connect to moving vehicles, smartphones, or other things, the cloud is perfect. No need to operate the infrastructure. Just focus on building applications.
Many mobility services integrate other 1st or 3rd party services. For instance, there is no need to build yet another mapping service. If you need one for building your innovative new application, just embed HERE Technologies (that actually provides a public Kafka interface as the preferred integration option instead of HTTP!) or any other available mapping service.
Regional services with low latency are often very relevant for mobility services. Hence, multiple MQTT and Kafka clusters are the norm, not an exception.

Let’s take a look at some real-world examples for cutting-edge mobility services in the transportation industry.

Example: Cloud Ecosystem for Next-Generation Mobility @ ZF

ZF Friedrichshafen AG is a global automotive supplier that enables vehicles to see, think, and act. With a broad range of systems for passenger cars, commercial vehicles, and industrial technology, ZF offers comprehensive solutions for established vehicle manufacturers as well as newly emerging transport and mobility service providers.

ZFs Connectivity Suite enables new business models for mobility as a service (MaaS) and transportation as a service (TaaS). The ProCV gateway device allows each vehicle to communicate using MQTT. The gateway provides a secure and reliable channel for transferring telemetry data from the car to the cloud and remote commands from the cloud to each vehicle.

Applications can exchange data such as real-time positioning information, remote commands to the vehicle, and vehicle-generated alerts. Some possible use cases:

Remote diagnostics for technical insight & management of vehicle performance
Fleet monitoring
Secure & reliable middleware between connected vehicles & cloud services

Read the case study from HiveMQ for more details about ZF’s IoT gateway.

Example: Real-Time Traveler Information @ Deutsche Bahn

Deutsche Bahn (the German railway) has a very complex network of short-distance and long-distance trains. Hence, delays and cancellations are common, not an exception. Hence, at least the traveler information should work well to send real-time notifications to customers.

For that reason, Deutsche Bahn has built a single source of truth traveler information platform with Confluent:

The mobility service integrates via Kafka with many legacy and modern applications. The mobile app shows real-time status updates about each train. While train delays and cancellations cannot be avoided completely, the app at least allows you to get to a lounge or grab a coffee if the delay is more than just a few minutes. I use the app every week myself and can confirm that the customer experience improved significantly.

Not every interface is or will be real-time. Kafka helps!

Fun fact: The first proof of concept to build this traveler information app used a traditional messaging queue. In theory, this is is sufficient as you “just” need to send status updates in real-time to the mobile app. Unfortunately, a few issues came up quickly:

Not every interface is real-time! In addition to messaging sources such as JMS or MQTT, the platform needed to integrate with databases, files, web services, and other legacy systems. Hence, data integration and is a key piece of the puzzle.
Data storage is important to handle backpressure and decouple applications. Slow consumers fall behind. Analytics workloads take data in batches, not in real-time. Web applications consume specific events via request-response queries.
Sending events from A to B is just part of the problem! The real added value comes by correlating the data from streaming and non-streaming applications and databases in real-time. Kafka-native Stream processing frameworks such as Kafka Streams or ksqlDB help to process data in motion.

For the above reasons, Deutsche Bahn re-started their proof of concept. Their existing project used different frameworks for messaging, caching, integration, and processing. This setup was replaced with Kafka. Kafka Connect integrates applications and databases. Kafka Streams processes the data in motion. The Kafka storage handles backpressure and slow consumers. All of this is built into Kafka out-of-the-box. And it scales much better. Today, the traveler information system is live and creates a much better customer experience.

Find more details about the traveler information system from Deutsche Bahn in their Confluent blog post.

Kafka + MQTT = Mobility Services and Transportation

In conclusion, Apache Kafka and MQTT are a perfect combination for mobility services and transportation. Follow the blog series to learn about use cases such as connected vehicles, manufacturing, mobility services, and smart city. Every blog post also includes real-world deployments from companies across industries. It is key to understand the different architectural options to make the right choice for your project.

What are your experiences and plans in IoT projects? What use case and architecture did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka and MQTT (Part 4 of 5) – Mobility Services and Transportation appeared first on Kai Waehner.

Apache Kafka for the Connected World – Vehicles, Factories, Cities, Digital Services

Kai Waehner — Mon, 01 Mar 2021 11:30:05 +0000

The digital transformation enables a connected world. People, vehicles, factories, cities, digital services, and other “things” communicate with each other in real-time to provide a safe environment, efficient processes, and a fantastic user experience. This scenario only works well with data processing in real-time at scale. This blog post shares a presentation that explains why Apache Kafka plays a key role in these industries and use cases but also to connect the different stakeholders.

Software is Changing and Connecting the World

Event Streaming with Apache Kafka plays a massive role in processing massive volumes of data in real-time in a reliable, scalable, and flexible way integrating with various legacy and modern data sources and sinks.

I want to give you an overview of existing use cases for event streaming technology in a connected world across supply chains, industries, and customer experiences that come along with these interdisciplinary data intersections:

The Automotive Industry (and it’s not only Connected Cars)
Mobility Services across verticals (transportation, logistics, travel industry, retailing, …)
Smart Cities (including citizen health services, communication infrastructure, …)
Technology Providers (including cloud hyperscaler, software vendors, telco infrastructure, …)

A Connected World with MQ, ETL, ESB, and Kafka

All these industries and sectors do not have new characteristics and requirements. They require data integration, data correlation, and real decoupling. The difference is the massively increased volumes of data.

Real-time messaging solutions have existed for many years. Hundreds of platforms exist for data integration (including ETL and ESB tooling or specific IIoT platforms). Proprietary monoliths monitor plants, telco networks, and other infrastructures for decades in real-time. But now, Kafka combines all the above characteristics in an open, scalable, and flexible infrastructure to operate mission-critical workloads at scale in real-time. And is taking over the world of connecting data.

“Apache vs. MQ/ETL/ESB” goes into more detail about this discussion.

Streaming Data Exchange with Apache Kafka

Before we jump into the presentation, I want to cover one key trend I see across industries: A streaming data exchange with Apache Kafka:

TL;DR: If you use event streaming with Kafka in your projects (for reasons like real-time processing, scalability, decoupling), and your partner does the same, well, then it does NOT make sense to put a REST / HTTP API in the middle. Instead, the partners should be integrated in a streaming way.

APIs and API Management still have their value for some use cases, of course. Check out the comparison of “Event Streaming with Apache Kafka vs. API Gateway / API Management with Mulesoft or Kong” for more details.

Slide Deck

Here is the slide deck covering various use cases and architectures to realize a connected world with Apache Kafka from different perspectives:

On-Demand Video Recording

The on-demand video recording walks you through the above presentation:

Apache Kafka for the Connected World

Connecting the world is a key requirement across industries. Many innovative digital services are only possible through collaboration between stakeholders. Real-time messaging, integration, continuous stream processing, and replication between partners are required. Event Streaming with Apache Kafka helps with the implementation of these use cases.

What are your experiences and plans for event streaming to connect the world? Did you already build applications with Apache Kafka to connect your products and services to partners? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka for the Connected World – Vehicles, Factories, Cities, Digital Services appeared first on Kai Waehner.

Infrastructure Checklist for Apache Kafka at the Edge

Kai Waehner — Wed, 03 Feb 2021 12:39:30 +0000

Event streaming with Apache Kafka at the edge is getting more and more traction these days. It is a common approach to providing the same open, flexible, and scalable architecture in the cloud and at the edge outside the data center. Possible locations for Kafka edge deployments include retail stores, cell towers, trains, small factories, restaurants, hospitals, stadiums, etc. This post explores a checklist with infrastructure questions you need to check and evaluate if you want to deploy Kafka at the edge.

Apache Kafka at the Edge == Outside the Data Center

I already discussed the concepts and architectures of Kafka at the edge in detail in the past:

This blog post explores a checklist of common infrastructure questions you need to answer and doublecheck before planning to deploy Kafka at the edge.

What is the Edge?

The term ‘edge’ needs to be defined to have the same understanding. When I talk about the edge in the context of Kafka, it means:

Edge is NOT a data center, i.e., limited compute, storage, network bandwidth
Kafka clients AND the Kafka broker(s) deployed here, not just the client applications
Offline business continuity, i.e., the workloads continue to work even if there is no connection to the cloud
Often 100+ locations, like restaurants, coffee shops, or retail stores, or even embedded into 1000s of devices or machines
Low-footprint and low-touch, i.e., Kafka can run as a normal highly available cluster or as a single broker (no cluster, no high availability); often shipped “as a preconfigured box” in OEM hardware (e.g., Hivecell)
Hybrid integration, i.e., most use cases require uni- or bidirectional communication with a remote Kafka cluster in a data center or the cloud

Let’s recap one architecture example that deploys Kafka in the cloud and at the edge: A hybrid event streaming architecture for real-time omnichannel retail and customer 360:

This definition of a ‘Kafka edge deployment‘ can also be summarized as an ‘autonomous edge‘ or ‘disconnected edge‘. On the other side, the ‘connected edge’ means that Kafka clients at the edge connect directly to a remote data center or cloud.

Infrastructure Checklist: How to Deploy Apache Kafka at the Edge?

I talked to 100+ customers and prospects across industries with the need to do edge computing for different reasons, including bad internet connection, reduced cost, low latency requirements, and security implications.

The following discussion points and questions come up all the time. Make sure to discuss them with your project team:

What are the use cases for Kafka at the edge? For instance, edge processing (e.g., business logic/analytics), replication to the cloud (uni- or bi-directional), data integration (e.g., 0 to devices, IoT gateways, local databases)?
What is the data model, and what the replication scenarios and SLAs (aggregation to “just gather data”, command&control to send data back to the edge, local analytics, etc.)? Check out Kafka-native replication tools, especially MirrorMaker 2 and Confluent’s Cluster Linking.
What is the main motivation for doing edge processing (vs. ingestion into a DC/cloud for all processing)? Examples: Low latency requirements, cost-efficiency, business continuity even when offline / disconnected from the cloud, etc.
How many “edge sites” do you plan to deploy to (e.g., retail stores, factories, restaurants, trains, …)? This needs to be considered from the beginning. If you want to roll out edge computing to thousands of restaurants, you need a different hardware and automation strategy than deploying to just ten smart factories worldwide.
What hardware do you use at the edge (e.g., hardware specifications)? How much memory, disk, CPU, etc., is available? Do you work with a specific hardware vendor? What are the support model and monitoring setup for the edge computers?
What network do you use? Is it stable? What is the connection to the cloud? If it is a stable connection (like AWS DirectConnect or Azure ExpressRoute), do you still need Kafka at the edge?
What is the infrastructure you plan to run Kafka on at the edge (e.g., operating system, container, Kubernetes, etc.)?
Do you need high availability and a ‘real’ Kafka cluster with 3+ brokers? Or is a single broker good enough? In many cases, the latter is good enough to decouple edge and cloud, handle backpressure, and enable business continuity even if the internet connection is gone for some time.
What edge protocols do you need to integrate with? is Kafka Connect sufficient with its connectors, or do you need a 3rd party IoT gateway? Common integration points at the edge are OPC UA, MQTT, proprietary PLC, traditional relational databases, files, IoT Gateways, etc.
Do you need to process the data at the edge? Kafka-native stream processing with Kafka Streams or ksqlDB is usually a straightforward and lightweight, but still scalable and reliable option. Almost all use cases I have seen at least need some streaming ETL at the edge. For instance, preprocess and filter data so that you only send relevant, aggregated data over the network to the cloud. However, many customers also deploy business applications at the edge, for instance, for real-time model inference.

How will fleet management work? Which part of the infrastructure or tool handles the management and operations of the edge machines. In most cases, this is not specific for Kafka but instead handled on the infrastructure level. For instance, if you run a Kubernetes cluster, Rancher might be used to provision and manage the edge clusters, including the Kafka ecosystem. Of course, specific Kafka metrics are also integrated here, for instance via Prometheus if you are using Kubernetes.

Discussing and answering these questions will help you with your planning for Kafka at the edge. Are there any key questions missing? Please let me know and I will update the list.

Kafka at the Edge is the new Black!

Apache Kafka at the edge is a common approach to providing the same open, flexible, and scalable architecture in the cloud and outside the data center. A huge benefit is that the same technology and architecture and be deployed everywhere across regions, sites, and clouds. This is a real hybrid architecture combing edge sites, data centers, and multiple clouds! Discuss the above infrastructure checklist with your team to be successful.

What are your experiences and plans for event streaming with Apache Kafka at the edge? Did you already deploy Apache Kafka on a small node somewhere, maybe even as a single broker setup? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Infrastructure Checklist for Apache Kafka at the Edge appeared first on Kai Waehner.

Use Cases and Architectures for Kafka at the Edge

Kai Waehner — Wed, 14 Oct 2020 12:54:32 +0000

Event streaming with Apache Kafka at the edge is not cutting edge anymore. It is a common approach to providing the same open, flexible, and scalable architecture at the edge as in the cloud or data center. Possible locations for a Kafka edge deployment include retail stores, cell towers, trains, small factories, restaurants, etc. I already discussed the concepts and architectures in detail in the past: “Apache Kafka is the New Black at the Edge” and “Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments“. This blog post is an add-on focusing on use cases across industries for Kafka at the edge.

To be clear before you read on: Edge is NOT a data center.

And “Edge Kafka” is not simply yet another IoT project using Kafka in a remote location. Edge Kafka is actually an essential component of a streaming nervous system that spans IoT (or OT in Industrial IoT) and non-IoT (traditional data-center / cloud infrastructures).

The post’s focus is scenarios where the Kafka clients AND the Kafka brokers are running on the edge. This enables edge processing, integration, decoupling, low latency, and cost-efficient data processing.

Categories and Architectures for Kafka at the Edge

Some IoT projects are built like “normal Kafka projects”, i.e., built in the (edge) data center or cloud. For instance, bigger factories can provide infrastructure to deploy a reliable Kafka cluster with stable network connectivity to the cloud. Unfortunately, many IoT projects require real edge capabilities.

What’s different at the edge?

Offline business continuity is important even if the connection to the central data center or cloud is not available. Disconnected / offline sites do often not require or provide high availability (because it is not worth the efforts): Local pre-processing, real-time analytics with low latency, only online (i.e., connection to the data center or cloud) from time to time or with low bandwidth.
Often these projects need to deploy Kafka brokers across hundreds of locations. A single broker is often good enough, without high availability, but for back pressure and local processing. Use cases exist across industries, including retail stores, trains, restaurants, cell towers, small factories, etc.
Low-footprint, low-touch, little-or-no-DevOps-required installations of Kafka brokers (not just clients) are mandatory for many of these use-cases. In these cases, no IT experts are available “on-site” to operate Kafka. Hence, using certified OEM hardware is a great option to install and operate Kafka at the edge.
Many edge use cases are all around sensor and telemetry data. This is not transactional data where every single message counts. An application that processes millions of messages per second is fine with losing a few of the messages as it does not affect the outcome of the calculation.
Hybrid and not cloud-only: Consumer IoT (CIoT) always includes the users in their smart home, ride-share, retail store, etc.), Industrial IoT (IIoT) always includes tangible good (cars, food, energy, …)
Thousands and tens of thousands of connected interfaces: Sensors, machines, mobile devices, etc.

Using one single technical infrastructure enables building edge and hybrid architectures. No need for a ton of different frameworks and products is required. This is a huge benefit from a development, testing, operations, support point of view!

Use Cases Across Industries

Industries for Kafka at the Edge include manufacturing, pharma, carmakers, telecommunications, retailing, energy, restaurants, gaming, healthcare, public sector, aerospace, transportation, and others.

Architectures and use cases include data integration, pre-processing and replication to the cloud, big and small data edge processing, and analytics, disconnected offline scenarios, very low footprint scenarios with hundreds of locations, scenarios without the high-availability, and others.

Scenarios for Edge Computing with Kafka

Various examples for Kafka deployments at the edge exist. Almost all of these use cases are related to several of the above categories and requirements, such as low hardware footprint, disconnected offline processing, hundred of locations, and hybrid architectures.

I have worked with enterprises across industries and the globe on the following scenarios:

Public Sector: Local administration in each city, smart city projects incl. public transportation, traffic management, integration of various connected car platforms from different carmakers, cybersecurity (including IoT use cases such as capturing and processing camera images)
Transportation / Logistics / Railway / Aviation: Track&Trace, Kafka in the trains for offline and local processing / storage, traveller information (delayed or canceled flight / train / bus), real-time loyalty platforms (class upgrade, lounge access)
Manufacturing (Automotive, Aerospace, Semiconductors, Chemical, Food, and others): IoT aftermarket customer services, OEM in machines and vehicles, embedding into standard software such as ERP or MES systems, cybersecurity, a digital twin of devices/machines/production lines/processes, production line monitoring in factories for predictive maintenance/quality control/production efficiency, operations dashboards and line wellness (on-site for the plant manager, and aggregated global KPIs for executive management), track&trace and geofencing on the shop floor
Energy / Utility / Oil&Gas: Smart home, smart buildings, smart meters, monitoring of remote machines (e.g., for drilling, windmills, mining), pipeline and refinery operations (e.g., predictive failure or anomaly detection)
Telecommunications / Media: OSS real-time monitoring/problem analysis/metrics reporting/root cause analysis/action response of the network devices and infrastructure (routers, switches, other network devices), BSS customer experience and OTT services (mobile app integration for millions of users), 5G edge (e.g., street sensors)
Healthcare: Track&trace in the hospital, remote monitoring, machine sensor analytics
Retailing / Food / Restaurants / Banking: Customer communication, cross-/up-selling, loyalty system, payments in retail stores, perpetual inventory, Point-of-Sale (PoS) integration for (local) payments and (remote) CRM integration, EFTPOS (Electronic funds transfer at point of sale)

A great practical example of edge computing in retailing is fast-food chain Chick-fil-A. They deployed a Kubernetes cluster in each of their 2000 restaurants for real-time analytics at the edge without an internet connection. The hardware is pretty small and provides an Intel quadcore processor with 8 GB RAM and SSD:

Example Architecture: Kafka in Transportation and Logistics

Let’s make this “Kafka at the edge” thing more clear with a specific example. In this case, I use the railway and transportation industry. But this can easily be mapped to your industry and use case.

The following example shows an edge and hybrid solution for railways to improve the customer experience and increase the revenue of the railway company. It leverages offline edge processing for customer communication, replication to the cloud for analytics, and integration with 3rd party interfaces and APIs from partners.

Hybrid Architecture – From Edge to Cloud

Local processing at the edge is happening on the train. But each train also replicates relevant data in real-time to the cloud – if there are internet connectivity and free network resources. If the train is not online, Kafka is handling the backpressure and replicating to the cloud when online again:

Event Streaming in the Train with Kafka

Kafka on the train is NOT just used for real-time messaging and handling backpressure. These are already great reasons for using Kafka at the edge. Still, the even bigger value is created when Kafka is also used for data integration (restaurant, traveler information, loyalty system, etc.) and data processing (up-/cross-selling, real-time delay information, etc.) at the edge. This way, only one single platform is required to solve all the different problems:

Kafka for Disconnected / Offline Scenarios

Trains (and many other edge locations) are offline regularly. For instance, a train drives through a tunnel or reaches an area with no cell connectivity. Local processing is still possible. Business continuity is the key to improve customer experience and increase sales processes – even if the train disconnected from the internet. Passengers can still use the mobile app to see traveler information, buy food in the restaurant, or watch movies stores on the train’s local server. As soon as the train has internet connectivity again, the purchases from passengers are transferred to the loyalty system in the cloud, the latest delay information is consumed from the cloud and stored on the edge Kafka broker in the train, etc. etc. etc.:

Cross-Company Kafka Integration

Data processing does not stop with the hybrid integration between the edge (train) and cloud (CRM, loyalty system, etc.). Different divisions or partner companies need to integrate, too. Instead of using non-scalable, synchronous REST API calls / API Management for partner integration, streaming replication with Kafka-native technologies is a much better, scalable approach:

I hope this story about Kafka at the edge helped you better understand how you can leverage event streaming in your industry and use cases to build an end-to-end streaming infrastructure from edge to cloud.

Infrastructure and Hardware Requirement for Deployment of Kafka at the Edge

Finally, it is important to discuss how to deploy Kafka at the edge. To be clear: Kafka still needs some computing power.

Obviously, this depends on many factors: The hardware vendors and infrastructure you are working with, specific SLAs and HA requirements, and so on. The good news is that Kafka can be deployed in many infrastructures, including bare metal, VMs, containers, Kubernetes, etc. The other good news is that new hardware for computing resources (even for the “edge”) typically has 4, 8, or even 16GB RAM because this is the smallest chip vendors produce these days (for these environments such as small factories, retail stores, etc).

Minimum hardware requirements for running a very small footprint Kafka are a single-core processor and a few 100MB RAM. This already allows decent edge processing with 100+Mb/sec throughput on a single Kafka node (with replication factor = 1). However, real values depend on the number of partitions, message size, network speed, and other characteristics. Don’t expect the same performance and scalability as in the data center or cloud!

Thus, you can deploy a Kafka broker on a Raspberry Pi, but not on some small embedded device! The latter is where the Kafka clients can run.

Check out the “Infrastructure Checklist for Apache Kafka at the Edge” if you plan to go that direction!

“The Confluent Way” to Deploy Kafka at the Edge

From a technical perspective, deployment of Kafka at the edge is the same as in a data center or cloud. However, the environment and requirements are a little bit different as we learned above. Some additional features definitely help with deploying and operating Kafka at the edge.

I work for Confluent. Hence, I provide you “The Confluent Way” of deploying Kafka at the edge in your future projects, including innovative, differentiating features:

Confluent Server includes the Kafka Broker and various enhancements such as self-balancing clusters, Tiered Storage, embedded REST API, server-side schema validation, and much more. It can be deployed as a single node (very lightweight, but no high availability) or cluster (for mission-critical workloads that require high availability at the edge).
In 2020, ZooKeeper is still required as an additional component. However, ZooKeeper removal is coming in 2021! Many Confluent engineers work full-time on removing the (ugly) Zookeeper dependency from the Kafka project, which means it will be possible to run a standalone, single process edge solution powered by Kafka. The whole Kafka community is looking forward to this architectural change. You can already run Kafka at the edge today, of course. It works well with ZooKeeper, too.
Cluster Linking allows all these small Kafka edge sites to connect to a bigger Kafka cluster in a data center or cloud using the Kafka protocol. No need to use additional tools and infrastructure such as Confluent Replicator or MirrorMaker. This significantly reduces development efforts and infrastructure costs.
Monitoring and Proactive Support with Confluent’s toolset. For instance, one Control Center can monitor several different (remote) Kafka clusters. Confluent Telemetry Reporter collects data from edge sites. Monitoring includes the technical infrastructure, but also for applications and end-to-end integration.
Kubernetes is becoming the edge orchestration platform of choice for many edge deployments. For instance, using the MicroK8s Kubernetes distribution. Confluent Operator (Custom Resource Definitions, CRD) provides the capabilities to provision and operate single-broker or cluster deployments at the edge. This includes rolling upgrades, security automation, and much more. Confluent Operator is battle-tested in Confluent Cloud and large deployments of Confluent Platform in data centers, already. Chick-fil-A has a great success story about running Kubernetes in their 2000+ fast food stores to provide disconnected edge services and low latency. Kubernetes is clearly optional. Only use it when it really adds value, and not too complex and resource-hungry. Many edge deployments will probably disqualify Kubernetes as too heavyweight and complex.
Data Integration and data processing is key at the edge. Confluent provides connectors to legacy and modern systems (including PLC4X for PLC, OPC-UA, and IIoT integration, database CDC connectors, MQ and MQTT integration, Data Diode connectors for uni-directional UDP networks in high security or dirty environments, cloud connectors, and much more). Kafka Streams and ksqlDB allow lightweight but powerful stream processing without the need for yet another technology.
Lightweight edge clients (e.g., embedded devices) leverage the C or C++ client APIs for Kafka and the REST Proxy to communicate from any programming language via HTTP(S). This is important for many low-power edge devices with limited computing power where Java or similar “resource-hungry technologies” cannot be deployed.
Optionally, certified, pre-configured OEM hardware is available. You put the box at the edge, connect it to LAN or Wifi, and use it. That’s it. Management and monitoring of the infrastructure occur via the remote software of the hardware vendor. “Feasible video processing on Hivecell” demonstrates edge analytics with a Kafka cluster, a Kafka Streams application, and embedded machine learning models for image recognition in real-time.

Kafka at the Edge (and in Hybrid Architectures) is the New Black

Kafka is a great solution for the edge. It enables deploying the same open, scalable, and reliable technology at the edge, data center, and the cloud. This is relevant across industries. Kafka is used in more and more places where nobody has seen it before. Edge sites include retail stores, restaurants, cell towers, trains, and many others. I hope the various use cases and architectures inspired you a little bit.

What are your experiences at the edge? What are your use cases? Did you or do you plan to use Apache Kafka and its ecosystem? What is your strategy? Let’s connect on LinkedIn and discuss it!

The post Use Cases and Architectures for Kafka at the Edge appeared first on Kai Waehner.

Mainframe Integration, Offloading and Replacement with Apache Kafka

Kai Waehner — Fri, 24 Apr 2020 14:55:10 +0000

Time to get more innovative; even with the mainframe! This blog post covers the steps I have seen in projects where enterprises started offloading data from the mainframe to Apache Kafka with the final goal of replacing the old legacy systems.

“Mainframes are still hard at work, processing over 70 percent of the world’s most important computing transactions every day. Organizations like banks, credit card companies, medical facilities, stock brokerages, and others that can absolutely not afford downtime and errors depend on the mainframe to get the job done. Nearly three-quarters of all Fortune 500 companies still turn to the mainframe to get the critical processing work completed” (BMC).

Cost, monolithic architectures and missing experts are the key challenges for mainframe applications. Mainframes are used in several industries, but I want to start this post by thinking about the current situation at banks in Germany; to understand the motivation why so many enterprises ask for help with mainframe offloading and replacement…

Finance Industry in 2020 in Germany

Germany is where I live, so I get the most news from the local newspapers. But the situation is very similar in other countries across the world…

Here is the current situation in Germany.

Challenges for Traditional Banks

Traditional banks are in trouble. More and more branches are getting closed every year.
Financial numbers are pretty bad. Market capitalization went down significantly (before Corona!). Deutsche Bank and Commerzbank – former flagships of the German finance industry – are in the news every week. Usually for bad news.
Commerzbank had to leave the DAX in 2019 (the blue chip stock market index consisting of the 30 major German companies trading on the Frankfurt Stock Exchange); replaced by Wirecard, a modern global internet technology and financial services provider offering its customers electronic payment transaction services and risk management. UPDATE Q3 2020: Wirecard is now insolvent due to the Wirecard scandal. A really horrible scandal for the German financial market and government.
Traditional banks talk a lot about creating a “cutting edge” blockchain consortium and distributed ledger to improve their processes in the future and to be “innovative”. For example, some banks plan to improve the ‘Know Your Customer’ (KYC) process with a joint distributed ledger to reduce costs. Unfortunately, this is years away from being implemented; and not clear if it makes sense and adds real business value. Blockchain has still not proven its added value in most scenarios.

Neobanks and FinTechs Emerging

Neobanks like Revolut or N26 provide an innovative mobile app and great customer experience, including a great KYC process and experience (without the need for a blockchain). In my personal International bank transactions typically take less than 24 hours. In contrary, the mobile app and website of my private German bank is a real pain in the ass. And it feels like getting worse instead of better. To be fair: Neobanks do not shine with customers service if there are problems (like fraud; they sometimes simply freeze your account and do not provide good support).
International Fintechs are also arriving in Germany to change the game. Paypal is the normal everywhere, already. German initiatives for a “German Paypal alternative” failed. Local retail stores already accept Apple Pay (the local banks are “forced” to support it in the meantime). Even the Chinese service Alipay is supported by more and more shops.

Traditional banks should be concerned. The same is true for other industries, of course. However, as said above, many enterprises still rely heavily on the 50+ year old mainframe while competing with innovative competitors. It feels like these companies relying on mainframes have a harder job to change than let’s say airlines or the retail industry.

Digital Transformation with(out) the Mainframe

I had a meeting with an insurance company a few months ago. The team presented me an internal paper quoting their CEO: “This has to be the last 20M mainframe contract with IBM! In five years, I expect to not renew this.” That’s what I call ‘clear expectations and goals’ for the IT team…

Why does Everybody Want to Get Rid of the Mainframe?

Many large, established companies, especially those in the financial services and insurance space still rely on mainframes for their most critical applications and data. Along with reliability, mainframes come with high operational costs, since they are traditionally charged by MIPS (million instructions per second). Reducing MIPS results in lowering operational expense, sometimes dramatically.

Many of these same companies are currently undergoing architecture modernization including cloud migration, moving from monolithic applications to micro services and embracing open systems.

Modernization with the Mainframe and Modern Technologies

This modernization doesn’t easily embrace the mainframe, but would benefit from being able to access this data. Kafka can be used to keep a more modern data store in real-time sync with the mainframe, while at the same time persisting the event data on the bus to enable microservices, and deliver the data to other systems such as data warehouses and search indexes.

This not only will reduce operational expenses, but will provide a path for architecture modernization and agility that wasn’t available from the mainframe alone.

As final step and the ultimate vision of most enterprises, the mainframe might be replaced by new applications using modern technologies.

Kafka in Financial Services and Insurance Companies

I recently wrote about how payment applications can be improved leveraging Kafka and Machine Learning for real time scoring at scale. Here are a few concrete examples from banking and insurance companies leveraging Apache Kafka and its ecosystem for innovation, flexibility and reduced costs:

Capital One: Becoming truly event driven – offering a service other parts of the bank can use.
ING: Significantly improved customer experience – as a differentiator + Fraud detection and cost savings.
Freeyou: Real time risk and claim management in their auto insurance applications.
Generali: Connecting legacy databases and the modern world with a modern integration architecture based on event streaming.
Nordea: Able to meet strict regulatory requirements around real-time reporting + cost savings.
Paypal: Processing 400+ Billion events per day for user behavioral tracking, merchant monitoring, risk & compliance, fraud detection, and other use cases.
Royal Bank of Canada (RBC): Mainframe off-load, better user experience and fraud detection – brought many parts of the bank together.

The last example brings me back to the topic of this blog post: Companies can save millions of dollars “just” by offloading data from their mainframes to Kafka for further consumption and processing. Mainframe replacement might be the long term goal; but just offloading data is a huge $$$ win.

Domain-Driven Design (DDD) for Your Integration Layer

So why use Kafka for mainframe offloading and replacement? There are hundreds of middleware tools available on the market for mainframe integration and migration.

Well, in addition to being open and scalable (I won’t cover all the characteristics of Kafka in this post), one key characteristic is that Kafka enables decoupling applications better than any other messaging or integration middleware.

Legacy Migration and Cloud Journey

Legacy migration is a journey. Mainframes cannot be replaced in a single project. A big bang will fail. This has to be planned long-term. The implementation happens step by step.

Almost every company on this planet has a cloud strategy in the meantime. The cloud has many benefits, like scalability, elasticity, innovation, and more. Of course, there are trade-offs: Cloud is not always cheaper. New concepts need to be learned. Security is very different (I am NOT saying worse or better, just different). And hybrid will be the normal for most enterprises. Only enterprises younger than 10 years are cloud-only.

Therefore, I use the term ‘cloud’ in the following. But the same phases would exist if you “just” want to move away from mainframes but stay in your own on premises data centers.

On a very high level, mainframe migration could contain three phases:

Phase 1 – Cloud Adoption: Replicate data to the cloud for further analytics and processing. Mainframe offloading is part of this.
Phase 2 – Hybrid Cloud: Bidirectional replication of data in real time between the cloud and application in the data center, including mainframe applications.
Phase 3 – Cloud-First Development: All new applications are build in the cloud. Even the core banking system (or at least parts of it) is running in the cloud (or in a modern infrastructure in the data center).

How do we get there? As I said, a big bang will fail. Let’s take a look at a very common approach I have seen at various customers…

Status Quo: Mainframe Limitations and $$$

The following is the status quo: Applications are consuming data from the mainframe; creating expensive MIPS:

The mainframe is running and core business logic is deployed. It works well (24/7, mission-critical). But it is really tough or even impossible to make changes or add new features. Scalability is often becoming an issue, already. And well, let’s not talk about cost. Mainframes cost millions.

Mainframe Offloading

Mainframe offloading means data is replicated to Kafka for further analytics and processing by other applications. Data continues to be written to the mainframe via the existing legacy applications:

Offloading data is the easy part as you do not have to change the code on the mainframe. Therefore, the first project is often just about reading data.

Writing to the mainframe is much harder because the business logic on the mainframe must be understood to make changes. Therefore, many projects avoid this huge (and sometimes not solvable) challenge by keeping the writes as it is… This is not ideal, of course. You cannot add new features or changes to the existing application.

People are not able or do not want to take the risk to change it. There is no DevOps or CI/CD pipelines and no A/B testing infrastructure behind your mainframe, dear university graduates!

Mainframe Replacement

Everybody wants to replace Mainframes due to its technical limitations and cost. But enterprises cannot simply shut down the mainframe because of all the mission-critical business applications.

Therefore, COBOL, the main programming language for the mainframe is still widely used in applications to do large-scale batch and transaction processing jobs. Read the blog post ‘Brush up your COBOL: Why is a 60 year old language suddenly in demand?‘ for a nice ‘year 2020 introduction to COBOL’.

Enterprises have the following options:

Option 1: Continue to develop actively on the mainframe

Train or hire more (expensive) Cobol developers to extend and maintain your legacy applications. Well, this is where you want to go away from. If you forgot why: Cost, cost, cost. And technical limitations. I find it amazing how mainframes even support “modern technologies” such as Web Services. Mainframes do not stand still. Having said this, even if you use more modern technologies and standards, you are still running on the mainframe with all its drawbacks.

Option 2: Replace the Cobol code with a modern application using a migration and code generation tool

Various vendors provide tools which take COBOL code and automatically migrate it to generated source code in a modern programming language (typically Java). The new source code can be refactored and changed (as Java uses static typing so that any errors can be shown in the IDE and changes can be apply to all related dependencies in the code structure). This is the theory. The more complex your COBOL code is, the harder it gets to migrate it and the more custom coding is required for the migration (which results in the same problem the COBOL developer had on the mainframe: If you don’t understand the code routines, you cannot change it without risking errors, data loss or downtime in the new version of the application).

Option 3: Develop a new future-ready application with modern technologies to provide an open, flexible and scalable architecture

This is the ideal approach. Keep the old legacy mainframe running as long as needed. A migration is not a big bang. Replace use cases and functionality step-by-step. The new applications will have very different feature requirements anyway. Therefore, take a look at the Fintech and Insurtech companies. Starting green field has many advantages. The big drawback is high development and migration costs.

In reality, a mix of these three options can also happen. No matter what works for you, let’s now think how Apache Kafka fits into this story.

Domain-Driven Design for your Integration Layer

Kafka is not just a messaging system. It is an event streaming platform that provides storage capabities. In contrary to MQ systems or Web Service / API based architectures, Kafka really decouples producers and consumers:

With Kafka and its Domain-driven Design, every application / service / microservice / database / “you-name-it” is completely independent and loosely coupled from each other. But scalable, highly available and reliable! The blog post “Apache Kafka, Microservices and Domain-Driven Design (DDD)” goes much deeper into this topic.

Sberbank, the biggest bank in Russia, is a great example for their domain-driven approach: Sberbank built their new core banking system on top of the Apache Kafka ecosystem. This infrastructure is open and scalable, but also ready for mission-critical payment use cases like instant-payment or fraud detection, but also for innovative applications like Chat Bots in customer service or Robo-advising for trading markets.

Why is this decoupling so important? Because the mainframe integration, migration and replacement is a (long!) journey…

Event Streaming Platform and Legacy Middleware

An Event Streaming Platform gives applications independence. No matter if an application uses a new modern technology or legacy, proprietary, monolithic components. Apache Kafka provides the freedom to tap into and manage shared data, no matter if the interface is real time messaging, a synchronous REST / SOAP web service, a file based system, a SQL or NoSQL database, a Data Warehouse, a data lake or anything else:

Apache Kafka as Integration Middleware

The Apache Kafka ecosystem is a highly scalable, reliable infrastructure and allows high throughput in real time. Kafka Connect provides integration with any modern or legacy system, be it Mainframe, IBM MQ, Oracle Database, CSV Files, Hadoop, Spark, Flink, TensorFlow, or anything else. More details here:

Kafka Ecosystem for Security and Data Governance

The Apache Kafka ecosystem provides additional capabilities. This is important as Kafka is not just used as messaging or ingestion layer, but a platform for your most mission-critical use cases. Remember the examples from banking and insurance industry I showed in the beginning of this blog post. These are just a few of many more mission-critical Kafka deployments across the world. Check past Kafka Summit video recordings and slides for many more mission-critical use cases and architectures.

I will not go into detail in this blog post, but the Apache Kafka ecosystem provides features for security and data governance like Schema Registry (+ Schema Evolution), Role Based Access Control (RBAC), Encryption (on message or field level), Audit Logs, Data Flow analytics tools, etc…

Mainframe Offloading and Replacement in the Next 5 Years

With all the theory in mind, let’s now take a look at a practical example. The following is a journey many enterprises walk through these days. Some enterprises are just in the beginning of this journey, while others already saved millions by offloading or even replacing the mainframe.

I will walk you through a realistic approach which takes ~5 years in this example (but this can easily take 10+ years depending on the complexity of your deployments and organization). The importance is quick wins and a successful step-by-step approach; no matter how long it takes.

Year 0: Direct Communication between App and Mainframe

Let’s get started. Here is again the current situation: Applications directly communicate with the mainframe.

Let’s reduce cost and dependencies between the mainframe and other applications.

Year 1: Kafka for Decoupling between Mainframe and App

Offloading data from the mainframe enables existing applications to consume the data from Kafka instead of creating high load and cost for direct mainframe access:

This saves a lot of MIPS (million instructions per second). MIPS is a way to measure the cost of computing on mainframes. Less MIPS, less cost.

After offloading the data from the mainframe, new applications can also consume the mainframe data from Kafka. No additional MIPS cost! This allows building new applications on the mainframe data. Gone is the time where developers cannot build new innovative applications because the MIPS cost was the main blocker to access the mainframe data.

Kafka can store the data as long as it needs to be stored. This might be just 60 minutes for one Kafka Topic for log analytics or 10 years for customer interactions for another Kafka Topic. With Tiered Storage for Kafka, you can even reduce costs and increase elasticity by separating processing from storage with a back-end storage like AWS S3. This discussion is out of scope of this article. Check out “Is Apache Kafka a Database?” for more details.

Change Data Capture (CDC) for Mainframe Offloading to Kafka

The most common option for mainframe offloading is Change Data Capture (CDC): Transaction log-based CDC pushes data changes (insert, update, delete) from the mainframe to Kafka. The advantages:

Real time push updates to Kafka
Eliminate disruptive full loads, i.e. minimize production impact
Reduce MIPS consumption
Integrate with any mainframe technology (DB2 Z/OS, VSAM, IMS/DB, CISC, etc.)
Full support

On the first connection, the CDC tool reads a consistent snapshot of all of the tables that are whitelisted. When that snapshot is complete, the connector continuously streams the changes that were committed to the DB2 database for all whitelisted tables in capture mode. This generates corresponding insert, update and delete events. All of the events for each table are recorded in a separate Kafka topic, where they can be easily consumed by applications and services.

The big disadvantage of CDC is high licensing costs. Therefore, other integration options exist…

Integration Options for Kafka and Mainframe

While CDC is typically the preferred choice, there are more alternatives. Here are the integration options I have seen being discussed in the field for mainframe offloading:

IBM InfoSphere Data Replication (IIDR) CDC solution for Mainframe
3rd Party commercial CDC solution (e.g. Attunity or HLR)
Open-source CDC solution (e.g Debezium – but you still need an IIDC license, this is the same challenge as with Oracle and GoldenGate CDC)
Create interface tables + Kafka Connect + JDBC connector
IBM MQ interface + Kafka Connect’s IBM MQ connector
Confluent REST Proxy and HTTP(S) calls from the mainframe
Kafka Clients on the mainframe

Evaluate the trade-offs and make your decision. Requiring a fully supported solution often eliminates several options quickly In my experience, most people use CDC with IIDR or a 3rd party tool, followed by IBM MQ. Other options are more theoretical in most cases.

Year 2 to 4: New Projects and Applications

Now it is time to build new applications:

This can be agile, lightweight Microservices using any technology. Or this can be external solutions like a data lake or cloud service. Pick what you need. Your are flexible. The heart of your infrastructure allows this. It is open, flexible and scalable. Welcome to a new modern IT world!

Year 5: Mainframe Replacement

At some point, you might wonder: What about the old mainframe applications? Can we finally replace them (or at least some of them)? When the new application based on a modern technology is ready, switch over:

As this is a step-by-step approach, the risk is limited. First of all, the mainframe can keep running. If the new application does not work, switch back to the mainframe (which is still up-to-date as you did not stop inserting the updates into its database in parallel to the new application).

As soon as the new application is battle-tested and proven, the mainframe application can be shut down. The $$$ budgeted for the next mainframe renewal can be used for other innovative projects. Congratulations!

Mainframe Offloading and Replacement is a Journey… Kafka can Help!

Here is the big picture of our mainframe offloading and replacement story:

Hybrid Cloud Infrastructure with Apache Kafka and Mainframe

Remember the three phases I discussed earlier: Phase 1 – Cloud Adoption, Phase 2 – Hybrid Cloud, Phase 3 – Cloud-First Development. I did not tackle this in detail in this blog post. Having said this, Kafka is an ideal candidate for this journey. Therefore, this is a perfect combination for offloading and replacing mainframes in a hybrid and multi-cloud strategy. More details here:

Slides and Video Recording for Kafka and Mainframe Integration

Here are the slides:

And the video walking you through the slides:

What are your experiences with mainframe offloading and replacement? Did you or do you plan to use Apache Kafka and its ecosystem? What is your strategy? Let’s connect on LinkedIn and discuss! Stay informed about new blog posts by subscribing to my newsletter.

The post Mainframe Integration, Offloading and Replacement with Apache Kafka appeared first on Kai Waehner.