Machine Learning Archives - Kai Waehner

Fraud Detection in Mobility Services (Ride-Hailing, Food Delivery) with Data Streaming using Apache Kafka and Flink

Kai Waehner — Mon, 28 Apr 2025 06:29:25 +0000

Mobility services like Uber, Grab, FREE NOW (Lyft), and DoorDash are built on real-time data. Every trip, delivery, and payment relies on accurate, instant decision-making. But as these services scale, they become prime targets for sophisticated fraud—GPS spoofing, fake accounts, payment abuse, and more. Traditional, batch-based fraud detection can’t keep up. It reacts too late, misses complex patterns, and creates blind spots that fraudsters exploit. To stop fraud before it happens, mobility platforms need data streaming technologies like Apache Kafka and Apache Flink for fraud detection. This blog explores how leading platforms are using real-time event processing to detect and block fraud as it happens—protecting revenue, user trust, and platform integrity at scale.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

The Business of Mobility Services (Ride-Hailing, Food Delivery, Taxi Aggregators, etc.)

Mobility services have become an essential part of modern urban life. They offer convenience and efficiency through ride-hailing, food delivery, car-sharing, e-scooters, taxi aggregators, and micro-mobility options. Companies such as Uber, Lyft, FREE NOW (former MyTaxi; acquired by Lyft recently), Grab, Careem, and DoorDash connect millions of passengers, drivers, restaurants, retailers, and logistics partners to enable seamless transactions through digital platforms.

These platforms operate in highly dynamic environments where real-time data is crucial for pricing, route optimization, customer experience, and fraud detection. However, this very nature of mobility services also makes them prime targets for fraudulent activities. Fraud in this sector can lead to financial losses, reputational damage, and deteriorating customer trust.

To effectively combat fraud, mobility services must rely on real-time data streaming with technologies such as Apache Kafka and Apache Flink. These technologies enable continuous event processing and allow platforms to detect and prevent fraud before transactions are finalized.

Why Fraud is a Major Challenge in Mobility Services

Fraudsters continually exploit weaknesses in digital mobility platforms. Some of the most common fraud types include:

Fake Rides and GPS Spoofing: Drivers manipulate GPS data to simulate trips that never occurred. Passengers use location spoofing to receive cheaper fares or exploit promotions.

Payment Fraud and Stolen Credit Cards: Fraudsters use stolen payment methods to book rides or order food.

Fake Drivers and Passengers: Fraudsters create multiple accounts and pretend to be both the driver and passenger to collect incentives. Some drivers manipulate fares by manually adjusting distances in their favor.

Promo Abuse: Users create multiple fake accounts to exploit referral bonuses and promo discounts.

Account Takeovers and Identity Fraud: Hackers gain access to legitimate accounts, misusing stored payment information. Fraudsters use fake identities to bypass security measures.

Fraud not only impacts revenue but also creates risks for legitimate users and drivers. Without proper fraud prevention measures, ride-hailing and delivery companies could face serious losses, both financially and operationally.

The Unseen Enemy: Core Challenges in Mobility Fraud
Detection

Traditional fraud detection relies on batch processing and manual rule-based systems. However, these approaches are no longer effective due to the speed and complexity of modern mobile apps with real-time experiences combined with modern fraud schemes.

Payment Fraud – The Hidden Enemy in a Digital World

Key challenges in mobility fraud detection include:

Fraud occurs in real-time, requiring instant detection and prevention before transactions are completed.
Millions of events per second must be processed, requiring scalable and efficient systems.
Fraud patterns constantly evolve, making static rule-based approaches ineffective.
Platforms operate across hybrid and multi-cloud environments, requiring seamless integration of fraud detection systems.

How Data Streaming with Apache Kafka and Flink Enables Real-Time Fraud Detection

To overcome these challenges, real-time streaming analytics powered by Apache Kafka and Apache Flink provide an effective solution.

Apache Kafka: The Backbone of Event-Driven Fraud Detection

Kafka serves as the core event streaming platform. It captures and processes real-time data from multiple sources such as:

GPS location data
Payment transactions
User and driver behavior analytics
Device fingerprints and network metadata

Kafka provides:

High-throughput data streaming, capable of processing millions of events per second to support real-time decision-making.
An event-driven architecture that enables decoupled, flexible systems—ideal for scalable and maintainable mobility platforms.
Seamless scalability across hybrid and multi-cloud environments to meet growing demand and regional expansion.
Always-on reliability, ensuring 24/7 data availability and consistency for mission-critical services such as fraud detection, pricing, and trip orchestration.

An excellent success story about the transition to data streaming comes from DoorDash: Why DoorDash migrated from Cloud-native Amazon SQS and Kinesis to Apache Kafka and Flink.

Apache Flink: Continuous Stream Processing for Fraud Detection in Real-Time

Apache Flink enables real-time fraud detection through advanced event correlation and applied AI:

Detects anomalies in GPS data, such as sudden jumps, route manipulation, or unrealistic movement patterns.
Analyzes historical user behavior to surface signs of account takeovers or other forms of identity misuse.
Joins multiple real-time streams—including payment events, location updates, and account interactions—to generate accurate, low-latency fraud scores.
Applies machine learning models in-stream, enabling the system to flag and stop suspicious transactions before they are processed.
Continuously adapts to new fraud patterns, updating models with fresh data in near real-time to reflect evolving user behavior and emerging threats.

With Kafka and Flink, fraud detection can shift from reactive to proactive to stop fraudulent transactions before they are completed.

I already covered various data streaming success stories from financial services companies such as Paypal, Capital One and ING Bank in a dedicated blog post. And a separate case study from about “Fraud Prevention in Under 60 Seconds with Apache Kafka: How A Bank in Thailand is Leading the Charge“.

Real-World Fraud Prevention Stories from Mobility Leaders

Fraud is not just a technical issue—it’s a business-critical challenge that impacts trust, revenue, and operational stability in mobility services. The following real-world examples from industry leaders like FREE NOW (Lyft), Grab, and Uber show how data streaming with advanced stream processing and AI are used around the world to detect and stop fraud in real time, at massive scale.

FREE NOW (Lyft): Detecting Fraudulent Trips in Real Time by Analyzing GPS Data of Cars

FREE NOW operates in more than 150 cities across Europe with 48 million users. It integrates multiple mobility services, including taxis, private vehicles, car-sharing, e-scooters, and bikes.

The company was recently acquired by Lyft, the U.S.-based ride-hailing giant known for its focus on multimodal urban transport and strong presence in North America. This acquisition marks Lyft’s strategic entry into the European mobility ecosystem, expanding its footprint beyond the U.S. and Canada.

Source: FREE NOW

Fraud Prevention Approach leveraging Data Streaming (presented at Kafka Summit)

Uses Kafka Streams and Kafka Connect to analyze GPS trip data in real-time.
Deploys fraud detection models that identify anomalies in trip routes and fare calculations.
Operates data streaming on fully managed Confluent Cloud and applications on Kubernetes for scalable fraud detection.

Source: FREE NOW

Example: Detecting Fake Rides

A driver inputs trip details into the app.
Kafka Streams predicts expected trip fare based on distance and duration.
GPS anomalies and unexpected route changes are flagged.
Fraud alerts are triggered for suspicious transactions.

By implementing real-time fraud detection with Kafka and Flink, FREE NOW (Lyft) has significantly reduced fraudulent trips and improved platform security.

Grab: AI-Powered Fraud Detection for Ride-Hailing and Delivery with Data Streaming and AI/ML

Grab is a leading mobility platform in Southeast Asia, handling millions of transactions daily. Fraud accounts for 1.6 percent of total revenue loss in the region.

To address these significant fraud numbers, Grab developed GrabDefence—an AI-powered fraud detection engine that leverages real-time data and machine learning to detect and block suspicious activity across its platform.

Source: Grab

Fraud Detection Approach

Uses Kafka Streams and machine learning for fraud risk scoring.
Leverages Flink for feature aggregation and anomaly detection.
Detects fraudulent transactions before they are completed.

Source: Grab

Example: Fake Driver and Passenger Fraud

Fraudsters create accounts as both driver and passenger to claim rewards.
Kafka ingests device fingerprints, payment transactions, and ride data.
Flink aggregates historical fraud behavior and assigns risk scores.
High-risk transactions are blocked instantly.

With GrabDefence built with data streaming, Grab reduced fraud rates to 0.2 percent, well below the industry average. Learn more about GrabDefence in the Kafka Summit talk.

Uber: Project RADAR – AI-Powered Fraud Detection with Human Oversight

Uber processes millions of payments per second globally. Fraud detection is complex due to chargebacks and uncollected payments.

To combat this, Uber launched Project RADAR—a hybrid system that combines machine learning with human reviewers to continuously detect, investigate, and adapt to evolving fraud patterns in near real time. Low latency is not required in this scenario. And humans are in the loop of the business process. Hence, Apache Spark is sufficient for Uber.

Source: Uber

Fraud Prevention Approach

Uses Kafka and Spark for multi-layered fraud detection.
Implements machine learning models to detect chargeback fraud.
Incorporates human analysts for rule validation.

Source: Uber

Example: Chargeback Fraud Detection

Kafka collects all ride transactions in real time.
Stream processing detects anomalies in payment patterns and disputes.
AI-based fraud scoring identifies high-risk transactions.
Uber’s RADAR system allows human analysts to validate fraud alerts.

Uber’s combination of AI-driven detection and human oversight has significantly reduced chargeback-related fraud.

Data Streaming with Kafka and Flink Provides Real-Time Fraud Detection in Mobility Services

Fraud in mobility services is a real-time challenge that requires real-time solutions that work 24/7, even at extreme scale for millions of events. Traditional batch processing systems are too slow, and static rule-based approaches cannot keep up with evolving fraud tactics.

By leveraging data streaming with Apache Kafka in conjunction with Kafka Streams or Apache Flink, mobility platforms can:

Process millions of events per second to detect fraud in real time.
Prevent fraudulent transactions before they occur.
Use AI-driven real-time fraud scoring for accurate risk assessment.
Adapt dynamically through continuous learning to evolving fraud patterns.

Mobility platforms such as Uber, Grab, and FREE NOW (Lyft) are leading the way in using real-time streaming analytics to protect their platforms from fraud. By implementing similar approaches, other mobility businesses can enhance security, reduce financial losses, and maintain customer trust.

Real-time fraud prevention in mobility services is not an option; it is a necessity. The ability to detect and stop fraud in real time will define the future success of ride-hailing, food delivery, and urban mobility platforms.

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And download my free book about data streaming use cases.

The post Fraud Detection in Mobility Services (Ride-Hailing, Food Delivery) with Data Streaming using Apache Kafka and Flink appeared first on Kai Waehner.

Online Model Training and Model Drift in Machine Learning with Apache Kafka and Flink

Kai Waehner — Sun, 23 Feb 2025 05:08:20 +0000

The landscape of artificial intelligence (AI) and machine learning (ML) is transforming rapidly. Online model training and model drift management become essential for businesses to maintain competitive edges. Data streaming with Apache Kafka and Apache Flink plays crucial roles in this evolution, enabling real-time updates and seamless integration into modern data infrastructures. This blog explores the challenges of model drift, investigates TikTok’s groundbreaking architecture, and highlights the business value and complementary nature of data streaming with other platforms.

Understanding Model Drift: The Achilles’ Heel of Static Models

Real-time model inference with a data streaming platform using Apache Kafka and Flink is a powerful solution for delivering fast and accurate predictions, as detailed in my model inference blog post, but it’s not enough to sustain long-term model accuracy.

Machine learning models degrade in accuracy over time due to shifts in data or concepts—a phenomenon known as model drift.

This can take several forms:

Concept Drift: Changing relationships between input and output variables, such as shifting user behavior.
Data Drift: Variations in data distribution, e.g., demographic shifts.
Upstream Data Changes: Pipeline modifications, e.g., new logging formats or unavailable sources.

Unchecked, model drift leads to poor predictions and missed opportunities. Addressing it requires continuous updates, which online machine learning enables through data streaming platforms like Kafka and Flink.

TikTok: Revolutionizing Real-Time Personalization with Kafka and Flink

TikTok’s recommendation system, detailed in ByteDance’s whitepaper, leverages a cutting-edge, real-time machine learning architecture powered by data streaming technologies like Kafka and Flink to deliver personalized content at scale, seamlessly integrating user behavior data, dynamic feature processing, and online model updates for unparalleled user engagement and platform efficiency.

What is ByteDance and TikTok?

ByteDance, TikTok’s parent company, is a Chinese technology giant renowned for its innovative use of AI and real-time ML. TikTok, its most famous product, has redefined user engagement through hyper-personalized video recommendations. TikTok employs real-time online machine learning, ensuring recommendations are dynamic, accurate, and engaging.

Why TikTok Outshines Competitors

While other social video platforms also leverage advanced machine learning for recommendations, TikTok’s architecture distinguishes itself by prioritizing real-time adaptability and hyper-personalization, ensuring it can respond to user behavior faster and more effectively than its competitors.

User Engagement: TikTok’s recommendation engine adapts in real-time, delivering hyper-relevant content that increases user retention.
Scalability: Unlike many platforms relying on periodic retraining, TikTok continuously updates its models, handling massive data streams with ease.
Speed: Real-time processing reduces latency in adapting to user behavior, a stark contrast to Facebook or YouTube’s delayed batch processes.

Technical Implementation using Apache Kafka, Flink and Machine Learning for Continuous Online Model Training

TikTok’s real-time recommendation system is built on a robust streaming data architecture:

Source: Bytedance

Data Ingestion:

User interactions like views, likes, and shares are streamed in real-time via Kafka.
Kafka ensures reliable collection and distribution of high-velocity event data.

Feature Engineering:

Flink processes raw data streams, performing real-time feature extraction and enrichment.
Techniques like point-in-time lookups prevent training-inference skew, ensuring the same features are used in both phases.

Online Model Training:

Lightweight models are continuously updated with fresh data.
This approach mitigates model drift, ensuring predictions stay relevant and accurate.

Real-Time Inference:

Updated models are deployed immediately to serve predictions.
TikTok’s architecture ensures latency is minimal, with recommendations delivered almost instantly.

This dynamic infrastructure has made TikTok a leader in real-time AI, setting a benchmark for others.

Data Streaming with Kafka and Flink: The Backbone of Modern Machine Learning and AI

Apache Kafka and Flink are indispensable for organizations embracing online ML.

Data streaming addresses key challenges:

Training-Inference Data Skew: By streaming real-time features into models, Flink ensures consistency in model training and inference data.
Multi-Model Governance: Kafka and Flink enable the data integration with small models for enrichment and large models for complex decision-making, ensuring governance and modularity.
Scalability and Efficiency: Data streaming pipelines handle massive volumes with low latency, enabling real-time decision-making.

Complementing Other Data Platforms: Streaming Meets Analytics

Data streaming complements platforms like Databricks, Snowflake, and Microsoft Fabric, creating a seamless ecosystem for AI/ML workflows:

Databricks: While Databricks excels in large-scale batch processing and AI model training, Kafka adds real-time data ingestion and pre-processing capabilities.
Snowflake: Zero-ETL integration with Kafka and Flink allows for real-time analytics alongside Snowflake’s strong data warehousing and AI features.
Microsoft Fabric: Fabric’s AI-powered analytics gain agility from Kafka’s event-driven architecture, ensuring near-instant data availability.

The Shift Left Architecture emphasizes moving from traditional batch processing and lakehouse-centric approaches to real-time data products, empowering businesses to act on data faster and with greater agility. Learn more about this transformative approach in my Shift Left Architecture blog post.

Meanwhile, Apache Iceberg, an open table format for lakehouses and streaming, ensures seamless data sharing across real-time and batch workflows by providing a unified view of data. Dive deeper into its capabilities in my Apache Iceberg blog post.

This complementary relationship enables businesses to leverage best-in-class tools without trade-offs, providing both real-time and batch capabilities. Learn more in my comparison blog series “Data Streaming with Kafka and Flink vs. Snowflake” and “Microsoft Fabric and Apache Kafka“.

Business Value: A Compelling Case for Real-Time AI/ML with Data Streaming using Kafka and Flink

The adoption of real-time ML with Kafka and Flink drives tangible business outcomes:

Enhanced User Engagement: Personalized recommendations lead to improved customer retention.
Faster Time to Market: Real-time data pipelines reduce the lead time for deploying ML solutions.
Improved ROI: Real-time adaptability ensures models deliver consistent business value.
Freedom of Choice: Kafka acts as the backbone, enabling seamless integration with diverse tools and platforms.

This translates to a flexible, scalable, and high-performing ML infrastructure capable of handling evolving business demands.

Real-Time AI/ML with Kafka and Flink: Transforming Data into Business Value

Online machine learning with Apache Kafka and Flink is the future of adaptive, real-time AI. TikTok’s success story is a testament to the power of dynamic AI/ML systems in driving engagement and staying competitive. By complementing platforms like Snowflake, Databricks, and Microsoft Fabric, data streaming enables a holistic, future-proof data strategy.

Organizations must embrace these technologies to unlock faster time to market, unparalleled user experiences, and sustained business growth.

Let’s connect on LinkedIn and discuss how to implement these ideas in your organization. Stay informed about new developments by subscribing to my newsletter. And make sure to download my free book about data streaming use cases.

The post Online Model Training and Model Drift in Machine Learning with Apache Kafka and Flink appeared first on Kai Waehner.

Stateless vs. Stateful Stream Processing with Kafka Streams and Apache Flink

Kai Waehner — Fri, 27 Dec 2024 08:48:54 +0000

In the world of data-driven applications, the rise of stream processing has changed how we handle and act on data. While traditional databases, data lakes, and warehouses are effective for many batch-based use cases, they fall short in scenarios demanding low latency, scalability, and real-time decision-making. This post explores the key concepts of stateless and stateful stream processing, using Kafka Streams and Apache Flink as examples. These principles apply to any stream processing engine, whether open-source or a cloud service. Let’s break down the differences, practical use cases, the relation to AI/ML, and the immense value stream processing offers compared to traditional data-at-rest methods.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch.

Rethinking Data Processing: From Static to Dynamic

In traditional systems, data is typically stored first in a database or data lake and queried later for computation. This works well for batch processing tasks, like generating reports or dashboards. The process usually looks something like this:

Store Data: Data arrives and is stored in a database or data lake.
Query & Compute: Applications request data for analysis or processing at a later time with a web service, request-response API or SQL script.

However, this approach fails when you need:

Immediate Action: Real-time responses to events, such as fraud detection.
Scalability: Handling thousands or millions of events per second.
Continuous Insights: Ongoing analysis of data in motion.

Enter stream processing—a paradigm where data is continuously processed as it flows through the system. Instead of waiting to store data first, stream processing engines like Kafka Streams and Apache Flink enable you to act on data instantly as it arrives.

Use Case: Fraud Prevention in Real-Time

The blog post uses a fraud prevention scenario to illustrate the power of stream processing. In this example, transactions from various sources (e.g., credit card payments, mobile app purchases) are monitored in real time.

The system flags suspicious activities using three methods:

Stateless Processing: Each transaction is evaluated independently, and high-value payments are flagged immediately.
Stateful Processing: Transactions are analyzed over a time window (e.g., 1 hour) to detect patterns, such as an unusually high number of transactions.
AI Integration: A pre-trained machine learning model is used for real-time fraud detection by predicting the likelihood of fraudulent activity.

This example highlights how stream processing enables instant, scalable, and intelligent fraud detection, something not achievable with traditional batch processing.

To avoid confusion: while I use Kafka Streams for stateless and Apache Flink for stateful in the example, both frameworks are capable of handling both types of processing.

Other Industry Examples of Stream Processing

Predictive Maintenance (Industrial IoT): Continuously monitor sensor data to predict equipment failures and schedule proactive maintenance.
Real-Time Advertisement (Retail): Deliver personalized ads based on real-time user interactions and behavior patterns.
Real-Time Portfolio Monitoring (Finance): Continuously analyze market data and portfolio performance to trigger instant alerts or automated trades during market fluctuations.
Supply Chain Optimization (Logistics): Track shipments in real time to optimize routing, reduce delays, and improve efficiency.
Condition Monitoring (Healthcare): Analyze patient vitals continuously to detect anomalies and trigger immediate alerts.
Network Monitoring (Telecom): Detect outages or performance issues in real time to improve service reliability.

These examples highlight how stream processing drives real-time insights and actions across diverse industries.

What is Stateless Stream Processing?

Stateless stream processing focuses on processing each event independently. In this approach, the system does not need to maintain any context or memory of previous events. Each incoming event is handled in isolation, meaning the logic applied depends solely on the data within that specific event.

This makes stateless processing highly efficient and easy to scale, as it doesn’t require state management or coordination between events. It is ideal for use cases such as filtering, transformations, and simple ETL operations where individual events can be processed with no need of historical data or context.

1. Example: Real-Time Payment Monitoring

Imagine a fraud prevention system that monitors transactions in real time to detect and prevent suspicious activities. Each transaction, whether from a credit card, mobile app, or payment gateway, is evaluated as it occurs. The system checks for anomalies such as unusually high amounts, transactions from unfamiliar locations, or rapid sequences of purchases.

By analyzing these attributes instantly, the system can flag high-risk transactions for further inspection or automatically block them. This real-time evaluation ensures potential fraud is caught immediately, reducing the likelihood of financial loss and enhancing overall security.

You want to flag high-value payments for further inspection. In the following Kafka Streams example:

Each transaction is evaluated as it arrives.
If the transaction amount exceeds 100 (in your chosen currency), it’s sent to a separate topic for further review.

Java Example (Kafka Streams):

KStream payments = builder.stream(“payments”);

payments.filter((key, payment) -> payment.getAmount() > 100)
.to(“high-risk-payments”);

Benefits of Stateless Processing

Low Latency: Immediate processing of individual events.
Simplicity: No need to track or manage past events.
Scalability: Handles large volumes of data efficiently.

This approach is ideal for use cases like filtering, data enrichment, and simple ETL tasks.

What is Stateful Stream Processing?

Stateful stream processing takes it a step further by considering multiple events together. The system maintains state across events, allowing for complex operations like aggregations, joins, and windowed analyses. This means the system can correlate data over a defined period, track patterns, and detect anomalies that emerge across multiple transactions or data points.

2. Example: Fraud Prevention through Continuous Pattern Detection

In fraud prevention, individual transactions may appear normal, but patterns over time can reveal suspicious behavior.

For example, a fraud prevention system might identify suspicious behavior by analyzing all transactions from a specific credit card within a one-hour window, rather than evaluating each transaction in isolation.

Let’s detect anomalies by analyzing transactions with Apache Flink using Flink SQL. In this example:

The system monitors transactions for each credit card within a 1-hour window.
If a card is used over 10 times in an hour, it flags potential fraud.

SQL Example (Apache Flink):

SELECT card_number, COUNT(*) AS transaction_count
FROM payments
GROUP BY TUMBLE(transaction_time, INTERVAL ‘1’ HOUR), card_number
HAVING transaction_count > 10;

Key Concepts in Stateful Processing

Stateful processing relies on maintaining context across multiple events, enabling the system to perform more sophisticated analyses. Here are the key concepts that make stateful stream processing possible:

Windows: Define a time range to group events (e.g., sliding windows, tumbling windows).
State Management: The system remembers past events within the defined window.
Joins: Combine data from multiple sources for enriched analysis.

Benefits of Stateful Processing

Stateful processing is essential for advanced use cases like anomaly detection, real-time monitoring, and predictive analytics:

Complex Analysis: Detect patterns over time.
Event Correlation: Combine events from different sources.
Real-Time Decision-Making: Continuous monitoring without reprocessing data.

Bringing AI and Machine Learning into Stream Processing

Stream processing engines like Kafka Streams and Apache Flink also enable real-time AI and machine learning model inference. This allows you to integrate pre-trained models directly into your data processing pipelines.

3. Example: Real-Time Fraud Detection with AI/ML Models

Consider a payment fraud detection system that uses a TensorFlow model for real-time inference. In this system, transactions from various sources — such as credit cards, mobile apps, and payment gateways — are streamed continuously. Each incoming transaction is preprocessed and sent to the TensorFlow model, which evaluates it based on patterns learned during training.

The model analyzes features like transaction amount, location, device ID, and frequency to predict the likelihood of fraud. If the model identifies a high probability of fraud, the system can trigger immediate actions, such as flagging the transaction, blocking it, or alerting security teams. This real-time inference ensures that potential fraud is detected and addressed instantly, reducing risk and enhancing security.

Here is a code example using Apache Flink’s Python API for predictive AI:

Python Example (Apache Flink):

def predict_fraud(payment):
prediction = model.predict(payment.features)
return prediction > 0.5

stream = payments.map(predict_fraud)

Why Combine AI with Stream Processing?

Integrating AI with stream processing unlocks powerful capabilities for real-time decision-making, enabling businesses to respond instantly to data as it flows through their systems. Here are some key benefits of combining AI with stream processing:

Real-Time Predictions: Immediate fraud detection and prevention.
Automated Decisions: Integrate AI into critical business processes.
Scalability: Handle millions of predictions per second.

Apache Kafka and Flink deliver low-latency, scalable, and robust predictions. My article “Real-Time Model Inference with Apache Kafka and Flink for Predictive AI and GenAI” compares remote inference (via APIs) and embedded inference (within the stream processing application).

For large AI models (e.g., generative AI or large language models), inference is often done via remote calls to avoid embedding large models within the stream processor.

Stateless vs. Stateful Stream Processing: When to Use Each

Choosing between stateless and stateful stream processing depends on the complexity of your use case and whether you need to maintain context across multiple events. The following table outlines the key differences to help you determine the best approach for your specific needs.

Feature	Stateless	Stateful
Use Case	Simple Filtering, ETL	Aggregations, Joins
Latency	Very Low Latency	Slightly Higher Latency due o State Management
Complexity	Simple Logic	Complex Logic Involving Multiple Events
State Management	Not Required	Required for Context-aware Processing
Scalability	High	Depends on the Framework

Read my article “Apache Kafka (including Kafka Streams) + Apache Flink = Match Made in Heaven” to learn more about choosing the right stream processing engine for your use case.

And to clarify again: while this article uses Kafka Streams for stateless and Flink for stateful stream processing, both frameworks are capable of handling both types.

Video Recording

Below, I summarize this content as a ten-minute video on my YouTube channel:

Why Stream Processing is a Fundamental Change

Whether stateless or stateful, stream processing with Kafka Streams, Apache Flink, and similar technologies unlocks real-time capabilities that traditional databases simply cannot offer. From simple ETL tasks to complex fraud detection and AI integration, stream processing empowers organizations to build scalable, low-latency applications.

Investing in stream processing means:

Faster Innovation: Real-time insights drive competitive advantage.
Operational Efficiency: Automate decisions and reduce latency.
Scalability: Handle millions of events seamlessly.

Stream processing isn’t just an evolution of data handling—it’s a revolution. If you’re not leveraging it yet, now is the time to explore this powerful paradigm. If you want to learn more, check out my light board video exploring the core value of Apache Flink:

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation.

The post Stateless vs. Stateful Stream Processing with Kafka Streams and Apache Flink appeared first on Kai Waehner.

How Siemens Healthineers Leverages Data Streaming with Apache Kafka and Flink in Manufacturing and Healthcare

Kai Waehner — Tue, 17 Dec 2024 05:58:17 +0000

Siemens Healthineers, a global leader in medical technology, delivers solutions that improve patient outcomes and empower healthcare professionals. As part of the Siemens AG family, Siemens Healthineers stands out with innovative products, data-driven solutions, and services designed to optimize workflows, improve precision, and enhance efficiency in healthcare systems worldwide. A significant aspect of their technological prowess lies in their use of data streaming to unlock real-time insights and optimize processes. This blog post explores how Siemens Healthineers uses data streaming with Apache Kafka and Flink, their cloud-focused technology stack, and the use cases that drive tangible business value such as real-time logistics, robotics, SAP ERP integration, AI/ML, and more.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch.

Siemens Healthineers: Shaping the Future of Healthcare Technology

Who They Are

Siemens AG, a global powerhouse in industrial manufacturing, energy, and technology, has been a leader in innovation for over 170 years. Known for its groundbreaking contributions across sectors, Siemens combines engineering expertise with digitalization to shape industries worldwide. Within this ecosystem, Siemens Healthineers stands out as a pivotal player in healthcare technology.

Source: Siemens Healthineers

With over 71,000 employees operating in 70+ countries, Siemens Healthineers supports critical clinical decisions in healthcare. Over 90% of leading hospitals worldwide collaborate with them, and their technologies influence over 70% of critical clinical decisions.

Their Vision

Siemens Healthineers focuses on innovation through data and AI, aiming to streamline healthcare delivery. With more than 24,000 technical intellectual property rights, including 15,000 granted patents, their technological foundation enables precision medicine, enhanced diagnostics, and patient-centric solutions.

Source: Siemens Healthineers

Siemens Healthineers and Data Streaming for Healthcare and Manufacturing

Siemens is a large conglomerate. I already covered a few data streaming use cases at other Siemens divisions. For instance, the integration project from SAP ERP on-premise to Salesforce CRM in the cloud.

At the Data in Motion Tour 2024 in Frankfurt, Arash Attarzadeh (“Apache Kafka Jedi“) from Siemens Heathineers presented various very interesting success stories that leverage data streaming using Apache Kafka, Flink, Confluent, and its entire ecosystem.

Why Data Streaming with Apache Kafka and Flink?

Healthcare and manufacturing processes generate massive volumes of real-time data. Whether it’s monitoring devices on production floors, analyzing telemetry data from hospitals, or optimizing logistics, Siemens Healthineers recognizes that data streaming enables:

Real-time insights: Immediate and continuously action on events as they happen.
Improved decision-making: Faster and more accurate responses.
Cost efficiency: Reduced downtime and optimized operations.

Healthineers Data Cloud

The Siemens Healthineers Data Cloud serves as the backbone of their data strategy. Built on a robust technology stack, it facilitates real-time data ingestion, transformation, and analytics using tools like Confluent Cloud (including Apache Kafka and Flink) and Snowflake.

Source: Siemens Healthineers

This combination of leading SaaS solutions enables seamless integration of streaming data with batch processes and diverse analytics platforms.

Technology Stack: Healthineers Data Cloud

Key Components

Confluent Cloud (Apache Kafka): For real-time data ingestion, data integration and stream processing.
Snowflake: A centralized warehouse for analytics and reporting.
Matillion: Batch ETL processes for structured and semi-structured data.
IoT Data Integration: Sensors and PLCs collect data from manufacturing floors, often via MQTT.

Source: Siemens Healthineers

Many other solutions are critical for some use cases. Siemens Healthineers also uses Databricks, dbt, OPC-UA, and many other systems for the end-to-end data pipelines.

Diverse Data Ingestion

Real-Time Streaming: IoT data (sensors, PLCs) is ingested within minutes.
Batch Processing: Structured and semi-structured data from SAP systems.
Change Data Capture (CDC): Data changes in SAP sources are captured and available in under 30 minutes.

Not every data integration process is or can be real-time. Data consistency is still one of the most underrated capabilities of data streaming. Apache Kafka supports real-time, batch and request-response APIs communicating with each other in a consistent way.

Use Cases for Data Streaming at Siemens Healthineers

Siemens Healthineers described six different use cases that leverage data streaming together with various other IoT, software and cloud services:

Machine monitoring and predictive maintenance
Data integration layer for analytics
Machine and robot integration
Telemetry data processing for improved diagnostics
Real-time logistics with SAP events for better supply chain efficiency
Track and Trace Orders for improved customer satisfaction and ensured compliance

Let’s take a look at them in the following subsections.

1. Machine Monitoring and Predictive Maintenance in Manufacturing

Goal: To ensure the smooth operation of production devices through predictive maintenance.

Using data streaming, real-time IoT data from drill machines is ingested into Kafka topics, where it’s analyzed to predict maintenance needs. By using a TensorFlow machine learning model for infererence with Apache Kafka, Siemens Healthineers can:

Reduce machine downtime.
Optimize maintenance schedules.
Increase productivity in manufacturing CT scanners.

Business Value: Predictive maintenance reduces operational costs and prevents production halts, ensuring timely delivery of critical medical equipment.

2. IQ-Data Intelligence from IoT and SAP to Cloud

Goal: Develop an end-to-end data integration layer for analytics.

Data from various lifecycle phases (e.g., SAP systems, IoT interfaces via MQTT using Mosquitto, external sources) is streamed into a consistent model using stream processing with ksqlDB. The resulting data backend supports the development of MLOps architectures and enables advanced analytics.

Source: Siemens Healthineers

Business Value: Streamlined data integration accelerates the development of AI applications, helping data scientists and analysts make quicker, more informed decisions.

3. Machine Integration with SAP and KUKA Robots

Goal: Integrate machine data for analytics and real-time insights.

Data from SAP systems (such as SAP ME and SAP PCO) and machines like KUKA robots is streamed into Snowflake for analytics. MQTT brokers and Apache Kafka manage real-time data ingestion and facilitate predictive analytics.

Source: Siemens Healthineers

Business Value: Enhanced machine integration improves production quality and supports the shift toward smart manufacturing processes.

4. Digital Healthcare Service Operations using Data Streaming

Goal: Stream telemetry data from Siemens Healthineers products for analytics.

Telemetry data from hospital devices is streamed via WebSockets to Kafka and combined with ksqlDB for continuous stream processing. Insights are fed back to clients for improved diagnostics.

Business Value: By leveraging real-time device data, Siemens Healthineers enhances the reliability of its medical equipment and improves patient outcomes.

5. Real-Time Logistics with SAP Events and Confluent Cloud

Goal: Stream SAP logistics event data for real-time packaging and shipping updates.

Using Confluent Cloud, Siemens Healthineers reduces delays in packaging and shipping by enabling real-time insights into logistics processes.

Source: Siemens Healthineers

Business Value: Improved packaging planning reduces delivery times and enhances supply chain efficiency, ensuring faster deployment of medical devices.

6. Track and Trace Orders with Apache Kafka and Snowflake

Goal: Real-time order tracking using streaming data.

Data from Siemens Healthineers orders is streamed into Snowflake using Kafka for real-time monitoring. This enables detailed tracking of orders throughout the supply chain.

Business Value: Enhanced order visibility improves customer satisfaction and ensures compliance with regulatory requirements.

Real-Time Data as a Catalyst for Healthcare and Manufacturing Innovation at Siemens Healthineers

Siemens Healthineers’ innovative use of data streaming exemplifies how real-time insights can drive efficiency, reliability, and innovation in healthcare and manufacturing. By leveraging tools like Confluent (including Apache Kafka and Flink), MQTT and Snowflake and transitiing some workloads to the cloud, they’ve built a robust infrastructure to handle diverse data streams, improve decision-making, and deliver tangible business outcomes.

From predictive maintenance to enhanced supply chain visibility, the adoption of data streaming unlocks value at every stage of the production and service lifecycle. For Siemens Healthineers, these advancements translate into better patient care, streamlined operations, and a competitive edge in the dynamic healthcare industry.

To learn more about the relationship between these key technologies and their applications in different use cases, explore the articles below:

Do you have similar use cases and architectures like Siemens Healthineers to leverage data streaming with Apache Kafka and Flink in the healthcare and manufacturing sector? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post How Siemens Healthineers Leverages Data Streaming with Apache Kafka and Flink in Manufacturing and Healthcare appeared first on Kai Waehner.

Real-Time Model Inference with Apache Kafka and Flink for Predictive AI and GenAI

Kai Waehner — Tue, 01 Oct 2024 05:26:11 +0000

Artificial Intelligence (AI) and Machine Learning (ML) are transforming business operations by enabling systems to learn from data and make intelligent decisions for predictive and generative AI use cases. Two essential components of AI/ML are model training and inference. Models are developed and refined using historical data. Model inference is the process of using a trained machine learning models to make predictions or generate outputs based on new, unseen data. This blog post covers the basics of model inference, comparing different approaches like remote and embedded inference. It also explores how data streaming with Apache Kafka and Flink enhances the performance and reliability of these predictions. Whether for real-time fraud detection, smart customer service applications, or predictive maintenance, understanding the value of data streaming for model inference is crucial for leveraging AI/ML effectively.

Artificial Intelligence (AI) and Machine Learning (ML)

Artificial Intelligence (AI) and Machine Learning (ML) are pivotal in transforming how businesses operate by enabling systems to learn from data and make informed decisions. AI is a broad field that includes various technologies aimed at mimicking human intelligence, while ML is a subset focused on developing algorithms that allow systems to learn from data and improve over time without being explicitly programmed. The major use cases are predictive AI and generative AI.

AI/ML = Model Training, Model Deployment and Model Inference

In AI/ML workflows, model training, model deployment and model inference are distinct yet interconnected processes:

Model Training: Using historical data or credible synthetic to build a model that can recognize patterns and make predictions. It involves selecting the right algorithm, tuning parameters, and validating the model’s performance. Model training is typically resource intensive and performed in a long-running batch process, but can be via online learning or incremental learning, too.
Model Deployment: The trained model is deployed to the production environment, which could be cloud (e.g., AWS, Google Cloud, Azure or purpose-built SaaS offerings), edge devices (local devices or IoT for embedded inference), or on-premises servers (local servers for sensitive data or compliance reasons). If the demand is high, load balancers distribute requests across multiple instances to ensure smooth operation.
Model Inference: Once a model is trained, it is deployed to make predictions on new, unseen data. Model inference, often just called making a prediction, refers to this process. During inference, the model applies the patterns and knowledge it learned during model training to provide results.

For the terminology, keep in mind that model inference is generating predictions using a trained model, while model scoring (which is sometimes wrongly used as a synonym) involves evaluating the accuracy or performance of those predictions.

Batch vs. Real-Time Model Inference

Model inference can be done in real-time or batch mode, depending on the application’s requirements. When making predictions in production environments, the requirements often differ from model training because timely, accurate, and robust predictions are needed. The inference process involves feeding input data into the model and getting an output, which could be a classification, regression value, or other prediction types.

There are two primary delivery approaches to model inference: Remote Model Inference and Embedded Model Inference. Each deployment option has its trade-offs. The right choice depends on requirements like latency (real-time vs. batch) but also on other characteristics like robustness, scalability, cost, etc.

Remote Model Inference

Remote Model Inference involves making a request-response call to a model server via RPC, API, or HTTP. While this approach allows for centralized model management and easier updates, it can introduce latency because of network communication. It is suitable for scenarios where model updates are frequent, and the overhead of network calls is acceptable.

The service creation exposes the model through an API, so applications or other systems can interact with it for predictions. It can be a technical interface with all the details about the model or a function that hides the AI/ML capabilities under the hood of a business service.

Pros:

Centralized Model Management: Models are deployed and managed on a central server, making it easier to update, A/B test, monitor, and version them with no need to change the application.
Scalability: Remote inference can leverage the scalability of cloud infrastructure. This allows services to handle large volumes of requests by distributing the load across multiple servers.
Resource Efficiency: The client or edge devices do not need to have the computational resources to run the model, which is beneficial for devices with limited processing power.
Security: Sensitive models and data remain on the server, which can be more secure than distributing them to potentially insecure or compromised edge devices.
Ease of Integration: Remote models can be accessed via APIs, making it easier to integrate with different applications or services

Cons:

Latency: Remote inference typically involves network communication, which can introduce latency, especially if the client and server are geographically distant.
Dependency on Network Availability: The inference depends on the availability and reliability of the network. Any disruption can cause failed predictions or delays.
Higher Operational Costs: Running and maintaining a remote server or cloud service can be expensive, particularly for high-traffic applications.
Data Privacy Concerns: Sending data to the server for inference may raise privacy concerns, especially in regulated industries or when dealing with sensitive information.

Embedded Model Inference

In this approach, the model is embedded within the stream processing application. This reduces latency as predictions are made locally within the application, but it may require more resources on the processing nodes. Embedded inference is ideal for applications where low latency is critical, and model updates are less frequent.

Pros:

Low Latency: Since the model runs directly on the device, there is a minimal delay in processing, leading to near real-time predictions.
Offline Availability: Embedded models do not rely on a network connection, making them ideal for scenarios where connectivity is intermittent or unavailable.
Cost Efficiency: Once deployed, there are no ongoing costs related to server maintenance or cloud usage, making it more cost-effective.
Privacy: Data stays local to the device, which can help in adhering to privacy regulations and minimizing the risk of data breaches.
Independence from Central Infrastructure: Embedded models are not reliant on a central server, reducing the risk of a single point of failure.

Cons:

Resource Intensive: Embedded scoring requires sufficient computational resources. While hosting and running the model on servers or containers is the expensive part, models also need to be adjusted and optimized for a more lightwight deployment on devices with limited processing power, memory, or battery life.
Complex Deployment: Updating models across many devices can be complex and require robust version management strategies.
Model Size Limitations: There may be constraints on model complexity and size because of the limited resources on the edge device, potentially leading to the need for model compression or simplification.
Security Risks: Deploying models on devices can expose them to reverse engineering, tampering, or unauthorized access, potentially compromising the model’s intellectual property or functionality.

Hidden Technical Debt in AI/ML Systems

The Google paper “Hidden Technical Debt in Machine Learning Systems” sheds light on the complexities involved in maintaining AI/ML systems. It argues that, while the focus is often on the model itself, the surrounding infrastructure, data dependencies, and system integration can introduce significant technical debt. This debt manifests as increased maintenance costs, reduced system reliability, and challenges in scaling and adapting the system.

Source: Google

Important points from the paper include:

Complexity in Data Dependencies: AI/ML systems often rely on multiple data sources, each with its own schema and update frequency. Managing these dependencies can be challenging and error-prone.
Systems Integration Challenges: Integrating ML models into existing systems requires careful consideration of interfaces, data formats, and communication protocols.
Monitoring and Maintenance: Continuous monitoring and maintenance are essential to ensure model performance does not degrade over time because of changes in data distribution or system behavior.

The Impedance Mismatch within AI/ML between Analytics and Operations

The impedance mismatch between the operational estate (production engineers) and the analytical estate (data scientists/data engineers) primarily stems from their differing toolsets, workflows and SLA requirements regarding uptime, latency and scalability.

Production engineers often use Java or other JVM-based languages to build robust, scalable applications, focusing on performance and reliability. They work in environments that emphasize code stability, using tools like IntelliJ IDEA and frameworks that support CI/CD and containerization.

In contrast, data scientists and data engineers typically use Python because of its simplicity and the rich ecosystem of data science libraries. They often work in interactive environments like Jupyter Notebooks, which are geared towards experimentation and rapid prototyping rather than production-level code quality.

This mismatch can create challenges in integrating machine learning models into production environments. Production engineers prioritize performance optimization and scalability, while data scientists focus on model accuracy and experimentation. To bridge this gap, organizations can form cross-functional teams, adopt a data streaming platform like Apache Kafka, develop standardized APIs for model deployment, and provide training to align the skills and priorities of both groups. By doing so, they can streamline the deployment of machine learning models, ensuring they deliver business value effectively.

AI/ML in Practice: Use Cases across Industries for Model Inference

Many use cases for model inference are critical and require real-time processing and high reliability to ensure timely and accurate decision-making in various industries. A few examples of critical predictive AI and generative AI use cases are:

Use Cases for Predictive AI

Many predictive AI use cases are already in production across industries. For instance:

Fraud Detection: Real-time model inference can identify fraudulent transactions as they occur, allowing for immediate intervention. By analyzing transaction data in real-time, businesses can detect anomalies and flag suspicious activities before they result in financial loss.
Predictive Maintenance: By analyzing sensor data in real-time, organizations can predict equipment failures and schedule maintenance proactively. This approach reduces downtime and maintenance costs by addressing issues before they lead to equipment failure.
Customer Promotions: Retailers can offer personalized promotions to customers while they are still in the store or using a mobile app, enhancing the shopping experience. Real-time inference allows businesses to analyze customer behavior and preferences on the fly, delivering targeted offers that increase engagement and sales.

Use Cases for Generative AI

Early adoption use cases with user-facing value:

Semantic Search: Generative AI enhances semantic search by understanding the context and intent behind user queries, enabling more accurate and relevant search results. It leverages advanced language models to interpret nuanced language patterns, improving the search experience by delivering content that closely aligns with user needs.
Content Generation: GenAI, exemplified by tools like Microsoft Co-pilot, assists users by automatically creating text, code, or other content based on user prompts, significantly boosting productivity. It utilizes machine learning models to generate human-like content, streamlining tasks such as writing, coding, and creative projects, thereby reducing the time and effort required for content creation.

More advanced use cases with transactional implications that take a bit longer to adopt because of its business impact and technical complexity:

Ticket Rebooking: In the airline industry, generative AI can assist customer service agents in rebooking tickets by providing real-time, context-specific recommendations based on flight availability, customer preferences, and loyalty status. This transactional use case ensures that agents can offer personalized and efficient solutions, enhancing customer satisfaction and operational efficiency.
Customer Support: For a SaaS product, generative AI can analyze customer support interactions to identify common issues and generate insightful reports that highlight trends and potential areas for improvement. This analysis assists companies in resolving common issues, refining their support procedures, and enhancing the overall user satisfaction.

So, after all the discussions about AI/ML, what is the relation to data streaming specifically for model inference?

Data Streaming with Apache Kafka and Flink for Model Inference

A data streaming platform helps to enhance model inference capabilities. Apache Kafka and Flink provide a robust infrastructure for processing data in motion, enabling real-time predictions with low latency.

The benefits of using data streaming for model inference include:

Low Latency: Real-time stream processing ensures that predictions are made quickly, which is crucial for time-sensitive applications. Kafka and Flink handle high-throughput, low-latency data streams. This makes them ideal for real-time inference.
Scalability: Kafka and Flink can handle large volumes of data, making them suitable for applications with high throughput requirements. They can scale horizontally by adding more nodes to the cluster to ensure that the system can handle increasing data loads. A serverless data streaming cloud service like Confluent Cloud even provides complete elasticity and takes over the (complex) operations burden.
Robustness: These platforms are fault-tolerant, ensuring continuous operation even in the face of failures. They provide mechanisms for data replication, failover, and recovery, which are essential for maintaining system reliability. This can even span multiple regions or different public clouds like AWS, Azure, GCP, and Alibaba.
Critical SLAs: Kafka and Flink support stringent service level agreements (SLAs) for uptime and performance, which are essential for critical applications. They offer features like exactly-once processing semantics with a Transaction API and stateful stream processing. These capabilities are crucial for maintaining data integrity and consistency.

Remote vs. Embedded Model Inference with Kafka and Flink

Let’s explore concrete examples for model inference with the embedded and remote call approaches.

Remote Model Inference with Kafka, Flink and OpenAI

Here is an example with Kafka, Flink and OpenAI using the ChatGPT large language model (LLM) for generative AI. The process involves using Apache Kafka and Flink for stream processing to correlate real-time and historical data, which is then fed into the OpenAI API via a Flink SQL User Defined Function (UDF) to generate context-specific responses using the ChatGPT large language model. The generated responses are sent to another Kafka topic for downstream applications, such as ticket rebooking or updating loyalty platforms, ensuring seamless integration and real-time data processing.

Trade-offs using Kafka with an RPC-based model server and HTTP/gRPC:

Simple integration with existing technologies and organizational processes
Easiest to understand if you come from a non-streaming world
Tight coupling of the availability, scalability, and latency/throughput between application and model server
Separation of concerns (e.g. Python model + Java streaming app)
Limited scalability and robustness
Later migration to real streaming is also possible
Model management built-in for different models, versioning, and A/B testing
Model monitoring built-in (include real-time tracking of model performance metrics (e.g., accuracy, latency), resource utilization, data drift detection, and logging of predictions for auditing and troubleshooting)

In the meantime, some model servers like Seldon or Dataiku also provide remote model inference natively via the Kafka API. A Kafka-native streaming model server enables the separation of concerns by providing a model server with all the expected features. But the model server does not use RPC communication via HTTP/gRPC and all the drawbacks this creates for a streaming architecture. Instead, the model server communicates via the native Kafka protocol and Kafka topics with the client application. Therefore, the stream processing application built with Flink has an option to do event-driven integration for model inference.

Embedded Model Inference with Kafka, Flink and TensorFlow

Here is an example with Kafka, Flink and TensorFlow where the model is embedded into the stream processing application. Apache Kafka is used to ingest and stream data, while Apache Flink processes the data in real-time, embedding a TensorFlow model directly within the Flink application for immediate model inference. This integration allows for low-latency predictions and actions on streaming data, leveraging the model’s capabilities with no external service calls, thus enhancing efficiency and scalability.

Trade-offs of embedding analytic models into a Flink application:

Best latency as local inference instead of remote call
No coupling of the availability, scalability, and latency/throughput of your Kafka Streams application
Offline inference (devices, edge processing, etc.)
No side-effects (e.g., in case of failure), all covered by Kafka processing (e.g., exactly once)
No built-in model management and monitoring

I showed examples and use cases for embedding TensorFlow and H2O.ai models into Kafka Streams and KSQL many years ago already. With Apache Flink becoming the de facto standard for many stream processing scenarios, it is just natural that we see more adoption of Flink for AI/ML use cases.

The Impact of Kafka and Flink for Model Inference – Predictive AI vs. Generative AI (GenAI)

Predictive AI and Generative AI (GenAI) represent two distinct paradigms within the field of artificial intelligence, each with unique capabilities and architectural requirements. Understanding these differences is crucial for leveraging their potential in data streaming applications.

Predictive AI and Data Streaming

Predictive AI focuses on forecasting future events or outcomes based on historical data. It employs machine learning models that are trained to recognize patterns and correlations within datasets. These models are typically used for tasks like predicting customer behavior, detecting fraud, or forecasting demand.

Architecture: The architecture for predictive AI often involves a batch processing approach. Data is collected, pre-processed, and then fed into a machine learning model to generate predictions. The model is usually trained offline and then deployed to make predictions on new data. This process can be integrated into applications via APIs or embedded directly into systems for real-time inference.
Impact on Data Streaming: In predictive AI, data streaming can enhance the timeliness and relevance of predictions by providing real-time data inputs. Technologies like Apache Kafka and Flink can ingest and process data streams, enabling continuous model inference and updating predictions as new data arrives. This real-time capability is valuable in scenarios where immediate insights are critical, such as fraud detection or dynamic pricing.
Concrete Examples:
- Condition Monitoring and Predictive Maintenance in Manufacturing
- Fraud Detection and Prevention in FinServ, Gaming and Transportation

Generative AI (GenAI) and Data Streaming

Generative AI creates new content, such as text, images, or music, that mimics human behaviour or creativity. It uses advanced models such as large language models (LLMs) to generate outputs based on input prompts. Just keep in mind that GenAI is still predictive based on historical data; it just makes a lot of small predictions to generate something. For instance, with text, it predicts a word at a time, etc.

Architecture: The architecture for GenAI is more complex and requires real-time, contextualized data to produce accurate and relevant outputs. This is where Retrieval Augmented Generation (RAG) comes into play. RAG combines LLMs with vector databases and semantic search to provide the context for generation tasks. The architecture involves two major steps: data augmentation and retrieval. Data is first processed to create embeddings, which are stored in a vector database. When a prompt is received, the system retrieves relevant context from the database to inform the generation process.
Impact on Data Streaming: Data streaming is integral to GenAI architectures, particularly those employing RAG. Real-time data streaming platforms like Apache Kafka and Flink facilitate the ingestion and processing of data streams, ensuring that the LLMs have access to the most current and relevant information. This capability is crucial for preventing hallucinations (i.e., generating false or misleading information) and ensuring the reliability of GenAI outputs. By integrating data streaming with GenAI, organizations can create dynamic, context-aware applications that respond to real-time data inputs.
Concrete Examples:
- Customer Service for Travel Booking in Aviation
- Supporting Sales with Customer 360 Generating Cold Cold Proposals

Data Streaming as Data Pipeline for Model Training in Lakehouses AND for Robust Low-Latency Model Inference

Data streaming technologies play a pivotal role in both predictive AI and generative AI. Kafka and Flink improve the data quality and latency for data ingestion into data warehouses, data lakes, lakehouses for model training. And data streaming enhances model inference by improving the timeliness and accuracy of predictions in predictive AI and providing the context for content generation in GenAI.

By leveraging data streaming with Kafka and Flink, organizations can achieve real-time predictions with low latency, scalability, and robustness, meeting critical SLAs for various use cases. The choice between remote and embedded model inference depends on the specific requirements and constraints of the application, such as latency tolerance, model update frequency, and resource availability. Overall, data streaming provides a powerful foundation for deploying AI/ML solutions that deliver timely and actionable insights.

How do you leverage data streaming with Kafka and Flink in your AI/ML projects? Only as data ingestion layer into the lakehouse? Or also for more robust and performant model inference? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Real-Time Model Inference with Apache Kafka and Flink for Predictive AI and GenAI appeared first on Kai Waehner.

GenAI Demo with Kafka, Flink, LangChain and OpenAI

Kai Waehner — Mon, 29 Jan 2024 14:32:13 +0000

Generative AI (GenAI) enables automation and innovation across industries. This blog post explores a simple but powerful architecture and demo for the combination of Python, and LangChain with OpenAI LLM, Apache Kafka for event streaming and data integration, and Apache Flink for stream processing. The use case shows how data streaming and GenAI help to correlate data from Salesforce CRM, searching for lead information in public datasets like Google and LinkedIn, and recommending ice-breaker conversations for sales reps.

The Emergence of Generative AI

Generative AI (GenAI) refers to a class of artificial intelligence (AI) systems and models that generate new content, often as images, text, audio, or other types of data. These models can understand and learn the underlying patterns, styles, and structures present in the training data and then generate new, similar content on their own.

Generative AI has applications in various domains, including:

Image Generation: Generating realistic images, art, or graphics.
Text Generation: Creating human-like text, including natural language generation.
Music Composition: Generating new musical compositions or styles.
Video Synthesis: Creating realistic video content.
Data Augmentation: Generating additional training data for machine learning models.
Drug Discovery: Generating molecular structures for new drugs.

A key challenge of Generative AI is the deployment in production infrastructure with context, scalability, and data privacy in mind. Let’s explore an example of using CRM and customer data to integrate GenAI into an enterprise architecture to support sales and marketing.

Demo: LangChain + Kafka + Flink = Automated Cold Calls of Sales Leads with Salesforce CRM and LinkedIn Data

This article shows a demo that combines real-time data streaming powered by Apache Kafka and Flink with a large language model from OpenAI within LangChain. If you want to learn more about data streaming with Kafka and Flink in conjunction with Generative AI, check out these two articles:

The following demo is about supporting sales reps or automated tools with Generative AI:

The Salesforce CRM creates new leads through other interfaces or by the human manually.
The sales rep / SDR receives lead information in real time to call the prospect.
A special GenAI service leverages the lead information (name and company) to search the web (mainly LinkedIn) to generate helpful content for the cold call of the lead, including: Summary, two interesting facts, topic of interest, and two creative ice-breaker for initiating a conversation.

Kudos to my colleague Carsten Muetzlitz who built the demo. The code is available on Github. Here is the architecture of the demo:

Technologies and Infrastructure in the Demo

The following technologies and infrastructure are used to implement and deploy the GenAI demo.

Python: The programming language almost every data engineer and data scientist uses.
LangChain: The Python framework implements the application to support sales conversations.
OpenAI: The language model and API help to build simple but powerful GenAI applications.
Salesforce: The cloud CRM tool stores the lead information and other sales and marketing data.
Apache Kafka: Scalable real-time data hub decoupling the data sources (CRM) and data sinks (GenAI application and other services).
Kafka Connect: Data integration via Change Data Capture (CDC) from Salesforce CRM.
Apache Flink: Stream processing for enrichment and data quality improvements of the CRM data.
Confluent Cloud: Fully managed Kafka (stream and store), Flink (process), and Salesforce connector (integrate).
SerpAPI: Scrape Google and other search engines with the lead information.
proxyCurl: Pull rich data about the lead from LinkedIn without worrying about scaling a web scraping and data-science team.

Live Demo of Python + LangChain + OpenAI + Kafka + Flink

Here is a 15 minute video walking you through the demo:

Use case
Technical architecture
GitHub project with Python code using Kafka and LangChain
Fully managed Kafka and Flink in the Confluent Cloud UI
Push new leads in real-time from Salesforce CRM via CDC using Kafka Connect
Streaming ETL with Apache Flink
Generative AI with Python, LangChain and OpenAI

Missing: No Vector DB and RAG with Model Embeddings in the LangChain Demo

This demo does NOT use advanced GenAI technologies for RAG (retrieval augmented generation), model embeddings, or vector search via a Vector database (Vector DB) like Pinecone, Weaviate, MongoDB or Oracle.

The principle of the demo is KISS (“keep it as simple as possible”). These technologies can and will be integrated into many real-world architectures.

The demo has limitations regarding latency and scale. Kafka and Flink run as fully managed and elastic SaaS. But the AI/ML part around LangChain could have improved latency, using a SaaS for hosting, and integration with other dedicated AI platforms. Especially data-intensive applications will need a vector database and advanced retrieval and semantic search technologies like RAG.

Fun fact: The demo breaks when I search for my name instead of Carsten’s. Because the web scraper finds too much content in the web about me and as a result the LangChain app crashes… This is a compelling event for complementary technologies like Pinecone or MongoDB that can do indexing, RAG and semantic search at scale. These technologies provide fully managed integration with Confluent Cloud so the demo could easily be extended.

The Role of LangChain in GenAI

LangChain is an open-source framework for developing applications powered by language models. LangChain is also the name of the commercial vendor behind the framework. The tool provides the needed “glue code” for data engineers to build GenAI applications with intuitive APIs for chaining together large language models (LLM), prompts with context, agents that drive decision making with stateful conversations, and tools that integrate with external interfaces.

LangChain supports:

Context-awareness: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

The main value props of the LangChain packages are:

Components: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not.
Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks.

Together, these products simplify the entire application lifecycle:

Develop: Write your applications in LangChain/LangChain.js. Hit the ground running using Templates for reference.
Productionize: Use LangSmith to inspect, test and monitor your chains, so that you can constantly improve and deploy with confidence.
Deploy: Turn any chain into an API with LangServe.

LangChain in the Demo

The demo uses several LangChain concepts such as Prompts, Chat Models, Chains using the LangChain Expression Language (LCEL), Agents using a language model to choose a sequence of actions to take

Here is the logical flow of the LangChain business process:

Get new leads: Collect full name and company of the lead from Salesforce CRM in real-time from a Kafka Topic.
Find LinkedIn profile: Use the Google Search API “SerpAPI” to search for the URL of the lead’s LinkedIn profile.
Collect information about the lead: Use Proxycurl to collect the required information about the lead from LinkedIn.
Create cold call recommendations for the sales rep or automated script: Ingest all information into the ChatGPT LLM via OpenAI API and send the generated text to a Kafka Topic.

The following screenshot shows a snippet of the generated content. It includes context-specific icebreaker conversations based on the LinkedIn profile. For the context, Carsten worked at Oracle for 24 years before joining Confluent. The LLM uses this context of the LangChain prompt to generate related content:

The Role of Apache Kafka in GenAI

Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. It plays a crucial role in handling and managing large volumes of data streams efficiently and reliably.

Generative AI typically involves models and algorithms for creating new data, such as images, text, or other types of content. Apache Kafka supports Generative AI by providing a scalable and resilient infrastructure for managing data streams. In a Generative AI context, Kafka can be used for:

Data Ingestion: Kafka can handle the ingestion of large datasets, including the diverse and potentially high-volume data needed to train Generative AI models.
Real-time Data Processing: Kafka’s real-time data processing capabilities help in scenarios where data is constantly changing, allowing for the rapid updating and training of Generative AI models.
Event Sourcing: Event sourcing with Kafka captures and stores events that occur over time, providing a historical record of data changes. This historical data is valuable for training and improving Generative AI models.
Integration with other Tools: Kafka can be integrated into larger data processing and machine learning pipelines, facilitating the flow of data between different components and tools involved in Generative AI workflows.

While Apache Kafka itself is a tool specifically designed for Generative AI, its features and capabilities contribute to the overall efficiency and scalability of the data infrastructure. Kafka’s capabilities are crucial when working with large datasets and complex machine learning models, including those used in Generative AI applications.

Apache Kafka in the Demo

Kafka is the data fabric connecting all the different applications. Ensuring data consistency is a sweet spot of Kafka. No matter if a data source or sink is real time, batch or a request-response API.

In this demo, Kafka consumes events from Salesforce CRM as the main data source of customer data. Different applications (Flink, LangChain, Salesforce) consume the data in different steps of the business process. Kafka Connect provides the capability for data integration with no need for another ETL, ESB or iPaaS tool. This demo uses Confluent’s Change Data Capture (CDC) connector to consume changes from the Salesforce database in real-time for further processing.

Fully managed Confluent Cloud is the infrastructure for the entire Kafka and Flink ecosystem in this demo. The focus of the developer should always build business logic, not worrying about operating infrastructure.

While the heart of Kafka is event-based, real-time and scalable, it also enables domain-driven design and data mesh enterprise architectures out-of-the-box.

The Role of Apache Flink in GenAI

Apache Flink is an open-source distributed stream processing framework for real-time analytics and event-driven applications. Its primary focus is on processing continuous streams of data efficiently and at scale. While Apache Flink itself is not a specific tool for Generative AI, it plays a role in supporting certain aspects of Generative AI workflows. Here are a few ways in which Apache Flink is relevant:

Real-time Data Processing: Apache Flink can process and analyze data in real-time, which can be useful for scenarios where Generative AI models need to operate on streaming data, adapting to changes and generating responses in real-time.
Event Time Processing: Flink has built-in support for event time processing, allowing for the handling of events in the order they occurred, even if they arrive out of order. This can be beneficial in scenarios where temporal order is crucial, such as in sequences of data used for training or applying Generative AI models.
Stateful Processing: Flink supports stateful processing, enabling the maintenance of state across events. This can be useful in scenarios where the Generative AI business process needs to maintain context or memory of past events to generate coherent and context-aware outputs.
Integration with Machine Learning Libraries: While Flink itself is not a machine learning framework, it can be integrated with other tools and libraries that are used in machine learning, including those relevant to Generative AI. This integration can facilitate the deployment and execution of machine learning models within Flink-based streaming applications.

The specific role of Apache Flink in Generative AI depends on the particular use case and the architecture of the overall system.

Apache Flink in the Demo

This demo leverages Apache Flink for streaming ETL (enrichment, data quality improvements) of the incoming Salesforce CRM events.

FlinkSQL provides a simple and intuitive way to implement ETL with any Java or Python code. Fully managed Confluent Cloud is the infrastructure for Kafka and Flink in this demo. Serverless FlinkSQL allows you to scale up as much as needed, but also scale down to zero if no events are consumed and processed.

The demo is just the starting point. Many powerful applications can be built with Apache Flink. This includes streaming ETL, but also business applications like you find them at Netflix, Uber and many other tech giants.

LangChain + Fully-Managed Kafka and Flink = Simple, Powerful Real-Time GenAI

LangChain is an easy-to-use AI/ML framework to connect large language models to other data sources and create valuable applications. The flexibility and open approach enables developers and data engineers to build all sorts of applications, from chatbots to smart systems that answer your questions.

Data streaming with Apache Kafka and Flink provide a reliable and scalable data fabric for data pipelines and stream processing. The event store of Kafka ensures data consistency across real-time, batch, and request-response APIs. Domain-driven design, microservice architectures and data products build in a data mesh more and more leverage on Kafka for these reasons.

The combination of LangChain, GenAI technologies like OpenAI and data streaming with Kafka and Flink make a powerful combination for context-specific decision in real-time powered by AI.

Most enterprises have a cloud-first strategy for AI use cases. Data streaming infrastructure is available in SaaS like Confluent Cloud so that the developers can focus on business logic with much faster time-to-market. Plenty of alternatives exist for building AI applications with Python (the de facto standard for AI). For instance, you could build a user-defined function (UDF) in a FlinkSQL application executing the Python code and consuming from Kafka. Or use a separate application development framework and cloud platform like Quix Streams or Bytewax for Python apps instead of a framework like LangChain.

How do you combine Python, LangChain and LLMs with data streaming technologies like Kafka and Flink? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post GenAI Demo with Kafka, Flink, LangChain and OpenAI appeared first on Kai Waehner.

The State of Data Streaming for Healthcare with Apache Kafka and Flink

Kai Waehner — Mon, 27 Nov 2023 13:52:35 +0000

This blog post explores the state of data streaming for the healthcare industry. The digital disruption combined with growing regulatory requirements and IT modernization efforts require a reliable data infrastructure, real-time end-to-end observability, fast time-to-market for new features, and integration with pioneering technologies like sensors, telemedicine, or AI/machine learning. Data streaming allows integrating and correlating legacy and modern interfaces in real-time at any scale to improve most business processes in the healthcare sector much more cost-efficiently.

I look at trends in the healthcare industry to explore how data streaming helps as a business enabler, including customer stories from Humana, Recursion, BHG (former Bankers Healthcare Group), Evernorth Health Services, and more. A complete slide deck and on-demand video recording are included.

General trends in the healthcare industry

The digitalization of the healthcare sector and disruptive use cases is exciting. Countries where healthcare is not part of the public administration innovate quickly. However, regulation and data privacy are crucial across the world. And even innovative technologies and cloud services need to comply with law and in parallel connect to legacy platforms and protocols.

Regulation and interoperability

Healthcare does often not have a choice. Regulations by the government must be implemented by a specific deadline. IT modernization, adoption of new technologies, and integration with the legacy world are mandatory. Many regulations demand Open APIs and interfaces. But even if not enforced, the public sector does itself a favour adopting open technologies for data sharing between different sectors and the members.

A concrete example: Interoperability and Patient Access final rule (CMS-9115-F), as explained by a US government, “aims to put patients first, giving them access to their health information when they need it most and, in a way, they can best use it.

Interoperability = Interoperability is the ability of two or more systems to exchange health information and use the information once it is received.
Patient Access = Patient Access refers to the ability of consumers to access their health care records.

Lack of seamless data exchange in healthcare has historically detracted from patient care, leading to poor health outcomes, and higher costs. The CMS Interoperability and Patient Access final rule establishes policies that break down barriers in the nation’s health system to enable better patient access to their health information, improve interoperability and unleash innovation while reducing the burden on payers and providers.

Patients and their healthcare providers will be more informed, which can lead to better care and improved patient outcomes, while reducing burden. In a future where data flows freely and securely between payers, providers, and patients, we can achieve truly coordinated care, improved health outcomes, and reduced costs.”

Digital disruption and automated workflows

Gartner has a few interesting insights about the evolution of the healthcare sector. The digital disruption is required to handle revenue reduction and revenue reinvention because of economic pressure, scarce and extensive talent, and supply challenges:

Gartner points out that real-time workflows and automation are critical across the entire health process to enable an optimal experience:

Therefore, data streaming is very helpful in implementing new digitalized healthcare processes.

Data streaming in the healthcare industry

Adopting healthcare trends like telemedicine, automated member service with Generative AI (GenAI), or automated claim processing are only possible if enterprises in the games sector can provide and correlate information at the right time in the proper context. Real-time, which means using the information in milliseconds, seconds, or minutes, is almost always better than processing data later:

Data streaming combines the power of real-time messaging at any scale with storage for true decoupling, data integration, and data correlation capabilities. Apache Kafka is the de facto standard for data streaming.

The following blog series about data streaming with Apache Kafka in the healthcare industry is a great starting point to learn more about data streaming in the health sector, including a few industry-specific and Kafka-powered case studies:

Data Streaming Use Cases and Architectures for Healthcare (including Slide Deck)
Legacy Modernization and Hybrid Cloud (Optum / UnitedHealth Group, Centene, Bayer)
Streaming ETL (Bayer, Babylon Health)
Real-time Analytics (Cerner, Celmatix, CDC/Centers for Disease Control and Prevention)
Machine Learning and Data Science (Recursion, Humana)
Open API and Omnichannel (Care.com, Invitae)

Architecture trends for data streaming

The healthcare industry applies various software development and enterprise architecture trends for cost, elasticity, security, and latency reasons. The three major topics I see these days at customers are:

Event-driven architectures (in combination with request-response communication) to enable domain-driven design and flexible technology choices
Data mesh for building new data products and real-time data sharing with internal platforms and partner APIs
Fully managed SaaS (whenever doable from compliance and security perspective) to focus on business logic and faster time-to-market

Let’s look deeper into some enterprise architectures that leverage data streaming for healthcare use cases.

Event-driven architecture for integration and IT modernization

IT modernization requires integration between legacy and modern applications. The integration challenges include different protocols (often proprietary and complex), various communication paradigms (asynchronous, request-response, batch), and SLAs (transactions, analytics, reporting).

Here is an example of a data integration workflow combining clinical health data and claims in EDI / EDIFACT format, data from legacy databases, and modern microservices:

One of the biggest problems in IT modernization is data consistency between files, databases, messaging platforms, and APIs. That is a sweet spot for Apache Kafka: Providing data consistency between applications no matter what technology, interface or API they use.

Data sharing across business units is important for any organization. The healthcare industry has to combine very interesting (different) data sets, like big data game telemetry, monetization and advertisement transactions, and 3rd party interfaces.

Data consistency is one of the most challenging problems in the games sector. Apache Kafka ensures data consistency across all applications and databases, whether these systems operate in real-time, near-real-time, or batch.

One sweet spot of data streaming is that you can easily connect new applications to the existing infrastructure or modernize existing interfaces, like migrating from an on-premise data warehouse to a cloud SaaS offering.

New customer stories for data streaming in the healthcare sector

The innovation is often slower in the healthcare sector. Automation and digitalization change how healthcare companies process member data, execute claim processing, integrate payment processors, or create new business models with telemedicine or sensor data in hospitals.

Most healthcare companies use a hybrid cloud approach to improve time-to-market, increase flexibility, and focus on business logic instead of operating all IT infrastructure on premises. The integration between legacy protocols like EDIFACT and modern applications is still one of the toughest challenges.

Here are a few customer stories from healthcare organizations for IT modernization and innovation with new technologies:

BHG Financial (formerly: Bankers Healthcare Group): Direct lender for healthcare professionals offering loans, credit card, insurance
Evernorth Health Services: Hybrid integration between on-premise mainframe and microservices on AWS cloud
Humana: Data integration and analytics at the point of care
Recursion: Accelerating drug discovery with a hybrid machine learning architecture

Resources to learn more

This blog post is just the starting point. Learn more about data streaming with Apache Kafka and Apache Flink in the healthcare industry in the following on-demand webinar recording, the related slide deck, and further resources, including pretty cool lightboard videos about use cases.

On-demand video recording

The video recording explores the healthcare industry’s trends and architectures for data streaming. The primary focus is the data streaming architectures and case studies.

I am excited to have presented this webinar in my interactive light board studio:

This creates a much better experience, especially in a time after the pandemic, where many people are “Zoom fatigue”.

Check out our on-demand recording:

Slides

If you prefer learning from slides, check out the deck used for the above recording:

Fullscreen Mode

Case studies and lightboard videos for data streaming in the healthcare industry

The state of data streaming for healthcare in 2023 is interesting. IT modernization is the most important initiative across most healthcare companies and organizations. This includes cost reduction by migrating from legacy infrastructure like the mainframe, hybrid cloud architectures with bi-directional data sharing, and innovative new use cases like telehealth.

We recorded lightboard videos showing the value of data streaming simply and effectively. These five-minute videos explore the business value of data streaming, related architectures, and customer stories. Here is an example of cost reduction through mainframe offloading.

Healthcare is just one of many industries that leverages data streaming with Apache Kafka and Apache Flink.. Every month, we talk about the status of data streaming in a different industry. Manufacturing was the first. Financial services second, then retail, telcos, gaming, and so on… Check out my other blog posts.

How do you modernize IT infrastructure in the healthcare sector? Do you already leverage data streaming with Apache Kafka and Apache Flink? Maybe even in the cloud as a serverless offering? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post The State of Data Streaming for Healthcare with Apache Kafka and Flink appeared first on Kai Waehner.

Apache Kafka + Vector Database + LLM = Real-Time GenAI

Kai Waehner — Wed, 08 Nov 2023 06:29:56 +0000

Generative AI (GenAI) enables advanced AI use cases and innovation but also changes how the enterprise architecture looks like. Large Language Models (LLM), Vector Databases, and Retrieval Augmentation Generation (RAG) require new data integration patterns and data engineering best practices. Data streaming with Apache Kafka and Apache Flink play a key role to ingest and curate incoming data sets in real-time at scale, connecting various databases and analytics platforms, and decouple independent business units and data products. This blog post explores possible architectures, examples, and trade-offs between event streaming and traditional request-response APIs and databases.

Use Cases for Apache Kafka and GenAI

Generative AI (GenAI) is the next-generation AI engine for natural language processing (NLP), image generation, code optimization, and other tasks. It helps many projects in the real world for service desk automation, customer conversation with a chatbot, content moderation in social networks, and many other use cases.

Apache Kafka became the predominant orchestration layer in these machine learning platforms for integrating various data sources, processing at scale, and real-time model inference.

Data streaming with Kafka already powers many GenAI infrastructures and software products. Very different scenarios are possible:

Data streaming as data fabric for the entire machine learning infrastructure
Model scoring with stream processing for real-time predictions and generation of content
Generation of streaming data pipelines with input text, speech, or images
Real-time online training of large language models

I explored these use cases, including real-world examples like Expedia, BMW and Tinder, in the blog post “Apache Kafka as Mission Critical Data Fabric for GenAI“.

The following looks at a concrete architecture for the combination of large language models (LLM), retrieval augmented generation (RAG) with vector databases and semantic search, and data streaming with Apache Kafka and Flink.

Why Generative AI Differs from Traditional Machine Learning Architectures?

Machine Learning (ML) allows computers to find hidden insights without being programmed where to look. This is called model training, a batch process analyzing big data sets. The output is a binary file, the analytic model.

Applications apply these models to new incoming events to make predictions. This is called model scoring, and can happen in real time or batch by embedding the model into an application or by doing a request-response API call to a model server (that deployed the model).

However, LLMs and GenAI have different requirements and patterns compared to traditional ML processes, as my former colleague Michael Drogalis explained in two simple, clear diagrams.

Traditional Predictive Machine Learning with Complex Data Engineering

Predictive Artificial Intelligence makes predictions. Purpose-built models. Offline training. That is how we did machine learning for the last decade or so.

In traditional ML, most of the data engineering work happens at model creation time. A lot of expertise and efforts are required for feature engineering and model training:

New use cases require a new model built by the data engineers and data scientists.

The Democratization of AI with Generative AI using Large Language Models (LLM)

Generative Artificial Intelligence (GenAI) creates content. Reusable models. In-context learning.

But with large language models, data engineering happens with every query. Different applications re-use the same model:

Challenges of Large Language Models for GenAI Use Cases

Large Language Models (LLM) are reusable. That enables democratization of AI, as not every team needs AI expertise. Instead, low AI expertise suffices to use existing LLMs.

However, a few huge trade-offs of LLMs exist:

Expensive training: LLMs like ChatGPT cost millions of dollars in compute resources (this does not include the required expertise to build the model)
Static data: LLMs are “frozen in time” meaning the model does not have up-to-date information.
Lack of domain knowledge: LLMs usually learn from public data sets. Hence, data engineers scrape the worldwide web and feed it into the model training. However, enterprises need to use the LMM in their own context to provide business value.
Stupidity: LLMs are not intelligent like a human. For instance, ChatGPT cannot even count the number of words in a sentence you prompt to it.

These challenges create so-called hallucination…

Avoiding Hallucination to Generate Reliable Answers

Hallucination, i.e. best guess answers, is the consequence; and the LLM does not tell you it is making things up. Hallucination is a phenomenon where the AI model generates content that is not based on real data or information, but creates entirely fictional or unrealistic outputs. Hallucinations can occur when a generative model, such as a text or image generator, generates content that is not coherent, factual, or relevant to the input data or context. These hallucinations can manifest as text, images, or other types of content that seem to be plausible but are entirely fabricated by the model.

Hallucinations can be problematic in generative AI because they can lead to the generation of misleading or false information.

For these reasons, a new design pattern emerged for Generative AI: Retrieval Augmented Generation (RAG). Let’s first look at this new best practice, and then explore why data streaming with technologies like Apache Kafka and Flink is a fundamental requirement for GenAI enterprise architectures.

Semantic Search and Retrieval Augmented Generation (RAG)

Many GenAI-enabled applications follow the design pattern of Retrieval Augmented Generation (RAG) to combine the LLM with accurate and up-to-date context. The team behind Pinecone, a fully managed vector database, has a great explanation using this diagram:

Source: Pinecone

At a high level, RAG is typically two separate steps. The first is a data augmentation step where disparate (often unstructured) operational data is chunked and then embeddings are created using an embedding model. The embeddings are indexed into a vector database. The vector database is a tool for enabling semantic search to find relevant context for prompts that do not require exact keyword matching.

Second comes the inference step, where the GenAI model receives the question and context to generate a reliable answer (without hallucination). RAG does not update embeddings, but retrieves relevant information to send to the LLM along with the prompt.

Vector Databases for Semantic Search with Embeddings

A vector database, also known as a vector storage or vector index, is a type of database that is specifically designed to store and retrieve vector data efficiently. In this context, vector data refers to collections of numerical vectors, which can represent a wide range of data types, such as embeddings of text, images, audio, or any other structured or unstructured data. Vector databases are useful in applications related to machine learning, data retrieval, recommendation systems, similarity search, and more.

Vector databases excel at performing similarity searches, often called semantic search. They can quickly find vectors that are similar or close to a given query vector based on various similarity metrics, such as cosine similarity or Euclidean distance.

Vector database is not (necessarily) a separate database category. Gradient Flow explains in its best practices for Retrieval Augmented Generation:

“Vector search is no longer limited to vector databases. Many data management systems – including PostgreSQL – now support vector search. Depending on your specific application, you might find a system that meets your specific needs. Is near real-time or streaming a priority? Check Rockset’s offering. Are you already using a knowledge graph? Neo4j’s support for vector search means your RAG results will be easier to explain and visualize.”

For another concrete example, look at MongoDB’s tutorial for “Building Generative AI Applications Using MongoDB: Harnessing the Power of Atlas Vector Search and Open Source Models“. There are various options for combining a vector database for GenAI use cases with Apache Kafka. The following is a possible architecture in an event-driven world.

Event-driven Architecture: Data Streaming + Vector DB + LLM

Event-driven applications can make both steps of Retrieval Augment Generation (RAG), data augmentation and model inference, more effectively implemented. Data Streaming with Apache Kafka and Apache Flink enables consistent synchronization of data at any scale (in real-time, if the application or database can handle it) and data curation (= streaming ETL).

The following diagram shows an enterprise architecture leveraging event-driven data streaming for data ingestion and processing across the entire GenAI pipeline:

This example uses data streaming to ingest flight bookings and changes in real-time into Kafka’s event store for later processing with GenAI technologies. Flink preprocesses the data before it is calling an embedding model to generate embeddings for the vector database. In parallel, a real-time customer service application built with Python consumes all relevant contextual data (e.g., flight data, customer data, embeddings, etc.) to prompt the large language model. The LLM creates a reliable prediction, e.g. a recommendation to rebook a passenger to another flight.

In most enterprise scenarios, all the processing operates behind the enterprise firewall for security and data privacy reasons. The LLM can even be integrated with transactional systems, like the booking engine, to execute the rebooking and feed the outcome into relevant applications and databases.

Request-Response with API vs. Event-driven Data Streaming

In an ideal world, everything is event-based and streaming data. The real-world is different. Hence, API calls with request-response with HTTP/REST or SQL are totally fine in some parts of the enterprise architecture. As Kafka truly decouples systems, each application chooses its own communication paradigm and processing speed. Therefore, it is important to understand the Trade-offs between HTTP/REST API and Apache Kafka.

When to use Request-Response with Apache Kafka? – this decision is often made on trade-offs like latency, decoupling, or security. However, with large LLMs, the situation changes. As an LLM is very expensive to train, the reusability of existing LLMs is crucial. And embedding an LLM into a Kafka Streams or Flink application makes little sense in contrary to other models built with algorithms like decision trees, clustering, or even small neural networks.

Similarly, augmentation models are usually integrated via an RPC/API call. By embedding it into the Kafka Streams microservice or Flink job, the augmentation model becomes tightly coupled. And experts host many of them today because operating and optimizing them is not trivial.

Solutions hosting LLMs and augmentations models typically only provide an RPC interface like HTTP. This will probably change in the future as request-response is an anti-pattern for streaming data. An excellent example of the evolution of model servers is Seldon; providing a Kafka-native interface in the meantime. Read more about request-response vs. streaming model serving in the article Streaming Machine Learning with Kafka-native Model Deployment.

Direct Integration between an LLM and the Rest of the Enterprise?

While writing this article, OpenAI announced GPTs to create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills. For enterprise usage, the most interesting capability is that developers can connect OpenAI’s GPTs to the real world, i.e. other software applications, databases, and cloud services:

“In addition to using our built-in capabilities, you can also define custom actions by making one or more APIs available to the GPT. Like plugins, actions allow GPTs to integrate external data or interact with the real-world. Connect GPTs to databases, plug them into emails, or make them your shopping assistant. For example, you could integrate a travel listings database, connect a user’s email inbox, or facilitate e-commerce orders.”

The trade-offs using direct integration are tight coupling and point-to-point communication. If you already use Kafka, you understand the value of domain-driven design with true decoupling.

For more details about the importance and value of true decoupling in a microservice or data mesh architecture, check out these two articles to learn the difference between tight coupling with traditional middleware and decoupled data products with event-based data streaming:

Last but not least: Public GenAI APIs and LLMs have a weak security and governance strategy. As AI data needs emerge and the number of point-to-point integrations increase, data access, lineage, and security challenges escalate.

Data Streaming with Kafka, Flink and GenAI in Practice

After a lot of theory, let’s look at a concrete example, a demo, and a real-world case study combining data streaming with GenAI:

Example: Flink SQL + OpenAI API
Demo: ChatGPT 4 + Confluent Cloud + MongoDB Atlas for RAG and Vector Search
Success story: Elemental Cognition – A Real-Time AI Platform powered by Confluent Cloud

Example: Flink SQL + OpenAI API

Stream processing with Kafka and Flink enables data correlation of real-time and historical data. An excellent example, especially for Generative AI, is context-specific customer service. We stay with the airline example and flight cancellations here.

A stateful stream processor takes existing customer information from the CRM, loyalty platform, and other applications, correlates it with the query from the customer into the chatbot, and makes an RPC call to an LLM.

The following diagram uses Apache Flink with a Flink SQL User Defined Function (UDF). The SQL query feeds the preprocessed data into the OpenAI API to get a reliable answer. The answer is sent to another Kafka topic from where downstream applications use it, e.g. for ticket rebooking, updating the loyalty platform, and also storing the data in a data lake for later batch processing and analytics.

Demo: ChatGPT 4 + Confluent Cloud + MongoDB Atlas for RAG and Vector Search

My colleague Britton LaRoche built a fantastic retail demo showing the combination of Kafka for data integration and processing and MongoDB for storage and semantic vector search. D-ID, an AI video creation platform, makes the demo more beautiful by replacing a command-line interface with a visual AI avatars.

The fully managed and deeply integrated services Confluent Cloud and MongoDB Atlas allow to focus on building business logic.

The architecture differs from my above event-based streaming example. The core is still Kafka to truly decouple applications. Most services are integrated via request-response APIs. This is simple, well understood, and often good enough. And you can easily migrate later to event-based patterns using a Python Kafka API, change data capture (CDC) from Kafka, embedding a LangChain Python UDF into Apache Flink, or using an asynchronous interface like AsyncAPI.

Here is a short five-minute demo walking you through the demo with RAG and semantic search using MongoDB Atlas, Confluent as integration hub, and D-ID as the communication interface with the end user:

Success Story: Elemental Cognition – A Real-Time GenAI Platform powered by Kafka and Confluent Cloud

Dr. David Ferrucci, the renowned AI researcher and inventor of IBM’s breakthrough Watson technology, founded Elemental Cognition in 2015. The company leverages GenAI to accelerate and improve critical decision-making, where trust, precision, and transparency matter.

Elemental Cognition technology can be used across industries and use cases. Primary targets are Healthcare / Life Science, Investment Management, Intelligence, Logistics & Scheduling, and Contact Centers.

The AI platform develops responsible and transparent AI that helps solve problems and deliver expertise that can be understood and trusted.

Elemental Cognition’s approach combines different AI strategies in a novel architecture that acquires and reasons over human-readable knowledge to solve problems collaboratively and dynamically. The result is more transparent and cost-effective delivery of expert problem-solving intelligence into conversational and discovery applications.

Confluent Cloud powers the AI platform to enable scalable real-time data and data integration use cases. I recommend looking at their website to learn from various impressive use cases.

Apache Kafka as Central Nervous System for GenAI Enterprise Architectures

Generative AI (GenAI) requires changes to the AI/ML enterprise architecture. Augmentation Models, LLMs, RAG with vector databases and semantic search need data integration, correlation, and decoupling. Data Streaming with Kafka and Flink is here to help.

Many applications and databases use a request-response communication with REST/HTTP, SQL, or other interfaces. That is totally fine. Choose the right technology and integration layer for your data products and applications. But secure data consistency.

Data streaming with Apache Kafka and Apache Flink enables developers and data engineers focusing on business problems in their data products or integration projects because it truly decoupled different domains. Integration with Kafka is possible with HTTP, Kafka APIs, AsyncAPI, CDC from a database, SaaS interfaces, and many other options.

Kafka enables to connect the systems with any communication paradigm. Its event store shares data in milliseconds (even at extreme scale) but also persists the data for slower downstream applications and replaying historical data. The heart of the data mesh must beat in real time. That’s true for any good enterprise architecture. GenAI is no difference.

How do you build conversational AI, chatbots, and other GenAI applications leveraging Apache Kafka? Did you build RAG in real-time with Flink and a Vector Database to provide the right context for the LLM? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka + Vector Database + LLM = Real-Time GenAI appeared first on Kai Waehner.

How Lufthansa uses Apache Kafka for Middleware and Analytics

Kai Waehner — Sun, 24 Sep 2023 16:30:22 +0000

Aviation and travel are notoriously vulnerable to social, economic, and political events, as well as the ever-changing expectations of consumers. The coronavirus was just a piece of the challenge. This post explores how Lufthansa leverages data streaming powered by Apache Kafka as cloud-native middleware for mission-critical data integration projects and as data fabric for AI/machine learning scenarios such as real-time predictions in fleet management. An interactive conversation with Lufthansa as an on-demand video is added at the end as a highlight if you want to learn more.

Data streaming in the aviation industry

The future business of airlines and airports will be digitally integrated into the ecosystem of partners and suppliers. Companies will provide more personalized customer experiences and be enabled by a new suite of the latest technologies, including automation, robotics, and biometrics.

The entire aviation industry leverages data streaming powered by Apache Kafka already. This includes airlines, airports, global distribution systems (GDS), aircraft manufacturers, travel agencies, etc. Why? Because real-time data beats slow data across almost all use cases.

Learn more in my blog about “Apache Kafka in the Airline, Aviation and Travel Industry” covering companies like Singapore Airlines, Air France, and Amadeus.

This article focuses on data streaming in critical Lufthansa projects. Lufthansa is a major German airline and one of the largest in Europe. It is known for its extensive network of domestic and international flights. Lufthansa offers services ranging from passenger transportation to cargo logistics and is a member of the Star Alliance, one of the world’s largest airline alliances.

Apache Kafka as next-generation middleware replacing ETL, ESB, and iPaaS

Typically, an enterprise service bus (ESB) or other integration solutions like extract-transform-load (ETL) tools have been used trying to decouple systems. However, the sheer number of connectors, as well as the requirement that applications publish and subscribe to the data at the same time, mean that systems are always intertwined. As a result, development projects depend on other systems, and nothing can be truly decoupled.

Many enterprises leverage the ecosystem of Apache Kafka for successful integration of different legacy and modern applications. Data streaming differs but also complements existing integration solutions like ESB or ETL tools. Apache Kafka is unique because it combines the following characteristics into a single middleware platform:

Real-time messaging at any scale
Event store for true decoupling, backpressure handling, and replayability of historical events
Data integration eliminating the need for additional integration tools
Stream processing for stateless and stateful data correlation of real-time and historical data

“Apache Kafka vs. Enterprise Service Bus (ESB) – Friends, Enemies or Frenemies?” explores how data streaming with Kafka complements legacy middleware. If your workloads run mostly in the public cloud, you need to understand the difference between Integration Platform as a Service (iPaaS) and data streaming powered by fully-managed Kafka infrastructure.

Lufthansa uses Apache Kafka as cloud-native middleware for mission-critical integrations

Lufthansa leverages data streaming with Confluent as cloud-native middleware for its strategic integration project KUSCO (Kafka Unified Streaming Cloud Operations).

The team discussed the benefits of using Apache Kafka instead of traditional messaging queues (TIBCO EMS, IBM MQ) for data processing. My two favorite statements:

“Scaling Kafka is really inexpensive”
“Kafka adopted and integrated within 3 months”

Lufthansa’s Kafka architecture does not have any surprises. A key lesson learned from many companies: The real added value is created when you leverage Kafka not just for messaging, but its entire ecosystem, including different clients/proxies, connectors, stream processing, and data governance.

The result at Lufthansa: A better, cheaper, and faster infrastructure for real-time data processing at scale.

Watch the full talk from Marcos Carballeira Rodríguez from Lufthansa Group recorded at the Confluent Streaming Days 2020 to see all the architectures and quotes from Lufthansa. More and more projects are onboarded on the KUSCO platform. Here are a few statistics on the adoption from 2022 to 2023 of the KUSCO project that System Architect Krzysztof Torunski of Lufthansa Group presented:

I see this typical pattern in customers across industries: The first use case is the hardest to get live. Afterward, new business units tap into the data feeds and build their projects. It has never been easier to access data feeds in real-time and with good data quality at any scale. Just build a downstream application (with your favorite programming language, tool, or SaaS) and start innovating.

Apache Kafka for analytics and AI/machine learning

Apache Kafka serves thousands of enterprises as the mission-critical and scalable real-time data fabric for machine learning infrastructures. The evolution of Generative AI (GenAI) with large language models (LLM) like ChatGPT changed how people think about intelligent software and automation. In various blog posts, I explored the relationship between data streaming with the Kafka ecosystem and AI/machine learning.

My latest article shows the enormous opportunities and some early adopters combining Kafka and GenAI beyond the buzz.

Lufthansa uses Apache Kafka with AI/machine learning for real-time predictions

Lufthansa leverages the KUSCO platform to build new analytics use cases with real-time data for critical workloads. In the webinar, we learned about the following two projects from Lufthansa Groups’s Domain Architect Sebastian Weber: anomaly detection for alerts and fleet management for aircraft operations.

Anomaly detection with Apache Kafka and ksqlDB

Data is fed into the streaming platform from various data sources. Lufthansa consolidates and aggregates the data with stream processing before the analytics applications do real-time alerting.

Machine learning and Apache Kafka for real-time fleet management

Lufthansa leverages the streaming platform as data fabric for data ingestion, data processing, and model scoring.

Embedding analytic models into a Kafka application is a standard best practice. While the data lake or lakehouse (that receives data via Kafka) trains the model in batch, many use cases require real-time model scoring and predictions at scale with critical SLAs and low latency. That’s exactly the sweet spot of the Kafka ecosystem.

You can either directly embed a model into the Kafka app or leverage a model server that supporting streaming interfaces. I blogged about the trade-offs and use cases: “Streaming Machine Learning with Kafka-native Model Deployment“.

Interactive conversation with Lufthansa

Here is an on-demand video of my conversation with Lufthansa. We talk about use cases for data streaming in the aviation industry and how Lufthansa leverages Apache Kafka as cloud-native middleware and as the data fabric for analytics and machine learning:

Data streaming as cloud-native middleware and for mission-critical analytics

Lufthansa showed us how you can innovate in the airline industry with a fast time-to-market while still integrating with traditional technologies. The two projects show very different challenges and use cases solved with data streaming powered by the Apache Kafka ecosystem.

The aviation industry is changing rapidly. A good customer experience, valuable loyalty platforms, and competitive pricing (or better hard and soft products) require digitalization of the end-to-end supply chain. This includes topics like Industrial IoT (e.g., predictive maintenance), B2B communication with partners (like GDS, airports, and retailers), and customer 360 (including great mobile apps and omnichannel experiences).

How do you leverage data streaming with Apache Kafka in your projects and enterprise architecture? Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post How Lufthansa uses Apache Kafka for Middleware and Analytics appeared first on Kai Waehner.

Apache Kafka as Mission Critical Data Fabric for GenAI

Kai Waehner — Sat, 22 Jul 2023 10:40:59 +0000

Apache Kafka serves thousands of enterprises as the mission-critical and scalable real-time data fabric for machine learning infrastructures. The evolution of Generative AI (GenAI) with large language models (LLM) like ChatGPT changed how people think about intelligent software and automation. This blog post explains the relation between data streaming and GenAI and shows the enormous opportunities and some early adopters of GenAI beyond the buzz.

Generative AI (GenAI) and Data Streaming

Let’s set the context first to have the same understanding of the buzzwords.

[Note: My ChatGPT iPhone app generated this section. I slightly shortened and adjusted the content afterward. GenAI is perfect for summarizing existing content so that authors can spend time on new content (that ChatGPT does not know yet).]

Natural Language Processing (NLP)

ChatGPT, what is NLP?

NLP stands for Natural Language Processing. It is a subfield of artificial intelligence (AI) that focuses on interactions between computers and human language. NLP enables computers to understand, interpret, and generate human language in a meaningful and valuable way.

Natural Language Processing involves a range of tasks, including:

Text Parsing and Tokenization: Breaking down text into individual words or tokens.
Named Entity Recognition (NER): Identifying and classifying named entities such as people, organizations, locations, and dates in the text.
Sentiment Analysis: Determining the sentiment or emotional tone expressed in text, whether positive, negative, or neutral.
Machine Translation: Translating text from one language to another.
Question Answering: Building systems that can understand and answer questions posed in natural language.
Text Generation: Creating human-like text or generating responses to prompts.

NLP is crucial in applications such as chatbots, virtual assistants, language translation, information retrieval, sentiment analysis, and more.

GenAI = Next Generation NLP (and more)

ChatGPT, what is Generative AI?

Generative AI is a branch of artificial intelligence focused on creating models and systems capable of generating new content, such as images, text, music, or even entire virtual worlds. These models are trained on large datasets and learn patterns and structures to generate new outputs similar to the training data. That’s why the widespread buzzword is Large Language Model (LLM).

Generative AI is used for next-generation NLP and uses techniques such as generative adversarial networks (GANs), variational autoencoders (VAEs), and recurrent neural networks (RNNs). Generative AI has applications in various fields and industries, including art, design, entertainment, and scientific research.

Apache Kafka for Data Streaming

ChatGPT, what is Apache Kafka?

Apache Kafka is an open-source distributed streaming platform and became the de facto standard for event streaming. It was developed by the Apache Software Foundation and is widely used for building real-time data streaming applications and event-driven architectures. Kafka provides a scalable and fault-tolerant system for handling high volumes of streaming data.

Kafka has a thriving ecosystem with various tools and frameworks that integrate with it, such as Apache Spark, Apache Flink, and others.

Apache Kafka is widely adopted in use cases that require real-time data streaming, such as data pipelines, event sourcing, log aggregation, messaging systems, and more.

Why Apache Kafka and GenAI?

Generative AI (GenAI) is the next-generation NLP engine that helps many projects in the real world for service desk automation, customer conversation with a chatbot, content moderation in social networks, and many other use cases.

Apache Kafka became the predominant orchestration layer in these machine learning platforms for integrating various data sources, processing at scale, and real-time model inference.

Data streaming with Kafka already powers many GenAI infrastructures and software products. Very different scenarios are possible:

Data streaming as data fabric for the entire machine learning infrastructure
Model scoring with stream processing for real-time productions
Generation of streaming data pipelines with input text or speech
Real-time online training of large language models

Let’s explore these opportunities for data streaming with Kafka and GenAI in more detail.

Real-time Kafka Data Hub for GenAI and other Microservices in the Enterprise Architecture

I already explored in 2017 (!) how “How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka“. At that time, real-world examples came from tech giants like Uber, Netflix, and Paypal.

Today, Apache Kafka is the de facto standard for building scalable and reliable machine learning infrastructures across any enterprise and industry, including:

Data integration from various sources (sensors, logs, databases, message brokers, APIs, etc.) using Kafka Connect connectors, fully-managed SaaS integrations, or any kind of HTTP REST API or programming language.
Data processing leveraging stream processing for cost-efficient streaming ETL such as filtering, aggregations, and more advanced calculations while the data is in motion (so that any downstream application gets accurate information)
Data ingestion for near real-time data sharing with various data warehouses and data lakes so that each analytics platform can use its product and tools.

Building scalable and reliable end-to-end pipelines is today’s sweet spot of data streaming with Apache Kafka in the AI and Machine Learning space.

Model Scoring with Stream Processing for Real-Time Predictions at any Scale

Deploying an analytic model in a Kafka application is the solution to provide real-time predictions at any scale with low latency. This is one of the biggest problems in the AI space, as data scientists primarily focus on historical data and batch model training in data lakes.

However, the model scoring for predictions needs to provide much better SLAs regarding scalability, reliability, and latency. Hence, more and more companies separate model training from model scoring and deploy the analytic model within a stream processor such as Kafka Streams, KSQL, or Apache Flink:

Check out my article “Machine Learning and Real-Time Analytics in Apache Kafka Applications” for more details.

Dedicated model servers usually only support batch and request-response (e.g., via HTTP or gRPC). Fortunately, many solutions now also provide native integration with the Kafka protocol.

I explored this innovation in my blog post “Streaming Machine Learning with Kafka-native Model Deployment“.

Development Tools for Generating Kafka-native Data Pipelines from Input Text or Speech

Almost every software vendor discusses GenAI to enhance its development environments and user interfaces.

For instance, GitHub is a platform and cloud-based service for software development and version control using Git. But their latest innovation is “the AI-Powered Developer Platform to Build, Scale, and Deliver Secure Software”: Github CoPilot X. Cloud providers like AWS provide similar tools.

Similarly, look at any data infrastructure vendor like Databricks or Snowflake. The latest conferences and announcements focus on embedded capabilities around large language models and GenAI in their solutions.

The same will be true for many data streaming platforms and cloud services. Low-code/no-code tools will add capabilities to generate data pipelines from input text. One of the most straightforward applications that I see coming is generating SQL code out of user text.

For instance, “Consume data from Oracle table customer, aggregate the payments by customer, and ingest it into Snowflake”. This could create SQL code for stream processing technologies like KSQL or FlinkSQL.

Developer experience, faster time-to-market, and support less technical personas are enormous advantages for embedding GenAI into Kafka development environments.

Real-time Training of Large Language Models (LLM)

AI and Machine Learning are still batch-based systems almost all of the time. Model training takes at least hours. This is not ideal, as many GenAI use cases require accurate and updated information. Imagine googling for information today, and you could not find data from the past week. Impossible to use such a service in many scenarios!

Similarly, if I ask ChatGPT today (July 2023): “What is GenAI?” – I get the following response:

As of my last update in September 2021, there is no specific information on an entity called “GenAi.” It’s possible that something new has emerged since then. Could you provide more context or clarify your question so I can better assist you?

The faster your machine learning infrastructure ingests data into model training, the better. My colleague Michael Drogalis wrote an excellent deep-technical blog post: “GPT-4 + Streaming Data = Real-Time Generative AI” to explore this topic more thoroughly.

This architecture is compelling because the chatbot will always have your latest information whenever you prompt it. For instance, if your flight gets delayed or your terminal changes, the chatbot will know about it during your chat session. This is entirely distinct from current approaches where the chat session must be reloaded or wait a few hours/days for new data to arrive.

LLM + Vector Database + Kafka = Real-Time GenAI

Real-time model training is still a novel approach. Many machine learning algorithms are not ready for continuous online model training today. But combining Kafka with a vector database enables using a batch-trained LLM together with real-time updates feeding up-to-date information into the LLM.

Nobody will accept an LLM like ChatGPT in a few years, giving you answers like “I don’t have this information; my model was trained a week ago”. It does not matter if you choose a brand new vector database like Pinecone or leverage new vector capabilities of your installed Oracle or MongoDB storage.

Feed data into the vector database in real-time with Kafka Connect and combine with with a mature LLM to enable real-time GenAI with context-specific recommendations.

Real-World Case Studies for Kafka and GenAI

This section explores how companies across different industries, such as the carmaker BMW, the online travel and booking Expedia, and the dating app Tinder leverage the combination of data streaming with GenAI for reliable real-time conversational AI, NLP and chatbots leveraging Kafka.

Two years ago, I wrote about this topic: “Apache Kafka for Conversational AI, NLP and Chatbot“. But technologies like ChatGPT make it much easier to adopt GenAI in real-world projects with much faster time-to-market and less cost and risk. Let’s explore a few of these success stories for embedding NLP and GenAI into data streaming enterprise architectures.

Disclaimer: As I want to show real-world case studies instead of visionary outlooks, I show several examples deployed in production in the last few years. Hence, the analytic models do not use GenAI, LLM, or ChatGPT as we know it from the press today. But the principles are precisely the same. The only difference is that you could use a cutting-edge model like ChatGPT with much improved and context-specific responses today.

Expedia – Conversations Platform for Better Customer Experience

Expedia is a leading online travel and booking. They have many use cases for machine learning. One of my favorite examples is their Conversations Platform built on Kafka and Confluent Cloud to provide an elastic cloud-native application.

The goal of Expedia’s Conversations Platform was simple: Enable millions of travelers to have natural language conversations with an automated agent via text, Facebook, or their channel of choice. Let them book trips, make changes or cancellations, and ask questions:

“How long is my layover?”
“Does my hotel have a pool?”
“How much will I get charged to bring my golf clubs?”

Then take all that is known about that customer across all of Expedia’s brands and apply machine learning models to immediately give customers what they are looking for in real-time and automatically, whether a straightforward answer or a complex new itinerary.

Real-time Orchestration realized in four Months

Such a platform is no place for batch jobs, back-end processing, or offline APIs. To quickly make decisions that incorporate contextual information, the platform needs data in near real-time, and it needs it from a wide range of services and systems. Meeting these needs meant architecting the Conversations Platform around a central nervous system based on Confluent Cloud and Apache Kafka. Kafka made it possible to orchestrate data from loosely coupled systems, enrich data as it flows between them so that by the time it reaches its destination, it is ready to be acted upon, and surface aggregated data for analytics and reporting.

Expedia built this platform from zero to production in four months. That’s the tremendous advantage of using a fully managed serverless event streaming platform as the foundation. The project team can focus on the business logic.

The Covid pandemic proved the idea of an elastic platform: Companies were hit with a tidal wave of customer questions, cancellations, and re-bookings. Throughout this once-in-a-lifetime event, the Conversations Platform proved up to the challenge, auto-scaling as necessary and taking off much of the load of live agents.

Expedia’s Migration from MQ to Kafka as Foundation for Real-time Machine Learning and Chatbots

As part of their conversations platform, Expedia needed to modernize their IT infrastructure, as Ravi Vankamamidi, Director of Technology at Expedia Group, explained in a Kafka Summit keynote.

Expedia’s old legacy chatbot service relied on a legacy messaging system. This service was a question-and-answer board with very limited scope for booking scenarios. This service could handle two-party conversations. It could not scale to bring all different systems into one architecture to build a powerful chatbot that is helpful for customer conversations.

I explored several times that event streaming is more than just a (scalable) message queue. Check out my old (but still accurate and relevant) Comparison between MQ and Kafka, or the newer comparison between cloud-native iPaaS and Kafka.

Expedia needed a service that was closer to travel assistance. It needed to handle context-specific, multi-party, multi-channel conversations. Hence, features such as natural language processing, translation, and real-time analytics are required. The full service needs to be scalable across multiple brands. Therefore, a fast and highly scalable platform with order guarantees, exactly-once-semantics (EOS), and real-time data processing were needed.

The Kafka-native event streaming platform powered by Confluent was the best choice and met all requirements. The new conversations platform doubled the Net Promoter Score (NPS) one year after the rollout. The new platform proved the business value of the new platform quickly.

BMW – GenAI for Contract Intelligence, Workplace Assistance and Machine Translation

The automotive company BMW presented innovative NLP services at Kafka Summit in 2019. It is no surprise that a carmaker has various NLP scenarios. These include digital contract intelligence, workplace assistance, machine translation, and customer conversations. The latter contains multiple use cases for conversational AI:

Service desk automation
Speech analysis of customer interaction center (CIC) calls to improve the quality
Self-service using smart knowledge bases
Agent support
Chatbots

The text and speech data is structured, enriched, contextualized, summarized, and translated to build real-time decision support applications. Kafka is a crucial component of BMW’s ML and NLP architecture. The real-time integration and data correlation enable interactive and interoperable data consumption and usage:

BMW explained the key advantages of leveraging Kafka and its streaming processing library Kafka Streams as the real-time integration and orchestration platform:

Flexible integration: Multiple supported interfaces for different deployment scenarios, including various machine learning technologies, programming languages, and cloud providers
Modular end-to-end pipelines: Services can be connected to provide full-fledged NLP applications.
Configurability: High agility for each deployment scenario

Tinder – Intelligent Content Moderation, Matching and Recommendations with Kafka and GenAI

The dating app Tinder is a great example where I can think of tens of use cases for NLP. Tinder talked at a past Kafka Summit about their Kafka-powered machine learning platform.

Tinder is a massive user of Kafka and its ecosystem for various use cases, including content moderation, matching, recommendations, reminders, and user reactivation. They used Kafka Streams as a Kafka-native stream processing engine for metadata processing and correlation in real-time at scale:

A critical use case in any dating or social platform is content moderation for detecting fakes, filtering sexual content, and other inappropriate things. Content moderation combines NLP and text processing (e.g., for chat messages) with image processing (e.g., selfie uploads) or processes the metadata with Kafka and stores the linked content in a data lake. Both leverage Deep Learning to process high volumes of text and images. Here is what content moderation looks like in Tinder’s Kafka architecture:

Plenty of ways exist to process text, images, and videos with the Kafka ecosystem. I wrote a detailed article about handling large messages and files with Apache Kafka to explore the options and trade-offs.

Chatbots could also play a key role “in the other way round”. More and more dating apps (and other social networks) fight against spam, fraud, and automated chatbots. Like building a chatbot, a chatbot detection system can analyze the data streams to block a dating app’s chatbot.

Kafka as Real-Time Data Fabric for Future GenAI Initiatives

Real-time data beats slow data. Generative AI only adds value if it provides accurate and up-to-date information. Data streaming technologies such as Apache Kafka and Apache Flink enable building a reliable, scalable real-time infrastructure for GenAI. Additionally, the event-based heart of the enterprise architecture guarantees data consistency between real-time and non-real-time systems (near real-time, batch, request-response).

The early adopters like BWM, Expedia, and Tinder proved that Generative AI integrated into a Kafka architecture adds enormous business value. The evolution of AI models with ChatGPT et al. makes the use case even more compelling across every industry.

How do you build conversational AI, chatbots, and other GenAI applications leveraging Apache Kafka? What technologies and architectures do you use? Are data streaming and Kafka part of the architecture? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka as Mission Critical Data Fabric for GenAI appeared first on Kai Waehner.

Machine Learning Archives - Kai Waehner

Fraud Detection in Mobility Services (Ride-Hailing, Food Delivery) with Data Streaming using Apache Kafka and Flink

The Business of Mobility Services (Ride-Hailing, Food Delivery, Taxi Aggregators, etc.)

Why Fraud is a Major Challenge in Mobility Services

The Unseen Enemy: Core Challenges in Mobility Fraud Detection

How Data Streaming with Apache Kafka and Flink Enables Real-Time Fraud Detection

Apache Kafka: The Backbone of Event-Driven Fraud Detection

Apache Flink: Continuous Stream Processing for Fraud Detection in Real-Time

Real-World Fraud Prevention Stories from Mobility Leaders

FREE NOW (Lyft): Detecting Fraudulent Trips in Real Time by Analyzing GPS Data of Cars

Grab: AI-Powered Fraud Detection for Ride-Hailing and Delivery with Data Streaming and AI/ML

Uber: Project RADAR – AI-Powered Fraud Detection with Human Oversight

Data Streaming with Kafka and Flink Provides Real-Time Fraud Detection in Mobility Services

Online Model Training and Model Drift in Machine Learning with Apache Kafka and Flink

Understanding Model Drift: The Achilles’ Heel of Static Models

TikTok: Revolutionizing Real-Time Personalization with Kafka and Flink

What is ByteDance and TikTok?

Why TikTok Outshines Competitors

Technical Implementation using Apache Kafka, Flink and Machine Learning for Continuous Online Model Training

Data Streaming with Kafka and Flink: The Backbone of Modern Machine Learning and AI

Complementing Other Data Platforms: Streaming Meets Analytics

Shift Left Architecture with Apache Iceberg as Open Table Format for Data Sharing

Business Value: A Compelling Case for Real-Time AI/ML with Data Streaming using Kafka and Flink

Real-Time AI/ML with Kafka and Flink: Transforming Data into Business Value

Stateless vs. Stateful Stream Processing with Kafka Streams and Apache Flink

Rethinking Data Processing: From Static to Dynamic

Use Case: Fraud Prevention in Real-Time

Other Industry Examples of Stream Processing

What is Stateless Stream Processing?

1. Example: Real-Time Payment Monitoring

Benefits of Stateless Processing

What is Stateful Stream Processing?

2. Example: Fraud Prevention through Continuous Pattern Detection

Key Concepts in Stateful Processing

Benefits of Stateful Processing

Bringing AI and Machine Learning into Stream Processing

3. Example: Real-Time Fraud Detection with AI/ML Models

Why Combine AI with Stream Processing?

Stateless vs. Stateful Stream Processing: When to Use Each

Video Recording

Why Stream Processing is a Fundamental Change

How Siemens Healthineers Leverages Data Streaming with Apache Kafka and Flink in Manufacturing and Healthcare

Siemens Healthineers: Shaping the Future of Healthcare Technology

Who They Are

Their Vision

Siemens Healthineers and Data Streaming for Healthcare and Manufacturing

Why Data Streaming with Apache Kafka and Flink?

Healthineers Data Cloud

Technology Stack: Healthineers Data Cloud

Use Cases for Data Streaming at Siemens Healthineers

1. Machine Monitoring and Predictive Maintenance in Manufacturing

2. IQ-Data Intelligence from IoT and SAP to Cloud

3. Machine Integration with SAP and KUKA Robots

4. Digital Healthcare Service Operations using Data Streaming

5. Real-Time Logistics with SAP Events and Confluent Cloud

6. Track and Trace Orders with Apache Kafka and Snowflake

Real-Time Data as a Catalyst for Healthcare and Manufacturing Innovation at Siemens Healthineers

Real-Time Model Inference with Apache Kafka and Flink for Predictive AI and GenAI

Artificial Intelligence (AI) and Machine Learning (ML)

AI/ML = Model Training, Model Deployment and Model Inference

Batch vs. Real-Time Model Inference

Remote Model Inference

Embedded Model Inference

Hidden Technical Debt in AI/ML Systems

The Impedance Mismatch within AI/ML between Analytics and Operations

AI/ML in Practice: Use Cases across Industries for Model Inference

Use Cases for Predictive AI

Use Cases for Generative AI

Data Streaming with Apache Kafka and Flink for Model Inference

Remote vs. Embedded Model Inference with Kafka and Flink

Remote Model Inference with Kafka, Flink and OpenAI

Embedded Model Inference with Kafka, Flink and TensorFlow

The Impact of Kafka and Flink for Model Inference – Predictive AI vs. Generative AI (GenAI)

Predictive AI and Data Streaming

Generative AI (GenAI) and Data Streaming

Data Streaming as Data Pipeline for Model Training in Lakehouses AND for Robust Low-Latency Model Inference

GenAI Demo with Kafka, Flink, LangChain and OpenAI

The Emergence of Generative AI

Demo: LangChain + Kafka + Flink = Automated Cold Calls of Sales Leads with Salesforce CRM and LinkedIn Data

Technologies and Infrastructure in the Demo

The Unseen Enemy: Core Challenges in Mobility Fraud
Detection