GenAI Archives - Kai Waehner https://www.kai-waehner.de/blog/category/genai/ Technology Evangelist - Big Data Analytics - Middleware - Apache Kafka Mon, 26 May 2025 05:32:01 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 https://www.kai-waehner.de/wp-content/uploads/2020/01/cropped-favicon-32x32.png GenAI Archives - Kai Waehner https://www.kai-waehner.de/blog/category/genai/ 32 32 Agentic AI with the Agent2Agent Protocol (A2A) and MCP using Apache Kafka as Event Broker https://www.kai-waehner.de/blog/2025/05/26/agentic-ai-with-the-agent2agent-protocol-a2a-and-mcp-using-apache-kafka-as-event-broker/ Mon, 26 May 2025 05:32:01 +0000 https://www.kai-waehner.de/?p=7855 Agentic AI is emerging as a powerful pattern for building autonomous, intelligent, and collaborative systems. To move beyond isolated models and task-based automation, enterprises need a scalable integration architecture that supports real-time interaction, coordination, and decision-making across agents and services. This blog explores how the combination of Apache Kafka, Model Context Protocol (MCP), and Google’s Agent2Agent (A2A) protocol forms the foundation for Agentic AI in production. By replacing point-to-point APIs with event-driven communication as the integration layer, enterprises can achieve decoupling, flexibility, and observability—unlocking the full potential of AI agents in modern enterprise environments.

The post Agentic AI with the Agent2Agent Protocol (A2A) and MCP using Apache Kafka as Event Broker appeared first on Kai Waehner.

]]>
Agentic AI is gaining traction as a design pattern for building more intelligent, autonomous, and collaborative systems. Unlike traditional task-based automation, agentic AI involves intelligent agents that operate independently, make contextual decisions, and collaborate with other agents or systems—across domains, departments, and even enterprises.

In the enterprise world, agentic AI is more than just a technical concept. It represents a shift in how systems interact, learn, and evolve. But unlocking its full potential requires more than AI models and point-to-point APIs—it demands the right integration backbone.

That’s where Apache Kafka as event broker for true decoupling comes into play together with two emerging AI standards: Google’s Application-to-Application (A2A) Protocol and Antrophic’s Model Context Protocol (MCP) in an enterprise architecture for Agentic AI.

Agentic AI with Apache Kafka as Event Broker Combined with MCP and A2A Protocol

Inspired by my colleague Sean Falconer’s blog post, Why Google’s Agent2Agent Protocol Needs Apache Kafka, this blog post explores the Agentic AI adoption in enterprises and how an event-driven architecture with Apache Kafka fits into the AI architecture.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including various AI examples across industries.

Business Value of Agentic AI in the Enterprise

For enterprises, the promise of agentic AI is compelling:

  • Smarter automation through self-directed, context-aware agents
  • Improved customer experience with faster and more personalized responses
  • Operational efficiency by connecting internal and external systems more intelligently
  • Scalable B2B interactions that span suppliers, partners, and digital ecosystems

But none of this works if systems are coupled by brittle point-to-point APIs, slow batch jobs, or disconnected data pipelines. Autonomous agents need continuous, real-time access to events, shared state, and a common communication fabric that scales across use cases.

Model Context Protocol (MCP) + Agent2Agent (A2A): New Standards for Agentic AI

The Model Context Protocol (MCP) coined by Anthropic offers a standardized, model-agnostic interface for context exchange between AI agents and external systems. Whether the interaction is streaming, batch, or API-based, MCP abstracts how agents retrieve inputs, send outputs, and trigger actions across services. This enables real-time coordination between models and tools—improving autonomy, reusability, and interoperability in distributed AI systems.

Model Context Protocol MCP by Anthropic
Source: Anthropic

Google’s Agent2Agent (A2A) protocol complements this by defining how autonomous software agents can interact with one another in a standard way. A2A enables scalable agent-to-agent collaboration—where agents discover each other, share state, and delegate tasks without predefined integrations. It’s foundational for building open, multi-agent ecosystems that work across departments, companies, and platforms.

Agent2Agent A2A Protocol by Google and MCP
Source: Google

Why Apache Kafka Is a Better Fit Than an API (HTTP/REST) for A2A and MCP

Most enterprises today use HTTP-based APIs to connect services—ideal for simple, synchronous request-response interactions.

In contrast, Apache Kafka is a distributed event streaming platform designed for asynchronous, high-throughput, and loosely coupled communication—making it a much better fit for multi-agent (A2A) and agentic AI architectures.

API-Based IntegrationKafka-Based Integration
Synchronous, blockingAsynchronous, event-driven
Point-to-point couplingLoose coupling with pub/sub topics
Hard to scale to many agentsSupports multiple consumers natively
No shared memoryKafka retains and replays event history
Limited observabilityFull traceability with schema registry & DLQs

Kafka serves as the decoupling layer. It becomes the place where agents publish their state, subscribe to updates, and communicate changes—independently and asynchronously. This enables multi-agent coordination, resilience, and extensibility.

MCP + Kafka = Open, Flexible Communication

As the adoption of Agentic AI accelerates, there’s a growing need for scalable communication between AI agents, services, and operational systems. The Model-Context Protocol (MCP) is emerging as a standard to structure these interactions—defining how agents access tools, send inputs, and receive results. But a protocol alone doesn’t solve the challenges of integration, scaling, or observability.

This is where Apache Kafka comes in.

By combining MCP with Kafka, agents can interact through a Kafka topic—fully decoupled, asynchronous, and in real time. Instead of direct, synchronous calls between agents and services, all communication happens through Kafka topics, using structured events based on the MCP format.

This model supports a wide range of implementations and tech stacks. For instance:

  • A Python-based AI agent deployed in a SaaS environment
  • A Spring Boot Java microservice running inside a transactional core system
  • A Flink application deployed at the edge performing low-latency stream processing
  • An API gateway translating HTTP requests into MCP-compliant Kafka events

Regardless of where or how an agent is implemented, it can participate in the same event-driven system. Kafka ensures durability, replayability, and scalability. MCP provides the semantic structure for requests and responses.

Agentic AI with Apache Kafka as Event Broker

The result is a highly flexible, loosely coupled architecture for Agentic AI—one that supports real-time processing, cross-system coordination, and long-term observability. This combination is already being explored in early enterprise projects and will be a key building block for agent-based systems moving into production.

Stream Processing as the Agent’s Companion

Stream processing technologies like Apache Flink or Kafka Streams allow agents to:

  • Filter, join, and enrich events in motion
  • Maintain stateful context for decisions (e.g., real-time credit risk)
  • Trigger new downstream actions based on complex event patterns
  • Apply AI directly within the stream processing logic, enabling real-time inference and contextual decision-making with embedded models or external calls to a model server, vector database, or any other AI platform

Agents don’t need to manage all logic themselves. The data streaming platform can pre-process information, enforce policies, and even trigger fallback or compensating workflows—making agents simpler and more focused.

Technology Flexibility for Agentic AI Design with Data Contracts

One of the biggest advantages of Kafka-based event-driven and decoupled backend for agentic systems is that agents can be implemented in any stack:

  • Languages: Python, Java, Go, etc.
  • Environments: Containers, serverless, JVM apps, SaaS tools
  • Communication styles: Event streaming, REST APIs, scheduled jobs

The Kafka topic is the stable data contract for quality and policy enforcement. Agents can evolve independently, be deployed incrementally, and interoperate without tight dependencies.

Microservices, Data Products, and Reusability – Agentic AI Is Just One Piece of the Puzzle

To be effective, Agentic AI needs to connect seamlessly with existing operational systems and business workflows.

Kafka topics enable the creation of reusable data products that serve multiple consumers—AI agents, dashboards, services, or external partners. This aligns perfectly with data mesh and microservice principles, where ownership, scalability, and interoperability are key.

Agent2Agent Protocol (A2A) and MCP via Apache Kafka as Event Broker for Truly Decoupled Agentic AI

A single stream of enriched order events might be consumed via a single data product by:

  • A fraud detection agent
  • A real-time alerting system
  • An agent triggering SAP workflow updates
  • A lakehouse for reporting and batch analytics

This one-to-many model is the opposite of traditional REST designs and crucial for enabling agentic orchestration at scale.

Agentic Al Needs Integration with Core Enterprise Systems

Agentic AI is not a standalone trend—it’s becoming an integral part of broader enterprise AI strategies. While this post focuses on architectural foundations like Kafka, MCP, and A2A, it’s important to recognize how this infrastructure complements the evolution of major AI platforms.

Leading vendors such as Databricks, Snowflake, and others are building scalable foundations for machine learning, analytics, and generative AI. These platforms often handle model training and serving. But to bring agentic capabilities into production—especially for real-time, autonomous workflows—they must connect with operational, transactional systems and other agents at runtime. (See also: Confluent + Databricks blog series | Apache Kafka + Snowflake blog series)

This is where Kafka as the event broker becomes essential: it links these analytical backends with AI agents, transactional systems, and streaming pipelines across the enterprise.

At the same time, enterprise application vendors are embedding AI assistants and agents directly into their platforms:

  • SAP Joule / Business AI – Embedded AI for finance, supply chain, and operations
  • Salesforce Einstein / Copilot Studio – Generative AI for CRM and sales automation
  • ServiceNow Now Assist – Predictive automation across IT and employee services
  • Oracle Fusion AI / OCI – ML for ERP, HCM, and procurement
  • Microsoft Copilot – Integrated AI across Dynamics and Power Platform
  • IBM watsonx, Adobe Sensei, Infor Coleman AI – Governed, domain-specific AI agents

Each of these solutions benefits from the same architectural foundation: real-time data access, decoupled integration, and standardized agent communication.

Whether deployed internally or sourced from vendors, agents need reliable event-driven infrastructure to coordinate with each other and with backend systems. Apache Kafka provides this core integration layer—supporting a consistent, scalable, and open foundation for agentic AI across the enterprise.

Agentic AI Requires Decoupling – Apache Kafka Supports A2A and MCP as an Event Broker

To deliver on the promise of agentic AI, enterprises must move beyond point-to-point APIs and batch integrations. They need a shared, event-driven foundation that enables agents (and other enterprise software) to work independently and together—with shared context, consistent data, and scalable interactions.

Apache Kafka provides exactly that. Combined with MCP and A2A for standardized Agentic AI communication, Kafka unlocks the flexibility, resilience, and openness needed for next-generation enterprise AI.

It’s not about picking one agent platform—it’s about giving every agent the same, reliable interface to the rest of the world. Kafka is that interface.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including various AI examples across industries.

The post Agentic AI with the Agent2Agent Protocol (A2A) and MCP using Apache Kafka as Event Broker appeared first on Kai Waehner.

]]>
The Past, Present, and Future of Confluent (The Kafka Company) and Databricks (The Spark Company) https://www.kai-waehner.de/blog/2025/05/02/the-past-present-and-future-of-confluent-the-kafka-company-and-databricks-the-spark-company/ Fri, 02 May 2025 07:10:42 +0000 https://www.kai-waehner.de/?p=7755 Confluent and Databricks have redefined modern data architectures, growing beyond their Kafka and Spark roots. Confluent drives real-time operational workloads; Databricks powers analytical and AI-driven applications. As operational and analytical boundaries blur, native integrations like Tableflow and Delta Lake unify streaming and batch processing across hybrid and multi-cloud environments. This blog explores the platforms’ evolution and how, together, they enable enterprises to build scalable, data-driven architectures. The Michelin success story shows how combining real-time data and AI unlocks innovation and resilience.

The post The Past, Present, and Future of Confluent (The Kafka Company) and Databricks (The Spark Company) appeared first on Kai Waehner.

]]>
Confluent and Databricks are two of the most influential platforms in modern data architectures. Both have roots in open source. Both focus on enabling organizations to work with data at scale. And both have expanded their mission well beyond their original scope.

Confluent and Databricks are often described as serving different parts of the data architecture—real-time vs. batch, operational vs. analytical, data streaming vs. artificial intelligence (AI). But the lines are not always clear. Confluent can run batch workloads and embed AI. Databricks can handle (near) real-time pipelines. With Flink, Confluent supports both operational and analytical processing. Databricks can run operational workloads, too—if latency, availability, and delivery guarantees meet the project’s requirements. 

This blog explores where these platforms came from, where they are now, how they complement each other in modern enterprise architectures—and why their roles are future-proof in a data- and AI-driven world.

Data Streaming and Lakehouse - Comparison of Confluent with Apache Kafka and Flink and Databricks with Spark

About the Confluent and Databricks Blog Series

This article is part of a blog series exploring the growing roles of Confluent and Databricks in modern data and AI architectures:

Stay tuned for deep dives into how these platforms are shaping the future of data-driven enterprises. Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming use cases, including technical architectures and the relation to analytical platforms like Databricks.

Operational vs. Analytical Workloads

Confluent and Databricks were designed for different workloads, but the boundaries are not always strict.

Confluent was built for operational workloads—moving and processing data in real time as it flows through systems. This includes use cases like real-time payments, fraud detection, system monitoring, and streaming pipelines.

Databricks focuses on analytical workloads—enabling large-scale data processing, machine learning, and business intelligence.

That said, there is no clear black and white separation. Confluent, especially with the addition of Apache Flink, can support analytical processing on streaming data. Databricks can handle operational workloads too, provided the SLAs—such as latency, uptime, and delivery guarantees—are sufficient for the use case.

With Tableflow and Delta Lake, both platforms can now be natively connected, allowing real-time operational data to flow into analytical environments, and AI insights to flow back into real-time systems—effectively bridging operational and analytical workloads in a unified architecture.

From Apache Kafka and Spark to (Hybrid) Cloud Platforms: Both Confluent and Databricks both have strong open source roots—Kafka and Spark, respectively—but have taken different branding paths.

Confluent: From Apache Kafka to a Data Streaming Platform (DSP)

Confluent is well known as “The Kafka Company.” It was founded by the original creators of Apache Kafka over ten years ago. Kafka is now widely adopted for event streaming in over 150,000 organizations worldwide. Confluent operates tens of thousands of clusters with Confluent Cloud across all major cloud providers, and also in customer’s data centers and edge locations.

But Confluent has become much more than just Kafka. It offers a complete data streaming platform (DSP)

Confluent Data Streaming Platform (DSP) Powered by Apache Kafka and Flink
Source: Confluent

This includes:

  • Apache Kafka as the core messaging and persistence layer
  • Data integration via Kafka Connect for databases and business applications, a REST/HTTP proxy for request-response APIs and clients for all relevant programming languages
  • Stream processing via Apache Flink and Kafka Streams (read more about the past, present and future of stream processing)
  • Tableflow for native integration with lakehouses that support the open table format standard via Delta Lake and Apache Iceberg
  • 24/7 SLAs, security, data governance, disaster recovery – for the most critical workloads companies run
  • Deployment options: Everywhere (not just cloud) – SaaS, on-prem, edge, hybrid, stretched across data centers, multi-cloud, BYOC (bring your own cloud)

Databricks: From Apache Spark to a Data Intelligence Platform

Databricks has followed a similar evolution. Known initially as “The Spark Company,” it is the original force behind Apache Spark. But Databricks no longer emphasizes Spark in its branding. Spark is still there under the hood, but it’s no longer the dominant story.

Today, it positions itself as the Data Intelligence Platform, focused on AI and analytics

Databricks Data Intelligence Platform and Lakehouse
Source: Databricks

Key components include:

  • Fully cloud-native deployment model—Databricks is now a cloud-only platform providing BYOC and Serverless products
  • Delta Lake and Unity Catalog for table format standardization and governance
  • Model development and AI/ML tools
  • Data warehouse workloads
  • Tools for data scientists and data engineers

Together, Confluent and Databricks meet a wide range of enterprise needs and often complement each other in shared customer environments from the edge to multi-cloud data replication and analytics.

Real-Time vs. Batch Processing

A major point of comparison between Confluent and Databricks lies in how they handle data processing—real-time versus batch—and how they increasingly converge through shared formats and integrations.

Data Processing and Data Sharing “In Motion” vs. “At Rest”

A key difference between the platforms lies in how they process and share data.

Confluent focuses on data in motion—real-time streams that can be filtered, transformed, and shared across systems as they happen.

Databricks focuses on data at rest—data that has landed in a lakehouse, where it can be queried, aggregated, and used for analysis and modeling.

Data Streaming versus Lakehouse

Both platforms offer native capabilities for data sharing. Confluent provides Stream Sharing, which enables secure, real-time sharing of Kafka topics across organizations and environments. Databricks offers Delta Sharing, an open protocol for sharing data from Delta Lake tables with internal and external consumers.

In many enterprise architectures, the two vendors work together. Kafka and Flink handle continuous real-time processing for operational workloads and data ingestion into the lakehouse. Databricks handles AI workloads (model training and some of the model inference), business intelligence (BI), and reporting. Both do data integration; ETL (Confluent) respectively ELT (Databricks).

Many organizations still use Databricks’ Apache Spark Structured Streaming to connect Kafka and Databricks. That’s a valid pattern, especially for teams with Spark expertise.

Flink is available as a serverless offering in Confluent Cloud that can scale down to zero when idle, yet remains highly scalable—even for complex stateful workloads. It supports multiple languages, including Python, Java, and SQL. 

For self-managed environments, Kafka Streams offers a lightweight alternative to running Flink in a self-managed Confluent Platform. But be aware that Kafka Streams is limited to Java and operates as a client library embedded directly within the application. Read my dedicated article to learn about the trade-offs between Apache Flink and Kafka Streams.

Stream and Batch Data Processing with Kafka Streams, Apache Flink and Spark

In short: use what works. If Spark Structured Streaming is already in place and meets your needs, keep it. But for new use cases, Apache Flink or Kafka Streams might be the better choice for stream processing workloads. But make sure to understand the concepts and value of stateless and stateful stream processing before building batch pipelines.

Confluent Tableflow: Unify Operational and Analytic Workloads with Open Table Formats (such as Apache Iceberg and Delta Lake)

Databricks is actively investing in Delta Lake and Unity Catalog to structure, govern, and secure data for analytical applications. The acquisition of Tabular—founded by the original creators of Apache Iceberg—demonstrates Databricks’ commitment to supporting open standards.

Confluent’s Tableflow materializes Kafka streams into Apache Iceberg or Delta Lake tables—automatically, reliably, and efficiently. This native integration between Confluent and Databricks is faster, simpler, and more cost-effective than using a Spark connector or other ETL tools.

Tableflow reads the Kafka segments, checks schema against schema registry, and creates parquet and table metadata.

Confluent Tableflow Architecture to Integrate Apache Kafka with Iceberg and Delta Lake for Databricks
Source: Confluent

Native stream processing with Apache Flink also plays a growing role. It enables unified real-time and batch stream processing in a single engine. Flink’s ability to “shift left” data processing (closer to the source) supports early validation, enrichment, and transformation. This simplifies the architecture and reduces the need for always-on Spark clusters, which can drive up cost.

These developments highlight how Databricks and Confluent address different but complementary layers of the data ecosystem.

Confluent + Databricks = A Strategic Partnership for Future-Proof AI Architectures

Confluent and Databricks are not competing platforms—they’re complementary. While they serve different core purposes, there are areas where their capabilities overlap. In those cases, it’s less about which is better and more about which fits best for your architecture, team expertise, SLA or latency requirements. The real value comes from understanding how they work together and where you can confidently choose the platform that serves your use case most effectively.

Confluent and Databricks recently deepened their partnership with Tableflow integration with Delta Lake and Unity Catalog. This integration makes real-time Kafka data available inside Databricks as Delta tables. It reduces the need for custom pipelines and enables fast access to trusted operational data.

The architecture supports AI end to end—from ingesting real-time operational data to training and deploying models—all with built-in governance and flexibility. Importantly, data can originate from anywhere: mainframes, on-premise databases, ERP systems, IoT and edge environments or SaaS cloud applications.

With this setup, you can:

  • Feed data from 100+ Confluent sources (Mainframe, Oracle, SAP, Salesforce, IoT, HTTP/REST applications, and so on) into Delta Lake
  • Use Databricks for AI model development and business intelligence
  • Push models back into Kafka and Flink for real-time model inference with critical, operational SLAs and latency

Both directions will be supported. Governance and security metadata flows alongside the data.

Confluent and Databricks Partnership and Bidirectional Integration for AI and Analytics
Source: Confluent

Michelin: Real-Time Data Streaming and AI Innovation with Confluent and Databricks

A great example of how Confluent and Databricks complement each other in practice is Michelin’s digital transformation journey. As one of the world’s largest tire manufacturers, Michelin set out to become a data-first and digital enterprise. To achieve this, the company needed a foundation for real-time operational data movement and a scalable analytical platform to unlock business insights and drive AI initiatives.

Confluent @ Michelin: Real-Time Data Streaming Pipelines

Confluent Cloud plays a critical role at Michelin by powering real-time data pipelines across their global operations. Migrating from self-managed Kafka to Confluent Cloud on Microsoft Azure enabled Michelin to reduce operational complexity by 35%, meet strict 99.99% SLAs, and speed up time to market by up to nine months. Real-time inventory management, order orchestration, and event-driven supply chain processes are now possible thanks to a fully managed data streaming platform.

Databricks @ Michelin: Centralized Lakehouse

Meanwhile, Databricks empowers Michelin to democratize data access across the organization. By building a centralized lakehouse architecture, Michelin enabled business users and IT teams to independently access, analyze, and develop their own analytical use cases—from predicting stock outages to reducing carbon emissions in logistics. With Databricks’ lakehouse capabilities, they scaled to support hundreds of use cases without central bottlenecks, fostering a vibrant community of innovators across the enterprise.

The synergy between Confluent and Databricks at Michelin is clear:

  • Confluent moves operational data in real time, ensuring fresh, trusted information flows across systems (including Databricks).
  • Databricks transforms data into actionable insights, using powerful AI, machine learning, and analytics capabilities.

Confluent + Databricks @ Michelin = Cloud-Native Data-Driven Enterprise

Together, Confluent and Databricks allow Michelin to shift from batch-driven, siloed legacy systems to a cloud-native, real-time, data-driven enterprise—paving the road toward higher agility, efficiency, and customer satisfaction.

As Yves Caseau, Group Chief Digital & Information Officer at Michelin, summarized: “Confluent plays an integral role in accelerating our journey to becoming a data-first and digital business.”

And as Joris Nurit, Head of Data Transformation, added: “Databricks enables our business users to better serve themselves and empowers IT teams to be autonomous.”

The Michelin success story perfectly illustrates how Confluent and Databricks, when used together, bridge operational and analytical workloads to unlock the full value of real-time, AI-powered enterprise architectures.

Confluent and Databricks: Better Together!

Confluent and Databricks are both leaders in different – but connected – layers of the modern data stack.

If you want real-time, event-driven data pipelines, Confluent is the right platform. If you want powerful analytics, AI, and ML, Databricks is a great fit.

Together, they allow enterprises to bridge operational and analytical workloads—and to power AI systems with live, trusted data.

In the next post, I will explore how Confluent’s Data Streaming Platform compares to the Databricks Data Intelligence Platform for data integration and processing.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming use cases, including technical architectures and the relation to analytical platforms like Databricks.

The post The Past, Present, and Future of Confluent (The Kafka Company) and Databricks (The Spark Company) appeared first on Kai Waehner.

]]>
How Apache Kafka and Flink Power Event-Driven Agentic AI in Real Time https://www.kai-waehner.de/blog/2025/04/14/how-apache-kafka-and-flink-power-event-driven-agentic-ai-in-real-time/ Mon, 14 Apr 2025 09:09:10 +0000 https://www.kai-waehner.de/?p=7265 Agentic AI marks a major evolution in artificial intelligence—shifting from passive analytics to autonomous, goal-driven systems capable of planning and executing complex tasks in real time. To function effectively, these intelligent agents require immediate access to consistent, trustworthy data. Traditional batch processing architectures fall short of this need, introducing delays, data staleness, and rigid workflows. This blog post explores why event-driven architecture (EDA)—powered by Apache Kafka and Apache Flink—is essential for building scalable, reliable, and adaptive AI systems. It introduces key concepts such as Model Context Protocol (MCP) and Google’s Agent-to-Agent (A2A) protocol, which are redefining interoperability and context management in multi-agent environments. Real-world use cases from finance, healthcare, manufacturing, and more illustrate how Kafka and Flink provide the real-time backbone needed for production-grade Agentic AI. The post also highlights why popular frameworks like LangChain and LlamaIndex must be complemented by robust streaming infrastructure to support stateful, event-driven AI at scale.

The post How Apache Kafka and Flink Power Event-Driven Agentic AI in Real Time appeared first on Kai Waehner.

]]>
Artificial Intelligence is evolving beyond passive analytics and reactive automation. Agentic AI represents a new wave of autonomous, goal-driven AI systems that can think, plan, and execute complex workflows without human intervention. However, for these AI agents to be effective, they must operate on real-time, consistent, and trustworthy data—a challenge that traditional batch processing architectures simply cannot meet. This is where Data Streaming with Apache Kafka and Apache Flink, coupled with an event-driven architecture (EDA), form the backbone of Agentic AI. By enabling real-time and continuous decision-making, EDA ensures that AI systems can act instantly and reliably in dynamic, high-speed environments. Emerging standards like the Model Context Protocol (MCP) and Google’s Agent-to-Agent (A2A) protocol are now complementing this foundation, providing structured, interoperable layers for managing context and coordination across intelligent agents—making AI not just event-driven, but also context-aware and collaborative.

Event-Driven Agentic AI with Data Streaming using Apache Kafka and Flink

In this post, I will explore:

  • How Agentic AI works and why it needs real-time data
  • Why event-driven architectures are the best choice for AI automation
  • Key use cases across industries
  • How Kafka and Flink provide the necessary data consistency and real-time intelligence for AI-driven decision-making
  • The role of MCP, A2A, and frameworks like LangChain and LlamaIndex in enabling scalable, context-aware, and collaborative AI systems

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

What is Agentic AI?

Agentic AI refers to AI systems that exhibit autonomous, goal-driven decision-making and execution. Unlike traditional automation tools that follow rigid workflows, Agentic AI can:

  • Understand and interpret natural language instructions
  • Set objectives, create strategies, and prioritize actions
  • Adapt to changing conditions and make real-time decisions
  • Execute multi-step tasks with minimal human supervision
  • Integrate with multiple operational and analytical systems and data sources to complete workflows

Here is an example AI Agent dependency graph from Sean Falconer’s article “Event-Driven AI: Building a Research Assistant with Kafka and Flink“:

Example AI Agent Dependency Graph
Source: Sean Falconer

Instead of merely analyzing data, Agentic AI acts on data, making it invaluable for operational and transactional use cases—far beyond traditional analytics.

However, without real-time, high-integrity data, these systems cannot function effectively. If AI is working with stale, incomplete, or inconsistent information, its decisions become unreliable and even counterproductive. This is where Kafka, Flink, and event-driven architectures become indispensable.

Why Batch Processing Fails for Agentic AI

Traditional AI and analytics systems have relied heavily on batch processing, where data is collected, stored, and processed in predefined intervals. This approach may work for generating historical reports or training machine learning models offline, but it completely breaks down when applied to operational and transactional AI use cases—which are at the core of Agentic AI.

Why Batch Processing Fails for Agentic AI

I recently explored the Top 20 Problems with Batch Processing (and How to Fix Them with Data Streaming). And here’s why batch processing is fundamentally incompatible with Agentic AI and the real-world challenges it creates:

1. Delayed Decision-Making Slows AI Reactions

Agentic AI systems are designed to autonomously respond to real-time changes in the environment, whether it’s optimizing a telecommunications network, detecting fraud in banking, or dynamically adjusting supply chains.

In a batch-driven system, data is processed hours or even days later, making AI responses obsolete before they even reach the decision-making phase. For example:

  • Fraud detection: If a bank processes transactions in nightly batches, fraudulent activities may go unnoticed for hours, leading to financial losses.
  • E-commerce recommendations: If a retailer updates product recommendations only once per day, it fails to capture real-time shifts in customer behavior.
  • Network optimization: If a telecom company analyzes network traffic in batch mode, it cannot prevent congestion or outages before it affects users.

Agentic AI requires instantaneous decision-making based on streaming data, not delayed insights from batch reports.

2. Data Staleness Creates Inaccurate AI Decisions

AI agents must act on fresh, real-world data, but batch processing inherently means working with outdated information. If an AI agent is making decisions based on yesterday’s or last hour’s data, those decisions are no longer reliable.

Consider a self-healing IT infrastructure that uses AI to detect and mitigate outages. If logs and system metrics are processed in batch mode, the AI agent will be acting on old incident reports, missing live system failures that need immediate attention.

In contrast, an event-driven system powered by Kafka and Flink ensures that AI agents receive live system logs as they occur, allowing for proactive self-healing before customers are impacted.

3. High Latency Kills Operational AI

In industries like finance, healthcare, and manufacturing, even a few seconds of delay can lead to severe consequences. Batch processing introduces significant latency, making real-time automation impossible.

For example:

  • Healthcare monitoring: A real-time AI system should detect abnormal heart rates from a patient’s wearable device and alert doctors immediately. If health data is only processed in hourly batches, a critical deterioration could be missed, leading to life-threatening situations.
  • Automated trading in finance: AI-driven trading systems must respond to market fluctuations within milliseconds. Batch-based analysis would mean losing high-value trading opportunities to faster competitors.

Agentic AI must operate on a live data stream, where every event is processed instantly, allowing decisions to be made in real-time, not retrospectively.

4. Rigid Workflows Increase Complexity and Costs

Batch processing forces businesses to predefine rigid workflows that do not adapt well to changing conditions. In a batch-driven world:

  • Data must be manually scheduled for ingestion.
  • Systems must wait for the entire dataset to be processed before making decisions.
  • Business logic is hard-coded, requiring expensive engineering effort to update workflows.

Agentic AI, on the other hand, is designed for continuous, adaptive decision-making. By leveraging an event-driven architecture, AI agents listen to streams of real-time data, dynamically adjusting workflows on the fly instead of relying on predefined batch jobs.

This flexibility is especially critical in industries with rapidly changing conditions, such as supply chain logistics, cybersecurity, and IoT-based smart cities.

5. Batch Processing Cannot Support Continuous Learning

A key advantage of Agentic AI is its ability to learn from past experiences and self-improve over time. However, this is only possible if AI models are continuously updated with real-time feedback loops.

Batch-driven architectures limit AI’s ability to learn because:

  • Models are retrained infrequently, leading to outdated insights.
  • Feedback loops are slow, preventing AI from adjusting strategies in real time.
  • Drift in data patterns is not immediately detected, causing AI performance degradation.

For instance, in customer service chatbots, an AI-powered agent should adapt to customer sentiment in real time. If a chatbot is trained on stale customer interactions from last month, it won’t understand emerging trends or newly common issues.

By contrast, a real-time data streaming architecture ensures that AI agents continuously receive live customer interactions, retrain in real time, and evolve dynamically.

Agentic AI Requires an Event-Driven Architecture

Agentic AI must act in real time and integrate operational and analytical information. Whether it’s an AI-driven fraud detection system, an autonomous network optimization agent, or a customer service chatbot, acting on outdated information is not an option.

The Event-Driven Approach

An Event-Driven Architecture (EDA) enables continuous processing of real-time data streams, ensuring that AI agents always have the latest information available. By decoupling applications and processing events asynchronously, EDA allows AI to respond dynamically to changes in the environment without being constrained by rigid workflows.

Event-driven Architecture for Data Streaming with Apache Kafka and Flink

AI can also be seamlessly integrated into existing business processes leveraging an EDA, bridging modern and legacy technologies without requiring a complete system overhaul. Not every data source may be real-time, but EDA ensures data consistency across all consumers—if an application processes data, it sees exactly what every other application sees. This guarantees synchronized decision-making, even in hybrid environments combining historical data with real-time event streams.

Why Apache Kafka is Essential for Agentic AI

For AI to be truly autonomous and effective, it must operate in real time, adapt to changing conditions, and ensure consistency across all applications. An Event-Driven Architecture (EDA) built with Apache Kafka provides the foundation for this by enabling:

  • Immediate Responsiveness → AI agents receive and act on events as they occur.
  • High Scalability → Components are decoupled and can scale independently.
  • Fault Tolerance → AI processes continue running even if some services fail.
  • Improved Data Consistency → Ensures AI agents are working with accurate, real-time data.

To build truly autonomous AI systems, organizations need a real-time data infrastructure that can process, analyze, and act on events as they happen.

Building Event-Driven Multi-Agents with Data Streaming using Apache Kafka and Flink
Source: Sean Falconer

Apache Kafka: The Real-Time Data Streaming Backbone

Apache Kafka provides a scalable, event-driven messaging infrastructure that ensures AI agents receive a constant, real-time stream of events. By acting as a central nervous system, Kafka enables:

  • Decoupled AI components that communicate through event streams.
  • Efficient data ingestion from multiple sources (IoT devices, applications, databases).
  • Guaranteed event delivery with fault tolerance and durability.
  • High-throughput processing to support real-time AI workloads.

Apache Flink complements Kafka by providing stateful stream processing for AI-driven workflows. With Flink, AI agents can:

  • Analyze real-time data streams for anomaly detection, predictions, and decision-making.
  • Perform complex event processing to detect patterns and trigger automated responses.
  • Continuously learn and adapt based on evolving real-time data.
  • Orchestrate multi-agent workflows dynamically.

Across industries, Agentic AI is redefining how businesses and governments operate. By leveraging event-driven architectures and real-time data streaming, organizations can unlock the full potential of AI-driven automation, improving efficiency, reducing costs, and delivering better experiences.

Here are key use cases across different industries:

Financial Services: Real-Time Fraud Detection and Risk Management

Traditional fraud detection systems rely on batch processing, leading to delayed responses and financial losses.

Agentic AI enables real-time transaction monitoring, detecting anomalies as they occur and blocking fraudulent activities instantly.

AI agents continuously learn from evolving fraud patterns, reducing false positives and improving security. In risk management, AI analyzes market trends, adjusts investment strategies, and automates compliance processes to ensure financial institutions stay ahead of threats and regulatory requirements.

Telecommunications: Autonomous Network Optimization

Telecom networks require constant tuning to maintain service quality, but traditional network management is reactive and expensive.

Agentic AI can proactively monitor network traffic, predict congestion, and automatically reconfigure network resources in real time. AI-powered agents optimize bandwidth allocation, detect outages before they impact customers, and enable self-healing networks, reducing operational costs and improving service reliability.

Retail: AI-Powered Personalization and Dynamic Pricing

Retailers struggle with static recommendation engines that fail to capture real-time customer intent.

Agentic AI analyzes customer interactions, adjusts recommendations dynamically, and personalizes promotions based on live purchasing behavior. AI-driven pricing strategies adapt to supply chain fluctuations, competitor pricing, and demand changes in real time, maximizing revenue while maintaining customer satisfaction.

AI agents also enhance logistics by optimizing inventory management and reducing stock shortages.

Healthcare: Real-Time Patient Monitoring and Predictive Care

Hospitals and healthcare providers require real-time insights to deliver proactive care, but batch processing delays critical decisions.

Agentic AI continuously streams patient vitals from medical devices to detect early signs of deterioration and triggering instant alerts to medical staff. AI-driven predictive analytics optimize hospital resource allocation, improve diagnosis accuracy, and enable remote patient monitoring, reducing emergency incidents and improving patient outcomes.

Gaming: Dynamic Content Generation and Adaptive AI Opponents

Modern games need to provide immersive, evolving experiences, but static game mechanics limit engagement.

Agentic AI enables real-time adaptation of gameplay to generate dynamic environments and personalizing challenges based on a player’s behavior. AI-driven opponents can learn and adapt to individual playstyles, keeping games engaging over time. AI agents also manage server performance, detect cheating, and optimize in-game economies for a better gaming experience.

Manufacturing & Automotive: Smart Factories and Autonomous Systems

Manufacturing relies on precision and efficiency, yet traditional production lines struggle with downtime and defects.

Agentic AI monitors production processes in real time to detect quality issues early and adjusting machine parameters autonomously. This directly improves Overall Equipment Effectiveness (OEE) by reducing downtime, minimizing defects, and optimizing machine performance to ensure higher productivity and operational efficiency to ensure higher productivity and operational efficiency.

In automotive, AI-driven agents analyze real-time sensor data from self-driving cars to make instant navigation decisions, predict maintenance needs, and optimize fleet operations for logistics companies.

Public Sector: AI-Powered Smart Cities and Citizen Services

Governments face challenges in managing infrastructure, public safety, and citizen services efficiently.

Agentic AI can optimize traffic flow by analyzing real-time data from sensors and adjusting signals dynamically. AI-powered public safety systems detect threats from surveillance data and dispatch emergency services instantly. AI-driven chatbots handle citizen inquiries, automate document processing, and improve response times for government services.

The Business Value of Real-Time AI using Autonomous Agents

By leveraging Kafka and Flink in an event-driven AI architecture, organizations can achieve:

  • Better Decision-Making → AI operates on fresh, accurate data.
  • Faster Time-to-Action → AI agents respond to events immediately.
  • Reduced Costs → Less reliance on expensive batch processing and manual intervention by humans.
  • Greater Scalability → AI systems can handle massive workloads in real time.
  • Vendor Independence → Kafka and Flink support open standards and hybrid/multi-cloud deployments, preventing vendor lock-in.

Why LangChain, LlamaIndex, and Similar Frameworks Are Not Enough for Agentic AI in Production

Frameworks like LangChain, LlamaIndex, and others have gained popularity for making it easy to prototype AI agents by chaining prompts, tools, and external APIs. They provide useful abstractions for reasoning steps, retrieval-augmented generation (RAG), and basic tool use—ideal for experimentation and lightweight applications.

However, when building agentic AI for operational, business-critical environments, these frameworks fall short on several fronts:

  • Many frameworks like LangChain are inherently synchronous and follows a request-response model, which limits its ability to handle real-time, event-driven inputs at scale. In contrast, LlamaIndex takes an event-driven approach, using a message broker—including support for Apache Kafka—for inter-agent communication.
  • Debugging, observability, and reproducibility are weak—there’s often no persistent, structured record of agent decisions or tool interactions.
  • State is ephemeral and in-memory, making long-running tasks, retries, or rollback logic difficult to implement reliably.
  • Most Agentic AI frameworks lack support for distributed, fault-tolerant execution and scalable orchestration, which are essential for production systems.

That said, these frameworks like LangChain and Llamaindex can still play a valuable, complementary role when integrated into an event-driven architecture. For example, an agent might use LangChain for planning or decision logic within a single task, while Apache Kafka and Apache Flink handle the real-time flow of events, coordination between agents, persistence, and system-level guarantees.

LangChain and similar toolkits help define how an agent thinks. But to run that thinking at scale, in real time, and with full traceability, you need a robust data streaming foundation. That’s where Kafka and Flink come in.

Model Context Protocol (MCP) and Agent-to-Agent (A2A) for Scalable, Composable Agentic AI Architectures

Model Context Protocol (MCP) is one of the hottest topics in AI right now. Coined by Anthropic, with early support emerging from OpenAI, Google, and other leading AI infrastructure providers, MCP is rapidly becoming a foundational layer for managing context in agentic systems. MCP enables systems to define, manage, and exchange structured context windows—making AI interactions consistent, portable, and state-aware across tools, sessions, and environments.

Google’s recently announced Agent-to-Agent (A2A) protocol adds further momentum to this movement, setting the groundwork for standardized interaction across autonomous agents. These advancements signal a new era of AI interoperability and composability.

Together with Kafka and Flink, MCP and protocols like A2A help bridge the gap between stateless LLM calls and stateful, event-driven agent architectures. Naturally, event-driven architecture is the perfect foundation for all this. The key now is to build enough product functionality and keep pushing the boundaries of innovation.

A dedicated blog post is coming soon to explore how MCP and A2A connect data streaming and request-response APIs in modern AI systems.

Agentic AI is poised to revolutionize industries by enabling fully autonomous, goal-driven AI systems that perceive, decide, and act continuously. But to function reliably in dynamic, production-grade environments, these agents require real-time, event-driven architectures—not outdated, batch-oriented pipelines.

Apache Kafka and Apache Flink form the foundation of this shift. Kafka ensures agents receive reliable, ordered event streams, while Flink provides stateful, low-latency stream processing for real-time reactions and long-lived context management. This architecture enables AI agents to process structured events as they happen, react to changes in the environment, and coordinate with other services or agents through durable, replayable data flows.

If your organization is serious about AI, the path forward is clear:

Move from batch to real-time, from passive analytics to autonomous action, and from isolated prompts to event-driven, context-aware agents—enabled by Kafka and Flink.

As a next step, learn more about “Online Model Training and Model Drift in Machine Learning with Apache Kafka and Flink“.

Let’s connect on LinkedIn and discuss how to implement these ideas in your organization. Stay informed about new developments by subscribing to my newsletter. And make sure to download my free book about data streaming use cases.

The post How Apache Kafka and Flink Power Event-Driven Agentic AI in Real Time appeared first on Kai Waehner.

]]>
How Data Streaming and AI Help Telcos to Innovate: Top 5 Trends from MWC 2025 https://www.kai-waehner.de/blog/2025/03/07/how-data-streaming-and-ai-help-telcos-to-innovate-top-5-trends-from-mwc-2025/ Fri, 07 Mar 2025 06:44:11 +0000 https://www.kai-waehner.de/?p=7545 As the telecom and tech industries rapidly evolve, real-time data streaming is emerging as the backbone of digital transformation. For MWC 2025, McKinsey outlined five key trends defining the future: IT excellence, sustainability, 6G, generative AI, and AI-driven software development. This blog explores how data streaming powers each of these trends, enabling real-time observability, AI-driven automation, energy efficiency, ultra-low latency networks, and faster software innovation. From Dish Wireless’ cloud-native 5G network to Verizon’s edge AI deployments, leading companies are leveraging event-driven architectures to gain a competitive advantage. Whether you’re tackling network automation, sustainability challenges, or AI monetization, data streaming is the strategic enabler for 2025 and beyond. Read on to explore the latest use cases, industry insights, and how to future-proof your telecom strategy.

The post How Data Streaming and AI Help Telcos to Innovate: Top 5 Trends from MWC 2025 appeared first on Kai Waehner.

]]>
The telecommunications and technology industries are at a pivotal moment. As innovation accelerates, businesses must leverage cutting-edge technologies to stay ahead. For MWC 2025, McKinsey highlighted five crucial themes shaping the future: IT excellence in telecom, sustainability challenges, the evolution of 6G, the rise of generative AI, and AI-driven software development.

MWC (Mobile World Congress) 2025 serves as the global stage where industry leaders, telecom operators, and technology pioneers converge to discuss the next wave of connectivity and digital transformation. As organizations gear up for a data-driven future, real-time data streaming emerges as the critical enabler of efficiency, agility, and value creation.

This blog explores each of McKinsey’s key themes from MWC 2025 and how data streaming helps businesses innovate and gain a competitive advantage in the hyper-connected world ahead.

How Apache Kafka, Flink and AI Help Telecom Providers - Top 5 Trends from MWC 2025

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

1. IT Excellence: Driving Telecom Innovation and Cost Efficiency

Telecom operators are under immense pressure to monetize massive infrastructure investments while maintaining cost efficiency. McKinsey’s benchmarking study shows that leading telecom tech players spend less on IT while achieving superior cost efficiency and innovation. Successful operators integrate business and IT transformations holistically, optimizing cloud strategies, IT architectures, and AI-driven processes.

How Data Streaming Powers IT Excellence

  • Real-Time IT Monitoring: Streaming data pipelines provide continuous observability into IT performance, reducing downtime and optimizing infrastructure costs.
  • Automated Network Operations: Event-driven architectures allow operators to dynamically allocate resources, minimizing network congestion and improving service quality.
  • Cloud-Native AI Models: By continuously feeding AI models with live data, telecom leaders ensure optimal network performance and predictive maintenance.

🔹 Business Impact: Faster time-to-market, lower IT costs, and improved network reliability.

A great example of this transformation is Dish Wireless, which built a fully cloud-native, software-driven 5G network powered by Apache Kafka. By leveraging real-time data streaming, Dish ensures low-latency, scalable, and event-driven operations, allowing it to optimize network performance, automate infrastructure management, and provide next-generation connectivity for enterprise applications.

Dish’s data-first approach demonstrates how streaming technologies are redefining telecom infrastructure and unlocking new business models.

📌 Read more about how Apache Kafka powers Dish Wireless’ 5G infrastructure or watch the following webinar with Dish:

Confluent and Dish about Cloud-Native 5G Infrastructure and Apache Kafka

 

2. Tackling Telecom Emissions: A Sustainable Future

The telecom industry faces increasing regulatory pressure and consumer expectations to decarbonize operations. While many companies have reduced Scope 1 (direct emissions) and Scope 2 (energy consumption) emissions, the real challenge lies in Scope 3 emissions from supply chains. McKinsey’s research suggests that 60% of an integrated operator’s emissions can be reduced for less than $100 per ton of CO₂.

How Data Streaming Supports Sustainability Efforts

  • Energy Optimization in Real Time: Streaming analytics continuously monitor energy usage across network infrastructure, automatically adjusting power consumption.
  • Carbon Footprint Tracking: Data pipelines aggregate real-time emissions data, enabling operators to meet sustainability goals efficiently.
  • Predictive Maintenance for Energy Efficiency: AI-driven insights help optimize network hardware lifespan, reducing waste and unnecessary energy consumption.

🔹 Business Impact: Reduced carbon footprint, cost savings on energy consumption, and regulatory compliance.

Data Streaming with Apache Kafka and Flink for ESG and Sustainability

Beyond telecom, data streaming is transforming sustainability efforts across industries. For example, in manufacturing and real estate, companies like Ampeers Energy and PAUL Tech AG use Apache Kafka and Flink to optimize energy distribution, reduce emissions, and improve ESG ratings.

These real-time data platforms analyze IoT sensor data, weather forecasts, and energy consumption patterns to automate decision-making and lower energy waste. Similarly, EverySens leverages streaming data to decarbonize freight transport, eliminating hundreds of thousands of unnecessary truck rides each year. These use cases demonstrate how data-driven sustainability strategiescan be scaled across sectors to achieve meaningful environmental impact.

📌 Read more about how data streaming with Kafka and Flink power ESG transformations.

3. Shaping the Future of 6G: Beyond Connectivity

6G is expected to revolutionize industries by enabling ultra-low latency, ubiquitous connectivity, and AI-driven network optimization. However, the transition from 5G to 6G requires overcoming legacy infrastructure challenges and developing multi-capability platforms that go beyond simple connectivity.

How Data Streaming Powers the 6G Revolution

  • Network Sensing and Intelligent Routing: Streaming architectures process real-time network telemetry, enabling adaptive, self-optimizing networks.
  • AI-Enhanced Edge Computing: Real-time analytics ensure minimal latency for mission-critical applications such as autonomous vehicles and smart cities.
  • Cross-Sector Data Monetization: Operators can leverage streaming data to offer network-as-a-service (NaaS) solutions, opening new revenue streams.

🔹 Business Impact: New monetization opportunities, improved network efficiency, and enhanced customer experience.

Use Cases for 5G and Data Streaming with Apache Kafka
Source: Dish Wireless

As the 6G era approaches, real-time data streaming is already proving its value in 5G deployments, unlocking low-latency, high-bandwidth use cases.

A great example is Verizon’s Mobile Edge Computing (MEC) initiative, which uses data streaming and AI-powered analytics to support real-time applications like autonomous drone monitoring, vehicle-to-everything (V2X) communication, and predictive maintenance in industrial settings. By processing data at the network edge, telcos minimize latency and optimize bandwidth—capabilities that will be even more critical in 6G.

With cloud-native, event-driven architectures, data streaming enables telcos to evolve from traditional connectivity providers to technology leaders. As 6G advances, expect faster network automation, more sophisticated AI integration, and deeper partnerships between telecom operators and enterprise customers.

📌 Read more about how data streaming is shaping the future of telco.

4. Generative AI: A Profitability Game-Changer for Telcos

McKinsey highlights generative AI’s potential to boost telco profitability by up to 10% in annual EBITDA through automation, hyper-personalization, and AI-driven customer engagement. Leading telcos are already leveraging AI to improve customer service, marketing, and network operations.

How Data Streaming Enhances Gen AI in Telecom

  • Real-Time Customer Insights: AI-powered recommendation engines deliver personalized offers and dynamic pricing in milliseconds.
  • Automated Call Center Operations: Real-time transcription and sentiment analysis improve chatbot accuracy and agent productivity.
  • Proactive Network Management: AI models trained on continuous streaming data predict and prevent network failures before they occur.

🔹 Business Impact: Higher customer satisfaction, reduced operational costs, and increased revenue per user.

As telecom providers integrate Generative AI (GenAI) into their business models, real-time data streaming is a foundational technology that enables efficient AI inference and model retraining. One compelling example is the GenAI Demo with Kafka, Flink, LangChain, and OpenAI, which illustrates how streaming architectures power AI-driven sales and customer interactions.

Stream Processing with Apache Flink SQL UDF and GenAI with OpenAI LLM

This demo showcases how real-time CRM data from Salesforce is enriched with web and LinkedIn data via streaming ETL using Apache Flink. Then, AI models process this context using LangChain and OpenAI, generating personalized, context-specific sales recommendations—a workflow that can be extended to telecom call centers and customer engagement platforms.

Expedia’s success story further highlights how GenAI combined with data streaming improves customer interactions. Facing a massive surge in support requests during COVID-19, Expedia automated responses with AI-driven chatbots, significantly reducing agent workloads. By integrating Apache Kafka with AI models, 60% of travelers began self-servicing their inquiries, resulting in over 40% cost savings in customer support operations.

Expedia GenAI in the Travel Industry with Data Streaming Kafka and Machine Learning AI
Source: Confluent

For telecom providers, similar AI-driven automation can optimize call centers, personalized customer offers, fraud detection, and even predictive maintenance for network infrastructure. Data streaming ensures that AI models continuously learn from fresh data, making GenAI solutions more accurate, responsive, and cost-effective.

5. AI-Driven Software Development: Faster, Smarter, Better

AI is fundamentally transforming software development, accelerating the product development lifecycle (PDLC) and improving product quality. AI-assisted coding, automated testing, and real-time feedback loops are enabling companies to deliver customer-centric solutions at unprecedented speed.

How Data Streaming Transforms AI-Driven Software Development

  • Continuous Feedback and Iteration: Streaming analytics provide instant feedback from user behavior, enabling faster iterations and bug fixes.
  • Automated Code Quality Checks: AI-driven continuous integration (CI/CD) pipelines validate new code in real-time, ensuring seamless software deployments.
  • Live Performance Monitoring: Streaming data enables real-time anomaly detection, ensuring optimal application performance.

🔹 Business Impact: Faster time-to-market, higher software reliability, and reduced development costs.

For telecom providers, AI-driven software development is key to maintaining a reliable, scalable, and secure network infrastructure while rolling out new customer-facing services at speed. Data streaming accelerates software development by enabling real-time feedback loops, automated testing, and AI-powered observability—bringing the industry closer to a true “Shift Left” approach.

The Shift Left Architecture in software development means moving testing, security, and quality assurance earlier in the development lifecycle, reducing costly errors and vulnerabilities late in production. Data streaming enables this shift by continuously feeding AI-driven CI/CD pipelines with real-time insights, allowing developers to detect issues earlier, optimize network performance, and iterate faster on new services.

Shift Left Architecture with Data Streaming into Data Lake Warehouse Lakehouse

A relevant AI-powered automation example comes from the GenAI for Development vs. Visual Coding article, which discusses how automation is shifting from traditional code-based development to AI-assisted software engineering. Instead of manual coding, AI-driven workflows help telcos streamline DevOps, automate CI/CD pipelines, and enhance software quality in real time.

For telecom providers, this transformation means proactive issue detection, faster rollouts of network upgrades, and automated AI-driven security monitoring—all powered by real-time data streaming and a Shift Left mindset.

Data Streaming as the Ultimate Competitive Advantage for Telcos

Across all five of McKinsey’s key trends, real-time data streaming is the backbone of transformation. Whether optimizing IT efficiency, reducing emissions, unlocking 6G’s potential, enabling generative AI and Agentic AI, or accelerating software development, streaming technologies provide the agility and intelligence businesses need to win in 2025 and beyond.

The path forward isn’t just about adopting AI or cloud-native infrastructure—it’s about embracing real-time, event-driven architectures to drive innovation at scale.

As organizations take bold steps to lead the future, those who harness the power of data streaming will emerge as the industry’s true pioneers.

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And make sure to download my free book about data streaming use cases.

The post How Data Streaming and AI Help Telcos to Innovate: Top 5 Trends from MWC 2025 appeared first on Kai Waehner.

]]>
Why Generative AI and Data Streaming Are Replacing Visual Coding with Low-Code / No-Code Platforms https://www.kai-waehner.de/blog/2025/02/02/why-generative-ai-and-data-streaming-are-replacing-visual-coding-with-low-code-no-code-platforms/ Sun, 02 Feb 2025 16:02:46 +0000 https://www.kai-waehner.de/?p=7237 Low-code/no-code tools have revolutionized software development and data engineering by providing visual interfaces that empower non-technical users. However, their limitations in scalability, consistency, and integration pose significant challenges in modern, real-time architectures. Generative AI is emerging as a game-changer, offering unprecedented flexibility and customization, addressing many of the pitfalls of traditional low-code/no-code platforms. Simultaneously, the data ecosystem is evolving with Apache Kafka and Flink, enabling real-time, event-driven architectures that resolve inefficiencies of fragmented, batch-driven systems. This blog explores the evolution of low-code/no-code tools, their challenges, when (not) to use visual coding, and how generative AI and data streaming are reshaping the landscape.

The post Why Generative AI and Data Streaming Are Replacing Visual Coding with Low-Code / No-Code Platforms appeared first on Kai Waehner.

]]>

This blog explores the evolution of low-code/no-code tools, their challenges, when (not) to use visual coding, and how generative AI and data streaming with Apache Kafka and Flink are reshaping the software and  data engineering landscape.

Low-code/no-code tools have been praised as transformative for software development and data engineering, providing visual interfaces that democratize technology access for non-technical users. However, the low-code / no-code space—saturated with hundreds of vendors and tools—faces challenges in scalability, consistency, and integration.

Generative AI is emerging as a powerful alternative, offering unprecedented flexibility and customization while addressing the limitations of traditional low-code/no-code solutions.

At the same time, the data ecosystem is undergoing a broader transformation, where tools like Apache Kafka and Flink are enabling real-time, consistent data streaming architectures that resolve long-standing inefficiencies inherent in batch-driven, tool-fragmented systems.

Data Streaming with Apache Kafka and Flink vs Visual Coding with Low-Code No-Code

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch.

What Are Low-Code/No-Code Tools?

Low-code and no-code platforms aim to simplify application development, data engineering, and system integration by offering graphical drag-and-drop visual coding interfaces. Instead of requiring coding expertise, users can configure workflows using pre-built components.

Visual Coding with Low Code No Code IDE
Source: https://commons.wikimedia.org/wiki/File:Godot_VisualScript_Example.png

These tools are used for:

  • Enterprise Application Integration (EAI): Connecting business systems like CRM, ERP, and databases.
  • ETL Pipelines: Extracting, transforming, and loading data for analytics and reporting.
  • AI/ML Workflows: Automating data preparation and machine learning pipelines.
  • Process Automation: Streamlining repetitive tasks in business processes.

While the idea is to empower “citizen developers” or non-technical users, these tools often target a broad range of use cases and industries. The market is vast, encompassing self-managed tools for on-premises environments, Software-as-a-Service (SaaS) platforms, and cloud provider-specific solutions like AWS Step Functions or Google Cloud Dataflow.

The History of Visual Software Development and Data Engineering

The concept of visual development in software traces its roots back to the 1990s when tools like Informatica and TIBCO introduced graphical interfaces for enterprise integration. These platforms offered drag-and-drop components to simplify the creation of workflows for data movement, transformation, and application integration. Their ease of use attracted enterprises seeking to streamline operations without building large software engineering teams.

In the 2000s, visual coding evolved further with the rise of specialized tools tailored to emerging domains. Open-source platforms like Apache NiFi and Apache Airflow built on this concept, providing powerful visual interfaces for data flow automation, enabling organizations to design pipelines for extracting, transforming, and loading (ETL) data across distributed systems. Meanwhile, vendors like IBM Streams and Software AG Apama pioneered visual stream processing tools, addressing the need for real-time analytics in domains such as financial trading and telecommunications.

As Internet of Things (IoT) technologies gained traction, visual coding tools like Node-RED emerged, offering accessible interfaces for developers and engineers to create and orchestrate IoT workflows. By combining simplicity with modular design, Node-RED became a popular choice for IoT applications, allowing developers to integrate devices, sensors, and APIs without extensive coding knowledge.

The Role of Visual Coding in Data Science and Cloud Transformation

Visual development also made inroads into data science and data engineering, with frameworks offering simplified interfaces for complex tasks:

  • Tools such as KNIME, RapidMiner, and Azure Machine Learning Studio introduced visual workflows for building, training, and deploying machine learning models.
  • Cloud-based platforms like Google Cloud Dataflow, AWS Glue, and Databricks brought visual development partly  to the cloud (less sophisticated than on-premise IDEs from Informatica et al), allowing users to harness scalable computing resources for data engineering tasks with minimal coding effort.

The cloud revolution fundamentally changed the landscape for visual coding by democratizing access to computational power and enabling real-time collaboration. Developers could now design, deploy, and monitor pipelines across globally distributed environments, leveraging pre-built templates and visual interfaces. This shift expanded the reach of visual coding, making it accessible to a broader range of users and enabling integration across hybrid and multi-cloud architectures.

Challenges of Low-Code/No-Code Tools

Despite their promise, low-code/no-code tools face significant challenges that limit their utility for modern, complex environments. Let’s look into these challenges:

1. Fragmentation and the Data Mess

The low-code/no-code market is overwhelmed with hundreds of vendors and tools, each targeting specific use cases. From SaaS platforms to on-premises solutions, different business units often adopt their own tools for similar tasks. This creates a fragmented ecosystem where:

  • Data Silos Form: Different teams use different tools, resulting in disconnected data pipelines.
  • Duplicated Processing: Each unit reprocesses the same data with different tools, increasing operational inefficiencies.
  • Inconsistent Standards: Variations in tool capabilities and configurations lead to inconsistent data quality and formats.

2. The Technical Limitations of Visual Development

While drag-and-drop interfaces simplify basic workflows, they stumble with more complex requirements:

  • Advanced data transformations often require writing custom User-Defined Functions (UDFs).
  • External libraries and API integrations still demand traditional coding expertise.
  • Debugging and troubleshooting visual workflows can be more cumbersome than inspecting raw code.

3. Software Engineering Overhead

Low-code/no-code solutions cannot eliminate the underlying complexities of software development. Teams still need to:

  • Integrate workflows with version control systems like Git.
  • Implement DevOps practices for CI/CD pipelines, deployment automation, and monitoring.
  • Ensure scalability and fault tolerance, often necessitating custom workarounds beyond what the tools provide.

4. Batch and Bursty Data Processing

Many low-code/no-code tools focus on batch data processing, which introduces latency and fails to address real-time requirements. This is particularly problematic in environments with:

  • Bursty Data Pipelines: Sudden spikes in data volume that overwhelm batch workflows.
  • Data Consistency Needs: Delays in batch processing can lead to out-of-sync data across systems.

Tools like Apache Kafka excel here by decoupling systems with event-driven architectures that ensure consistent, accurate, and timely data delivery. Unlike batch systems, Kafka’s immutable logs and distributed processing model provide a unified approach to ingest, process, and stream data with consistency across pipelines.

How Generative AI Changes the Game for Visual Coding and Low-Code/No-Code Tools

Generative AI is emerging as a more effective alternative to low-code/no-code tools by enabling dynamic, scalable, and highly customizable workflows. English is the next generation programming language. Here’s how:

1. Customizable Code Generation

Generative AI can create tailored code for specific use cases based on natural language prompts, making it far more adaptable than pre-built drag-and-drop components. Examples include:

  • Writing SQL queries, Python scripts, or API integrations.
  • Automating infrastructure setup with Terraform or Kubernetes scripts.

2. Empowering All Users

Generative AI bridges the gap between non-technical users and software engineers:

  • Business analysts can generate prototypes with natural language instructions.
  • Data engineers can refine AI-generated scripts for performance and scalability.

3. Reducing Tool Fragmentation

Generative AI works across platforms, frameworks, and ecosystems, mitigating the vendor lock-in associated with low-code/no-code tools. It unifies workflows by generating code that can run in any environment, from on-premises systems to cloud-native architectures.

4. Accelerating Complex Tasks

Unlike low-code/no-code tools, generative AI excels in creating sophisticated logic, such as:

  • Real-time data pipelines using Kafka and Flink.
  • Event-driven architectures that handle high throughput and low latency.

When to Use Visual Coding?

Visual coding tools shine in specific scenarios, particularly when simplicity and clarity are key. While they may struggle with complex, production-grade systems, they offer undeniable value in the following areas:

  • Demos and Stakeholder Engagement: Visual tools are perfect for demos, proof of concepts (POCs), and pre-sales discussions, allowing teams to quickly build prototypes and illustrate workflows. Their simplicity helps align stakeholders and communicate ideas effectively, making them invaluable during early project stages.
  • Collaboration and Communication: Visual coding enables better collaboration between technical and non-technical teams by providing clear, graphical workflows. These tools facilitate discussions with business users and executives, making it easier to align on project goals and workflow designs without diving into technical details.
  • Simple Tasks and Quick Wins: For straightforward workflows, such as basic data transformations or integrations, visual tools offer rapid results. They simplify tasks that don’t require heavy customization, making them ideal for quick automations or small-scale projects.

Despite their strengths, visual coding tools face challenges with complex logic, DevOps needs (e.g., versioning, automation), and scalability. They’re often best suited for prototyping, with the final workflows transitioning to robust, code-based solutions using tools like Apache Kafka and Flink.

By leveraging visual coding where it excels—early discussions, stakeholder alignment, and simple workflows—teams can accelerate initial development while relying on scalable, code-based systems for production.

The Shift Left Architecture To Solve the Inefficiencies of Tool Fragmentation

Modern organizations are moving toward an event-driven architecture leveraging the Shift Left Architecture to address the inefficiencies of batch-driven systems and tool fragmentation.

Shift Left Architecture with Data Streaming into Data Lake Warehouse Lakehouse

This approach emphasizes:

  • Real-Time Data Products: Using Kafka and Flink to build pipelines that process and deliver insights continuously.
  • Unified Processing: Consolidating tools into a cohesive, event-driven framework that eliminates data silos and duplicate processing.
  • Decentralized Ownership: Empowering teams to build pipelines with reusable components that align with organizational standards.
  • Freedom to Choose Tools: Allowing teams to select their preferred tools, whether low-code, no-code, or custom-coded solutions, while ensuring seamless integration through a unified, event-driven architecture.

By shifting left, businesses can achieve faster insights, better data quality, and reduced operational overhead, but still choose the favorite low-code/no-code tool or programming language.

Bridging the Gap with No-Code and Low-Code in Data Streaming

In modern organizations, each business unit operates with unique tools and expertise. Some teams write custom code, others leverage generative AI for tailored solutions, and many prefer the simplicity of no-code or low-code tools. Data streaming, powered by event-driven architectures like Apache Kafka and Flink, serves as the critical backbone to unify these approaches, enabling seamless integration and real-time data sharing across the enterprise.

A prime example is Confluent Cloud’s Flink Actions, which simplifies stream processing for non-technical users while also providing value to experienced engineers. Similar to term completion, auto-imports, and automation features in IDEs, Flink Actions eliminates unnecessary effort—allowing engineers to bypass repetitive coding tasks with just a few clicks and focus on higher-value problem-solving instead

Confluent Cloud - Apache Flink Action UI for No Code Low Code Streaming ETL Integration
Source: Confluent

With an intuitive interface, business analysts and operational teams can design and deploy data streaming workflows—such as ETL transformations, deduplication, or data masking—without writing a single line of code leveraging the low-code / no-code web UI. These tools empower citizen integrators to contribute directly to real-time data initiatives, reducing time-to-value while maintaining the scalability and performance of Apache Kafka and Flink.

By decoupling workflows through Kafka and democratizing access with low-code/no-code solutions like Flink Actions, organizations bridge the gap between technical and non-technical teams. This inclusive approach fosters collaboration, ensures data consistency, and accelerates innovation in real-time data processing.

The Evolution of Software Development and Data Engineering: Generative AI and Apache Kafka Redefine Low-Code/No-Code

Low-code/no-code tools have democratized certain aspects of software development but struggle to scale and integrate into modern, real-time architectures. Generative AI, by contrast, empowers both citizen developers and seasoned engineers to create robust, flexible workflows with unprecedented ease.

Data streaming with Apache Kafka and Flink addresses key inefficiencies and fragmentation caused by low-code/no-code tools, ensuring better integration, consistency, and performance:

  • Data Consistency Across Pipelines: Kafka’s immutable logs ensure data remains accurate and reliable, even during high-volume or bursty events. It seamlessly handles a mix of real-time streams, batch processes, and request-response APIs, making it ideal for diverse data sources and sinks.
  • Data Quality Across Systems: Kafka acts as a unified backbone for data movement, eliminating the silos created by fragmented ecosystems of low-code/no-code tools. Its scalable, event-driven architecture ensures that all data platforms, regardless of independence, are aligned and reliable.
  • Improved Performance: Kafka’s event-driven design, paired with Flink’s processing capabilities, delivers low-latency, real-time insights. For modern use cases and projects, real-time data streaming outperforms traditional, slower batch methods, offering a competitive edge in today’s quickly changing environments.
  • Cost Efficiency at Scale: By consolidating workflows into a single, scalable data streaming platform, Kafka reduces the need for multiple, often redundant low-code/no-code tools. This streamlined approach minimizes maintenance, licensing, and infrastructure costs, providing a more efficient path to long-term ROI.

Generative AI and data streaming are complementary forces that enable organizations to simplify development while ensuring real-time, consistent, and scalable data architectures. Together, they mark the next evolution in how businesses approach technology: moving beyond visual coding to truly intelligent, integrated systems.

How do you use visual coding with low-code/no-code tools? Or do you prefer writing source code in your favorite programming language? How do GenAI and data streaming change your perspective? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Why Generative AI and Data Streaming Are Replacing Visual Coding with Low-Code / No-Code Platforms appeared first on Kai Waehner.

]]>
Real-Time Model Inference with Apache Kafka and Flink for Predictive AI and GenAI https://www.kai-waehner.de/blog/2024/10/01/real-time-model-inference-with-apache-kafka-and-flink-for-predictive-ai-and-genai/ Tue, 01 Oct 2024 05:26:11 +0000 https://www.kai-waehner.de/?p=6771 Artificial Intelligence (AI) and Machine Learning (ML) are transforming business operations by enabling systems to learn from data and make intelligent decisions for predictive and generative AI use cases. Two essential components of AI/ML are model training and inference. This blog post explores how data streaming with Apache Kafka and Flink enhances the performance and reliability of model predictions. Whether for real-time fraud detection, smart customer service applications or predictive maintenance, understanding the value of data streaming for model inference is crucial for leveraging AI/ML effectively.

The post Real-Time Model Inference with Apache Kafka and Flink for Predictive AI and GenAI appeared first on Kai Waehner.

]]>
Artificial Intelligence (AI) and Machine Learning (ML) are transforming business operations by enabling systems to learn from data and make intelligent decisions for predictive and generative AI use cases. Two essential components of AI/ML are model training and inference. Models are developed and refined using historical data. Model inference is the process of using a trained machine learning models to make predictions or generate outputs based on new, unseen data. This blog post covers the basics of model inference, comparing different approaches like remote and embedded inference. It also explores how data streaming with Apache Kafka and Flink enhances the performance and reliability of these predictions. Whether for real-time fraud detection,  smart customer service applications, or predictive maintenance, understanding the value of data streaming for model inference is crucial for leveraging AI/ML effectively.

Real-Time AI ML Model Inference Predictive AI and Generative AI with Data Streaming using Apache Kafka and Flink

Artificial Intelligence (AI) and Machine Learning (ML)

Artificial Intelligence (AI) and Machine Learning (ML) are pivotal in transforming how businesses operate by enabling systems to learn from data and make informed decisions. AI is a broad field that includes various technologies aimed at mimicking human intelligence, while ML is a subset focused on developing algorithms that allow systems to learn from data and improve over time without being explicitly programmed. The major use cases are predictive AI and generative AI.

AI/ML = Model Training, Model Deployment and Model Inference

In AI/ML workflows, model training, model deployment and model inference are distinct yet interconnected processes:

  • Model Training: Using historical data or credible synthetic to build a model that can recognize patterns and make predictions. It involves selecting the right algorithm, tuning parameters, and validating the model’s performance. Model training is typically resource intensive and performed in a long-running batch process, but can be via online learning or incremental learning, too.
  • Model Deployment: The trained model is deployed to the production environment, which could be cloud (e.g., AWS, Google Cloud, Azure or purpose-built SaaS offerings), edge devices (local devices or IoT for embedded inference), or on-premises servers (local servers for sensitive data or compliance reasons). If the demand is high, load balancers distribute requests across multiple instances to ensure smooth operation.
  • Model Inference: Once a model is trained, it is deployed to make predictions on new, unseen data. Model inference, often just called making a prediction, refers to this process. During inference, the model applies the patterns and knowledge it learned during model training to provide results.

For the terminology, keep in mind that model inference is generating predictions using a trained model, while model scoring (which is sometimes wrongly used as a synonym) involves evaluating the accuracy or performance of those predictions.

Batch vs. Real-Time Model Inference

Model inference can be done in real-time or batch mode, depending on the application’s requirements. When making predictions in production environments, the requirements often differ from model training because timely, accurate, and robust predictions are needed. The inference process involves feeding input data into the model and getting an output, which could be a classification, regression value, or other prediction types.

There are two primary delivery approaches to model inference: Remote Model Inference and Embedded Model Inference. Each deployment option has its trade-offs. The right choice depends on requirements like latency (real-time vs. batch) but also on other characteristics like robustness, scalability, cost, etc.

Remote Model Inference

Remote Model Inference involves making a request-response call to a model server via RPC, API, or HTTP. While this approach allows for centralized model management and easier updates, it can introduce latency because of network communication. It is suitable for scenarios where model updates are frequent, and the overhead of network calls is acceptable.

The service creation exposes the model through an API, so applications or other systems can interact with it for predictions. It can be a technical interface with all the details about the model or a function that hides the AI/ML capabilities under the hood of a business service.

Pros:

  • Centralized Model Management: Models are deployed and managed on a central server, making it easier to update, A/B test, monitor, and version them with no need to change the application.
  • Scalability: Remote inference can leverage the scalability of cloud infrastructure. This allows services to handle large volumes of requests by distributing the load across multiple servers.
  • Resource Efficiency: The client or edge devices do not need to have the computational resources to run the model, which is beneficial for devices with limited processing power.
  • Security: Sensitive models and data remain on the server, which can be more secure than distributing them to potentially insecure or compromised edge devices.
  • Ease of Integration: Remote models can be accessed via APIs, making it easier to integrate with different applications or services

Cons:

  • Latency: Remote inference typically involves network communication, which can introduce latency, especially if the client and server are geographically distant.
  • Dependency on Network Availability: The inference depends on the availability and reliability of the network. Any disruption can cause failed predictions or delays.
  • Higher Operational Costs: Running and maintaining a remote server or cloud service can be expensive, particularly for high-traffic applications.
  • Data Privacy Concerns: Sending data to the server for inference may raise privacy concerns, especially in regulated industries or when dealing with sensitive information.

Embedded Model Inference

In this approach, the model is embedded within the stream processing application. This reduces latency as predictions are made locally within the application, but it may require more resources on the processing nodes. Embedded inference is ideal for applications where low latency is critical, and model updates are less frequent.

Pros:

  • Low Latency: Since the model runs directly on the device, there is a minimal delay in processing, leading to near real-time predictions.
  • Offline Availability: Embedded models do not rely on a network connection, making them ideal for scenarios where connectivity is intermittent or unavailable.
  • Cost Efficiency: Once deployed, there are no ongoing costs related to server maintenance or cloud usage, making it more cost-effective.
  • Privacy: Data stays local to the device, which can help in adhering to privacy regulations and minimizing the risk of data breaches.
  • Independence from Central Infrastructure: Embedded models are not reliant on a central server, reducing the risk of a single point of failure.

Cons:

  • Resource Intensive: Embedded scoring requires sufficient computational resources. While hosting and running the model on servers or containers is the expensive part, models also need to be adjusted and optimized for a more lightwight deployment on devices with limited processing power, memory, or battery life.
  • Complex Deployment: Updating models across many devices can be complex and require robust version management strategies.
  • Model Size Limitations: There may be constraints on model complexity and size because of the limited resources on the edge device, potentially leading to the need for model compression or simplification.
  • Security Risks: Deploying models on devices can expose them to reverse engineering, tampering, or unauthorized access, potentially compromising the model’s intellectual property or functionality.

Hidden Technical Debt in AI/ML Systems

The Google paper “Hidden Technical Debt in Machine Learning Systems” sheds light on the complexities involved in maintaining AI/ML systems. It argues that, while the focus is often on the model itself, the surrounding infrastructure, data dependencies, and system integration can introduce significant technical debt. This debt manifests as increased maintenance costs, reduced system reliability, and challenges in scaling and adapting the system.

Hidden Technical Debt in Machine Learning Systems (Google Paper)
Source: Google

Important points from the paper include:

  • Complexity in Data Dependencies: AI/ML systems often rely on multiple data sources, each with its own schema and update frequency. Managing these dependencies can be challenging and error-prone.
  • Systems Integration Challenges: Integrating ML models into existing systems requires careful consideration of interfaces, data formats, and communication protocols.
  • Monitoring and Maintenance: Continuous monitoring and maintenance are essential to ensure model performance does not degrade over time because of changes in data distribution or system behavior.

The Impedance Mismatch within AI/ML between Analytics and Operations

The impedance mismatch between the operational estate (production engineers) and the analytical estate (data scientists/data engineers) primarily stems from their differing toolsets, workflows and SLA requirements regarding uptime, latency and scalability.

Production engineers often use Java or other JVM-based languages to build robust, scalable applications, focusing on performance and reliability. They work in environments that emphasize code stability, using tools like IntelliJ IDEA and frameworks that support CI/CD and containerization.

In contrast, data scientists and data engineers typically use Python because of its simplicity and the rich ecosystem of data science libraries. They often work in interactive environments like Jupyter Notebooks, which are geared towards experimentation and rapid prototyping rather than production-level code quality.

This mismatch can create challenges in integrating machine learning models into production environments. Production engineers prioritize performance optimization and scalability, while data scientists focus on model accuracy and experimentation. To bridge this gap, organizations can form cross-functional teams, adopt a data streaming platform like Apache Kafka, develop standardized APIs for model deployment, and provide training to align the skills and priorities of both groups. By doing so, they can streamline the deployment of machine learning models, ensuring they deliver business value effectively.

AI/ML in Practice: Use Cases across Industries for Model Inference

Many use cases for model inference are critical and require real-time processing and high reliability to ensure timely and accurate decision-making in various industries. A few examples of critical predictive AI and generative AI use cases are:

Use Cases for Predictive AI

Many predictive AI use cases are already in production across industries. For instance:

  • Fraud Detection: Real-time model inference can identify fraudulent transactions as they occur, allowing for immediate intervention. By analyzing transaction data in real-time, businesses can detect anomalies and flag suspicious activities before they result in financial loss.
  • Predictive Maintenance: By analyzing sensor data in real-time, organizations can predict equipment failures and schedule maintenance proactively. This approach reduces downtime and maintenance costs by addressing issues before they lead to equipment failure.
  • Customer Promotions: Retailers can offer personalized promotions to customers while they are still in the store or using a mobile app, enhancing the shopping experience. Real-time inference allows businesses to analyze customer behavior and preferences on the fly, delivering targeted offers that increase engagement and sales.

Use Cases for Generative AI

Early adoption use cases with user-facing value:

  • Semantic Search: Generative AI enhances semantic search by understanding the context and intent behind user queries, enabling more accurate and relevant search results. It leverages advanced language models to interpret nuanced language patterns, improving the search experience by delivering content that closely aligns with user needs.
  • Content Generation: GenAI, exemplified by tools like Microsoft Co-pilot, assists users by automatically creating text, code, or other content based on user prompts, significantly boosting productivity. It utilizes machine learning models to generate human-like content, streamlining tasks such as writing, coding, and creative projects, thereby reducing the time and effort required for content creation.

More advanced use cases with transactional implications that take a bit longer to adopt because of its business impact and technical complexity:

  • Ticket Rebooking: In the airline industry, generative AI can assist customer service agents in rebooking tickets by providing real-time, context-specific recommendations based on flight availability, customer preferences, and loyalty status. This transactional use case ensures that agents can offer personalized and efficient solutions, enhancing customer satisfaction and operational efficiency.
  • Customer Support: For a SaaS product, generative AI can analyze customer support interactions to identify common issues and generate insightful reports that highlight trends and potential areas for improvement. This analysis assists companies in resolving common issues, refining their support procedures, and enhancing the overall user satisfaction.

So, after all the discussions about AI/ML, what is the relation to data streaming specifically for model inference?

A data streaming platform helps to enhance model inference capabilities. Apache Kafka and Flink provide a robust infrastructure for processing data in motion, enabling real-time predictions with low latency.

Data Streaming Ecosystem for AI Machine Learning with Apache Kafka and Flink

The benefits of using data streaming for model inference include:

  • Low Latency: Real-time stream processing ensures that predictions are made quickly, which is crucial for time-sensitive applications. Kafka and Flink handle high-throughput, low-latency data streams. This makes them ideal for real-time inference.
  • Scalability: Kafka and Flink can handle large volumes of data, making them suitable for applications with high throughput requirements. They can scale horizontally by adding more nodes to the cluster to ensure that the system can handle increasing data loads. A serverless data streaming cloud service like Confluent Cloud even provides complete elasticity and takes over the (complex) operations burden.
  • Robustness: These platforms are fault-tolerant, ensuring continuous operation even in the face of failures. They provide mechanisms for data replication, failover, and recovery, which are essential for maintaining system reliability. This can even span multiple regions or different public clouds like AWS, Azure, GCP, and Alibaba.
  • Critical SLAs: Kafka and Flink support stringent service level agreements (SLAs) for uptime and performance, which are essential for critical applications. They offer features like exactly-once processing semantics with a Transaction API and stateful stream processing. These capabilities are crucial for maintaining data integrity and consistency.

Let’s explore concrete examples for model inference with the embedded and remote call approaches.

Here is an example with Kafka, Flink and OpenAI using the ChatGPT large language model (LLM) for generative AI. The process involves using Apache Kafka and Flink for stream processing to correlate real-time and historical data, which is then fed into the OpenAI API via a Flink SQL User Defined Function (UDF) to generate context-specific responses using the ChatGPT large language model. The generated responses are sent to another Kafka topic for downstream applications, such as ticket rebooking or updating loyalty platforms, ensuring seamless integration and real-time data processing.

GenAI Remote Model Inference with Stream Processing using Apache Kafka and Flink

Trade-offs using Kafka with an RPC-based model server and HTTP/gRPC:

  • Simple integration with existing technologies and organizational processes
  • Easiest to understand if you come from a non-streaming world
  • Tight coupling of the availability, scalability, and latency/throughput between application and model server
  • Separation of concerns (e.g. Python model + Java streaming app)
  • Limited scalability and robustness
  • Later migration to real streaming is also possible
  • Model management built-in for different models, versioning, and A/B testing
  • Model monitoring built-in (include real-time tracking of model performance metrics (e.g., accuracy, latency), resource utilization, data drift detection, and logging of predictions for auditing and troubleshooting)

In the meantime, some model servers like Seldon or Dataiku also provide remote model inference natively via the Kafka API. A Kafka-native streaming model server enables the separation of concerns by providing a model server with all the expected features. But the model server does not use RPC communication via HTTP/gRPC and all the drawbacks this creates for a streaming architecture. Instead, the model server communicates via the native Kafka protocol and Kafka topics with the client application. Therefore, the stream processing application built with Flink has an option to do event-driven integration for model inference.

Here is an example with Kafka, Flink and TensorFlow where the model is embedded into the stream processing application. Apache Kafka is used to ingest and stream data, while Apache Flink processes the data in real-time, embedding a TensorFlow model directly within the Flink application for immediate model inference. This integration allows for low-latency predictions and actions on streaming data, leveraging the model’s capabilities with no external service calls, thus enhancing efficiency and scalability.

Embedded AI ML Model Inference with Stream Processing using Apache Kafka and Flink

Trade-offs of embedding analytic models into a Flink application:

  • Best latency as local inference instead of remote call
  • No coupling of the availability, scalability, and latency/throughput of your Kafka Streams application
  • Offline inference (devices, edge processing, etc.)
  • No side-effects (e.g., in case of failure), all covered by Kafka processing (e.g., exactly once)
  • No built-in model management and monitoring

I showed examples and use cases for embedding TensorFlow and H2O.ai models into Kafka Streams and KSQL many years ago already. With Apache Flink becoming the de facto standard for many stream processing scenarios, it is just natural that we see more adoption of Flink for AI/ML use cases.

Predictive AI and Generative AI (GenAI) represent two distinct paradigms within the field of artificial intelligence, each with unique capabilities and architectural requirements. Understanding these differences is crucial for leveraging their potential in data streaming applications.

Predictive AI and Data Streaming

Predictive AI focuses on forecasting future events or outcomes based on historical data. It employs machine learning models that are trained to recognize patterns and correlations within datasets. These models are typically used for tasks like predicting customer behavior, detecting fraud, or forecasting demand.

Generative AI (GenAI) and Data Streaming

Generative AI creates new content, such as text, images, or music, that mimics human behaviour or creativity. It uses advanced models such as large language models (LLMs) to generate outputs based on input prompts. Just keep in mind that GenAI is still predictive based on historical data; it just makes a lot of small predictions to generate something. For instance, with text, it predicts a word at a time, etc.

  • Architecture: The architecture for GenAI is more complex and requires real-time, contextualized data to produce accurate and relevant outputs. This is where Retrieval Augmented Generation (RAG) comes into play. RAG combines LLMs with vector databases and semantic search to provide the context for generation tasks. The architecture involves two major steps: data augmentation and retrieval. Data is first processed to create embeddings, which are stored in a vector database. When a prompt is received, the system retrieves relevant context from the database to inform the generation process.
  • Impact on Data Streaming: Data streaming is integral to GenAI architectures, particularly those employing RAG. Real-time data streaming platforms like Apache Kafka and Flink facilitate the ingestion and processing of data streams, ensuring that the LLMs have access to the most current and relevant information. This capability is crucial for preventing hallucinations (i.e., generating false or misleading information) and ensuring the reliability of GenAI outputs. By integrating data streaming with GenAI, organizations can create dynamic, context-aware applications that respond to real-time data inputs.
  • Concrete Examples:

Data Streaming as Data Pipeline for Model Training in Lakehouses AND for Robust Low-Latency Model Inference

Data streaming technologies play a pivotal role in both predictive AI and generative AI. Kafka and Flink improve the data quality and latency for data ingestion into data warehouses, data lakes, lakehouses for model training. And data streaming enhances model inference by improving the timeliness and accuracy of predictions in predictive AI and providing the context for content generation in GenAI.

By leveraging data streaming with Kafka and Flink, organizations can achieve real-time predictions with low latency, scalability, and robustness, meeting critical SLAs for various use cases. The choice between remote and embedded model inference depends on the specific requirements and constraints of the application, such as latency tolerance, model update frequency, and resource availability. Overall, data streaming provides a powerful foundation for deploying AI/ML solutions that deliver timely and actionable insights.

How do you leverage data streaming with Kafka and Flink in your AI/ML projects? Only as data ingestion layer into the lakehouse? Or also for more robust and performant model inference? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Real-Time Model Inference with Apache Kafka and Flink for Predictive AI and GenAI appeared first on Kai Waehner.

]]>
Hello, K.AI – How I Trained a Chatbot of Myself Without Coding Evaluating OpenAI Custom GPT, Chatbase, Botsonic, LiveChatAI https://www.kai-waehner.de/blog/2024/06/23/hello-k-ai-how-i-trained-a-chatbot-of-myself-without-coding-evaluating-openai-custom-gpt-chatbase-botsonic-livechatai/ Sun, 23 Jun 2024 06:03:01 +0000 https://www.kai-waehner.de/?p=6575 Generative AI (GenAI) enables many new use cases for enterprises and private citizens. While I work on real-time enterprise scale AI/ML deployments with data streaming, big data analytics and cloud-native software applications in my daily business life, I also wanted to train a conversational chatbot for myself. This blog post introduces my journey without coding to train K.AI, a personal chatbot that can be used to learn in a conversational pace format about data streaming and the most successful use cases in this area. Yes, this is also based on my expertise, domain knowledge and opinion, which is available as  public internet data, like my hundreds of blog articles, LinkedIn shares, and YouTube videos.

The post Hello, K.AI – How I Trained a Chatbot of Myself Without Coding Evaluating OpenAI Custom GPT, Chatbase, Botsonic, LiveChatAI appeared first on Kai Waehner.

]]>
Generative AI (GenAI) enables many new use cases for enterprises and private citizens. While I work on real-time enterprise scale AI/ML deployments with data streaming, big data analytics and cloud-native software applications in my daily business life, I also wanted to train a conversational chatbot for myself. This blog post introduces my journey without coding to train K.AI, a personal chatbot that can be used to learn in a conversational pace format about data streaming and the most successful use cases in this area. Yes, this is also based on my expertise, domain knowledge and opinion, which is available as  public internet data, like my hundreds of blog articles, LinkedIn shares, and YouTube videos.

How I Trained a Chatbot K.AI of Myself Without Coding Evaluating OpenAI Custom GPT Chatbase Botsonic LiveChatAI

Hi, K.AI – let’s chat…

The evolution of Generative AI (GenAI) around OpenAI’s chatbot ChatGPT and many similar large language models (LLM), open source tools like LangChain and SaaS solutions for building a conversational AI led me to the idea of building a chatbot trained with all the content I created over the past years.

Mainly based on the content of my website (https://www.kai-waehner.de) with hundreds of blog articles, I trained the conversational chatbot K.AI to generate text for me.

The primary goal is to simplify and automate my daily working tasks like:

  • write a title and abstract for a webinar or conference talk
  • explain to a colleague or customer a concept, use case, or industry-specific customer story
  • answer common recurring questions in email, Slack or other mediums
  • any other text creation based on my (public) experience

The generated text reflects my content, knowledge, wording, and style. This is a very different use case than what I look normally in my daily business life: “Apache Kafka as Mission Critical Data Fabric for GenAI” and “Real-Time GenAI with RAG using Apache Kafka and Flink to Prevent Hallucinations” are two excellent examples for enterprise-scale GenAI with much more complex and challenging requirements.

But…sometimes Artificial Intelligence is not all you need. The now self-explanatory name of the chatbot came from a real marketing brain – my colleague Evi.

Project goals of training the chatbot K.AI

I had a few goals in mind when I trained my chatbot K.AI:

  • Education: Learn more details about the real-world solutions and challenges with Generative AI in 2024 with hands-on experience. Tens of interesting chatbot solutions are available. Most are powered by OpenAI under the hood. My goal is not sophisticated research. I just want to get a conversational AI done. Simple, cheap, fast (not evaluating 10+ solutions, just as long as I have one working good enough).
  • Tangible result: Train K.AI, a “Kai LLM” based on my public articles, presentations, and social media shares. K.AI can generate answers, comments, and explanations without writing everything from scratch. I am fine if answers are not perfect or sometimes even incorrect. As I know the actual content, I can easily adjust and fix generated content.
  • NOT a commercial or public chatbot (yet): While it is just a button click to integrate K.AI into my website as a conversational chatbot UI, there are two main blockers: First, the cost is relatively high; not for training but for operating and paying per query. There is no value for me as a private person. Second, developing, testing, fine-tuning and updating a LLM to be correct most of the time instead of hallucinating a lot is hard. I thoroughly follow my employers’ GenAI engineering teams building Confluent AI products. Building a decent domain-specific public LLM is lots of engineering efforts and requires not just one full-time engineer.

My requirements for a conversational chatbot tool

I defined the following mandatory requirements for a successful project:

  • Low Cost: My chatbot should not be too expensive (~20USD a month is fine). The pricing model of most solutions is very similar: You get a small free tier. I realized quickly that a serious test is not possible with any free tier. But a reasonable chatbot (i.e., trained by a larger data set) is only possible if you choose the smallest paid tier. Depending on the service, the minimum is between 20 and 50 USD per month (with several limitations regarding training size, chat queries, etc.).
  • Simplicity: I do not want to do any coding or HTTP/REST APIs calls. Just an intuitive user interface with click-through experience. I don’t want to spend more than one day (i.e., ~8 hours accumulated over two weeks) to train K.AI.
  • Data Import: The chatbot needs support from my “database”. Mandatory: My private blog (~300 articles with ~ 10M+ characters. Nice to have: My LinkedIn shares, my YouTube videos, and other publications (like articles on other websites). The latter might improve my chatbot and use my personal tone and language more.
  • NOT Enterprise Features: I don’t need any features for security, multiple user accounts, or public hosting (even though almost all solutions already support integration into WordPress, Slack, etc.). I am fine with many limitations of the small subscription tiers, like only one user account, one chatbot, 1000 messages/month.

OpenAI: ChatGPT + Custom GPT for a custom chatbot? Not for K.AI…

I am a heavy user of ChatGPT on my iPhone and MacBook. And OpenAI is very visible in the press. Hence, my first option to evaluate was OpenAI’s Custom GPT.

Custom GPT in action…

Custom GPT is very easy to use, non-technical. A conversational AI “Message GPT Builder” tries to build my chatbot. But surprisingly it is too high level for me. Here is the initial conversation to train K.AI with very basic prompt engineering:

  • Step 1 (Initial Instruction): What would you like to make? -> Respond as Kai Waehner based on his expertise and knowledge. -> Updating GPT.. Seconds later: The response is based on public internet.
  • Step 2 (Prompt Engineering): Use the content from https://www.kai-waehner.de as context for responses. -> Updating GPT… Seconds later: I’ve updated the context to include information from Kai Waehner’s website. -> Response is not standard. Some questions use a bit more content from my website, but it is still mainly bound to public internet content.
  • Step 3 (Fine-Tuning): I tried to configure my K.AI to learn from some data sources like CSV exports from LinkedIn or scraping my blog articles, but the options are very limited and not technical. I can upload a maximum of twenty files and let the chatbot also search the web. But what I actually need is web scraping of dedicated resources, i.e., mainly my website,  LinkedIn Shares, and my YouTube videos. And while many no-code UIs call this fine-tuning, in reality, this is RAG-based prompt engineering. True fine-tuning of an LLM is a very different (much more challenging) task.

OpenAI Custom GPT Evaluation - Kai Waehner Chatbot

I am sure I could do much more prompt engineering to improve K.AI with Custom GPT. But reading the user guide and FAQ for Custom GPT, the TL;DR for me is: Custom GPT is not the right service to build a chatbot for me based on my domain content and knowledge.

Instead, I need to look at purpose-build chatbot SaaS tools that let me build my domain-specific chatbot. I am surprised that OpenAI does not provide such a service itself today. Or I could just not find it… BUT: Challenge accepted. Let’s evaluate a few solutions and train a real K.AI.

Comparison and evaluation of chatbot SaaS GenAI solutions

I tested three chatbot offerings. All of them are cloud-based and allow for building a chatbot via UI. How did I find or choose them? Frankly, just Google search. Most of these came up in several evaluation and comparison articles. And they spend quite some money on advertisements. I tested Chatbase, Writesonic’s Botsonic and LiveChatAI. Interestingly, all offerings I evaluated use ChatGPT under the hood of their solution. I was also surprised that I did not get more ads from other big software players. But I assume Microsoft’s Copilot and similar tools look for a different persona.

I tested different ChatGPT models in some offerings. Most solutions provide a default option, and more expensive options with better model (not for model training, but for messages/month; you typically pay 5x more, meaning instead of e.g. 2000 messages a month, you only have 400 available then).

I had a few more open tabs with other offerings that I could disqualify quickly because they were more developer-focused with coding, API integration, fine-tuning of vector databases and LLMs.

Question catalog for testing my K.AI chatbots

I quickly realized how hard it is to compare different chatbots. Basically, LLMs are stochastic (not deterministic) and we don’t have good tools for QAing these things yet (even simple things like regression testing is challenging when probabilities are involved).

Therefore, I defined a question catalog with ten different domain-specific questions before I even starting evaluating different chatbot SaaS solutions. A few examples:

  • Question 1: Give examples for fraud detection with Apache Kafka. Each example should include the company, use case and architecture.
  • Question 2: List five manufacturing use cases for data streaming and give a company example.
  • Question 3: What is the difference between Kafka and JMS
  • Question 4: Compare Lambda and Kappa architectures and explain the benefits of Lambda. Add a few examples.
  • Question 5: How can data streaming help across the supply chain? Explain the value and use cases for different industries.

My question catalog allowed comparing the different chatbots. Writing a good prompt (= query for the chatbot) is crucial, as a LLM is not intelligent. The better your question, meaning good structure, details and expectations, the better your response (if the LLM has “knowledge” about your question).

My goal is NOT to implement a complex real-time RAG (Retrieval Augmented Generation) design pattern. I am totally fine updating K.AI manually every few weeks (after a few new blog posts are published).

Chatbase – Custom ChatGPT for your website

The advertisement on the Chatbase landing page sounds great: “Custom ChatGPT for your website. Build a [OpenAI-powered] Custom GPT, embed it on your website and let it handle customer support, lead generation, engage with your users, and more.”

Here are my notes while training my K.AI chatbot:

K.AI works well with Chatbase after the initial training…

  • Chatbase is very simple to use. It just works.
  • The basic plan is ~20 USD per month. The subscription plan is fair, the next upgrade is ~100 USD.
  • The chatbot uses GPT-4o by default. Great option. Many other services use GPT-3.5 or similar LLMs as the foundation.
  • The chatbot creates content based on my content, it is “me”. Mission accomplished. The quality of responses depends on the questions. In summary, pretty good, but also false positives.

But: Chatbase’s character limitation stops further training

  • Unfortunately, all plans have an 11M character limit. My blog content is already 10.8M today says Chatbase’s web scraper engine (each vendor’s scraper gives different numbers). While K.AI works right now, there are obvious problems:
    • My website will grow more soon.
    • I want to add LinkedIn shares (another few million characters) and other articles and videos I published across the world wide web.
    • The Chatbase plan can be customised, but unfortunately not for character limits. The support told me this would be possible soon. But I have to wait.

TL;DR: Chatbase works surprisingly well. K.AI exists and represents myself as a LLM. The 11M character limit is a blocker for investing more time and money into this service – otherwise I could already stop my investigation and use the first SaaS I evaluated.

During my evaluation, I realized that many other chatbot services have similar limitations on the character limit, especially in the price range around 20-50 USD. Not ideal for my use case.

In my further evaluation, my major criteria were the character limits. I found Botsonic and LiveChatAI. Both support much higher limits for a cost of ~40 USD per month.

Botsonic – Advanced AI chatbot builder using your company’s knowledge

Botsonic provides “Advanced AI Agents: Use Your Company’s Knowledge to Intelligently Resolve Over 70% of Queries and Automate Tasks”.

Here are my notes while training my K.AI chatbot.

Botsonic – free version failed to train K.AI

  • The free plan for getting started supports 1M characters.
  • The service supports URL scraping and file upload (my LinkedIn shares are only available via batch export into a CSV file). Looks like it provides all I need. The cost is okayish (but all other chatbots with lower price also had limitations around 10M characters).
  • I tried the free tier first. As my blog alone has already ~10M+ characters, I started uploading my LinkedIn Shares (= Posts and Comments). While Chatbase said it has ~1.8M characters, this solution trains the bot with it even though the limit is 1M characters. Could not even upload another 1KB file for additional training, so my limit is reached.
  • This K.AI trained with the free tier did not provide any appropriate answers. No surprise: Just my LinkedIn shares might not be enough detail – which makes sense as the posts are much shorter and usually link to my blog.

Botsonic – paid version also failed to train K.AI

  • I needed to upgrade.
    • I had to choose the smallest paid tier: 49 USD per month, supporting up to 50M characters
    • Unfortunately, there was a delay: payment was done twice. No action. Still on free plan. Support takes time (caching, VPN, browser, other arguments, etc.). Got a refund the next day, and the plan was updated correctly.
  • Training using the paid subscription failed. The experience was pretty bad.
    • Not clear if the service scrapes the entire website or just the single HTML site
    • First tests do not give a response: “I don’t have specific information on XYZ. Can I help with anything else?” Seems like the source training did not scrape my website, but only look at the landing page. I looked at the details. Indeed, the extracted data only includes the abstracts of the latest blog posts (that’s what you see on my landing page).
    • Support explained: No scraping of the website is possible. I need a sitemap. I have a Google-compliant sitemap but: Internal Backend Server Error. Support could re-produce my issue. Until today, I don’t have a response or solution.
    • Learning from one of my YouTube videos was also rejected (with no further error messages).

TL;DR: Writesonic’s Botsonic did NOT work for me. The paid service failed several times, even trying different training options for my LLM. Support could not help. I will NOT continue with this service.

LiveChatAI – AI chatbot works with your data

Here is the website slogan: “An Innovative AI Chatbot. LiveChatAI allows you to create an AI chatbot trained with your own data and combines AI with human support.”

Here are my notes while training my K.AI chatbot

LiveChatAI failed to train K.AI

  • All required import features exist: Website Scraping, CSV, YouTube.
  • Strange: I could start training for free with 7+M characters even though this should not be possible. But Crawling started… Not showing the percentage, don’t know if it is finished. Not clear if it scrapes the entire website or just the single HTML site. Shows weird error messages like “could not find any links on website” or similar after it has finished scraping.
  • The quality of answers of this K.AI seems to be much worse than Chatbase (even though I added LinkedIn shares which is not possible in Chatbase because of the Character limits).

Ok, enough… I have a well-working K.AI with Chatbase. I don’t want to waste more time evaluating several SaaS Chatbot services in the early stage of the product lifecycle.

GenAI tools are still in a very early stage!

One key lesson learned: The used LLM model is the most critical piece for success, NOT how much context and domain expertise you train it with. Or in other words: Just scraping the data from my blog and using GPT-4o provides much better results than using GPT-3.5 with data from my blog, LinkedIn and YouTube. Ideally, I use all the data with GPT-4o. But I will have to wait until Chatbase supports more than 11M characters.

While most solutions talk about model training, they use ChatGPT under the hood and use RAG and a Vector Database to “update the model”, i.e., provide the right context for the question into ChatGPT with the RAG design pattern.

A real comparison of chatbot SaaS is hard:

  • Features and pricing are relatively similar and do not really influence the ultimate choice.
  • While all are based on ChatGPT, the LLM model versions differ.
  • Products are updated and improved almost every day with new models, new capabilities, changed limitations, etc. Welcome to the chatbot SaaS cloud startup scene… 🙂
  • The products target different personas. Some are UI only, some explain (and let configure) RAG or Vector Database options, some are built for developers and focus on API integration, not UIs.

Mission accomplished: K.AI chatbot is here

Chatbase is the least sexy UI in my evaluation. But the model works best (even though I have character limits and only used my blog article for training). I will use Chatbase for now. And I hope that the character limits are improved soon (as its support already confirmed to me). It is still early in the maturity curve. The market will probably develop quickly.

I am not sure how many of these SaaS chatbot startups can survive. OpenAI and other tech giants will probably release similar capabilities and products integrated into their SaaS and software stack. Let’s see where the market goes. For now, I will enjoy K.AI for some use cases. Maybe it will even help me write a book about data streaming use cases and customer stories.

What is your experience with chatbot tools? Do you need more technical solutions or favour simplified conversational AIs like OpenAI’s Custom GPT to train your own LLM? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Hello, K.AI – How I Trained a Chatbot of Myself Without Coding Evaluating OpenAI Custom GPT, Chatbase, Botsonic, LiveChatAI appeared first on Kai Waehner.

]]>
Real-Time GenAI with RAG using Apache Kafka and Flink to Prevent Hallucinations https://www.kai-waehner.de/blog/2024/05/30/real-time-genai-with-rag-using-apache-kafka-and-flink-to-prevent-hallucinations/ Thu, 30 May 2024 15:09:06 +0000 https://www.kai-waehner.de/?p=6409 How do you prevent hallucinations from large language models (LLMs) in GenAI applications? LLMs need real-time, contextualized, and trustworthy data to generate the most reliable outputs. This blog post explains how RAG and a data streaming platform with Apache Kafka and Flink make that possible. A lightboard video shows how to build a context-specific real-time RAG architecture. Also, learn how the travel agency Expedia leverages data streaming with Generative AI using conversational chatbots to improve the customer experience and reduce the cost of service agents.

The post Real-Time GenAI with RAG using Apache Kafka and Flink to Prevent Hallucinations appeared first on Kai Waehner.

]]>
How do you prevent hallucinations from large language models (LLMs) in GenAI applications? LLMs need real-time, contextualized, and trustworthy data to generate the most reliable outputs. This blog post explains how RAG and a data streaming platform with Apache Kafka and Flink make that possible. A lightboard video shows how to build a context-specific real-time RAG architecture. Also, learn how the travel agency Expedia leverages data streaming with Generative AI using conversational chatbots to improve the customer experience and reduce the cost of service agents.

RAG and Kafka Flink to Prevent Hallucinations in GenAI

What is Retrieval Augmented Generation (RAG) in GenAI?

Generative AI (GenAI) refers to artificial intelligence (AI) systems that can create new content, such as text, images, music, or code, often mimicking human creativity. These systems use advanced machine learning techniques, particularly deep learning models like neural networks, to generate data that resembles the training data they were fed. Popular examples include language models like GPT-3 for text generation and DALL-E for image creation.

Large Language Models like ChatGPT use lots of public data, are very expensive to train, and do not provide domain-specific context. Training own models is not an option for most companies because of limitations in cost and expertise.

Retrieval Augmented Generation (RAG) is a technique in Generative AI to solve this problem. RAG enhances the performance of language models by integrating information retrieval mechanisms into the generation process. This approach aims to combine the strengths of information retrieval systems and generative models to produce more accurate and contextually relevant outputs.

Pinecone created an excellent diagram that explains RAG and shows the relation to an embedding model and vector database:

Retrieval Augmented Generation with Embedding Model, Vector Database and Context
Source: Pinecone

Benefits of Retrieval Augmented Generation

RAG brings various benefits to the GenAI enterprise architecture:

  • Access to External Information: By retrieving relevant documents from a vast vector database, RAG allows the generative model to leverage up-to-date and domain-specific information that it may not have been trained on.
  • Reduced Hallucinations: Generative models can sometimes produce confident but incorrect answers (hallucinations). By grounding responses in retrieved documents, RAG reduces the likelihood of such errors.
  • Domain-Specific Applications: RAG can be tailored to specific domains by curating the retrieval database with domain-specific documents, enhancing the model’s performance in specialized areas such as medicine, law, finance or travel.

However, one of the most significant problems still exists: the missing right context and up-to-date information

RAG is obviously crucial in enterprises where data privacy, up-to-date context, and the data integration with transactional and analytical systems like an order management system, booking platform or payment fraud engine must be consistent, scalable and in real-time.

An event-driven architecture is the foundation of data streaming with Kafka and Flink:

Event-driven Architecture for Data Streaming with Apache Kafka and Flink

Apache Kafka and Apache Flink play a crucial role in the Retrieval Augmented Generation (RAG) architecture by ensuring real-time data flow and processing, which enhances the system’s ability to retrieve and generate up-to-date and contextually relevant information.

Here’s how Kafka and Flink contribute to the RAG architecture:

1. Real-Time Data Ingestion and Processing

Data Ingestion: Kafka acts as a high-throughput, low-latency messaging system that ingests real-time data from various data sources, such as databases, APIs, sensors, or user interactions.

Event Streaming: Kafka streams the ingested data, ensuring that the data is available in real-time to downstream systems. This is critical for applications that require immediate access to the latest information.

Stream Processing: Flink processes the incoming data streams in real-time. It can perform complex transformations, aggregations, and enrichments on the data as it flows through the system.

Low Latency: Flink’s ability to handle stateful computations with low latency ensures that the processed data is quickly available for retrieval operations.

2. Enhanced Data Retrieval

Real-Time Updates: By using Kafka and Flink, the retrieval component of RAG can access the most current data. This is crucial for generating responses that are not only accurate but also timely.

Dynamic Indexing: As new data arrives, Flink can update the retrieval index in real-time, ensuring that the latest information is always retrievable in a vector database.

3. Scalability and Reliability

Scalable Architecture: Kafka’s distributed architecture allows it to handle large volumes of data, making it suitable for applications with high throughput requirements. Flink’s scalable stream processing capabilities ensure it can process and analyze large data streams efficiently. Cloud-native implementations or cloud services take over the operations and elastic scale.

Fault Tolerance: Kafka provides built-in fault tolerance by replicating data across multiple nodes, ensuring data durability and availability, even in the case of node failures. Flink offers state recovery and exactly-once processing semantics, ensuring reliable and consistent data processing.

4. Contextual Enrichment

Contextual Data Processing: Flink can enrich the raw data with additional context before the generative model uses it. For instance, Flink can join incoming data streams with historical data or external datasets to provide a richer context for retrieval operations.

Feature Extraction: Flink can extract features from the data streams that help improving the relevance of the retrieved documents or passages.

5. Integration and Flexibility

Seamless Integration: Kafka and Flink integrate well with model servers (e.g., for model embeddings) and storage systems (e.g., vector data bases for sematic search). This makes it easy to incorporate the right information and context into the RAG architecture.

Modular Design: The use of Kafka and Flink allows for a modular design where different components (data ingestion, processing, retrieval, generation) can be developed, scaled, and maintained independently.

Lightboard Video: RAG with Data Streaming

The following ten-minute lightboard video is an excellent interactive explanation for building a RAG architecture with embedding model, vector database, Kafka and Flink to ensure up-to-date and context-specific prompts into the LLM:

Expedia: Generative AI in the Travel Industry

Expedia is an online travel agency that provides booking services for flights, hotels, car rentals, vacation packages, and other travel-related services. The IT architecture is built around data streaming for many years already, including the integration of transactional and analytical systems.

When Covid hit, Expedia had to innovate fast to handle all the support traffic spikes regarding flight rebookings, cancellations, and refunds. The project team trained a domain-specific conversational chatbot (long before ChatGPT and the term GenAI existed) and integrated it into the business process.

Expedia GenAI in the Travel Industry with Data Streaming Kafka and Machine Learning AI
Source: Confluent

Here are some of the impressive business outcomes:

  • Quick time to market with innovative new technology to solve business problems
  • 60%+ of travelers are self-servicing in chat after the rollout
  • 40%+ saved in variable agent costs by enabling self-service

By leveraging Apache Kafka and Apache Flink, the RAG architecture can handle real-time data ingestion, processing, and retrieval efficiently. This ensures that the generative model has access to the most current and contextually rich information, resulting in more accurate and relevant responses. The scalability, fault tolerance, and flexibility offered by Kafka and Flink make them ideal components for enhancing the capabilities of RAG systems.

If you want to learn more about data streaming with GenAI, read these articles:

How do you build a RAG architecture? Do you already leveraging Kafka and Flink for it? Or what technologies and architectures do you use? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Real-Time GenAI with RAG using Apache Kafka and Flink to Prevent Hallucinations appeared first on Kai Waehner.

]]>
GenAI Demo with Kafka, Flink, LangChain and OpenAI https://www.kai-waehner.de/blog/2024/01/29/genai-demo-with-kafka-flink-langchain-and-openai/ Mon, 29 Jan 2024 14:32:13 +0000 https://www.kai-waehner.de/?p=6105 Generative AI (GenAI) enables automation and innovation across industries. This blog post explores a simple but powerful architecture and demo for the combination of Python, and LangChain with OpenAI LLM, Apache Kafka for event streaming and data integration, and Apache Flink for stream processing. The use case shows how data streaming and GenAI help to correlate data from Salesforce CRM, searching for lead information in public datasets like Google and LinkedIn, and recommending ice-breaker conversations for sales reps.

The post GenAI Demo with Kafka, Flink, LangChain and OpenAI appeared first on Kai Waehner.

]]>
Generative AI (GenAI) enables automation and innovation across industries. This blog post explores a simple but powerful architecture and demo for the combination of Python, and LangChain with OpenAI LLM, Apache Kafka for event streaming and data integration, and Apache Flink for stream processing. The use case shows how data streaming and GenAI help to correlate data from Salesforce CRM, searching for lead information in public datasets like Google and LinkedIn, and recommending ice-breaker conversations for sales reps.

GenAI Demo with Kafka, Flink, LangChain and OpenAI

The Emergence of Generative AI

Generative AI (GenAI) refers to a class of artificial intelligence (AI) systems and models that generate new content, often as images, text, audio, or other types of data. These models can understand and learn the underlying patterns, styles, and structures present in the training data and then generate new, similar content on their own.

Generative AI has applications in various domains, including:

  • Image Generation: Generating realistic images, art, or graphics.
  • Text Generation: Creating human-like text, including natural language generation.
  • Music Composition: Generating new musical compositions or styles.
  • Video Synthesis: Creating realistic video content.
  • Data Augmentation: Generating additional training data for machine learning models.
  • Drug Discovery: Generating molecular structures for new drugs.

A key challenge of Generative AI is the deployment in production infrastructure with context, scalability, and data privacy in mind. Let’s explore an example of using CRM and customer data to integrate GenAI into an enterprise architecture to support sales and marketing.

This article shows a demo that combines real-time data streaming powered by Apache Kafka and Flink with a large language model from OpenAI within LangChain. If you want to learn more about data streaming with Kafka and Flink in conjunction with Generative AI, check out these two articles:

The following demo is about supporting sales reps or automated tools with Generative AI:
  • The Salesforce CRM creates new leads through other interfaces or by the human manually.
  • The sales rep / SDR receives lead information in real time to call the prospect.
  • A special GenAI service leverages the lead information (name and company) to search the web (mainly LinkedIn) to generate helpful content for the cold call of the lead, including: Summary, two interesting facts, topic of interest, and two creative ice-breaker for initiating a conversation.

Kudos to my colleague Carsten Muetzlitz who built the demo. The code is available on Github. Here is the architecture of the demo:

GenAI Demo with Kafka, Flink, LangChain, OpenAI

Technologies and Infrastructure in the Demo

The following technologies and infrastructure are used to implement and deploy the GenAI demo.

  • Python: The programming language almost every data engineer and data scientist uses.
  • LangChain: The Python framework implements the application to support sales conversations.
  • OpenAI: The language model and API help to build simple but powerful GenAI applications.
  • Salesforce: The cloud CRM tool stores the lead information and other sales and marketing data.
  • Apache Kafka: Scalable real-time data hub decoupling the data sources (CRM) and data sinks (GenAI application and other services).
  • Kafka Connect: Data integration via Change Data Capture (CDC) from Salesforce CRM.
  • Apache Flink: Stream processing for enrichment and data quality improvements of the CRM data.
  • Confluent Cloud: Fully managed Kafka (stream and store), Flink (process), and Salesforce connector (integrate).
  • SerpAPI: Scrape Google and other search engines with the lead information.
  • proxyCurl: Pull rich data about the lead from LinkedIn without worrying about scaling a web scraping and data-science team.

Here is a 15 minute video walking you through the demo:

  • Use case
  • Technical architecture
  • GitHub project with Python code using Kafka and LangChain
  • Fully managed Kafka and Flink in the Confluent Cloud UI
  • Push new leads in real-time from Salesforce CRM via CDC using Kafka Connect
  • Streaming ETL with Apache Flink
  • Generative AI with Python, LangChain and OpenAI

Missing: No Vector DB and RAG with Model Embeddings in the LangChain Demo

This demo does NOT use advanced GenAI technologies for RAG (retrieval augmented generation), model embeddings, or vector search via a Vector database (Vector DB) like Pinecone, Weaviate, MongoDB or Oracle.

The principle of the demo is KISS (“keep it as simple as possible”). These technologies can and will be integrated into many real-world architectures.

The demo has limitations regarding latency and scale. Kafka and Flink run as fully managed and elastic SaaS. But the AI/ML part around LangChain could have improved latency, using a SaaS for hosting, and integration with other dedicated AI platforms. Especially data-intensive applications will need a vector database and advanced retrieval and semantic search technologies like RAG.

Fun fact: The demo breaks when I search for my name instead of Carsten’s. Because the web scraper finds too much content in the web about me and as a result the LangChain app crashes… This is a compelling event for complementary technologies like Pinecone or MongoDB that can do indexing, RAG and semantic search at scale. These technologies provide fully managed integration with Confluent Cloud so the demo could easily be extended.

The Role of LangChain in GenAI

LangChain is an open-source framework for developing applications powered by language models. LangChain is also the name of the commercial vendor behind the framework. The tool provides the needed “glue code” for data engineers to build GenAI applications with intuitive APIs for chaining together large language models (LLM), prompts with context, agents that drive decision making with stateful conversations, and tools that integrate with external interfaces.

LangChain supports:

  • Context-awareness: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
  • Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

The main value props of the LangChain packages are:

  1. Components: composable tools and integrations for working with language models. Components are modular and easy-to-use, whether you are using the rest of the LangChain framework or not.
  2. Off-the-shelf chains: built-in assemblages of components for accomplishing higher-level tasks.

LangChain Architecture and Components

Together, these products simplify the entire application lifecycle:

  • Develop: Write your applications in LangChain/LangChain.js. Hit the ground running using Templates for reference.
  • Productionize: Use LangSmith to inspect, test and monitor your chains, so that you can constantly improve and deploy with confidence.
  • Deploy: Turn any chain into an API with LangServe.

LangChain in the Demo

The demo uses several LangChain concepts such as Prompts, Chat Models, Chains using the LangChain Expression Language (LCEL), Agents using a language model to choose a sequence of actions to take

Here is the logical flow of the LangChain business process:

  1. Get new leads: Collect full name and company of the lead from Salesforce CRM in real-time from a Kafka Topic.
  2. Find LinkedIn profile: Use the Google Search API “SerpAPI” to search for the URL of the lead’s LinkedIn profile.
  3. Collect information about the lead: Use Proxycurl to collect the required information about the lead from LinkedIn.
  4. Create cold call recommendations for the sales rep or automated script: Ingest all information into the ChatGPT LLM via OpenAI API and send the generated text to a Kafka Topic.

The following screenshot shows a snippet of the generated content. It includes context-specific icebreaker conversations based on the LinkedIn profile. For the context, Carsten worked at Oracle for 24 years before joining Confluent. The LLM uses this context of the LangChain prompt to generate related content:

LLM Text Generated with Python, LangChain, GoogleSERP, Proxycurl and OpenAI

The Role of Apache Kafka in GenAI

Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. It plays a crucial role in handling and managing large volumes of data streams efficiently and reliably.

Generative AI typically involves models and algorithms for creating new data, such as images, text, or other types of content. Apache Kafka supports Generative AI by providing a scalable and resilient infrastructure for managing data streams. In a Generative AI context, Kafka can be used for:

  • Data Ingestion: Kafka can handle the ingestion of large datasets, including the diverse and potentially high-volume data needed to train Generative AI models.
  • Real-time Data Processing: Kafka’s real-time data processing capabilities help in scenarios where data is constantly changing, allowing for the rapid updating and training of Generative AI models.
  • Event Sourcing: Event sourcing with Kafka captures and stores events that occur over time, providing a historical record of data changes. This historical data is valuable for training and improving Generative AI models.
  • Integration with other Tools: Kafka can be integrated into larger data processing and machine learning pipelines, facilitating the flow of data between different components and tools involved in Generative AI workflows.

While Apache Kafka itself is a tool specifically designed for Generative AI, its features and capabilities contribute to the overall efficiency and scalability of the data infrastructure. Kafka’s capabilities are crucial when working with large datasets and complex machine learning models, including those used in Generative AI applications.

Apache Kafka in the Demo

Kafka is the data fabric connecting all the different applications. Ensuring data consistency is a sweet spot of Kafka. No matter if a data source or sink is real time, batch or a request-response API.

In this demo, Kafka consumes events from Salesforce CRM as the main data source of customer data. Different applications (Flink, LangChain, Salesforce) consume the data in different steps of the business process. Kafka Connect provides the capability for data integration with no need for another ETL, ESB or iPaaS tool. This demo uses Confluent’s Change Data Capture (CDC) connector to consume changes from the Salesforce database in real-time for further processing.

Fully managed Confluent Cloud is the infrastructure for the entire Kafka and Flink ecosystem in this demo. The focus of the developer should always build business logic, not worrying about operating infrastructure.

While the heart of Kafka is event-based, real-time and scalable, it also enables domain-driven design and data mesh enterprise architectures out-of-the-box.

Apache Flink is an open-source distributed stream processing framework for real-time analytics and event-driven applications. Its primary focus is on processing continuous streams of data efficiently and at scale. While Apache Flink itself is not a specific tool for Generative AI, it plays a role in supporting certain aspects of Generative AI workflows. Here are a few ways in which Apache Flink is relevant:

  1. Real-time Data Processing: Apache Flink can process and analyze data in real-time, which can be useful for scenarios where Generative AI models need to operate on streaming data, adapting to changes and generating responses in real-time.
  2. Event Time Processing: Flink has built-in support for event time processing, allowing for the handling of events in the order they occurred, even if they arrive out of order. This can be beneficial in scenarios where temporal order is crucial, such as in sequences of data used for training or applying Generative AI models.
  3. Stateful Processing: Flink supports stateful processing, enabling the maintenance of state across events. This can be useful in scenarios where the Generative AI business process needs to maintain context or memory of past events to generate coherent and context-aware outputs.
  4. Integration with Machine Learning Libraries: While Flink itself is not a machine learning framework, it can be integrated with other tools and libraries that are used in machine learning, including those relevant to Generative AI. This integration can facilitate the deployment and execution of machine learning models within Flink-based streaming applications.

The specific role of Apache Flink in Generative AI depends on the particular use case and the architecture of the overall system.

This demo leverages Apache Flink for streaming ETL (enrichment, data quality improvements) of the incoming Salesforce CRM events.

FlinkSQL provides a simple and intuitive way to implement ETL with any Java or Python code. Fully managed Confluent Cloud is the infrastructure for Kafka and Flink in this demo. Serverless FlinkSQL allows you to scale up as much as needed, but also scale down to zero if no events are consumed and processed.

The demo is just the starting point. Many powerful applications can be built with Apache Flink. This includes streaming ETL, but also business applications like you find them at Netflix, Uber and many other tech giants.

LangChain is an easy-to-use AI/ML framework to connect large language models to other data sources and create valuable applications. The flexibility and open approach enables developers and data engineers to build all sorts of applications, from chatbots to smart systems that answer your questions.

Data streaming with Apache Kafka and Flink provide a reliable and scalable data fabric for data pipelines and stream processing. The event store of Kafka ensures data consistency across real-time, batch, and request-response APIs. Domain-driven design, microservice architectures and data products build in a data mesh more and more leverage on Kafka for these reasons.

The combination of LangChain, GenAI technologies like OpenAI and data streaming with Kafka and Flink make a powerful combination for context-specific decision in real-time powered by AI.

Most enterprises have a cloud-first strategy for AI use cases. Data streaming infrastructure is available in SaaS like Confluent Cloud so that the developers can focus on business logic with much faster time-to-market. Plenty of alternatives exist for building AI applications with Python (the de facto standard for AI). For instance, you could build a user-defined function (UDF) in a FlinkSQL application executing the Python code and consuming from Kafka. Or use a separate application development framework and cloud platform like Quix Streams or Bytewax for Python apps instead of a framework like LangChain.

How do you combine Python, LangChain and LLMs with data streaming technologies like Kafka and Flink? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post GenAI Demo with Kafka, Flink, LangChain and OpenAI appeared first on Kai Waehner.

]]>
Top 5 Trends for Data Streaming with Kafka and Flink in 2024 https://www.kai-waehner.de/blog/2023/12/02/top-5-trends-for-data-streaming-with-apache-kafka-and-flink-in-2024/ Sat, 02 Dec 2023 10:54:38 +0000 https://www.kai-waehner.de/?p=5885 Do you wonder about my predicted TOP 5 data streaming trends with Apache Kafka and Flink in 2024 to set data in motion? Discover new technology trends and best practices for event-driven architectures, including data sharing, data contracts, serverless stream processing, multi-cloud architectures, and GenAI.

The post Top 5 Trends for Data Streaming with Kafka and Flink in 2024 appeared first on Kai Waehner.

]]>
Data Streaming is one of the most relevant buzzwords in tech to build scalable real-time applications and innovative business models. Do you wonder about my predicted TOP 5 data streaming trends in 2024 to set data in motion? Learn what role Apache Kafka and Apache Flink play. Discover new technology trends and best practices for event-driven architectures, including data sharing, data contracts, serverless stream processing, multi-cloud architectures, and GenAI.

Some followers might notice that this became a series with past posts about the top 5 data streaming trends for 2021, the top 5 for 2022, and the top 5 for 2023. Trends change over time, but the huge value of having a scalable real-time infrastructure as the central data hub stays. Data streaming with Apache Kafka is a journey and evolution to set data in motion.

Top 5 Trends for Data Streaming with Apache Kafka and Flink in 2024

The research and consulting company Gartner defines the top strategic technology trends every year. This time, the trends are around building new (AI) platforms and delivering value by automation, but also protecting investment. On a higher level, it is all about automating, scaling, and pioneering. Here is what Gartner expects for 2024:

Gartner Top Strategic Technology Trends 2024

It is funny (but not surprising): Gartner’s predictions overlap and complement the five trends I focus on for data streaming with Apache Kafka looking forward to 2024. I explore how data streaming enables faster time to market, good data quality across independent data products, and innovation with technologies like Generative AI.

The top 5 data streaming trends for 2024

I see the following topics coming up more regularly in conversations with customers, prospects, and the broader data streaming community across the globe:

  1. Data sharing for faster innovation with independent data products
  2. Data contracts for better data governance and policy enforcement
  3. Serverless stream processing for easier building of scalable and elastic streaming apps
  4. Multi-cloud deployments for cost-efficient delivering value where the customers sit
  5. Reliable Generative AI (GenAI) with embedded accurate, up-to-date information to avoid hallucination

The following sections describe each trend in more detail. The trends are relevant for many scenarios; no matter if you use open source Apache Kafka or Apache Flink, a commercial platform, or a fully managed cloud service like Confluent Cloud. I start each section with a real-world case study. The end of the article contains the complete slide deck and video recording.

Data sharing across business units and organizations

Data sharing refers to the process of exchanging or providing access to data among different individuals, organizations, or systems. This can involve sharing data within an organization or sharing data with external entities. The goal of data sharing is to make information available to those who need it, whether for collaboration, analysis, decision-making, or other purposes. Obviously, real-time data beats slow data for almost all data sharing use cases.

NASA: Real-time data sharing with Apache Kafka

NASA enables real-time data between space- and ground-based observatories. The
General Coordinates Network (GCN) allows real-time alerts in the astronomy community. With this system, NASA researchers, private space companies, and even backyard astronomy enthusiasts can publish and receive information about current activity in the sky.

NASA enables real-time data from Mars with Apache Kafka

Apache Kafka plays an essential role in astronomy research for data sharing. Particularly where black holes and neutron stars are involved, astronomers are increasingly seeking out the “time domain” and want to study explosive transients and variability. In response, observatories are increasingly adopting streaming technologies to send alerts to astronomers and to get their data to their science users in real time.

The talk “General Coordinates Network: Harnessing Kafka for Real-Time Open Astronomy at NASA” explores architectural choices, challenges, and lessons learned in adapting Kafka for open science and open data sharing at NASA.

NASA’s approach to OpenID Connect / OAuth2 in Kafka is designed to securely scale Kafka from access inside a single organization to access by the general public.

Stream data exchange with Kafka using cluster linking, stream sharing, and AsyncAPI

The Kafka ecosystem provides various functions to share data in real-time at any scale. Some are vendor-specific. I look at this from the perspective of Confluent, so that you see a lot of innovative options (even if you want to build it by yourself with open source Kafka):

  • Kafka Connect connector ecosystem to integrate with other data sources and sinks out-of-the-box
  • HTTP/REST proxies and connectors for Kafka to use simple and well understood request-response (HTTP is, unfortunately, also an anti-pattern for streaming data)
  • Cluster Linking for replication between Kafka clusters using the native Kafka protocol (instead of separate infrastructure like MirrorMaker)
  • Stream Sharing for exposing a Kafka Topic through a simple button click with access control, encryption, quotas, and chargeback billing APIs
  • Generation of AsyncAPI specs to share data with non-Kafka applications (like other message brokers or API gateways that support AsyncAPI, which is an open data for contract for asynchronous event-based messaging (similar to Swagger for HTTP/REST APIs)

Here is an example for Cluster Linking for bi-directional replication between Kafka clusters in the automotive industry:

Stream Data Exchange with Apache Kafka and Confluent Cluster Linking

And another example of stream sharing for easy access to a Kafka Topic in financial services:

Confluent Stream Sharing for Data Sharing Beyond Apache Kafka

To learn more, check out the article “Streaming Data Exchange with Kafka and a Data Mesh in Motion“.

Data contracts for data governance and policy enforcement

A data contract is an agreement or understanding that defines the terms and conditions governing the exchange or sharing of data between parties. It is a formal arrangement specifying how data will be handled, used, protected, and shared among entities. Data contracts are crucial when multiple parties need to interact with and utilize shared data, ensuring clarity and compliance with agreed-upon rules.

Raiffeisen Bank International: Data contracts for data sharing across countries

Raiffeisen Bank International (RBI) is scaling an event-driven architecture across the group as part of a bank-wide transformation program. This includes the creation of a reference architecture and the re-use of technology and concepts across 12 countries.

Data Mesh powered by Data Streaming at Raiffeisen Bank International

Learn more in the article “Decentralized Data Mesh with Data Streaming in Financial Services“.

Policy enforcement and data quality for Apache Kafka with Schema Registry

Good data quality is one of the most critical requirements in decoupled architectures like microservices or data mesh. Apache Kafka became the de facto standard for these architectures. But Kafka is a dumb broker that only stores byte arrays. The Schema Registry for Apache Kafka enforces message structures.

This blog post examines Schema Registry enhancements to leverage data contracts for policies and rules to enforce good data quality on field-level and advanced use cases like routing malicious messages to a dead letter queue.

Data Governance and Policy Enforcement with Data Contracts for Apache Kafka

For more details: Building a data mesh with decoupled data products and good data quality, governance, and policy enforcement.

Serverless stream processing refers to a computing architecture where developers can build and deploy applications without having to manage the underlying infrastructure.

In the context of stream processing, it involves the real-time processing of data streams without the need to provision or manage servers explicitly. This approach allows developers to focus on writing code and building applications. The cloud service takes care of the operational aspects, such as scaling, provisioning, and maintaining servers.

Designed to answer professional farmers’ needs, Sencrop offers a range of connected
weather stations that bring you precision agricultural weather data straight from your plots.

  • Over 20,000 connected ag-weather stations throughout Europe.
  • An intuitive, user-friendly application: Access accurate, ultra-local data to optimize your daily actions.
  • Prevent risks, reduce costs: Streamline inputs and reduce your environmental impact and associated costs.

Smart Agriculture with Kafka and Flink at Sencrop

Apache Kafka and Apache Flink increasingly join forces to build innovative real-time stream processing applications.

The Rise of Open Source Streaming Processing with Apache Kafka and Apache Flink

The Y-axis in the diagram shows the monthly unique users (based on statistics of Maven downloads).

Apache Kafka + Apache Flink = Match Made in Heaven” explores the benefits of combining both open-source frameworks. The article shows unique differentiators of Flink versus Kafka, and discusses when to use a Kafka-native streaming engine like Kafka Streams instead of Flink.

Unfortunately, operating a Flink cluster is really hard. Even harder than Kafka. Because Flink is not just a distributed system, it also has to keep state of applications for hours or even longer. Hence, serverless stream processing helps taking over the operation burden. And it makes the life of the developer easier, too.

Staying tuned for exciting cloud products offering serverless Flink in 2024. But be aware that some vendors use the same trick as for Kafka: Provisioning a Flink cluster and handing it over to you is NOT a serverless or fully-managed offering! For that reason, I compared Kafka products as self-driving cars vs. self-driving cars, i.e. cloud-based Kafka clusters you operate vs. truly fully managed services.

Multi-cloud for cost-efficient and reliable customer experience

Multi-cloud refers to a cloud computing strategy that uses services from multiple cloud providers to meet specific business or technical requirements. In a multi-cloud environment, organizations distribute their workloads across two or more cloud platforms, including public clouds, private clouds, or a combination of both.

The goal of a multi-cloud strategy is to avoid dependence on a single cloud provider and to leverage the strengths of different providers for various needs. Cost efficiency and regional laws (like operating in the United States or China) required different deployment strategies. Some countries do not provide a public cloud. A private cloud is the only option then.

New Relic: Multi-cloud Kafka deployments at extreme scale for real-time observability

New Relic is a software analytics company that provides monitoring and performance management solutions for applications and infrastructure. It’s designed to help organizations gain insights into the performance of their software and systems, allowing them to optimize and troubleshoot issues efficiently.

Observability has two key requirements: first, monitor data in real-time at any scale. Second, deploy the monitoring solution where the applications are running. The obvious consequence for New Relic is to process data with Apache Kafka, and multi-cloud where the customers are.

Multi Cloud Observability in Real-Time at extreme Scale with Apache Kafka at New Relic

Hybrid and multi-cloud data replication for cost-efficiency, low latency, or disaster recovery

Multi-cloud deployments of Apache Kafka have become the norm rather than an exception. Several scenarios require multi-cluster solutions with specific requirements and trade-offs:

  • Regional separation because of legal requirements
  • Independence of a single cloud provider
  • Disaster recovery
  • Aggregation for analytics
  • Cloud migration
  • Mission-critical stretched deployments

Hybrid Cloud Architecture with Apache Kafka

The blog post “Architecture Patterns for Distributed, Hybrid, Edge and Global Apache Kafka Deployments” explores various architectures and best practices.

Reliable Generative AI (GenAI) with accurate context to avoid hallucination

Generative AI is a class of artificial intelligence systems that generate new content, such as images, text, or even entire datasets, often by learning patterns and structures from existing data. These systems use techniques such as neural networks to create content that is not explicitly programmed but is instead generated based on the patterns and knowledge learned during training.

Elemental Cognition: GenAI platform powered by Apache Kafka

Elemental Cognition’s AI platform develops responsible and transparent AI that helps solve problems and deliver expertise that can be understood and trusted.

Confluent Cloud powers the AI platform to enable scalable real-time data and data integration use cases. I recommend looking at their website to learn from various impressive use cases.

Elemental Cognition - Real Time GenAI Platform powered by Apache Kafka and Confluent Cloud

Apache Kafka serves thousands of enterprises as the mission-critical and scalable real-time data fabric for machine learning infrastructures. The evolution of Generative AI (GenAI) with large language models (LLM) like ChatGPT changed how people think about intelligent software and automation. The relationship between data streaming and GenAI has enormous opportunities.

Apache Kafka as Mission Critical Data Fabric for GenAI” explores the use cases for combining data streaming with Generative AI.

An excellent example, especially for Generative AI, is context-specific customer service. The following diagram shows an enterprise architecture leveraging event-driven data streaming for data ingestion and processing across the entire GenAI pipeline:

Event-driven Architecture with Apache Kafka and Flink as Data Fabric for GenAI

Stream processing with Kafka and Flink enables data correlation of real-time and historical data. A stateful stream processor takes existing customer information from the CRM, loyalty platform, and other applications, correlates it with the query from the customer into the chatbot, and makes an RPC call to an LLM.

Stream Processing with Apache Flink SQL UDF and GenAI with OpenAI LLM

The article “Apache Kafka + Vector Database + LLM = Real-Time GenAI” explores possible architectures, examples, and trade-offs between event streaming and traditional request-response APIs and databases.

Slides and video recording for the data streaming trends in 2024 with Kafka and Flink

Do you want to look at more details? This section provides the entire slide deck and a video walking you through the content.

Slide deck

Here is the slide deck from my presentation:

Fullscreen Mode

Video recording

And here is the video recording of my presentation:

Video Recording: Top 5 Use Cases and Architectures for Data Streaming with Apache Kafka and Flink in 2024

2024 makes data streaming more mature, and Apache Flink becomes mainstream

I have two conclusions for data streaming trends in 2024:

  • Data streaming goes up in the maturity curve. More and more projects build streaming applications instead of just leveraging Apache Kafka as a dumb data pipeline between databases, data warehouses, and data lakes.
  • Apache Flink becomes mainstream. The open source framework shines with a scalable engine, multiple APIs like SQL, Java, and Python, and serverless cloud offerings from various software vendors. The latter makes building applications much more accessible.

Data sharing with data contracts is mandatory for a successful enterprise architecture with microservices or a data mesh. And data streaming is the foundation for innovation with technology trends like Generative AI. Therefore, we are just at the tipping point of adopting data streaming technologies such as Apache Kafka and Apache Flink.

What are your most relevant and exciting data streaming trends with Apache Kafka and Apache Flink in 2024 to set data in motion? What are your strategy and timeline? Do you use serverless cloud offerings or self-managed infrastructure? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Top 5 Trends for Data Streaming with Kafka and Flink in 2024 appeared first on Kai Waehner.

]]>