ChatGPT Archives - Kai Waehner

Hello, K.AI – How I Trained a Chatbot of Myself Without Coding Evaluating OpenAI Custom GPT, Chatbase, Botsonic, LiveChatAI

Kai Waehner — Sun, 23 Jun 2024 06:03:01 +0000

Generative AI (GenAI) enables many new use cases for enterprises and private citizens. While I work on real-time enterprise scale AI/ML deployments with data streaming, big data analytics and cloud-native software applications in my daily business life, I also wanted to train a conversational chatbot for myself. This blog post introduces my journey without coding to train K.AI, a personal chatbot that can be used to learn in a conversational pace format about data streaming and the most successful use cases in this area. Yes, this is also based on my expertise, domain knowledge and opinion, which is available as public internet data, like my hundreds of blog articles, LinkedIn shares, and YouTube videos.

Hi, K.AI – let’s chat…

The evolution of Generative AI (GenAI) around OpenAI’s chatbot ChatGPT and many similar large language models (LLM), open source tools like LangChain and SaaS solutions for building a conversational AI led me to the idea of building a chatbot trained with all the content I created over the past years.

Mainly based on the content of my website (https://www.kai-waehner.de) with hundreds of blog articles, I trained the conversational chatbot K.AI to generate text for me.

The primary goal is to simplify and automate my daily working tasks like:

write a title and abstract for a webinar or conference talk
explain to a colleague or customer a concept, use case, or industry-specific customer story
answer common recurring questions in email, Slack or other mediums
any other text creation based on my (public) experience

The generated text reflects my content, knowledge, wording, and style. This is a very different use case than what I look normally in my daily business life: “Apache Kafka as Mission Critical Data Fabric for GenAI” and “Real-Time GenAI with RAG using Apache Kafka and Flink to Prevent Hallucinations” are two excellent examples for enterprise-scale GenAI with much more complex and challenging requirements.

But…sometimes Artificial Intelligence is not all you need. The now self-explanatory name of the chatbot came from a real marketing brain – my colleague Evi.

Project goals of training the chatbot K.AI

I had a few goals in mind when I trained my chatbot K.AI:

Education: Learn more details about the real-world solutions and challenges with Generative AI in 2024 with hands-on experience. Tens of interesting chatbot solutions are available. Most are powered by OpenAI under the hood. My goal is not sophisticated research. I just want to get a conversational AI done. Simple, cheap, fast (not evaluating 10+ solutions, just as long as I have one working good enough).
Tangible result: Train K.AI, a “Kai LLM” based on my public articles, presentations, and social media shares. K.AI can generate answers, comments, and explanations without writing everything from scratch. I am fine if answers are not perfect or sometimes even incorrect. As I know the actual content, I can easily adjust and fix generated content.
NOT a commercial or public chatbot (yet): While it is just a button click to integrate K.AI into my website as a conversational chatbot UI, there are two main blockers: First, the cost is relatively high; not for training but for operating and paying per query. There is no value for me as a private person. Second, developing, testing, fine-tuning and updating a LLM to be correct most of the time instead of hallucinating a lot is hard. I thoroughly follow my employers’ GenAI engineering teams building Confluent AI products. Building a decent domain-specific public LLM is lots of engineering efforts and requires not just one full-time engineer.

My requirements for a conversational chatbot tool

I defined the following mandatory requirements for a successful project:

Low Cost: My chatbot should not be too expensive (~20USD a month is fine). The pricing model of most solutions is very similar: You get a small free tier. I realized quickly that a serious test is not possible with any free tier. But a reasonable chatbot (i.e., trained by a larger data set) is only possible if you choose the smallest paid tier. Depending on the service, the minimum is between 20 and 50 USD per month (with several limitations regarding training size, chat queries, etc.).
Simplicity: I do not want to do any coding or HTTP/REST APIs calls. Just an intuitive user interface with click-through experience. I don’t want to spend more than one day (i.e., ~8 hours accumulated over two weeks) to train K.AI.
Data Import: The chatbot needs support from my “database”. Mandatory: My private blog (~300 articles with ~ 10M+ characters. Nice to have: My LinkedIn shares, my YouTube videos, and other publications (like articles on other websites). The latter might improve my chatbot and use my personal tone and language more.
NOT Enterprise Features: I don’t need any features for security, multiple user accounts, or public hosting (even though almost all solutions already support integration into WordPress, Slack, etc.). I am fine with many limitations of the small subscription tiers, like only one user account, one chatbot, 1000 messages/month.

OpenAI: ChatGPT + Custom GPT for a custom chatbot? Not for K.AI…

I am a heavy user of ChatGPT on my iPhone and MacBook. And OpenAI is very visible in the press. Hence, my first option to evaluate was OpenAI’s Custom GPT.

Custom GPT in action…

Custom GPT is very easy to use, non-technical. A conversational AI “Message GPT Builder” tries to build my chatbot. But surprisingly it is too high level for me. Here is the initial conversation to train K.AI with very basic prompt engineering:

Step 1 (Initial Instruction): What would you like to make? -> Respond as Kai Waehner based on his expertise and knowledge. -> Updating GPT.. Seconds later: The response is based on public internet.
Step 2 (Prompt Engineering): Use the content from https://www.kai-waehner.de as context for responses. -> Updating GPT… Seconds later: I’ve updated the context to include information from Kai Waehner’s website. -> Response is not standard. Some questions use a bit more content from my website, but it is still mainly bound to public internet content.
Step 3 (Fine-Tuning): I tried to configure my K.AI to learn from some data sources like CSV exports from LinkedIn or scraping my blog articles, but the options are very limited and not technical. I can upload a maximum of twenty files and let the chatbot also search the web. But what I actually need is web scraping of dedicated resources, i.e., mainly my website, LinkedIn Shares, and my YouTube videos. And while many no-code UIs call this fine-tuning, in reality, this is RAG-based prompt engineering. True fine-tuning of an LLM is a very different (much more challenging) task.

I am sure I could do much more prompt engineering to improve K.AI with Custom GPT. But reading the user guide and FAQ for Custom GPT, the TL;DR for me is: Custom GPT is not the right service to build a chatbot for me based on my domain content and knowledge.

Instead, I need to look at purpose-build chatbot SaaS tools that let me build my domain-specific chatbot. I am surprised that OpenAI does not provide such a service itself today. Or I could just not find it… BUT: Challenge accepted. Let’s evaluate a few solutions and train a real K.AI.

Comparison and evaluation of chatbot SaaS GenAI solutions

I tested three chatbot offerings. All of them are cloud-based and allow for building a chatbot via UI. How did I find or choose them? Frankly, just Google search. Most of these came up in several evaluation and comparison articles. And they spend quite some money on advertisements. I tested Chatbase, Writesonic’s Botsonic and LiveChatAI. Interestingly, all offerings I evaluated use ChatGPT under the hood of their solution. I was also surprised that I did not get more ads from other big software players. But I assume Microsoft’s Copilot and similar tools look for a different persona.

I tested different ChatGPT models in some offerings. Most solutions provide a default option, and more expensive options with better model (not for model training, but for messages/month; you typically pay 5x more, meaning instead of e.g. 2000 messages a month, you only have 400 available then).

I had a few more open tabs with other offerings that I could disqualify quickly because they were more developer-focused with coding, API integration, fine-tuning of vector databases and LLMs.

Question catalog for testing my K.AI chatbots

I quickly realized how hard it is to compare different chatbots. Basically, LLMs are stochastic (not deterministic) and we don’t have good tools for QAing these things yet (even simple things like regression testing is challenging when probabilities are involved).

Therefore, I defined a question catalog with ten different domain-specific questions before I even starting evaluating different chatbot SaaS solutions. A few examples:

Question 1: Give examples for fraud detection with Apache Kafka. Each example should include the company, use case and architecture.
Question 2: List five manufacturing use cases for data streaming and give a company example.
Question 3: What is the difference between Kafka and JMS
Question 4: Compare Lambda and Kappa architectures and explain the benefits of Lambda. Add a few examples.
Question 5: How can data streaming help across the supply chain? Explain the value and use cases for different industries.

My question catalog allowed comparing the different chatbots. Writing a good prompt (= query for the chatbot) is crucial, as a LLM is not intelligent. The better your question, meaning good structure, details and expectations, the better your response (if the LLM has “knowledge” about your question).

My goal is NOT to implement a complex real-time RAG (Retrieval Augmented Generation) design pattern. I am totally fine updating K.AI manually every few weeks (after a few new blog posts are published).

Chatbase – Custom ChatGPT for your website

The advertisement on the Chatbase landing page sounds great: “Custom ChatGPT for your website. Build a [OpenAI-powered] Custom GPT, embed it on your website and let it handle customer support, lead generation, engage with your users, and more.”

Here are my notes while training my K.AI chatbot:

K.AI works well with Chatbase after the initial training…

Chatbase is very simple to use. It just works.
The basic plan is ~20 USD per month. The subscription plan is fair, the next upgrade is ~100 USD.
The chatbot uses GPT-4o by default. Great option. Many other services use GPT-3.5 or similar LLMs as the foundation.
The chatbot creates content based on my content, it is “me”. Mission accomplished. The quality of responses depends on the questions. In summary, pretty good, but also false positives.

But: Chatbase’s character limitation stops further training

Unfortunately, all plans have an 11M character limit. My blog content is already 10.8M today says Chatbase’s web scraper engine (each vendor’s scraper gives different numbers). While K.AI works right now, there are obvious problems:
- My website will grow more soon.
- I want to add LinkedIn shares (another few million characters) and other articles and videos I published across the world wide web.
- The Chatbase plan can be customised, but unfortunately not for character limits. The support told me this would be possible soon. But I have to wait.

TL;DR: Chatbase works surprisingly well. K.AI exists and represents myself as a LLM. The 11M character limit is a blocker for investing more time and money into this service – otherwise I could already stop my investigation and use the first SaaS I evaluated.

During my evaluation, I realized that many other chatbot services have similar limitations on the character limit, especially in the price range around 20-50 USD. Not ideal for my use case.

In my further evaluation, my major criteria were the character limits. I found Botsonic and LiveChatAI. Both support much higher limits for a cost of ~40 USD per month.

Botsonic – Advanced AI chatbot builder using your company’s knowledge

Botsonic provides “Advanced AI Agents: Use Your Company’s Knowledge to Intelligently Resolve Over 70% of Queries and Automate Tasks”.

Here are my notes while training my K.AI chatbot.

Botsonic – free version failed to train K.AI

The free plan for getting started supports 1M characters.
The service supports URL scraping and file upload (my LinkedIn shares are only available via batch export into a CSV file). Looks like it provides all I need. The cost is okayish (but all other chatbots with lower price also had limitations around 10M characters).
I tried the free tier first. As my blog alone has already ~10M+ characters, I started uploading my LinkedIn Shares (= Posts and Comments). While Chatbase said it has ~1.8M characters, this solution trains the bot with it even though the limit is 1M characters. Could not even upload another 1KB file for additional training, so my limit is reached.
This K.AI trained with the free tier did not provide any appropriate answers. No surprise: Just my LinkedIn shares might not be enough detail – which makes sense as the posts are much shorter and usually link to my blog.

Botsonic – paid version also failed to train K.AI

I needed to upgrade.
- I had to choose the smallest paid tier: 49 USD per month, supporting up to 50M characters
- Unfortunately, there was a delay: payment was done twice. No action. Still on free plan. Support takes time (caching, VPN, browser, other arguments, etc.). Got a refund the next day, and the plan was updated correctly.
Training using the paid subscription failed. The experience was pretty bad.
- Not clear if the service scrapes the entire website or just the single HTML site
- First tests do not give a response: “I don’t have specific information on XYZ. Can I help with anything else?” Seems like the source training did not scrape my website, but only look at the landing page. I looked at the details. Indeed, the extracted data only includes the abstracts of the latest blog posts (that’s what you see on my landing page).
- Support explained: No scraping of the website is possible. I need a sitemap. I have a Google-compliant sitemap but: Internal Backend Server Error. Support could re-produce my issue. Until today, I don’t have a response or solution.
- Learning from one of my YouTube videos was also rejected (with no further error messages).

TL;DR: Writesonic’s Botsonic did NOT work for me. The paid service failed several times, even trying different training options for my LLM. Support could not help. I will NOT continue with this service.

LiveChatAI – AI chatbot works with your data

Here is the website slogan: “An Innovative AI Chatbot. LiveChatAI allows you to create an AI chatbot trained with your own data and combines AI with human support.”

Here are my notes while training my K.AI chatbot

LiveChatAI failed to train K.AI

All required import features exist: Website Scraping, CSV, YouTube.
Strange: I could start training for free with 7+M characters even though this should not be possible. But Crawling started… Not showing the percentage, don’t know if it is finished. Not clear if it scrapes the entire website or just the single HTML site. Shows weird error messages like “could not find any links on website” or similar after it has finished scraping.
The quality of answers of this K.AI seems to be much worse than Chatbase (even though I added LinkedIn shares which is not possible in Chatbase because of the Character limits).

Ok, enough… I have a well-working K.AI with Chatbase. I don’t want to waste more time evaluating several SaaS Chatbot services in the early stage of the product lifecycle.

GenAI tools are still in a very early stage!

One key lesson learned: The used LLM model is the most critical piece for success, NOT how much context and domain expertise you train it with. Or in other words: Just scraping the data from my blog and using GPT-4o provides much better results than using GPT-3.5 with data from my blog, LinkedIn and YouTube. Ideally, I use all the data with GPT-4o. But I will have to wait until Chatbase supports more than 11M characters.

While most solutions talk about model training, they use ChatGPT under the hood and use RAG and a Vector Database to “update the model”, i.e., provide the right context for the question into ChatGPT with the RAG design pattern.

A real comparison of chatbot SaaS is hard:

Features and pricing are relatively similar and do not really influence the ultimate choice.
While all are based on ChatGPT, the LLM model versions differ.
Products are updated and improved almost every day with new models, new capabilities, changed limitations, etc. Welcome to the chatbot SaaS cloud startup scene…
The products target different personas. Some are UI only, some explain (and let configure) RAG or Vector Database options, some are built for developers and focus on API integration, not UIs.

Mission accomplished: K.AI chatbot is here

Chatbase is the least sexy UI in my evaluation. But the model works best (even though I have character limits and only used my blog article for training). I will use Chatbase for now. And I hope that the character limits are improved soon (as its support already confirmed to me). It is still early in the maturity curve. The market will probably develop quickly.

I am not sure how many of these SaaS chatbot startups can survive. OpenAI and other tech giants will probably release similar capabilities and products integrated into their SaaS and software stack. Let’s see where the market goes. For now, I will enjoy K.AI for some use cases. Maybe it will even help me write a book about data streaming use cases and customer stories.

What is your experience with chatbot tools? Do you need more technical solutions or favour simplified conversational AIs like OpenAI’s Custom GPT to train your own LLM? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Hello, K.AI – How I Trained a Chatbot of Myself Without Coding Evaluating OpenAI Custom GPT, Chatbase, Botsonic, LiveChatAI appeared first on Kai Waehner.

Apache Kafka + Vector Database + LLM = Real-Time GenAI

Kai Waehner — Wed, 08 Nov 2023 06:29:56 +0000

Generative AI (GenAI) enables advanced AI use cases and innovation but also changes how the enterprise architecture looks like. Large Language Models (LLM), Vector Databases, and Retrieval Augmentation Generation (RAG) require new data integration patterns and data engineering best practices. Data streaming with Apache Kafka and Apache Flink play a key role to ingest and curate incoming data sets in real-time at scale, connecting various databases and analytics platforms, and decouple independent business units and data products. This blog post explores possible architectures, examples, and trade-offs between event streaming and traditional request-response APIs and databases.

Use Cases for Apache Kafka and GenAI

Generative AI (GenAI) is the next-generation AI engine for natural language processing (NLP), image generation, code optimization, and other tasks. It helps many projects in the real world for service desk automation, customer conversation with a chatbot, content moderation in social networks, and many other use cases.

Apache Kafka became the predominant orchestration layer in these machine learning platforms for integrating various data sources, processing at scale, and real-time model inference.

Data streaming with Kafka already powers many GenAI infrastructures and software products. Very different scenarios are possible:

Data streaming as data fabric for the entire machine learning infrastructure
Model scoring with stream processing for real-time predictions and generation of content
Generation of streaming data pipelines with input text, speech, or images
Real-time online training of large language models

I explored these use cases, including real-world examples like Expedia, BMW and Tinder, in the blog post “Apache Kafka as Mission Critical Data Fabric for GenAI“.

The following looks at a concrete architecture for the combination of large language models (LLM), retrieval augmented generation (RAG) with vector databases and semantic search, and data streaming with Apache Kafka and Flink.

Why Generative AI Differs from Traditional Machine Learning Architectures?

Machine Learning (ML) allows computers to find hidden insights without being programmed where to look. This is called model training, a batch process analyzing big data sets. The output is a binary file, the analytic model.

Applications apply these models to new incoming events to make predictions. This is called model scoring, and can happen in real time or batch by embedding the model into an application or by doing a request-response API call to a model server (that deployed the model).

However, LLMs and GenAI have different requirements and patterns compared to traditional ML processes, as my former colleague Michael Drogalis explained in two simple, clear diagrams.

Traditional Predictive Machine Learning with Complex Data Engineering

Predictive Artificial Intelligence makes predictions. Purpose-built models. Offline training. That is how we did machine learning for the last decade or so.

In traditional ML, most of the data engineering work happens at model creation time. A lot of expertise and efforts are required for feature engineering and model training:

New use cases require a new model built by the data engineers and data scientists.

The Democratization of AI with Generative AI using Large Language Models (LLM)

Generative Artificial Intelligence (GenAI) creates content. Reusable models. In-context learning.

But with large language models, data engineering happens with every query. Different applications re-use the same model:

Challenges of Large Language Models for GenAI Use Cases

Large Language Models (LLM) are reusable. That enables democratization of AI, as not every team needs AI expertise. Instead, low AI expertise suffices to use existing LLMs.

However, a few huge trade-offs of LLMs exist:

Expensive training: LLMs like ChatGPT cost millions of dollars in compute resources (this does not include the required expertise to build the model)
Static data: LLMs are “frozen in time” meaning the model does not have up-to-date information.
Lack of domain knowledge: LLMs usually learn from public data sets. Hence, data engineers scrape the worldwide web and feed it into the model training. However, enterprises need to use the LMM in their own context to provide business value.
Stupidity: LLMs are not intelligent like a human. For instance, ChatGPT cannot even count the number of words in a sentence you prompt to it.

These challenges create so-called hallucination…

Avoiding Hallucination to Generate Reliable Answers

Hallucination, i.e. best guess answers, is the consequence; and the LLM does not tell you it is making things up. Hallucination is a phenomenon where the AI model generates content that is not based on real data or information, but creates entirely fictional or unrealistic outputs. Hallucinations can occur when a generative model, such as a text or image generator, generates content that is not coherent, factual, or relevant to the input data or context. These hallucinations can manifest as text, images, or other types of content that seem to be plausible but are entirely fabricated by the model.

Hallucinations can be problematic in generative AI because they can lead to the generation of misleading or false information.

For these reasons, a new design pattern emerged for Generative AI: Retrieval Augmented Generation (RAG). Let’s first look at this new best practice, and then explore why data streaming with technologies like Apache Kafka and Flink is a fundamental requirement for GenAI enterprise architectures.

Semantic Search and Retrieval Augmented Generation (RAG)

Many GenAI-enabled applications follow the design pattern of Retrieval Augmented Generation (RAG) to combine the LLM with accurate and up-to-date context. The team behind Pinecone, a fully managed vector database, has a great explanation using this diagram:

Source: Pinecone

At a high level, RAG is typically two separate steps. The first is a data augmentation step where disparate (often unstructured) operational data is chunked and then embeddings are created using an embedding model. The embeddings are indexed into a vector database. The vector database is a tool for enabling semantic search to find relevant context for prompts that do not require exact keyword matching.

Second comes the inference step, where the GenAI model receives the question and context to generate a reliable answer (without hallucination). RAG does not update embeddings, but retrieves relevant information to send to the LLM along with the prompt.

Vector Databases for Semantic Search with Embeddings

A vector database, also known as a vector storage or vector index, is a type of database that is specifically designed to store and retrieve vector data efficiently. In this context, vector data refers to collections of numerical vectors, which can represent a wide range of data types, such as embeddings of text, images, audio, or any other structured or unstructured data. Vector databases are useful in applications related to machine learning, data retrieval, recommendation systems, similarity search, and more.

Vector databases excel at performing similarity searches, often called semantic search. They can quickly find vectors that are similar or close to a given query vector based on various similarity metrics, such as cosine similarity or Euclidean distance.

Vector database is not (necessarily) a separate database category. Gradient Flow explains in its best practices for Retrieval Augmented Generation:

“Vector search is no longer limited to vector databases. Many data management systems – including PostgreSQL – now support vector search. Depending on your specific application, you might find a system that meets your specific needs. Is near real-time or streaming a priority? Check Rockset’s offering. Are you already using a knowledge graph? Neo4j’s support for vector search means your RAG results will be easier to explain and visualize.”

For another concrete example, look at MongoDB’s tutorial for “Building Generative AI Applications Using MongoDB: Harnessing the Power of Atlas Vector Search and Open Source Models“. There are various options for combining a vector database for GenAI use cases with Apache Kafka. The following is a possible architecture in an event-driven world.

Event-driven Architecture: Data Streaming + Vector DB + LLM

Event-driven applications can make both steps of Retrieval Augment Generation (RAG), data augmentation and model inference, more effectively implemented. Data Streaming with Apache Kafka and Apache Flink enables consistent synchronization of data at any scale (in real-time, if the application or database can handle it) and data curation (= streaming ETL).

The following diagram shows an enterprise architecture leveraging event-driven data streaming for data ingestion and processing across the entire GenAI pipeline:

This example uses data streaming to ingest flight bookings and changes in real-time into Kafka’s event store for later processing with GenAI technologies. Flink preprocesses the data before it is calling an embedding model to generate embeddings for the vector database. In parallel, a real-time customer service application built with Python consumes all relevant contextual data (e.g., flight data, customer data, embeddings, etc.) to prompt the large language model. The LLM creates a reliable prediction, e.g. a recommendation to rebook a passenger to another flight.

In most enterprise scenarios, all the processing operates behind the enterprise firewall for security and data privacy reasons. The LLM can even be integrated with transactional systems, like the booking engine, to execute the rebooking and feed the outcome into relevant applications and databases.

Request-Response with API vs. Event-driven Data Streaming

In an ideal world, everything is event-based and streaming data. The real-world is different. Hence, API calls with request-response with HTTP/REST or SQL are totally fine in some parts of the enterprise architecture. As Kafka truly decouples systems, each application chooses its own communication paradigm and processing speed. Therefore, it is important to understand the Trade-offs between HTTP/REST API and Apache Kafka.

When to use Request-Response with Apache Kafka? – this decision is often made on trade-offs like latency, decoupling, or security. However, with large LLMs, the situation changes. As an LLM is very expensive to train, the reusability of existing LLMs is crucial. And embedding an LLM into a Kafka Streams or Flink application makes little sense in contrary to other models built with algorithms like decision trees, clustering, or even small neural networks.

Similarly, augmentation models are usually integrated via an RPC/API call. By embedding it into the Kafka Streams microservice or Flink job, the augmentation model becomes tightly coupled. And experts host many of them today because operating and optimizing them is not trivial.

Solutions hosting LLMs and augmentations models typically only provide an RPC interface like HTTP. This will probably change in the future as request-response is an anti-pattern for streaming data. An excellent example of the evolution of model servers is Seldon; providing a Kafka-native interface in the meantime. Read more about request-response vs. streaming model serving in the article Streaming Machine Learning with Kafka-native Model Deployment.

Direct Integration between an LLM and the Rest of the Enterprise?

While writing this article, OpenAI announced GPTs to create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills. For enterprise usage, the most interesting capability is that developers can connect OpenAI’s GPTs to the real world, i.e. other software applications, databases, and cloud services:

“In addition to using our built-in capabilities, you can also define custom actions by making one or more APIs available to the GPT. Like plugins, actions allow GPTs to integrate external data or interact with the real-world. Connect GPTs to databases, plug them into emails, or make them your shopping assistant. For example, you could integrate a travel listings database, connect a user’s email inbox, or facilitate e-commerce orders.”

The trade-offs using direct integration are tight coupling and point-to-point communication. If you already use Kafka, you understand the value of domain-driven design with true decoupling.

For more details about the importance and value of true decoupling in a microservice or data mesh architecture, check out these two articles to learn the difference between tight coupling with traditional middleware and decoupled data products with event-based data streaming:

Last but not least: Public GenAI APIs and LLMs have a weak security and governance strategy. As AI data needs emerge and the number of point-to-point integrations increase, data access, lineage, and security challenges escalate.

Data Streaming with Kafka, Flink and GenAI in Practice

After a lot of theory, let’s look at a concrete example, a demo, and a real-world case study combining data streaming with GenAI:

Example: Flink SQL + OpenAI API
Demo: ChatGPT 4 + Confluent Cloud + MongoDB Atlas for RAG and Vector Search
Success story: Elemental Cognition – A Real-Time AI Platform powered by Confluent Cloud

Example: Flink SQL + OpenAI API

Stream processing with Kafka and Flink enables data correlation of real-time and historical data. An excellent example, especially for Generative AI, is context-specific customer service. We stay with the airline example and flight cancellations here.

A stateful stream processor takes existing customer information from the CRM, loyalty platform, and other applications, correlates it with the query from the customer into the chatbot, and makes an RPC call to an LLM.

The following diagram uses Apache Flink with a Flink SQL User Defined Function (UDF). The SQL query feeds the preprocessed data into the OpenAI API to get a reliable answer. The answer is sent to another Kafka topic from where downstream applications use it, e.g. for ticket rebooking, updating the loyalty platform, and also storing the data in a data lake for later batch processing and analytics.

Demo: ChatGPT 4 + Confluent Cloud + MongoDB Atlas for RAG and Vector Search

My colleague Britton LaRoche built a fantastic retail demo showing the combination of Kafka for data integration and processing and MongoDB for storage and semantic vector search. D-ID, an AI video creation platform, makes the demo more beautiful by replacing a command-line interface with a visual AI avatars.

The fully managed and deeply integrated services Confluent Cloud and MongoDB Atlas allow to focus on building business logic.

The architecture differs from my above event-based streaming example. The core is still Kafka to truly decouple applications. Most services are integrated via request-response APIs. This is simple, well understood, and often good enough. And you can easily migrate later to event-based patterns using a Python Kafka API, change data capture (CDC) from Kafka, embedding a LangChain Python UDF into Apache Flink, or using an asynchronous interface like AsyncAPI.

Here is a short five-minute demo walking you through the demo with RAG and semantic search using MongoDB Atlas, Confluent as integration hub, and D-ID as the communication interface with the end user:

Success Story: Elemental Cognition – A Real-Time GenAI Platform powered by Kafka and Confluent Cloud

Dr. David Ferrucci, the renowned AI researcher and inventor of IBM’s breakthrough Watson technology, founded Elemental Cognition in 2015. The company leverages GenAI to accelerate and improve critical decision-making, where trust, precision, and transparency matter.

Elemental Cognition technology can be used across industries and use cases. Primary targets are Healthcare / Life Science, Investment Management, Intelligence, Logistics & Scheduling, and Contact Centers.

The AI platform develops responsible and transparent AI that helps solve problems and deliver expertise that can be understood and trusted.

Elemental Cognition’s approach combines different AI strategies in a novel architecture that acquires and reasons over human-readable knowledge to solve problems collaboratively and dynamically. The result is more transparent and cost-effective delivery of expert problem-solving intelligence into conversational and discovery applications.

Confluent Cloud powers the AI platform to enable scalable real-time data and data integration use cases. I recommend looking at their website to learn from various impressive use cases.

Apache Kafka as Central Nervous System for GenAI Enterprise Architectures

Generative AI (GenAI) requires changes to the AI/ML enterprise architecture. Augmentation Models, LLMs, RAG with vector databases and semantic search need data integration, correlation, and decoupling. Data Streaming with Kafka and Flink is here to help.

Many applications and databases use a request-response communication with REST/HTTP, SQL, or other interfaces. That is totally fine. Choose the right technology and integration layer for your data products and applications. But secure data consistency.

Data streaming with Apache Kafka and Apache Flink enables developers and data engineers focusing on business problems in their data products or integration projects because it truly decoupled different domains. Integration with Kafka is possible with HTTP, Kafka APIs, AsyncAPI, CDC from a database, SaaS interfaces, and many other options.

Kafka enables to connect the systems with any communication paradigm. Its event store shares data in milliseconds (even at extreme scale) but also persists the data for slower downstream applications and replaying historical data. The heart of the data mesh must beat in real time. That’s true for any good enterprise architecture. GenAI is no difference.

How do you build conversational AI, chatbots, and other GenAI applications leveraging Apache Kafka? Did you build RAG in real-time with Flink and a Vector Database to provide the right context for the LLM? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka + Vector Database + LLM = Real-Time GenAI appeared first on Kai Waehner.

Apache Kafka as Mission Critical Data Fabric for GenAI

Kai Waehner — Sat, 22 Jul 2023 10:40:59 +0000

Apache Kafka serves thousands of enterprises as the mission-critical and scalable real-time data fabric for machine learning infrastructures. The evolution of Generative AI (GenAI) with large language models (LLM) like ChatGPT changed how people think about intelligent software and automation. This blog post explains the relation between data streaming and GenAI and shows the enormous opportunities and some early adopters of GenAI beyond the buzz.

Generative AI (GenAI) and Data Streaming

Let’s set the context first to have the same understanding of the buzzwords.

[Note: My ChatGPT iPhone app generated this section. I slightly shortened and adjusted the content afterward. GenAI is perfect for summarizing existing content so that authors can spend time on new content (that ChatGPT does not know yet).]

Natural Language Processing (NLP)

ChatGPT, what is NLP?

NLP stands for Natural Language Processing. It is a subfield of artificial intelligence (AI) that focuses on interactions between computers and human language. NLP enables computers to understand, interpret, and generate human language in a meaningful and valuable way.

Natural Language Processing involves a range of tasks, including:

Text Parsing and Tokenization: Breaking down text into individual words or tokens.
Named Entity Recognition (NER): Identifying and classifying named entities such as people, organizations, locations, and dates in the text.
Sentiment Analysis: Determining the sentiment or emotional tone expressed in text, whether positive, negative, or neutral.
Machine Translation: Translating text from one language to another.
Question Answering: Building systems that can understand and answer questions posed in natural language.
Text Generation: Creating human-like text or generating responses to prompts.

NLP is crucial in applications such as chatbots, virtual assistants, language translation, information retrieval, sentiment analysis, and more.

GenAI = Next Generation NLP (and more)

ChatGPT, what is Generative AI?

Generative AI is a branch of artificial intelligence focused on creating models and systems capable of generating new content, such as images, text, music, or even entire virtual worlds. These models are trained on large datasets and learn patterns and structures to generate new outputs similar to the training data. That’s why the widespread buzzword is Large Language Model (LLM).

Generative AI is used for next-generation NLP and uses techniques such as generative adversarial networks (GANs), variational autoencoders (VAEs), and recurrent neural networks (RNNs). Generative AI has applications in various fields and industries, including art, design, entertainment, and scientific research.

Apache Kafka for Data Streaming

ChatGPT, what is Apache Kafka?

Apache Kafka is an open-source distributed streaming platform and became the de facto standard for event streaming. It was developed by the Apache Software Foundation and is widely used for building real-time data streaming applications and event-driven architectures. Kafka provides a scalable and fault-tolerant system for handling high volumes of streaming data.

Kafka has a thriving ecosystem with various tools and frameworks that integrate with it, such as Apache Spark, Apache Flink, and others.

Apache Kafka is widely adopted in use cases that require real-time data streaming, such as data pipelines, event sourcing, log aggregation, messaging systems, and more.

Why Apache Kafka and GenAI?

Generative AI (GenAI) is the next-generation NLP engine that helps many projects in the real world for service desk automation, customer conversation with a chatbot, content moderation in social networks, and many other use cases.

Apache Kafka became the predominant orchestration layer in these machine learning platforms for integrating various data sources, processing at scale, and real-time model inference.

Data streaming with Kafka already powers many GenAI infrastructures and software products. Very different scenarios are possible:

Data streaming as data fabric for the entire machine learning infrastructure
Model scoring with stream processing for real-time productions
Generation of streaming data pipelines with input text or speech
Real-time online training of large language models

Let’s explore these opportunities for data streaming with Kafka and GenAI in more detail.

Real-time Kafka Data Hub for GenAI and other Microservices in the Enterprise Architecture

I already explored in 2017 (!) how “How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka“. At that time, real-world examples came from tech giants like Uber, Netflix, and Paypal.

Today, Apache Kafka is the de facto standard for building scalable and reliable machine learning infrastructures across any enterprise and industry, including:

Data integration from various sources (sensors, logs, databases, message brokers, APIs, etc.) using Kafka Connect connectors, fully-managed SaaS integrations, or any kind of HTTP REST API or programming language.
Data processing leveraging stream processing for cost-efficient streaming ETL such as filtering, aggregations, and more advanced calculations while the data is in motion (so that any downstream application gets accurate information)
Data ingestion for near real-time data sharing with various data warehouses and data lakes so that each analytics platform can use its product and tools.

Building scalable and reliable end-to-end pipelines is today’s sweet spot of data streaming with Apache Kafka in the AI and Machine Learning space.

Model Scoring with Stream Processing for Real-Time Predictions at any Scale

Deploying an analytic model in a Kafka application is the solution to provide real-time predictions at any scale with low latency. This is one of the biggest problems in the AI space, as data scientists primarily focus on historical data and batch model training in data lakes.

However, the model scoring for predictions needs to provide much better SLAs regarding scalability, reliability, and latency. Hence, more and more companies separate model training from model scoring and deploy the analytic model within a stream processor such as Kafka Streams, KSQL, or Apache Flink:

Check out my article “Machine Learning and Real-Time Analytics in Apache Kafka Applications” for more details.

Dedicated model servers usually only support batch and request-response (e.g., via HTTP or gRPC). Fortunately, many solutions now also provide native integration with the Kafka protocol.

I explored this innovation in my blog post “Streaming Machine Learning with Kafka-native Model Deployment“.

Development Tools for Generating Kafka-native Data Pipelines from Input Text or Speech

Almost every software vendor discusses GenAI to enhance its development environments and user interfaces.

For instance, GitHub is a platform and cloud-based service for software development and version control using Git. But their latest innovation is “the AI-Powered Developer Platform to Build, Scale, and Deliver Secure Software”: Github CoPilot X. Cloud providers like AWS provide similar tools.

Similarly, look at any data infrastructure vendor like Databricks or Snowflake. The latest conferences and announcements focus on embedded capabilities around large language models and GenAI in their solutions.

The same will be true for many data streaming platforms and cloud services. Low-code/no-code tools will add capabilities to generate data pipelines from input text. One of the most straightforward applications that I see coming is generating SQL code out of user text.

For instance, “Consume data from Oracle table customer, aggregate the payments by customer, and ingest it into Snowflake”. This could create SQL code for stream processing technologies like KSQL or FlinkSQL.

Developer experience, faster time-to-market, and support less technical personas are enormous advantages for embedding GenAI into Kafka development environments.

Real-time Training of Large Language Models (LLM)

AI and Machine Learning are still batch-based systems almost all of the time. Model training takes at least hours. This is not ideal, as many GenAI use cases require accurate and updated information. Imagine googling for information today, and you could not find data from the past week. Impossible to use such a service in many scenarios!

Similarly, if I ask ChatGPT today (July 2023): “What is GenAI?” – I get the following response:

As of my last update in September 2021, there is no specific information on an entity called “GenAi.” It’s possible that something new has emerged since then. Could you provide more context or clarify your question so I can better assist you?

The faster your machine learning infrastructure ingests data into model training, the better. My colleague Michael Drogalis wrote an excellent deep-technical blog post: “GPT-4 + Streaming Data = Real-Time Generative AI” to explore this topic more thoroughly.

This architecture is compelling because the chatbot will always have your latest information whenever you prompt it. For instance, if your flight gets delayed or your terminal changes, the chatbot will know about it during your chat session. This is entirely distinct from current approaches where the chat session must be reloaded or wait a few hours/days for new data to arrive.

LLM + Vector Database + Kafka = Real-Time GenAI

Real-time model training is still a novel approach. Many machine learning algorithms are not ready for continuous online model training today. But combining Kafka with a vector database enables using a batch-trained LLM together with real-time updates feeding up-to-date information into the LLM.

Nobody will accept an LLM like ChatGPT in a few years, giving you answers like “I don’t have this information; my model was trained a week ago”. It does not matter if you choose a brand new vector database like Pinecone or leverage new vector capabilities of your installed Oracle or MongoDB storage.

Feed data into the vector database in real-time with Kafka Connect and combine with with a mature LLM to enable real-time GenAI with context-specific recommendations.

Real-World Case Studies for Kafka and GenAI

This section explores how companies across different industries, such as the carmaker BMW, the online travel and booking Expedia, and the dating app Tinder leverage the combination of data streaming with GenAI for reliable real-time conversational AI, NLP and chatbots leveraging Kafka.

Two years ago, I wrote about this topic: “Apache Kafka for Conversational AI, NLP and Chatbot“. But technologies like ChatGPT make it much easier to adopt GenAI in real-world projects with much faster time-to-market and less cost and risk. Let’s explore a few of these success stories for embedding NLP and GenAI into data streaming enterprise architectures.

Disclaimer: As I want to show real-world case studies instead of visionary outlooks, I show several examples deployed in production in the last few years. Hence, the analytic models do not use GenAI, LLM, or ChatGPT as we know it from the press today. But the principles are precisely the same. The only difference is that you could use a cutting-edge model like ChatGPT with much improved and context-specific responses today.

Expedia – Conversations Platform for Better Customer Experience

Expedia is a leading online travel and booking. They have many use cases for machine learning. One of my favorite examples is their Conversations Platform built on Kafka and Confluent Cloud to provide an elastic cloud-native application.

The goal of Expedia’s Conversations Platform was simple: Enable millions of travelers to have natural language conversations with an automated agent via text, Facebook, or their channel of choice. Let them book trips, make changes or cancellations, and ask questions:

“How long is my layover?”
“Does my hotel have a pool?”
“How much will I get charged to bring my golf clubs?”

Then take all that is known about that customer across all of Expedia’s brands and apply machine learning models to immediately give customers what they are looking for in real-time and automatically, whether a straightforward answer or a complex new itinerary.

Real-time Orchestration realized in four Months

Such a platform is no place for batch jobs, back-end processing, or offline APIs. To quickly make decisions that incorporate contextual information, the platform needs data in near real-time, and it needs it from a wide range of services and systems. Meeting these needs meant architecting the Conversations Platform around a central nervous system based on Confluent Cloud and Apache Kafka. Kafka made it possible to orchestrate data from loosely coupled systems, enrich data as it flows between them so that by the time it reaches its destination, it is ready to be acted upon, and surface aggregated data for analytics and reporting.

Expedia built this platform from zero to production in four months. That’s the tremendous advantage of using a fully managed serverless event streaming platform as the foundation. The project team can focus on the business logic.

The Covid pandemic proved the idea of an elastic platform: Companies were hit with a tidal wave of customer questions, cancellations, and re-bookings. Throughout this once-in-a-lifetime event, the Conversations Platform proved up to the challenge, auto-scaling as necessary and taking off much of the load of live agents.

Expedia’s Migration from MQ to Kafka as Foundation for Real-time Machine Learning and Chatbots

As part of their conversations platform, Expedia needed to modernize their IT infrastructure, as Ravi Vankamamidi, Director of Technology at Expedia Group, explained in a Kafka Summit keynote.

Expedia’s old legacy chatbot service relied on a legacy messaging system. This service was a question-and-answer board with very limited scope for booking scenarios. This service could handle two-party conversations. It could not scale to bring all different systems into one architecture to build a powerful chatbot that is helpful for customer conversations.

I explored several times that event streaming is more than just a (scalable) message queue. Check out my old (but still accurate and relevant) Comparison between MQ and Kafka, or the newer comparison between cloud-native iPaaS and Kafka.

Expedia needed a service that was closer to travel assistance. It needed to handle context-specific, multi-party, multi-channel conversations. Hence, features such as natural language processing, translation, and real-time analytics are required. The full service needs to be scalable across multiple brands. Therefore, a fast and highly scalable platform with order guarantees, exactly-once-semantics (EOS), and real-time data processing were needed.

The Kafka-native event streaming platform powered by Confluent was the best choice and met all requirements. The new conversations platform doubled the Net Promoter Score (NPS) one year after the rollout. The new platform proved the business value of the new platform quickly.

BMW – GenAI for Contract Intelligence, Workplace Assistance and Machine Translation

The automotive company BMW presented innovative NLP services at Kafka Summit in 2019. It is no surprise that a carmaker has various NLP scenarios. These include digital contract intelligence, workplace assistance, machine translation, and customer conversations. The latter contains multiple use cases for conversational AI:

Service desk automation
Speech analysis of customer interaction center (CIC) calls to improve the quality
Self-service using smart knowledge bases
Agent support
Chatbots

The text and speech data is structured, enriched, contextualized, summarized, and translated to build real-time decision support applications. Kafka is a crucial component of BMW’s ML and NLP architecture. The real-time integration and data correlation enable interactive and interoperable data consumption and usage:

BMW explained the key advantages of leveraging Kafka and its streaming processing library Kafka Streams as the real-time integration and orchestration platform:

Flexible integration: Multiple supported interfaces for different deployment scenarios, including various machine learning technologies, programming languages, and cloud providers
Modular end-to-end pipelines: Services can be connected to provide full-fledged NLP applications.
Configurability: High agility for each deployment scenario

Tinder – Intelligent Content Moderation, Matching and Recommendations with Kafka and GenAI

The dating app Tinder is a great example where I can think of tens of use cases for NLP. Tinder talked at a past Kafka Summit about their Kafka-powered machine learning platform.

Tinder is a massive user of Kafka and its ecosystem for various use cases, including content moderation, matching, recommendations, reminders, and user reactivation. They used Kafka Streams as a Kafka-native stream processing engine for metadata processing and correlation in real-time at scale:

A critical use case in any dating or social platform is content moderation for detecting fakes, filtering sexual content, and other inappropriate things. Content moderation combines NLP and text processing (e.g., for chat messages) with image processing (e.g., selfie uploads) or processes the metadata with Kafka and stores the linked content in a data lake. Both leverage Deep Learning to process high volumes of text and images. Here is what content moderation looks like in Tinder’s Kafka architecture:

Plenty of ways exist to process text, images, and videos with the Kafka ecosystem. I wrote a detailed article about handling large messages and files with Apache Kafka to explore the options and trade-offs.

Chatbots could also play a key role “in the other way round”. More and more dating apps (and other social networks) fight against spam, fraud, and automated chatbots. Like building a chatbot, a chatbot detection system can analyze the data streams to block a dating app’s chatbot.

Kafka as Real-Time Data Fabric for Future GenAI Initiatives

Real-time data beats slow data. Generative AI only adds value if it provides accurate and up-to-date information. Data streaming technologies such as Apache Kafka and Apache Flink enable building a reliable, scalable real-time infrastructure for GenAI. Additionally, the event-based heart of the enterprise architecture guarantees data consistency between real-time and non-real-time systems (near real-time, batch, request-response).

The early adopters like BWM, Expedia, and Tinder proved that Generative AI integrated into a Kafka architecture adds enormous business value. The evolution of AI models with ChatGPT et al. makes the use case even more compelling across every industry.

How do you build conversational AI, chatbots, and other GenAI applications leveraging Apache Kafka? What technologies and architectures do you use? Are data streaming and Kafka part of the architecture? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka as Mission Critical Data Fabric for GenAI appeared first on Kai Waehner.