Gaming Archives - Kai Waehner

Powering Fantasy Sports at Scale: How Dream11 Uses Apache Kafka for Real-Time Gaming

Kai Waehner — Mon, 19 May 2025 06:48:27 +0000

Fantasy sports has become one of the most dynamic and data-intensive digital industries of the past decade. What started as a casual game for sports fans has evolved into a massive business, blending real-time analytics, mobile engagement, and personalized gaming experiences. At the center of this transformation is Apache Kafka—a critical enabler for platforms like Dream11, where millions of users expect live scores, instant feedback, and seamless gameplay. This post explores how fantasy sports works, why real-time data is non-negotiable, and how Dream11 has scaled its Kafka infrastructure to handle some of the world’s most demanding user traffic patterns.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, including several success stories around gaming, loyalty platforms, and personalized advertising.

Fantasy Sports: Real-Time Gaming Meets Real-World Sports

Fantasy sports allows users to create virtual teams based on real-life athletes. As matches unfold, players earn points based on the performance of their selected athletes. The better the team performs, the higher the user’s score—and the bigger the prize.

Key characteristics of fantasy gaming:

Multi-sport experience: Users can play across cricket, football, basketball, and more.
Live interaction: Scoring is updated in real time as matches progress.
Contests and leagues: Players join public or private contests, often with cash prizes.
Peak traffic patterns: Most activity spikes in the minutes before a match begins.

This user behavior creates a unique business and technology challenge. Millions of users make critical decisions at the same time, just before the start of each game. The result: extreme concurrency, massive request volumes, and a hard dependency on data accuracy and low latency.

Real-time infrastructure isn’t optional in this model. It’s fundamental to user trust and business success.

Dream11: A Fantasy Sports Giant with Massive Scale

Founded in India, Dream11 is the largest fantasy sports platform in the country—and one of the biggest globally. With over 230 million users, it dominates fantasy gaming across cricket and 11 other sports. The platform sees traffic that rivals the world’s largest digital services.

Source: Dream11

Bipul Karnanit from Dream11 presented very interesting overview at Current 2025 in Bangalore India. Here are a few statistics about Dream11’s scale:

230M users
12 sports
12,000 matches/year
44TB data per day
15M+ peak concurrent users
43M+ peak transactions/day

During major events like the IPL, Dream11 experiences hockey-stick traffic curves, where tens of millions of users log in just minutes before a match begins—making lineup changes, joining contests, and waiting for live updates.

This creates a business-critical need for:

Low latency
Guaranteed data consistency
Fault tolerance
Real-time analytics and scoring
High developer productivity to iterate fast

Apache Kafka at the Heart of Dream11’s Platform

To meet these demands, Dream11 uses Apache Kafka as the foundation of its real-time data infrastructure. Kafka powers the messaging between services that manage user actions, match scores, payouts, leaderboards, and more.

Apache Kafka enables:

Event-driven microservices
Scalable ingestion and processing of user and game data
Loose coupling between systems with data products for operational and analytical consumers
High throughput with guaranteed ordering and durability

Solving Kafka Consumer Challenges at Scale

As the business grew, Dream11’s engineering team encountered challenges with Kafka’s standard consumer APIs, particularly around rebalancing, offset management, and processing guarantees under peak load.

To address these issues, Dream11 built a custom Java-based Kafka consumer library—a foundational component of its internal platform that simplifies Kafka integration across services and boosts developer productivity.

Dream11 Kafka Consumer Library:

Purpose: A custom-built Java library designed to handle high-volume Kafka message consumption at Dream11 scale.
Key Benefit: Abstracts away low-level Kafka consumer details, simplifying tasks like offset management, error handling, and multi-threading, allowing developers to focus on business logic.
Simple Interfaces: Provides easy-to-use interfaces for processing records.
Increased Developer Productivity: Standardized library lead to faster development and fewer errors.

This library plays a crucial role in enabling real-time updates and ensuring seamless gameplay—even under the most demanding user scenarios.

For deeper technical insights, including how Dream11 decoupled polling and processing, implemented at-least-once delivery, and improved throughput with custom worker pools, watch the Dream11 engineering session from Current India 2025 presented by Bipul Karnanit.

Fantasy Sports, Real-Time Expectations, and Business Value

Dream11’s business success is built on user trust, real-time responsiveness, and high-quality gameplay. With millions of users relying on accurate, timely updates, the platform can’t afford downtime, data loss, or delays.

Data Streaming with Apache Kafka enables Dream11 to:

React to user interactions instantly
Deliver consistent data across microservices and devices
Scale dynamically during live events
Streamline the development and deployment of new features

This is not just a backend innovation—it’s a competitive advantage in a space where milliseconds matter and trust is everything.

Dream11’s Kafka Journey: The Backbone of Fantasy Sports at Scale

Fantasy sports is one of the most demanding environments for real-time data platforms. Dream11’s approach—scaling Apache Kafka to serve hundreds of millions of events with precision—is a powerful example of aligning architecture with business needs.

As more industries adopt event-driven systems, Dream11’s journey offers a clear message: Apache Kafka is not just a messaging layer—it’s a strategic platform for building reliable, low-latency digital experiences at scale.

Whether you’re in gaming, finance, telecom, or logistics, there’s much to learn from the way fantasy sports leaders like Dream11 harness data streaming to deliver world-class services.

The post Powering Fantasy Sports at Scale: How Dream11 Uses Apache Kafka for Real-Time Gaming appeared first on Kai Waehner.

A New Era in Dynamic Pricing: Real-Time Data Streaming with Apache Kafka and Flink

Kai Waehner — Thu, 14 Nov 2024 12:09:57 +0000

In the age of digitization, the concept of pricing is no longer fixed or manual. Instead, companies increasingly use dynamic pricing — a flexible model that adjusts prices based on real-time market changes. Data streaming technologies like Apache Kafka and Apache Flink have become integral to enabling this real-time responsiveness, giving companies the tools they need to respond instantly to demand, competitor prices, and customer behaviors. This blog post explores the fundamentals of dynamic pricing, its link to data streaming, and real-world examples of how different industries such as retail, logistics, gaming and the energy sector leverage this powerful approach to get ahead of the competition.

What is Dynamic Pricing?

Dynamic pricing is a strategy where prices are adjusted automatically based on real-time data inputs, such as demand, customer behavior, supply levels, and competitor actions. This model allows companies to optimize profitability, boost sales, and better meet customer expectations.

Relevant Industries and Examples

Dynamic pricing has applications across many industries:

Retail and eCommerce: Dynamic pricing in eCommerce helps adjust product prices based on stock levels, competitor actions, and customer demand. Companies like Amazon frequently update prices on millions of products, using dynamic pricing to maximize revenue.
Transportation and Mobility: Ride-sharing companies like Uber and Grab adjust fares based on real-time demand and traffic conditions. This is commonly known as “surge pricing.”
Gaming: Context-specific in-game add-ons or virtual items are offered at varying prices based on player engagement, time spent in-game, and special events or levels.
Energy Markets: Dynamic pricing in energy adjusts rates in response to demand fluctuations, energy availability, and wholesale costs. This approach helps to stabilize the grid and manage resources.
Sports and Entertainment Ticketing: Ticket prices for events are adjusted based on seat availability, demand, and event timing to allow venues and ticketing platforms to balance occupancy and maximize ticket revenue.
Hospitality: Adaptive room rates and promotions in real-time based on demand, seasonality, and guest behavior, using dynamic pricing models.

These industries have adopted dynamic pricing to maintain profitability, manage supply-demand balance, and enhance customer satisfaction through personalized, responsive pricing.

Relation of Dynamic Pricing to Data Streaming with Apache Kafka and Flink

Dynamic pricing relies on up-to-the-minute data on market and customer conditions, making real-time data streaming critical to its success. Traditional batch processing, where data is collected and processed periodically, is insufficient for dynamic pricing. It introduces delays that could mean lost revenue opportunities or suboptimal pricing. This scenario is where data streaming technologies come into play.

Apache Kafka serves as the real-time data pipeline, collecting and distributing data streams from diverse sources. For instance, user behaviour on websites, competitor pricing, social media signals, IoT data, and more. Kafka’s capability to handle high throughput and low latency makes it ideal for ingesting large volumes of data continuously.
Apache Flink processes the data in real-time, applying complex algorithms to identify pricing opportunities as conditions change. With Flink’s support for stream processing and complex event processing, businesses can apply sophisticated logic to assess and adjust prices based on multiple real-time factors.

Together, Kafka and Flink create a powerful foundation for dynamic pricing, enabling real-time data ingestion, analysis, and action. This empowers companies to implement pricing models that are not only highly responsive but also resilient and scalable.

Clickstream Analytics in Real-Time with Data Streaming Replacing Batch with Hadoop and Spark

Years ago, companies relied on Hadoop and Spark to run batch-based clickstream analytics. Data engineers ingested logs from websites, online stores, and mobile apps to gather insights. Processing took hours. Therefore, any promotional offer or discount often arrived a day later — by which time the customer may have already made their purchase elsewhere, like on Amazon.

With today’s data streaming platforms like Kafka and Flink, clickstream analytics has evolved to support real-time, context-specific engagement and dynamic pricing. Instead of waiting on delayed insights, businesses can now analyze customer behavior as it happens, instantly adjusting prices and delivering personalized offers at the moment. This dynamic pricing capability allows companies to respond immediately to high-intent customers, presenting tailored prices or promotions when they’re most likely to convert. Dynamic pricing with Kafka and Flink can create a much better seamless and timely shopping experience that maximizes sales and customer satisfaction.

Success Stories for Dynamic Pricing with Data Streaming using Kafka and Flink Across Industries

Here’s how businesses across various sectors are harnessing Kafka and Flink for dynamic pricing.

Retail: Hyper-Personalized Promotions and Discounts
Logistics and Transportation: Intelligent Tolling
Technology: Surge Pricing
Energy Markets: Manage Supply-Demand and Stabilize Grid Loads
Gaming: Context-Specific In-Game Add-Ons
Sports and Entertainment: Optimize Ticketing Sales Sports and Entertainment

Learn more about data streaming with Kafka and Flink for dynamic pricing in the following success stories:

AO: Hyper-Personalized Promotions and Discounts (Retail and eCommerce)

AO, a major UK eCommerce retailer, leverages data streaming for dynamic pricing to stay competitive and drive higher customer engagement. By ingesting real-time data on competitor prices, customer demand, and inventory stock levels, AO’s system processes this information instantly to adjust prices in sync with market conditions. This approach allows AO to seize pricing opportunities and align closely with customer expectations. The result is a 30% increase in customer conversion rates.

Dynamic pricing has also allowed AO to provide a hyper-personalized shopping experience, delivering relevant product recommendations and timely promotions. This real-time responsiveness has enhanced customer satisfaction and loyalty, as customers receive offers that feel customized to their needs. During high-traffic periods like holiday sales, AO’s dynamic pricing ensures competitiveness and optimizes margins. This drives both profitability and customer retention. The company has applied this real-time approach not just to pricing, but also to other areas like delivery to make things run smoother. The retailer is now much more efficient and provides better customer service.

Quarterhill: Intelligent Tolling (Logistics and Transportation)

Quarterhill, a leader in tolling and intelligent transportation systems, uses Kafka and Flink to implement dynamic toll pricing. Kafka ingests real-time data from traffic sensors and road usage patterns. Flink processes this data to determine congestion levels and calculate the optimal toll based on real-time conditions.

This dynamic pricing strategy allows Quarterhill to manage road congestion effectively, reward off-peak travel, and optimize toll revenues. This system not only improves travel efficiency but also helps regulate traffic flows in high-density areas, providing value both to drivers and the city infrastructure.

Uber, Grab, and FreeNow: Surge Pricing (Technology)

Ride-sharing companies like Uber, Grab, and FreeNow are widely known for their dynamic pricing or “surge pricing” models. With data streaming, these platforms capture data on demand, supply (available drivers), location, and traffic in real time. This data is processed continuously by Apache Flink, Kafka Streams or other stream processing engines to calculate optimal pricing, balancing supply with demand, while considering variables like route distance and current traffic.

Source FreeNow

Surge pricing enables these companies to provide incentives for drivers to operate in high-demand areas, maintaining service availability and ensuring customer needs are met during peak times. This real-time pricing model improves revenue while optimizing customer satisfaction through prompt service availability.

Uber’s Kappa Architecture is an excellent example for how to build a data pipeline for dynamic pricing and many other use cases with Kafka and Flink:

Source: Uber

2K Games / Take-Two Interactive: Context-Specific In-Game Purchases (Gaming Industry)

In the gaming industry, dynamic pricing is becoming a strategy to improve player engagement and monetize experiences. Many gaming companies use Kafka and Flink to capture real-time data on player interactions, time spent in specific game sections, and in-game events. This data enables companies to offer personalized pricing for in-game items, bonuses, or add-ons, adjusting prices based on the player’s current engagement level and recent activities.

For instance, if players are actively taking part in a particular game event, they may be offered special discounts or dynamic prices on related in-game assets. Thereby, the gaming companies improve conversion rates and player engagement while maximizing revenue.

2K Games,A leading video game publisher and a subsidiary of Take-Two Interactive, has shifted from batch to real-time analytics to enhance player engagement across popular franchises like BioShock, NBA 2K, and Borderlands. By leveraging Confluent Cloud as fully managed data streaming platform, the publisher scales dynamically to handle high traffic, processing up to 3000 MB per second to serve 4 million concurrent users.

Source: 2K Games

Real-time telemetry analytics now allow them to analyze player actions and context instantly, enabling personalized, context-specific promotions and enhancing the gaming experience. Cost efficiencies are achieved through data compression, tiered storage, and reduced data transfer, making real-time engagement both effective and economical.

50hertz: Manage Supply-Demand and Stabilize Grid Loads (Energy Markets)

Dynamic pricing in energy markets is essential for managing supply-demand fluctuations and stabilizing grid loads. With Kafka, energy providers ingest data from smart meters, renewable energy sources, and weather. Flink processes this data in real-time, adjusting energy prices based on grid conditions, demand levels, and renewable supply availability.

50Hertz, as a leading electricity transmission system operator, indirectly (!) affects dynamic pricing in the energy market by sharing real-time grid data with partners and energy providers. This allows energy providers and market operators to adjust prices dynamically based on real-time insights into supply-demand fluctuations and grid stability.

To support this, 50Hertz is modernizing its SCADA systems with data streaming technologies to enable real-time data capture and distribution that enhances grid monitoring and responsiveness.

Real-time pricing approach helps encourage consumption when renewable energy is abundant and discourages usage during peak times, leading to optimized energy distribution, grid stability, and improved sustainability.

Ticketmaster: Optimize Ticketing Sales (Sports and Entertainment)

In ticketing, dynamic pricing allows for optimized revenue management based on demand and availability. Companies like Ticketmaster use Kafka to collect data on ticket availability, sales velocity, and even social media sentiment surrounding events. Flink processes this data to adjust prices based on real-time market conditions, such as proximity to the event date and current demand.

By dynamically pricing tickets, event organizers can maximize seat occupancy, boost revenue, and respond to last-minute demand surges, ensuring that prices reflect real-time interest while enhancing fan satisfaction.

Real-time inventory data streams allow Ticketmaster to monitor ticket availability, pricing, and demand as they change moment-to-moment. With data streaming through Apache Kafka and Confluent Platform, Ticketmaster tracks sales, venue capacity, and customer behavior in a single, live inventory stream. This enables quick responses, such as adjusting prices for high-demand events or boosting promotions where conversions lag. Teams gain actionable insights to forecast demand accurately and optimize inventory. This approach ensures fans have timely access to tickets. The result is a dynamic, data-driven approach that enhances customer experience and maximizes event success.

Conclusion: Business Value of Dynamic Pricing Built with Data Streaming

Dynamic pricing powered by data streaming with Apache Kafka and Flink brings transformative business value by:

Maximizing Revenue and Margins: Real-time price adjustments enable companies to capture value during demand surges, optimize for competitive conditions, and maintain healthy margins.
Improving Operational Efficiency: By automating pricing decisions based on real-time data, organizations can reduce manual intervention, speed up reaction times, and allocate resources more effectively.
Boosting Customer Satisfaction: Responsive pricing models allow companies to meet customer expectations in real time, leading to improved customer loyalty and engagement.
Supporting Sustainability Goals: In energy and transportation, dynamic pricing helps manage resources and reward environmentally friendly behaviors. Examples include off-peak travel and renewable energy usage.
Empowering Strategic Decision-Making: Real-time data insights provide business leaders with the information needed to adjust strategies and respond to developing market demands quickly.

Building a dynamic pricing system with Kafka and Flink represents a strategic investment in business agility and competitive resilience. Using data streaming to set prices instantly, businesses can stay ahead of competitors, improve customer service, and become more profitable. Dynamic pricing powered by data streaming is more than just a revenue tool; it’s a vital lever for driving growth, differentiation, and long-term success.

Did you already implement dynamic pricing? What is your data platform and strategy? Do you use Apache Kafka and Flink? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post A New Era in Dynamic Pricing: Real-Time Data Streaming with Apache Kafka and Flink appeared first on Kai Waehner.

The State of Data Streaming for Gaming

Kai Waehner — Wed, 01 Nov 2023 06:40:45 +0000

This blog post explores the state of data streaming for the gaming industry. The evolution of casual and online games, Esports, social platforms, gambling, and new business models require a reliable global data infrastructure, real-time end-to-end observability, fast time-to-market for new features, and integration with pioneering technologies like AI/machine learning, virtual reality, and cryptocurrencies. Data streaming allows integrating and correlating data in real-time at any scale to improve most business processes in the gaming sector much more cost-efficiently.

I look at trends in the games industry to explore how data streaming helps as a business enabler, including customer stories from Kakao Games, Mobile Premier League (MLP), Demonware / Blizzard, and more. A complete slide deck and on-demand video recording are included.

General trends in the gaming industry

The global gaming market is bigger than the music and film industries combined! Digitalization plays a huge factor for the growth in the past years. The gaming industry has various business models connecting players, fans, vendors, and other stakeholders:

Hardware sales: Game consoles, VR sets, glasses
Game sales: Physical and digital
Free-to-play + in-game purchases: One-time in-game purchases (skins, champions, miscellaneous), gambling (loot boxes)
Game-as-a-service (subscription): Seasonal in-game purchases like passes for theme events, mid-season invitational & world championship, passes for competitive play
Game-Infrastructure-as-a-Service: High-performance state synchronization, multiplayer, matchmaking, gaming statistics
Merchandise sales: T-shirts, souvenirs, fan equipment
Community: Esports broadcast, ticket sales, franchising fees
Live betting
Video streaming: Subscriptions, advertisements, rewards,

Growth and innovation require cloud-native infrastructure

Most industries require a few specific characteristics. Instant payments must be executed in real time without data loss. Telcom infrastructure monitors huge volumes of logs in near-real-time. The retail industry needs to scale up for events like Christmas or Black Friday and scale down afterward.The gaming industry combines all the characteristics of other industries:

Real-time data processing
Scalability for millions of players
High availability, at least for transactional data
Decoupling for innovation and faster roll-out of new features
Cost efficiency because cloud networking for huge volumes is expensive
The flexibility of adopting various innovative technologies and APIs
Elasticity for critical events a few times a year
Standards-based integration for integration with SaaS, B2B, and mobile apps
Security for trusted customer data
Global and vendor-independent cloud infrastructure to deploy across countries

The good news is that data streaming powered by Apache Kafka and Apache Flink provides all these characteristics on a single platform, especially if you choose a fully managed SaaS offering.

Data streaming in the gaming industry

Adopting gaming trends like in-game purchases, customer-specific discounts, and massively multiplayer online games (MMOG) is only possible if enterprises in the games sector can provide and correlate information at the right time in the proper context. Real-time, which means using the information in milliseconds, seconds, or minutes, is almost always better than processing data later (whatever later means):

Data streaming combines the power of real-time messaging at any scale with storage for true decoupling, data integration, and data correlation capabilities. Apache Kafka is the de facto standard for data streaming.

“Apache Kafka in the Gaming Industry” is a great starting point to learn more about data streaming in the games sector, including a few Kafka-powered case studies not covered in this blog post – such as

Big Fish Games: Live operations by monitoring real-time analytics of game telemetry and context-specific recommendations for in-game purchases
Unity: Monetization network for player rewards, banner ads, playable advertisements, and cross-promotions.
William Hill: Trading platform for gambling and betting
Disney+ Hotstar: Gamification of live sport video streaming

Architecture trends for data streaming

The gaming industry applies various trends for enterprise architectures for cost, elasticity, security, and latency reasons. The three major topics I see these days at customers are:

Fully managed SaaS to focus on business logic and faster time-to-market
Event-driven architectures (in combination with request-response communication) to enable domain-driven design and flexible technology choices
Data mesh for building new data products and real-time data sharing with internal platforms and partner APIs

Let’s look deeper into some enterprise architectures that leverage data streaming for gaming use cases.

Cloud-native elasticity for seasonal spikes

The games sector has extreme spikes in workloads. For instance, specific game events increase the traffic 10x and more. Only cloud-native infrastructure enables a cost-efficient architecture.

Epic Games presented at an AWS Summit in 2018 already how elasticity is crucial for data-driven architecture.

Make sure to use a truly cloud-native Apache Kafka service for gaming infrastructure. Adding brokers is relatively easy. Removing brokers is much harder. Hence, a fully-managed SaaS should take over the complex operations challenges of distributed systems like Kafka and Flink for you. The separation of compute and storage is another crticial piece of a cloud-native Kafka architecture to ensure cost-efficient scale.

Data sharing across business units is important for any organization. The gaming industry has to combine very interesting (different) data sets, like big data game telemetry, monetization and advertisement transactions, and 3rd party interfaces.

Data consistency is one of the most challenging problems in the games sector. Apache Kafka ensures data consistency across all applications and databases, whether these systems operate in real-time, near-real-time, or batch.

One sweet spot of data streaming is that you can easily connect new applications to the existing infrastructure or modernize existing interfaces, like migrating from an on-premise data warehouse to a cloud SaaS offering.

New customer stories for data streaming in the gaming sector

So much innovation is happening in the gaming sector. Automation and digitalization change how gaming companies process game telemetry data, build communities and customer relationships with VIPs, and create new business models with enterprises of other verticals.

Most gaming companies use a cloud-first approach to improve time-to-market, increase flexibility, and focus on business logic instead of operating IT infrastructure. And elastic scalability gets even more critical with all the growing real-time expectations and mobile app capabilities.

Here are a few customer stories from worldwide gaming organizations:

Kakao Games: Log analytics and fraud prevention
Mobile Premier League (MLP): Mobile eSports and digital gaming
Demonware / Blizzard: Network and gaming infrastructure
WhatNot: Retail gamification and social commerce
Vimeo: Video streaming observability

Resources to learn more

This blog post is just the starting point. Learn more about data streaming in the gaming industry in the following on-demand webinar recording, the related slide deck, and further resources, including pretty cool lightboard videos about use cases.

On-demand video recording

The video recording explores the gaming industry’s trends and architectures for data streaming. The primary focus is the data streaming case studies.

I am excited to have presented this webinar in my interactive light board studio:

This creates a much better experience, especially in a time after the pandemic, where many people are “Zoom fatigue”.

Check out our on-demand recording:

Slides

If you prefer learning from slides, check out the deck used for the above recording:

Fullscreen Mode

Case studies and lightboard videos for data streaming in the gaming industry

The state of data streaming for gaming in 2023 is fascinating. New use cases and case studies come up every month. This includes better end-to-end observability in real-time across the entire organization, telemetry data collection from gamers, data sharing and B2B partnerships with engines like Unity or video platforms like Twitch, new business models for ads and in-game purchases, and many more scenarios.

We recorded lightboard videos showing the value of data streaming simply and effectively. These five-minute videos explore the business value of data streaming, related architectures, and customer stories. Here is an example for real-time fraud detection with data streaming.

Gaming is just one of many industries that leverages data streaming with Apache Kafka and Apache Flink.. Every month, we talk about the status of data streaming in a different industry. Manufacturing was the first. Financial services second, then retail, telcos, gaming, and so on… Check out my other blog posts.

Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post The State of Data Streaming for Gaming appeared first on Kai Waehner.

When NOT to use Apache Kafka?

Kai Waehner — Tue, 04 Jan 2022 07:24:59 +0000

Apache Kafka is the de facto standard for event streaming to process data in motion. With its significant adoption growth across all industries, I get a very valid question every week: When NOT to use Apache Kafka? What limitations does the event streaming platform have? When does Kafka simply not provide the needed capabilities? How to qualify Kafka out as it is not the right tool for the job? This blog post explores the DOs and DONTs. Separate sections explain when to use Kafka, when NOT to use Kafka, and when to MAYBE use Kafka.

Market Trends – A Connected World

Let’s begin with understanding why Kafka comes up everywhere in the meantime. This clarifies the huge market demand for event streaming but also shows that there is no silver bullet solving all problems. Kafka is NOT the silver bullet for a connected world, but a crucial component!

The world gets more and more connected. Vast volumes of data are generated and need to be correlated in real-time to increase revenue, reduce costs, and reduce risks. I could pick almost any industry. Some are faster. Others are slower. But the connected world is coming everywhere. Think about manufacturing, smart cities, gaming, retail, banking, insurance, and so on. If you look at my past blogs, you can find relevant Kafka use cases for any industry.

I picked two market trends that show this insane growth of data and the creation of innovation and new cutting-edge use cases (and why Kafka’s adoption is insane across industries, too).

Connected Cars – Insane volume of telemetry data and aftersales

Here is the “Global Opportunity Analysis and Industry Forecast, 2020–2027” by Allied Market Research:

The Connected Car market includes a much wider variety of use cases and industries than most people think. A few examples: Network infrastructure and connectivity, safety, entertainment, retail, aftermarket, vehicle insurance, 3rd party data usage (e.g., smart city), and so much more.

Gaming – Billions of players and massive revenues

The gaming industry is already bigger than all other media categories combined, and this is still just the beginning of a new era – as Bitkraft depicts:

Millions of new players join the gaming community every month across the globe. Connectivity and cheap smartphones are sold in less wealthy countries. New business models like “play to earn” change how the next generation of gamers plays a game. More scalable and low latency technologies like 5G enable new use cases. Blockchain and NFT (Non-Fungible Token) are changing the monetization and collection market forever.

These market trends across industries clarify why the need for real-time data processing increases significantly quarter by quarter. Apache Kafka established itself as the de facto standard for processing analytical and transactional data streams at scale. However, it is crucial to understand when (not) to use Apache Kafka and its ecosystem in your projects.

What is Apache Kafka, and what is it NOT?

Kafka is often misunderstood. For instance, I still hear way too often that Kafka is a message queue. Part of the reason is that some vendors only pitch it for a specific problem (such as data ingestion into a data lake or data warehouse) to sell their products. So, in short:

Kafka is…

a scalable real-time messaging platform to process millions of messages per second.
an event streaming platform for massive volumes of big data analytics and small volumes of transactional data processing.
a distributed storage provides true decoupling for backpressure handling, support of various communication protocols, and replayability of events with guaranteed ordering.
a data integration framework for streaming ETL.
a data processing framework for continuous stateless or stateful stream processing.

This combination of characteristics in a single platform makes Kafka unique (and successful).

Kafka is NOT…

a proxy for millions of clients (like mobile apps) – but Kafka-native proxies (like REST or MQTT) exist for some use cases.
an API Management platform – but these tools are usually complementary and used for the creation, life cycle management, or the monetization of Kafka APIs.
a database for complex queries and batch analytics workloads – but good enough for transactional queries and relatively simple aggregations (especially with ksqlDB).
an IoT platform with features such as device management – but direct Kafka-native integration with (some) IoT protocols such as MQTT or OPC-UA is possible and the appropriate approach for (some) use cases.
a technology for hard real-time applications such as safety-critical or deterministic systems – but that’s true for any other IT framework, too. Embedded systems are a different software!

For these reasons, Kafka is complementary, not competitive, to these other technologies. Choose the right tool for the job and combine them!

Case studies for Apache Kafka in a connected world

This section shows a few examples of fantastic success stories where Kafka is combined with other technologies because it makes sense and solves the business problem. The focus here is case studies that need more than just Kafka for the end-to-end data flow.

No matter if you follow my blog, Kafka Summit conferences, online platforms like Medium or Dzone, or any other tech-related news. You find plenty of success stories around real-time data streaming with Apache Kafka for high volumes of analytics and transactional data from connected cars, IoT edge devices, or gaming apps on smartphones.

A few examples across industries and use cases:

Audi: Connected car platform rolled out across regions and cloud providers
BMW: Smart factories for the optimization of the supply chain and logistics
SolarPower: Complete solar energy solutions and services across the globe
Royal Caribbean: Entertainment on cruise ships with disconnected edge services and hybrid cloud aggregation
Disney+ Hotstar: Interactive media content and gaming/betting for millions of fans on their smartphone
The list goes on and on and on.

So what is the problem with all these great IoT success stories? Well, there is no problem. But some clarification is needed to explain when to use event streaming with the Apache Kafka ecosystem and where other complementary solutions usually complement it.

When to use Apache Kafka?

Before we discuss when NOT to use Kafka, let’s understand where to use it to get more clear how and when to complement it with other technologies if needed.

I will add real-world examples to each section. In my experience, this makes it much easier to understand the added value.

Kafka consumes and processes high volumes of IoT and mobile data in real-time

Processing massive volumes of data in real-time is one of the critical capabilities of Kafka.

Tesla is not just a car maker. Tesla is a tech company writing a lot of innovative and cutting-edge software. They provide an energy infrastructure for cars with their Tesla Superchargers, solar energy production at their Gigafactories, and much more. Processing and analyzing the data from their vehicles, smart grids, and factories and integrating with the rest of the IT backend services in real-time is a crucial piece of their success.

Tesla has built a Kafka-based data platform infrastructure “to support millions of devices and trillions of data points per day”. Tesla showed an exciting history and evolution of their Kafka usage at a Kafka Summit in 2019:

Keep in mind that Kafka is much more than just messaging. I repeat this in almost every blog post as too many people still don’t get it. Kafka is a distributed storage layer that truly decouples producers and consumers. Additionally, Kafka-native processing tools like Kafka Streams and ksqlDB enable real-time processing.

Kafka correlates IoT data with transactional data from the MES and ERP systems

Data integration in real-time at scale is relevant for analytics and the usage of transactional systems like an ERP or MES system. Kafka Connect and non-Kafka middleware complement the core of event streaming for this task.

BMW operates mission-critical Kafka workloads across the edge (i.e., in the smart factories) and public cloud. Kafka enables decoupling, transparency, and innovation. The products and expertise from Confluent add stability. The latter is vital for success in manufacturing. Each minute of downtime costs a fortune. Read my related article “Apache Kafka as Data Historian – an IIoT / Industry 4.0 Real-Time Data Lake” to understand how Kafka improves the Overall Equipment Effectiveness (OEE) in manufacturing.

BMW optimizes its supply chain management in real-time. The solution provides information about the right stock in place, both physically and in transactional systems like BMW’s ERP powered by SAP. “Just in time, just in sequence” is crucial for many critical applications. The integration between Kafka and SAP is required for almost 50% of customers I talk to in this space. Beyond the integration, many next-generation transactional ERP and MES platforms are powered by Kafka, too.

Kafka integrates with all the non-IoT IT in the enterprise at the edge and hybrid or multi-cloud

Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. Learn about several scenarios that may require multi-cluster solutions and see real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments, and global Kafka.

The true decoupling between different interfaces is a unique advantage of Kafka vs. other messaging platforms such as IBM MQ, RabbitMQ, or MQTT brokers. I also explored this in detail in my article about Domain-driven Design (DDD) with Kafka.

Infrastructure modernization and hybrid cloud architectures with Apache Kafka are typical across industries.

One of my favorite examples is the success story from Unity. The company provides a real-time 3D development platform focusing on gaming and getting into other industries like manufacturing with their Augmented Reality (AR) / Virtual Reality (VR) features.

The data-driven company already had content installed 33 billion times in 2019, reaching 3 billion devices worldwide. Unity operates one of the largest monetization networks in the world. They migrated this platform from self-managed Kafka to fully-managed Confluent Cloud. The cutover was executed by the project team without downtime or data loss. Read Unity’s post on the Confluent Blog: “How Unity uses Confluent for real-time event streaming at scale “.

Kafka is the scalable real-time backend for mobility services and gaming/betting platforms

Many gaming and mobility services leverage event streaming as the backbone of their infrastructure. Use cases include the processing of telemetry data, location-based services, payments, fraud detection, user/player retention, loyalty platform, and so much more. Almost all innovative applications in this sector require real-time data streaming at scale.

A few examples:

Mobility services: Uber, Lyft, FREE NOW, Grab, Otonomo, Here Technologies, …
Gaming services: Disney+ Hotstar, Sony Playstation, Tencent, Big Fish Games, …
Betting services: William Hill, Sky Betting, …

Just look at the job portals of any mobility or gaming service. Not everybody is talking about their Kafka usage in public. But almost everyone is looking for Kafka experts to develop and operate their platform.

These use cases are just as critical as a payment process in a core banking platform. Regulatory compliance and zero data loss are crucial. Multi-Region Clusters (i.e., a Kafka cluster stretched across regions like US East, Central, and West) enable high availability with zero downtime and no data loss even in the case of a disaster.

Vehicles, machines, or IoT devices embed a single Kafka broker

The edge is here to stay and grow. Some use cases require the deployment of a Kafka cluster or single broker outside a data center. Reasons for operating a Kafka infrastructure at the edge include low latency, cost efficiency, cybersecurity, or no internet connectivity.

Examples for Kafka at the edge:

Disconnected edge in logistics to store logs, sensor data, and images while offline (e.g., a truck on the street or a drone flying around a ship) until a good internet connection is available in the distribution center
Vehicle-to-Everything (V2X) communication in a local small data center like AWS Outposts (via a gateway like MQTT if large area, a considerable number of vehicles, or lousy network), or via direct Kafka client connection for a few hundreds of machines, e.g., in a smart factory )
Offline mobility services like integrating a car infrastructure with gaming, maps, or a recommendation engine with locally processed partner services (e.g., the next Mc Donalds comes in 10 miles, here is a coupon).

The cruise line Royal Caribbean is a great success story for this scenario. It operates the four largest passenger ships in the world. As of January 2021, the line operates twenty-four ships and has six additional ships on order.

Royal Caribbean implemented one of Kafka’s most famous use cases at the edge. Each cruise ship has a Kafka cluster running locally for use cases such as payment processing, loyalty information, customer recommendations, etc.:

I covered this example and other Kafka edge deployments in various blogs. I talked about use cases for Kafka at the edge, showed architectures for Kafka at the edge, and explored low latency 5G deployments powered by Kafka.

When NOT to use Apache Kafka?

Finally, we are coming to the section everybody was looking for, right? However, it is crucial first to understand when to use Kafka. Now, it is easy to explain when NOT to use Kafka.

For this section, let’s assume that we talk about production scenarios, not some ugly (?) workarounds to connect Kafka to something for a proof of concept directly; there is always a quick and dirty option to test something – and that’s fine for that goal. But things change when you need to scale and roll out your infrastructure globally, be compliant to law, and guarantee no data loss for transactional workloads.

With this in mind, it is relatively easy to qualify out Kafka as an option for some use cases and problems:

Kafka is NOT hard real-time

The definition of the term “real-time” is difficult. It is often a marketing term. Real-time programs must guarantee a response within specified time constraints.

Kafka – and all other frameworks, products, and cloud services used in this context – is only soft real-time and built for the IT world. Many OT and IoT applications require hard real-time with zero latency spikes.

Soft real-time is used for applications such as

Point-to-point messaging between IT applications
Data ingestion from various data sources into one or more data sinks
Data processing and data correlation (often called event streaming or event stream processing)

If your application requires sub-millisecond latency, Kafka is not the right technology. For instance, high-frequency trading is usually implemented with purpose-built proprietary commercial solutions.

Always keep in mind: The lowest latency would be to not use a messaging system at all and just use shared memory. In a race to the lowest latency, Kafka will lose every time. However, for the audit log and transaction log or persistence engine parts of the exchange, it is no data loss that becomes more important than latency and Kafka wins.

Most real-time use cases “only” require data processing in the millisecond to the second range. In that case, Kafka is a perfect solution. Many FinTechs, such as Robinhood, rely on Kafka for mission-critical transactional workloads, even financial trading. Multi-access edge computing (MEC) is another excellent example of low latency data streaming with Apache Kafka and cloud-native 5G infrastructure.

Kafka is NOT deterministic for embedded and safety-critical systems

This one is pretty straightforward and related to the above section. Kafka is not a deterministic system. Safety-critical applications cannot use it for a car engine control system, a medical system such as a heart pacemaker, or an industrial process controller.

A few examples where Kafka CANNOT be used for:

Safety-critical data processing in the car or vehicle. That’s Autosar / MINRA C / Assembler and similar technologies.
CAN Bus communication between ECUs.
Robotics. That’s C / C++ or similar low-level languages combined with frameworks such as Industrial ROS (Robot Operating System).
Safety-critical machine learning / deep learning (e.g., for autonomous driving)
Vehicle-to-Vehicle (V2V) communication. That’s 5G sidelink without an intermediary like Kafka.

My post “Apache Kafka is NOT Hard Real-Time BUT Used Everywhere in Automotive and Industrial IoT” explores this discussion in more detail.

TL;DR: Safety-related data processing must be implemented with dedicated low-level programming languages and solutions. That’s not Kafka! The same is true for any other IT software, too. Hence, don’t replace Kafka with IBM MQ, Flink, Spark, Snowflake, or any other similar IT software.

Kafka is NOT built for bad networks

Kafka requires good stable network connectivity between the Kafka clients and the Kafka brokers. Hence, if the network is unstable and clients need to reconnect to the brokers all the time, then operations are challenging, and SLAs are hard to reach.

There are some exceptions, but the basic rule of thumb is that other technologies are built specifically to solve the problem of bad networks. MQTT is the most prominent example. Hence, Kafka and MQTT are friends, not enemies. The combination is super powerful and used a lot across industries. For that reason, I wrote a whole blog series about Kafka and MQTT.

We built a connected car infrastructure that processes 100,000 data streams for real-time predictions using MQTT, Kafka, and TensorFlow in a Kappa architecture.

Kafka does NOT provide connectivity to tens of thousands of client applications

Another specific point to qualify Kafka out as an integration solution is that Kafka cannot connect to tens of thousands of clients. If you need to build a connected car infrastructure or gaming platform for mobile players, the clients (i.e., cars or smartphones) will not directly connect to Kafka.

A dedicated proxy such as an HTTP gateway or MQTT broker is the right intermediary between thousands of clients and Kafka for real-time backend processing and the integration with further data sinks such as a data lake, data warehouse, or custom real-time applications.

Where are the limits of Kafka client connections? As so often, this is hard to say. I have seen customers connect directly from their shop floor in the plant via .NET and Java Kafka clients via a direct connection to the cloud where the Kafka cluster is running. Direct hybrid connections usually work well if the number of machines, PLCs, IoT gateways, and IoT devices is in the hundreds. For higher numbers of client applications, you need to evaluate if you a) need a proxy in the middle or b) deploy “edge computing” with or without Kafka at the edge for lower latency and cost-efficient workloads.

When to MAYBE use Apache Kafka?

The last section covered scenarios where it is relatively easy to quality Kafka out as it simply cannot provide the required capabilities. I want to explore a few less apparent topics, and it depends on several things if Kafka is a good choice or not.

Kafka does (usually) NOT replace another database

Apache Kafka is a database. It provides ACID guarantees and is used in hundreds of companies for mission-critical deployments. However, most times, Kafka is not competitive with other databases. Kafka is an event streaming platform for messaging, storage, processing, and integration at scale in real-time with zero downtime or data loss.

Kafka is often used as a central streaming integration layer with these characteristics. Other databases can build materialized views for their specific use cases like real-time time-series analytics, near real-time ingestion into a text search infrastructure, or long-term storage in a data lake.

In summary, when you get asked if Kafka can replace a database, then there are several answers to consider:

Kafka can store data forever in a durable and high available manner providing ACID guarantees
Further options to query historical data are available in Kafka
Kafka-native add-ons like ksqlDB or Tiered Storage make Kafka more potent than ever before for data processing and event-based long-term storage
Stateful applications can be built leveraging Kafka clients (microservices, business applications) with no other external database
Not a replacement for existing databases, data warehouses, or data lakes like MySQL, MongoDB, Elasticsearch, Hadoop, Snowflake, Google BigQuery, etc.
Other databases and Kafka complement each other; the right solution has to be selected for a problem; often, purpose-built materialized views are created and updated in real-time from the central event-based infrastructure
Different options are available for bi-directional pull and push-based integration between Kafka and databases to complement each other

My blog post “Can Apache Kafka replace a database, data warehouse, or data lake?” discusses the usage of Kafka as a database in much more detail.

Kafka does (usually) NOT process large messages

Kafka was not built for large messages. Period.

Nevertheless, more and more projects send and process 1Mb, 10Mb, and even much bigger files and other large payloads via Kafka. One reason is that Kafka was designed for large volume/throughput – which is required for large messages. A very common example that comes up regularly is the ingestion and processing of large files from legacy systems with Kafka before ingesting the processed data into a Data Warehouse.

However, not all large messages should be processed with Kafka. Often you should use the right storage system and just leverage Kafka for the orchestration. Reference-based messaging (i.e. storing the file in another storage system and sending the link and metadata) is often the better design pattern:

Know the different design patterns and choose the right technology for your problem.

For more details and use cases about handling large files with Kafka, check out this blog post: “Handling Large Messages with Apache Kafka (CSV, XML, Image, Video, Audio, Files)“.

Kafka is (usually) NOT the IoT gateway for the last-mile integration of industrial protocols…

The last-mile integration with IoT interfaces and mobile apps is a tricky space. As discussed above, Kafka cannot connect to thousands of Kafka clients. However, many IoT and mobile applications only require tens or hundreds of connections. In that case, a Kafka-native connection is straightforward using one of the various Kafka clients available for almost any programming language on the planet.

Suppose a connection on TCP level with a Kafka client makes little sense or is not possible. In that case, a very prevalent workaround is the REST Proxy as the intermediary between the clients and the Kafka cluster. The clients communicate via synchronous HTTP(S) with the streaming platform.

Use cases for HTTP and REST APIs with Apache Kafka include the control plane (= management), the data plane (= produce and consume messages), and automation, respectively DevOps tasks.

Unfortunately, many IoT projects require much more complex integrations. I am not just talking about a relatively straightforward integration via an MQTT or OPC-UA connector. Challenges in Industrial IoT projects include:

The automation industry does often not use open standards but is slow, insecure, not scalable, and proprietary.
Product Lifecycles are very long (tens of years), with no simple changes or upgrades.
IIoT usually uses incompatible protocols, typically proprietary and built for one specific vendor.
Proprietary and expensive monoliths that are not scalable and not extendible.

Therefore, many IoT projects complement Kafka with a purpose-built IoT platform. Most IoT products and cloud services are proprietary but provide open interfaces and architectures. The open-source space is small in this industry. A great alternative (for some use cases) is Apache PLC4X. The framework integrates with many proprietary legacy protocols, such as Siemens S7, Modbus, Allen Bradley, Beckhoff ADS, etc. PLC4X also provides a Kafka Connect connector for native and scalable Kafka integration.

A modern data historian is open and flexible. The foundation of many strategic IoT modernization projects across the shop floor and hybrid cloud is powered by event streaming:

Kafka is NOT a blockchain (but relevant for web3, crypto trading, NFT, off-chain, sidechain, oracles)

Kafka is a distributed commit log. The concepts and foundations are very similar to a blockchain. I explored this in more detail in my post “Apache Kafka and Blockchain – Comparison and a Kafka-native Implementation“.

A blockchain should be used ONLY if different untrusted parties need to collaborate. For most enterprise projects, a blockchain is unnecessary added complexity. A distributed commit log (= Kafka) or a tamper-proof distributed ledger (= enhanced Kafka) is sufficient.

Having said this, more interestingly, I see more and more companies using Kafka within their crypto trading platforms, market exchanges, and NFT token trading marketplaces.

To be clear: Kafka is NOT the blockchain on these platforms. The blockchain is a cryptocurrency like Bitcoin or a platform providing smart contracts like Ethereum where people build new distributed applications (dApps) like NFTs for the gaming or art industry. Kafka is the streaming platform to connect these blockchains with other Oracles (= the non-blockchain apps) like the CRM, data lake, data warehouse, and so on:

TokenAnalyst is an excellent example that leverages Kafka to integrate blockchain data from Bitcoin and Ethereum with their analytics tools. Kafka Streams provides a stateful streaming application to prevent using invalid blocks in downstream aggregate calculations. For example, TokenAnalyst developed a block confirmer component that resolves reorganization scenarios by temporarily keeping blocks, and only propagates them when a threshold of a number of confirmations (children to that block are mined) is reached.

In some advanced use cases, Kafka is used to implementing a sidechain or off-chain platform as the original blockchain does not scale well enough (blockchain is known as on-chain data). Not just Bitcoin has the problem of only processing single-digit (!) transactions per second. Most modern blockchain solutions cannot scale even close to the workloads Kafka processes in real-time.

From DAOs to blue chip companies, measuring the health of blockchain infrastructure and IOT components is still necessary even in a distributed network to avoid downtime, secure the infrastructure, and make the blockchain data accessible. Kafka provides an agentless and scalable way to present that data to the parties involved and make sure that the relevant data is exposed to the right teams before a node is lost. This is relevant for cutting-edge Web3 IoT projects like Helium, or simpler closed distributed ledgers (DLT) like R3 Corda.

My recent post about live commerce powered by event streaming and Kafka transforming the retail metaverse shows how the retail and gaming industry connects virtual and physical things. The retail business process and customer communication happen in real-time; no matter if you want to sell clothes, a smartphone, or a blockchain-based NFT token for your collectible or video game.

TL;DR: Kafka is NOT…

… a replacement for your favorite database or data warehouse.

… hard real-time for safety-critical embedded workloads.

… a proxy for thousands of clients in bad networks.

… an API Management solution.

… an IoT gateway.

… a blockchain.

It is easy to qualify Kafka out for some use cases and requirements.

However, analytical and transactional workloads across all industries use Kafka. It is the de-facto standard for event streaming everywhere. Hence, Kafka is often combined with other technologies and platforms.

Where do you (not) use Apache Kafka? What other technologies do you combine Kafka with? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post When NOT to use Apache Kafka? appeared first on Kai Waehner.

Apache Kafka in Gaming (Games Industry, Bookmaker, Betting, Gambling, Video Streaming)

Kai Waehner — Thu, 16 Jul 2020 06:13:54 +0000

This blog post explores how event streaming with Apache Kafka provides a scalable, reliable, and efficient infrastructure to make gamers happy and Gaming companies successful. Various use cases and architectures in the gaming industry are discussed, including online and mobile games, betting, gambling, and video streaming.

Learn about:

Real-time analytics and data correlation of game telemetry
Monetization network for real-time advertising and in-app purchases
Payment engine for betting
Detection of financial fraud and cheating
Chat function in games and cross-games
Monitor the results of live operations like weekend events or limited time offers
Real-time analytics on metadata and chat data for marketing campaigns

The Evolution of the Gaming Industry

The gaming industry must process billions of events per day in real-time and ensure consistent and reliable data processing and correlation across gameplay interactions and backend analytics. Deployments must run globally and work for millions of users 24/7 on 365 days a year.

These requirements are valid for hardcore games and blockbusters, including massively multiplayer online role-playing games (MMORPG), first-person shooters, and multiplayer online battle arenas (MOBA), but also mid-core and casual games. Reliable and scalable real-time integration with consumer devices like smartphones and game consoles is as essential as cooperating with online streaming services like Twitch and betting providers.

Business Models in the Gaming Industry

Gaming is not just about games anymore. Though, even in the games industry, the option of playing games diverse from consoles and PCs to mobile games, casino games, online games, and various other options. In addition to the games, people also engage via professional eSports, $$$ tournaments, live video streaming, and real-time betting.

This is a crazy evolution, isn’t it? Here are some of the business models relevant today in the gaming industry:

Hardware sales
Game sales
Free-to-play + in-game purchases, such as skins or champions
Gambling (Loot boxes)
Game-as-a-service (Subscription)
Seasonal in-game purchases like passes for theme events, mid-season invitational & world championship, passes for competitive play
Game-Infrastructure-as-a-Service
Merchandise sales
Communities including eSports broadcast, ticket sales, franchising fees
Live betting
Video streaming, including ads, rewards, etc.
…

Evolution of “AI” (Artificial Intelligence) in Gaming

Artificial Intelligence (business rules, statistical models, machine learning, deep learning) is vital for many use cases in Gaming. These use cases include:

In-game AI: Non-playable characters (NPC), environments, features
Fraud detection: Cheating, financial fraud, child abuse
Game analytics: Retention, game changes (real-time delivery or via next patch/update)
Research: Find new algorithms, improve AI, adopt to business problems

Many of the use cases I explore use AI in conjunction with event streaming and Kafka in the following.

Hybrid Gaming Architectures for Event Streaming with Apache Kafka

A vast demand for building an open, flexible, scalable platform and real-time processing are the reasons why so many gaming-related projects use Apache Kafka. I will not discuss Kafka here and assume you know why Kafka became the de facto standard for event streaming.

What’s more interesting is the different deployments and architectures I have seen in the wild. Infrastructures in the gaming industry are often global. Sometimes cloud-only, sometimes hybrid with local on-premises installations. Betting is usually regional (mainly because of laws and compliance reasons). Games typically are global. If a game is excellent, it gets deployed and rolled out across the world.

Let’s now take a look at several different use cases and architectures in the gaming industry. Most of these examples are relevant in all gaming-related use cases, including games, mobile, betting, gambling, and video streaming.

Infrastructure Operations – Live Monitoring and Troubleshooting

Monitoring the results of live operations is essential for every mission-critical infrastructure. Use cases include:

Game clients, game servers, game services
Service health 24/7
Special events such as weekend tournaments, limited time offers and user acquisition campaigns

Immediate and correct troubleshooting require real-time monitoring. You need to be able to answer questions like “Who creates the problem? Client? ISP? The game itself?”

Let’s take a look at a typical example in the gaming industry: A new marketing campaign:

“Play for free over the weekend”
Scalability – Huge extra traffic
Monitoring – Was the marketing campaign successful? How profitable is the game/business?
Real-time (e.g., alerting)
Batch (e.g., analytics and reporting of success with Snowflake)

A lot of diverse data has to be integrated, correlated, and monitored to keep the infrastructure running and to troubleshoot issues.

Elasticity Is the Key for Success in the Games Industry

A key challenge in infrastructure monitoring is the required elasticity. You cannot just provision some hardware, deploy the software, and operate it 24 hours 365 days a year. Gaming infrastructures require elasticity. No matter if you care about online games, betting, or video streaming.

Chris Dyl, Director of Platform at Epic Games, pointed this out well at AWS Summit 2018: “We have an almost ten times difference in workloads between peak and low-peak. Elasticity is really, really important for us in any particular region at the cloud providers”.

Confluent provides elasticity for any Kafka deployment, no matter if the event streaming platform runs self-managed at the edge or fully managed in the cloud. Check out “Scaling Apache Kafka to 10+ GB Per Second in Confluent Cloud” to see how Kafka can be scaled automatically in the cloud. Self-managed Kafka gets elastic by using tools such as Self-Balancing Kafka, Tiered Storage, and Confluent Operator for Kubernetes.

Game Telemetry – Real-time Analytics and Data Correlation with Kafka

Game Telemetry describes how the player plays the game. Player information includes business logic such as user actions (button clicks, shooting, use item) or game environment metrics (quests, level up), and technical information like login from a specific server, IP address, location.

Global Gaming requires proxies all over the world to guarantee regional latency for millions of clients. Besides, a central analytics cluster (with anonymized data) correlates data from across the globe. Here are some use cases for using game telemetry:

Game monitoring
How well do players progress through the game and what problems occurred
Live operations – Adjust the gameplay
Server-side changes while the player is playing the game (e.g., time-limited event, give reward)
Real-time updates to improve the game or align to audience needs (or in other words: Recommend an item / upgrade / skin / additional in-game purchase

Most use cases require processing big data streams in real-time:

Big Fish Games

Big Fish Games is an excellent example of live operations leveraging Apache Kafka and its ecosystem. They develop casual and mid-core games. 2.5 billion games were installed on smartphones and computers in 150 countries, representing over 450 unique mobile games and over 3,500 unique PC games.

Live operations use real-time analytics of game telemetry data. For instance, Big Fish Games increases revenue while the player plays the game by making context-specific recommendations for in-game purchases in real-time. Kafka Streams is used for continuous data correlation in real-time at scale.

Check out the details in the Kafka Summit Talk “How Big Fish Games developed real-time analytics“.

Monetization Network

Monetization networks are a fundamental component in most gaming companies. Use cases include:

In-game advertising
Micro-transactions and in-game purchases: Sell Skins, Upgrade to the next level…
Game-Infrastructure-as-a-Service: Multi-platform-and-store-integration, matchmaking, advertising, player identity and friends, cross-play, lobbies, leader boards, achievements, game analytics, …
Partner network: Cross-sell game data, game SDK, game analytics, …

A monetization network looks like the following:

Unity Ads – Monetization network

Unity is a fantastic example. In 2019, content installed 33 billion times, reaching 3 billion devices worldwide. The company provides a real-time 3D development platform.

Unity operates one of the largest monetization networks in the world:

Reward players for watching ads
Incorporate banner ads
Incorporate Augmented Reality (AR) ads
Playable ads
Cross-Promotions

Unity is a data-driven company:

Average about half a million events per second
Handles millions of dollars of monetary transactions
Data infrastructure based on Confluent Platform, Confluent Cloud and Apache Kafka

A single data pipeline provides the foundational infrastructure for analytics, R&D, monetization, cloud services, etc. for real-time and batch processing leveraging Apache Kafka:

Real-time monetization network
Feed machine learning models in real-time
Data lake went from two-day latency down to 15 minutes

If you want to learn about their success story migrating this platform from self-managed Kafka to fully-managed Confluent Cloud, read Unity’s post on the Confluent Blog: “How Unity uses Confluent for real-time event streaming at scale“.

Chat Function within Games and Cross-Platform

Building a chat platform is not a trivial task in today’s world. Chatting means send text, in-game screenshots, in-game items, and other things. Millions of events have to be processed in real-time. Cross-platform chat platforms need to support various technologies, programming languages, and communication paradigms such as real-time, batch, request-response:

The characteristics of Kafka make it the perfect infrastructure for chat platforms due to high scalability, real-time processing, and real decoupling, including backpressure handling.

Payment Engine

Payment infrastructure needs to be real-time, scalable, reliable, and technology-independent. No matter if your solution is built for games, betting, casino, 3D game engines, video streaming, or any other 3rd services.

Most payment engines in the gaming industry are built on top of Apache Kafka. Many of these companies provide public information about their real-time betting infrastructure. Here is one example of an architecture:

One example use case is the implementation of a betting delay and approval system in live bets. Stateful streaming analytics is required to improve the margin:

Kafka-native technologies like Kafka Streams or ksqlDB enable a straightforward implementation of these scenarios.

William Hill – A Secure and Reliable Real-time Microservice Architecture

William Hill went from a monolith to a flexible, scalable microservice architecture:

Kafka as central, reliable streaming infrastructure
Kafka for messaging, storage, cache and processing of data
Independent decoupled microservices
Decoupling and replayability
Technology independence
High throughput + low latency + real-time

William Hill’s trading platform leverages Kafka as the heart of all events and transactions:

“process-to-process” execution in real-time
Integration with analytic models for real-time machine learning
Various data sources and data sinks (real-time, batch, request-response)

Bookmaker business == Banking Business (including Legacy Middleware and Mainframes)

Not everyone can start from greenfield. Legacy middleware and mainframe integration, offloading, and replacement is a common scenario.

Betting usually is a regulated market. PII data is often processing on-premise in a regional data center. Non-PII data can be offloaded to the cloud for Analytics.

Legacy technologies like mainframe are a crucial cost factor, monolithic and inflexible. I covered the relation between Kafka and Mainframe in detail in the following post:

And here is the story about Kafka vs. Legacy Middleware (MQ, ETL, EBS).

Streaming Analytics for Retention, Compliance, and Customer Experience

Data quality is critical for legal compliance. Responsible gaming compliance. Client retention is vital to keep engagement and revenue growth.

Plenty of real-time streaming analytics use cases exist in this environment. Some examples where Kafka-native frameworks like Kafka Streams or ksqlDB can provide the foundation for a reliable and scalable solution:

Player winning / losing streak
Player conversion – from registration to wage (within x min)
Game achievement of the player
Fraud detection – e.g., payment windows
Long-running windows per player over days/months
Tournaments
Incentive unhappy players with an additional free credit
Reports to regulator – replay old events in a guaranteed order
Geolocation to enable features, limitations or commissions

Stream processing is also relevant for many other use cases, including fraud detection, as you will see in the next section.

Fraud Detection in Gaming with Kafka

Real-time analytics for detecting anomalies is a widespread scenario in any payment infrastructure. In Gaming, two different kinds of fraud exist:

Cheating: Fake accounts, bots, …
Financial fraud: match-fixing, stolen credit cards, …

Here is an example of doing streaming analytics for fraud detection with Kafka, its ecosystem, and machine learning:

Here is an example of detecting financial fraud and cheating with Jupyter notebooks and Python to analyze data pre-processed with ksqlDB:

Customer 360 is critical for real-time and context-specific acquisition, engagement, and retention. Use cases include:

Real-Time Event Streaming
- Game event triggers
- Personalized statistics and odds
- Player segmentation
- Campaign orchestration (“player journey”)
Loyalty system
- Rewards e.g., upgrade, exclusive in-game content, beta keys for the announcement event
- Avoid customer churn
- Cross-selling
Social Network integration
- Twitter, Facebook, …
- Example: Candy Crush (I guess every Facebook user has seen ads for this game)
Partner integration
- API Management

The following architecture depicts the relation between various internal and external components of a customer 360 solution:

Customer 360 at Sky Betting & Gaming

Sky Betting & Gaming has built a real-time streaming architecture for customer 360 use cases with Kafka’s ecosystem.

Here is a quote of why they choose Kafka-native frameworks like Kafka Streams instead of a zoo of technologies like Hadoop, Spark, Storm, and others:

“Most of our streaming data is in the form of topics on a Kafka cluster. This means we can use tooling designed around Kafka instead of general streaming solutions with Kafka plugins/connectors.

Kafka itself is a fast-moving target, with client libraries constantly being updated; waiting for these new libraries to be included in an enterprise distribution of Hadoop or any off the shelf tooling is not really an option. Finally, the data in our first use-case is user-generated and needs to be presented back to the user as quickly as possible.”

Disney+ Hotstar – Telco-OTT for millions of cricket fans in India

In India, people love cricket. Millions of users watch live streams on their smartphones. But they are not just watching it. Instead, gambling is also part of the story. For instance, you can bet on the result of the next play. People compete with each other and can win rewards.

This infrastructure has to run at extreme scale. Millions of actions have to be processed each second. No surprise that Disney+ Hotstar chose Kafka as the heart of this infrastructure:

IoT Integration is often also part of such a customer 360 implementation. Use cases include:

Live eSports events, TV, video streaming and news stations
Fan engagement
Audience communication
Entertaining features for Alexa, Google Home or sports-specific hardware

Cross-Company Kafka Integration

Last but not least, let’s talk about a trend I see in many industries: Streaming replication across departments and companies.

Most companies in the gaming industry use event streaming with Kafka at the heart of their business. However, connecting to the outside world (i.e., other departments, partners, 3rd party services) is typically done via HTTP / REST APIs. A total anti-pattern! Not scalable! Why not directly stream the data?

I see more and more companies moving to this approach.

API Management is an elaborate discussion on its own. Therefore, I have a dedicated blog post about the relation between Kafka and API Management:

Slides and Video – Kafka in the Gaming Industry

Here are the slides and on-demand video recording discussing Apache Kafka in the gaming industry in more detail:

As you learned in this post, Kafka is used everywhere in the gaming industry. No matter if you focus on games, betting, or video streaming.

What are your experiences with modernizing the infrastructure and applications in the gaming industry? Did you or do you plan to use Apache Kafka and its ecosystem? What is your strategy? Let’s connect on LinkedIn and discuss it!

The post Apache Kafka in Gaming (Games Industry, Bookmaker, Betting, Gambling, Video Streaming) appeared first on Kai Waehner.

Gaming Archives - Kai Waehner

Powering Fantasy Sports at Scale: How Dream11 Uses Apache Kafka for Real-Time Gaming

Fantasy Sports: Real-Time Gaming Meets Real-World Sports

Dream11: A Fantasy Sports Giant with Massive Scale

Apache Kafka at the Heart of Dream11’s Platform

Solving Kafka Consumer Challenges at Scale

Fantasy Sports, Real-Time Expectations, and Business Value

Dream11’s Kafka Journey: The Backbone of Fantasy Sports at Scale

A New Era in Dynamic Pricing: Real-Time Data Streaming with Apache Kafka and Flink

What is Dynamic Pricing?

Relevant Industries and Examples

Relation of Dynamic Pricing to Data Streaming with Apache Kafka and Flink

Clickstream Analytics in Real-Time with Data Streaming Replacing Batch with Hadoop and Spark

Success Stories for Dynamic Pricing with Data Streaming using Kafka and Flink Across Industries

AO: Hyper-Personalized Promotions and Discounts (Retail and eCommerce)

Quarterhill: Intelligent Tolling (Logistics and Transportation)

Uber, Grab, and FreeNow: Surge Pricing (Technology)

2K Games / Take-Two Interactive: Context-Specific In-Game Purchases (Gaming Industry)

50hertz: Manage Supply-Demand and Stabilize Grid Loads (Energy Markets)

Ticketmaster: Optimize Ticketing Sales (Sports and Entertainment)

Conclusion: Business Value of Dynamic Pricing Built with Data Streaming

The State of Data Streaming for Gaming

General trends in the gaming industry

Growth and innovation require cloud-native infrastructure

Data streaming in the gaming industry

Architecture trends for data streaming

Cloud-native elasticity for seasonal spikes

Data mesh for real-time data sharing

New customer stories for data streaming in the gaming sector

Resources to learn more

On-demand video recording

Slides

Case studies and lightboard videos for data streaming in the gaming industry

When NOT to use Apache Kafka?

Market Trends – A Connected World

Connected Cars – Insane volume of telemetry data and aftersales

Gaming – Billions of players and massive revenues

What is Apache Kafka, and what is it NOT?

Kafka is…

Kafka is NOT…

Case studies for Apache Kafka in a connected world

When to use Apache Kafka?

Kafka consumes and processes high volumes of IoT and mobile data in real-time

Kafka correlates IoT data with transactional data from the MES and ERP systems

Kafka integrates with all the non-IoT IT in the enterprise at the edge and hybrid or multi-cloud

Kafka is the scalable real-time backend for mobility services and gaming/betting platforms

Vehicles, machines, or IoT devices embed a single Kafka broker

When NOT to use Apache Kafka?

Kafka is NOT hard real-time

Kafka is NOT deterministic for embedded and safety-critical systems

Kafka is NOT built for bad networks

Kafka does NOT provide connectivity to tens of thousands of client applications

When to MAYBE use Apache Kafka?

Kafka does (usually) NOT replace another database

Kafka does (usually) NOT process large messages

Kafka is (usually) NOT the IoT gateway for the last-mile integration of industrial protocols…

Kafka is NOT a blockchain (but relevant for web3, crypto trading, NFT, off-chain, sidechain, oracles)

TL;DR: Kafka is NOT…

Apache Kafka in Gaming (Games Industry, Bookmaker, Betting, Gambling, Video Streaming)

The Evolution of the Gaming Industry

Business Models in the Gaming Industry

Evolution of “AI” (Artificial Intelligence) in Gaming

Hybrid Gaming Architectures for Event Streaming with Apache Kafka

Infrastructure Operations – Live Monitoring and Troubleshooting

Elasticity Is the Key for Success in the Games Industry

Game Telemetry – Real-time Analytics and Data Correlation with Kafka

Big Fish Games

Monetization Network

Unity Ads – Monetization network

Chat Function within Games and Cross-Platform

Payment Engine

William Hill – A Secure and Reliable Real-time Microservice Architecture

Bookmaker business == Banking Business (including Legacy Middleware and Mainframes)

Streaming Analytics for Retention, Compliance, and Customer Experience

Fraud Detection in Gaming with Kafka

Customer 360 – Recommendations, Loyalty System, Social Integration

Customer 360 at Sky Betting & Gaming

Disney+ Hotstar – Telco-OTT for millions of cricket fans in India

Cross-Company Kafka Integration

Slides and Video – Kafka in the Gaming Industry