Retail Archives - Kai Waehner https://www.kai-waehner.de/blog/category/retail/ Technology Evangelist - Big Data Analytics - Middleware - Apache Kafka Fri, 21 Mar 2025 07:18:29 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 https://www.kai-waehner.de/wp-content/uploads/2020/01/cropped-favicon-32x32.png Retail Archives - Kai Waehner https://www.kai-waehner.de/blog/category/retail/ 32 32 Retail Media with Data Streaming: The Future of Personalized Advertising in Commerce https://www.kai-waehner.de/blog/2025/03/21/retail-media-with-data-streaming-the-future-of-personalized-advertising-in-commerce/ Fri, 21 Mar 2025 07:18:29 +0000 https://www.kai-waehner.de/?p=7529 Retail media is reshaping digital advertising by using first-party data to deliver personalized, timely ads across online and in-store channels. As retailers build retail media networks, they unlock new revenue opportunities while improving ad effectiveness and customer engagement. The key to success lies in real-time data streaming, which enables instant targeting, automated bidding, and precise attribution. Technologies like Apache Kafka and Apache Flink make this possible, helping retailers like Albertsons enhance ad performance and maximize returns. This post explores how real-time streaming is driving the evolution of retail media

The post Retail Media with Data Streaming: The Future of Personalized Advertising in Commerce appeared first on Kai Waehner.

]]>
Retail media is transforming advertising by leveraging first-party data to deliver highly targeted, real-time promotions across digital and physical channels. As traditional ad models decline, retailers are monetizing their data through retail media networks, creating additional revenue streams and improving customer engagement. However, success depends on real-time data streaming—enabling instant ad personalization, dynamic bidding, and seamless attribution. Data Streaming with Apache Kafka and Apache Flink provide the foundation for this shift, allowing retailers like Albertsons to optimize advertising strategies and drive measurable results. In this post, I explore how real-time streaming is shaping the future of retail media.

Retail Media with Data Streaming using Apache Kafka and Flink

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming use cases, including various use cases from the retail industry.

What is Retail Media?

Retail media is transforming how brands advertise by leveraging first-party data from retailers to create highly targeted ads within their ecosystems. Instead of relying solely on third-party data from traditional digital advertising platforms, retail media allows companies to reach consumers at the point of purchase—whether online, in-store, or via mobile apps.

Retail media is one of the fastest-growing and most strategic revenue streams for retailers today. It has transformed from a niche digital advertising concept into a multi-billion-dollar industry, changing how retailers monetize their data and engage with brands. Below are the key reasons retail media is crucial for retailers.

Online catalogue or Sales concept with three happy diverse shoppers carrying bags past a computer screen with Sale icons, vector illustration
Retail Media: Display with Advertisements in the Store

Retailers like Amazon, Walmart, and Albertsons are leading the way in monetizing their digital real estate, offering brands access to sponsored product placements, banner ads, video ads, and personalized promotions based on shopping behavior. This shift has made retail media one of the fastest-growing sectors in digital advertising, expected to exceed $100 billion globally in the coming years.

The Digitalization of Retail Media

Retail media has grown from traditional in-store promotions to a fully digitized, data-driven advertising ecosystem. The rise of e-commerce, mobile apps, and connected devices has enabled retailers to:

  • Collect granular consumer behavior data in real time
  • Offer personalized promotions to drive higher conversion rates
  • Provide advertisers with measurable ROI and closed-loop attribution
  • Leverage AI and machine learning for dynamic ad targeting

By integrating digital advertising with real-time customer data and real-time inventory, retailers can provide contextually relevant promotions across multiple touchpoints. The key to success lies in seamlessly connecting online and offline shopping experiences—a challenge that data streaming with Apache Kafka and Flink helps solve.

Online, Brick-and-Mortar, and Hybrid Retail Media

Retail media strategies vary depending on whether a retailer operates online, in-store, or in a hybrid model:

  • Online-Only Retail Media: Retail giants like Amazon and eBay leverage vast amounts of digital consumer data to offer programmatic ads, sponsored products, and personalized recommendations directly on their websites and apps.
  • Brick-and-Mortar Retail Media: Traditional retailers like Target and Albertsons are integrating digital signage, in-store Wi-Fi promotions, and AI-powered shelf displays to engage customers while shopping in physical stores.
  • Hybrid Retail Media: Retailers like Walmart and Kroger are bridging the gap between digital and physical shopping experiences with omnichannel marketing strategies, personalized mobile app promotions, and AI-powered customer insights that drive both online and in-store purchases.

Omnichannel vs. Unified Commerce in Retail Media

Retailers are moving beyond omnichannel marketing, where customer interactions happen across multiple channels, to unified commerce, where all customer data, inventory, and marketing campaigns are synchronized in real time.

  • Omnichannel: Offers a seamless shopping experience across different platforms but often lacks real-time data integration.
  • Unified Commerce: Uses real-time data streaming to unify customer behavior, inventory management, and personalized advertising for a more cohesive experience.

For example, a unified commerce strategy allows a retailer to:

This level of integration is only possible with real-time data streaming using technologies such as Apache Kafka and Apache Flink.

Retail media networks require real-time data processing at scale to manage millions of customer interactions across online and offline touchpoints. Kafka and Flink provide the foundation for a scalable, event-driven infrastructure that enables retailers to:

  • Process customer behavior in real time: Tracking clicks, searches, and in-store activity instantly
  • Deliver hyper-personalized ads and promotions: AI-driven dynamic ad targeting
  • Optimize inventory and pricing: Aligning promotions with real-time stock levels
  • Measure campaign performance instantly: Providing brands with real-time attribution and insights

Event-Driven Architecture with Data Streaming for Retail Media with Apache Kafka and Flink

With Apache Kafka as the backbone for data streaming and Apache Flink for real-time analytics, retailers can ingest, analyze, and act on consumer data within milliseconds.

Here are a few examples of input data sources, stream processing applications, and outputs for other systems:

Input Data Sources for Retail Media

  1. Customer transaction data (e.g., point-of-sale purchases, online orders)
  2. Website and app interactions (e.g., product views, searches, cart additions)
  3. Loyalty program data (e.g., customer preferences, purchase frequency)
  4. Third-party ad networks (e.g., campaign performance data, audience segments)
  5. In-store sensor and IoT data (e.g., foot traffic, digital shelf interactions)

Stream Processing Applications for Retail Media

  1. Real-time advertisement personalization engine (customizes promotions based on live behavior)
  2. Dynamic pricing optimization (adjusts ad bids and discounts in real-time)
  3. Customer segmentation & targeting (creates audience groups based on behavioral signals)
  4. Fraud detection & clickstream analysis (identifies bot traffic and fraudulent ad clicks)
  5. Omnichannel attribution modeling (correlates ads with online and offline purchases)

Output Systems for Retail Media

  1. Retail media network platforms (e.g., sponsored product listings, display ads)
  2. Programmatic ad exchanges (e.g., Google Ads, The Trade Desk, Amazon DSP)
  3. CRM & marketing automation tools (e.g., Salesforce, Adobe Experience Cloud)
  4. Business intelligence dashboards (e.g., Looker, Power BI, Tableau)
  5. In-store digital signage & kiosks (personalized promotions for physical shoppers)

Real-time data streaming with Kafka and Flink enables critical retail media use cases by processing vast amounts of data from customer interactions, inventory updates and advertising platforms. The ability to analyze and act on data instantly allows retailers to optimize ad placements, enhance personalization, and measure the effectiveness of marketing campaigns with unprecedented accuracy. Below are some of the most impactful retail media applications powered by event-driven architectures.

Personalized In-Store Promotions

Retailers can use real-time customer location data, combined with purchase history and preferences, to deliver highly personalized promotions through mobile apps or digital signage. By incorporating location-based services (LBS), the system detects when a shopper enters a specific section of a store and triggers a targeted discount or special offer. For example, a customer browsing the beverage aisle might receive a notification offering 10% off their favorite soda, increasing the likelihood of an impulse purchase.

Dynamic Ad Placement & Bidding

Kafka and Flink power real-time programmatic advertising, enabling retailers to dynamically adjust ad placements and bids based on customer activity and shopping trends. This allows advertisers to serve the most relevant ads at the optimal time, maximizing engagement and conversions. For instance, Walmart Connect continuously analyzes in-store and online behavior to adjust which ads appear on product pages or search results, ensuring brands reach the right shoppers at the right moment.

Inventory-Aware Ad Targeting

Real-time inventory tracking ensures that advertisers only bid on ads for products that are in stock and ready for fulfillment, reducing wasted ad spend and improving customer satisfaction. This integration between retail media networks and inventory systems prevents scenarios where customers click on an ad only to find the item unavailable. For example, if a popular TV model is running low in a specific store, the system can prioritize ads for a similar in-stock product, ensuring a seamless shopping experience.

Fraud Detection & Brand Safety

Retailers must protect their media platforms from click fraud, fake engagement, and suspicious transactions, which can distort performance metrics and drain marketing budgets.

Kafka and Flink enable real-time fraud detection by analyzing patterns in ad clicks, user behavior, and IP addresses to identify bots or fraudulent activity. For example, if an unusual spike in ad impressions originates from a single source, the system can immediately block the traffic, safeguarding advertisers’ investments.

Real-Time Attribution & Measurement

Retail media networks must provide advertisers with instant insights into ad performance by linking online interactions to in-store purchases.

Kafka enables event-driven attribution models, allowing brands to measure how digital ads drive real-world sales. For example, if a customer clicks on an ad for running shoes, visits a store, and buys them later, the platform tracks the conversion in real time, ensuring brands understand the full impact of their campaigns. Solutions like Segment (built on Kafka) provide robust customer data platforms (CDPs) that help retailers unify and analyze customer journeys.

Retail Media as an Advertising Channel for Third-Party Brands

Retailers are increasingly leveraging third-party data sources to bridge the gap between retail media networks and adjacent industries, such as quick-service restaurants (QSRs).

Kafka enables seamless data exchange between grocery stores, delivery apps, and restaurant chains, optimizing cross-industry advertising. For example, a burger chain could dynamically adjust digital menu promotions based on real-time data from a retail partner—if a grocery store’s sales data shows a surge in plant-based meat purchases, the restaurant could prioritize ads for its new vegan burger, ensuring more relevant and effective marketing.

Albertsons’ New Retail Media Strategy Leveraging Data Streaming

One of the most innovative retail media success stories comes from Albertsons. Albertsons is one of the largest supermarket chains in the United States, operating over 2,200 stores under various banners, including Safeway, Vons, and Jewel-Osco, and providing groceries, pharmacy services, and household essentials.

I explored Albertsons in another article about its revamped loyalty platform to retain customers for life. Data streaming is essential and a key strategic part of Albertsons’ enterprise architecture:

Albertsons Retail Enterprise Architecture for Data Streaming powered by Apache Kafka
Source: Albertsons (Confluent Webinar)

When I hosted a webinar with Albertsons around two years ago on their data streaming strategy, retail media was one of the bullet points. But I didn’t yet realize until now how crucial it would become for retailers:

  • Retail Media Network Expansion: Albertsons has launched its own retail media network, leveraging first-party data to create highly targeted advertising campaigns.
  • Real-Time Personalization: With real-time data streaming, Albertsons can provide personalized promotions based on customer purchase history, in-store behavior, and digital engagement.
  • AI-Powered Insights: Albertsons uses AI and machine learning on top of streaming data pipelines to optimize ad placements, campaign effectiveness, and dynamic pricing strategies.
  • Data Monetization: By offering data-driven advertising solutions, Albertsons is monetizing its shopper data while enhancing the customer experience with relevant, timely promotions.

Business Value of Real-Time Retail Media

Retailers that adopt data streaming with Kafka and Flink for retail media strategies to unlock massive business value:

  • New Revenue Streams: Retail media monetization drives ad sales growth
  • Higher Conversion Rates: Real-time targeting improves customer engagement
  • Better Customer Insights: Streaming analytics enables deep behavioral insights
  • Competitive Advantage: Retailers with real-time personalization outperform rivals
  • Better Customer Experience: Retail media reduces friction and enhances the shopping journey through personalized promotions

The Future of Retail Media is Real-Time and Context-Specific Data Streaming

Retail media is no longer just about placing ads on retailer websites—it’s about delivering real-time, data-driven advertising experiences across every consumer touchpoint.

With Kafka and Flink powering real-time data streaming, retailers can:

  • Unify online and offline shopping experiences
  • Enhance personalization with AI-driven insights
  • Maximize ad revenue with real-time campaign optimization

As retailers like Albertsons, Walmart, and Amazon continue to innovate, the future of retail media will be hyper-personalized, data-driven, and real-time.

How is your organization using real-time data for retail media? Stay ahead of the curve in retail innovation! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And download my free book about data streaming use cases and success stories in the retail industry.

The post Retail Media with Data Streaming: The Future of Personalized Advertising in Commerce appeared first on Kai Waehner.

]]>
A New Era in Dynamic Pricing: Real-Time Data Streaming with Apache Kafka and Flink https://www.kai-waehner.de/blog/2024/11/14/a-new-era-in-dynamic-pricing-real-time-data-streaming-with-apache-kafka-and-flink/ Thu, 14 Nov 2024 12:09:57 +0000 https://www.kai-waehner.de/?p=6968 In the age of digitization, the concept of pricing is no longer fixed or manual. Instead, companies increasingly use dynamic pricing — a flexible model that adjusts prices based on real-time market changes to enable real-time responsiveness, giving companies the tools they need to respond instantly to demand, competitor prices, and customer behaviors. This blog post explores the fundamentals of dynamic pricing, its link to data streaming, and real-world examples across different industries such as retail, logistics, gaming and the energy sector.

The post A New Era in Dynamic Pricing: Real-Time Data Streaming with Apache Kafka and Flink appeared first on Kai Waehner.

]]>
In the age of digitization, the concept of pricing is no longer fixed or manual. Instead, companies increasingly use dynamic pricing — a flexible model that adjusts prices based on real-time market changes. Data streaming technologies like Apache Kafka and Apache Flink have become integral to enabling this real-time responsiveness, giving companies the tools they need to respond instantly to demand, competitor prices, and customer behaviors. This blog post explores the fundamentals of dynamic pricing, its link to data streaming, and real-world examples of how different industries such as retail, logistics, gaming and the energy sector leverage this powerful approach to get ahead of the competition.

Dynamic Pricing with Data Streaming using Apache Kafka and Flink

What is Dynamic Pricing?

Dynamic pricing is a strategy where prices are adjusted automatically based on real-time data inputs, such as demand, customer behavior, supply levels, and competitor actions. This model allows companies to optimize profitability, boost sales, and better meet customer expectations.

Relevant Industries and Examples

Dynamic pricing has applications across many industries:

  • Retail and eCommerce: Dynamic pricing in eCommerce helps adjust product prices based on stock levels, competitor actions, and customer demand. Companies like Amazon frequently update prices on millions of products, using dynamic pricing to maximize revenue.
  • Transportation and Mobility: Ride-sharing companies like Uber and Grab adjust fares based on real-time demand and traffic conditions. This is commonly known as “surge pricing.”
  • Gaming: Context-specific in-game add-ons or virtual items are offered at varying prices based on player engagement, time spent in-game, and special events or levels.
  • Energy Markets: Dynamic pricing in energy adjusts rates in response to demand fluctuations, energy availability, and wholesale costs. This approach helps to stabilize the grid and manage resources.
  • Sports and Entertainment Ticketing: Ticket prices for events are adjusted based on seat availability, demand, and event timing to allow venues and ticketing platforms to balance occupancy and maximize ticket revenue.
  • Hospitality: Adaptive room rates and promotions in real-time based on demand, seasonality, and guest behavior, using dynamic pricing models.

These industries have adopted dynamic pricing to maintain profitability, manage supply-demand balance, and enhance customer satisfaction through personalized, responsive pricing.

Dynamic pricing relies on up-to-the-minute data on market and customer conditions, making real-time data streaming critical to its success. Traditional batch processing, where data is collected and processed periodically, is insufficient for dynamic pricing. It introduces delays that could mean lost revenue opportunities or suboptimal pricing. This scenario is where data streaming technologies come into play.

  • Apache Kafka serves as the real-time data pipeline, collecting and distributing data streams from diverse sources. For instance, user behaviour on websites, competitor pricing, social media signals, IoT data, and more. Kafka’s capability to handle high throughput and low latency makes it ideal for ingesting large volumes of data continuously.
  • Apache Flink processes the data in real-time, applying complex algorithms to identify pricing opportunities as conditions change. With Flink’s support for stream processing and complex event processing, businesses can apply sophisticated logic to assess and adjust prices based on multiple real-time factors.

Dynamic Pricing with Apache Kafka and Flink in Retail eCommerce

Together, Kafka and Flink create a powerful foundation for dynamic pricing, enabling real-time data ingestion, analysis, and action. This empowers companies to implement pricing models that are not only highly responsive but also resilient and scalable.

Clickstream Analytics in Real-Time with Data Streaming Replacing Batch with Hadoop and Spark

Years ago, companies relied on Hadoop and Spark to run batch-based clickstream analytics. Data engineers ingested logs from websites, online stores, and mobile apps to gather insights. Processing took hours. Therefore, any promotional offer or discount often arrived a day later — by which time the customer may have already made their purchase elsewhere, like on Amazon.

With today’s data streaming platforms like Kafka and Flink, clickstream analytics has evolved to support real-time, context-specific engagement and dynamic pricing. Instead of waiting on delayed insights, businesses can now analyze customer behavior as it happens, instantly adjusting prices and delivering personalized offers at the moment. This dynamic pricing capability allows companies to respond immediately to high-intent customers, presenting tailored prices or promotions when they’re most likely to convert. Dynamic pricing with Kafka and Flink can create a much better seamless and timely shopping experience that maximizes sales and customer satisfaction.

Here’s how businesses across various sectors are harnessing Kafka and Flink for dynamic pricing.

  • Retail: Hyper-Personalized Promotions and Discounts
  • Logistics and Transportation: Intelligent Tolling
  • Technology: Surge Pricing
  • Energy Markets: Manage Supply-Demand and Stabilize Grid Loads
  • Gaming: Context-Specific In-Game Add-Ons
  • Sports and Entertainment: Optimize Ticketing Sales Sports and Entertainment

Learn more about data streaming with Kafka and Flink for dynamic pricing in the following success stories:

AO: Hyper-Personalized Promotions and Discounts (Retail and eCommerce)

AO, a major UK eCommerce retailer, leverages data streaming for dynamic pricing to stay competitive and drive higher customer engagement. By ingesting real-time data on competitor prices, customer demand, and inventory stock levels, AO’s system processes this information instantly to adjust prices in sync with market conditions. This approach allows AO to seize pricing opportunities and align closely with customer expectations. The result is a 30% increase in customer conversion rates.

AO Retail eCommerce Hyper Personalized Online and Mobile Experience

Dynamic pricing has also allowed AO to provide a hyper-personalized shopping experience, delivering relevant product recommendations and timely promotions. This real-time responsiveness has enhanced customer satisfaction and loyalty, as customers receive offers that feel customized to their needs. During high-traffic periods like holiday sales, AO’s dynamic pricing ensures competitiveness and optimizes margins. This drives both profitability and customer retention. The company has applied this real-time approach not just to pricing, but also to other areas like delivery to make things run smoother. The retailer is now much more efficient and provides better customer service.

Quarterhill: Intelligent Tolling (Logistics and Transportation)

Quarterhill, a leader in tolling and intelligent transportation systems, uses Kafka and Flink to implement dynamic toll pricing. Kafka ingests real-time data from traffic sensors and road usage patterns. Flink processes this data to determine congestion levels and calculate the optimal toll based on real-time conditions.

Quarterhill – Intelligent Roadside Enforcement and Compliance

This dynamic pricing strategy allows Quarterhill to manage road congestion effectively, reward off-peak travel, and optimize toll revenues. This system not only improves travel efficiency but also helps regulate traffic flows in high-density areas, providing value both to drivers and the city infrastructure.

Uber, Grab, and FreeNow: Surge Pricing (Technology)

Ride-sharing companies like Uber, Grab, and FreeNow are widely known for their dynamic pricing or “surge pricing” models. With data streaming, these platforms capture data on demand, supply (available drivers), location, and traffic in real time. This data is processed continuously by Apache Flink, Kafka Streams or other stream processing engines to calculate optimal pricing, balancing supply with demand, while considering variables like route distance and current traffic.

Dynamic Surge Pricing at Mobility Service MaaS Freenow with Kafka and Stream Processing
Source FreeNow

Surge pricing enables these companies to provide incentives for drivers to operate in high-demand areas, maintaining service availability and ensuring customer needs are met during peak times. This real-time pricing model improves revenue while optimizing customer satisfaction through prompt service availability.

Uber’s Kappa Architecture is an excellent example for how to build a data pipeline for dynamic pricing and many other use cases with Kafka and Flink:

Kappa Architecture with Apache Kafka at Mobility Service Uber
Source: Uber

2K Games / Take-Two Interactive: Context-Specific In-Game Purchases (Gaming Industry)

In the gaming industry, dynamic pricing is becoming a strategy to improve player engagement and monetize experiences. Many gaming companies use Kafka and Flink to capture real-time data on player interactions, time spent in specific game sections, and in-game events. This data enables companies to offer personalized pricing for in-game items, bonuses, or add-ons, adjusting prices based on the player’s current engagement level and recent activities.

For instance, if players are actively taking part in a particular game event, they may be offered special discounts or dynamic prices on related in-game assets. Thereby, the gaming companies improve conversion rates and player engagement while maximizing revenue.

2K Games,A leading video game publisher and a subsidiary of Take-Two Interactive, has shifted from batch to real-time analytics to enhance player engagement across popular franchises like BioShock, NBA 2K, and Borderlands. By leveraging Confluent Cloud as fully managed data streaming platform, the publisher scales dynamically to handle high traffic, processing up to 3000 MB per second to serve 4 million concurrent users.

2K Games Take Two Interactive - Bridging the Gap And Overcoming Tech Hurdles to Activate Data
Source: 2K Games

Real-time telemetry analytics now allow them to analyze player actions and context instantly, enabling personalized, context-specific promotions and enhancing the gaming experience. Cost efficiencies are achieved through data compression, tiered storage, and reduced data transfer, making real-time engagement both effective and economical.

50hertz: Manage Supply-Demand and Stabilize Grid Loads (Energy Markets)

Dynamic pricing in energy markets is essential for managing supply-demand fluctuations and stabilizing grid loads. With Kafka, energy providers ingest data from smart meters, renewable energy sources, and weather. Flink processes this data in real-time, adjusting energy prices based on grid conditions, demand levels, and renewable supply availability.

50Hertz, as a leading electricity transmission system operator, indirectly (!) affects dynamic pricing in the energy market by sharing real-time grid data with partners and energy providers. This allows energy providers and market operators to adjust prices dynamically based on real-time insights into supply-demand fluctuations and grid stability.

To support this, 50Hertz is modernizing its SCADA systems with data streaming technologies to enable real-time data capture and distribution that enhances grid monitoring and responsiveness.

Data Streaming with Apache Kafka and Flink to Modernize SCADA Systems

Real-time pricing approach helps encourage consumption when renewable energy is abundant and discourages usage during peak times, leading to optimized energy distribution, grid stability, and improved sustainability.

Ticketmaster: Optimize Ticketing Sales (Sports and Entertainment)

In ticketing, dynamic pricing allows for optimized revenue management based on demand and availability. Companies like Ticketmaster use Kafka to collect data on ticket availability, sales velocity, and even social media sentiment surrounding events. Flink processes this data to adjust prices based on real-time market conditions, such as proximity to the event date and current demand.

By dynamically pricing tickets, event organizers can maximize seat occupancy, boost revenue, and respond to last-minute demand surges, ensuring that prices reflect real-time interest while enhancing fan satisfaction.

Real-time inventory data streams allow Ticketmaster to monitor ticket availability, pricing, and demand as they change moment-to-moment. With data streaming through Apache Kafka and Confluent Platform, Ticketmaster tracks sales, venue capacity, and customer behavior in a single, live inventory stream. This enables quick responses, such as adjusting prices for high-demand events or boosting promotions where conversions lag. Teams gain actionable insights to forecast demand accurately and optimize inventory. This approach ensures fans have timely access to tickets. The result is a dynamic, data-driven approach that enhances customer experience and maximizes event success.

Conclusion: Business Value of Dynamic Pricing Built with Data Streaming

Dynamic pricing powered by data streaming with Apache Kafka and Flink brings transformative business value by:

  • Maximizing Revenue and Margins: Real-time price adjustments enable companies to capture value during demand surges, optimize for competitive conditions, and maintain healthy margins.
  • Improving Operational Efficiency: By automating pricing decisions based on real-time data, organizations can reduce manual intervention, speed up reaction times, and allocate resources more effectively.
  • Boosting Customer Satisfaction: Responsive pricing models allow companies to meet customer expectations in real time, leading to improved customer loyalty and engagement.
  • Supporting Sustainability Goals: In energy and transportation, dynamic pricing helps manage resources and reward environmentally friendly behaviors. Examples include off-peak travel and renewable energy usage.
  • Empowering Strategic Decision-Making: Real-time data insights provide business leaders with the information needed to adjust strategies and respond to developing market demands quickly.

Building a dynamic pricing system with Kafka and Flink represents a strategic investment in business agility and competitive resilience. Using data streaming to set prices instantly, businesses can stay ahead of competitors, improve customer service, and become more profitable. Dynamic pricing powered by data streaming is more than just a revenue tool; it’s a vital lever for driving growth, differentiation, and long-term success.

Did you already implement dynamic pricing? What is your data platform and strategy? Do you use Apache Kafka and Flink? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post A New Era in Dynamic Pricing: Real-Time Data Streaming with Apache Kafka and Flink appeared first on Kai Waehner.

]]>
Unified Commerce in Retail and eCommerce with Apache Kafka and Flink for Real-Time Customer 360 https://www.kai-waehner.de/blog/2024/08/30/unified-commerce-in-retail-and-ecommerce-with-apache-kafka-and-flink-for-real-time-customer-360/ Fri, 30 Aug 2024 09:40:35 +0000 https://www.kai-waehner.de/?p=6755 Delivering a seamless and personalized customer experience across all touchpoints is essential for staying competitive in today’s rapidly evolving retail and eCommerce landscape. Unified commerce integrates all sales channels and backend systems into a single platform to ensure real-time consistency in customer interactions, inventory management, and order fulfillment. This blog post explores how Apache Kafka and Flink can be pivotal in achieving real-time Customer 360 in the unified commerce ecosystem and how it differs from traditional omnichannel approaches.

The post Unified Commerce in Retail and eCommerce with Apache Kafka and Flink for Real-Time Customer 360 appeared first on Kai Waehner.

]]>
Delivering a seamless and personalized customer experience across all touchpoints is essential for staying competitive in today’s rapidly evolving retail and e-commerce landscape. Unified commerce integrates all sales channels and backend systems into a single platform to ensure real-time consistency in customer interactions, inventory management, and order fulfillment. Leveraging the power of data streaming with Apache Kafka and Apache Flink, businesses can harness real-time data streaming to build a comprehensive Customer 360 view and to enable instant insights and tailored experiences. This approach not only enhances operational efficiency but also drives customer loyalty by offering a truly unified and responsive shopping experience. This blog post explores how Kafka and Flink can be pivotal in achieving real-time Customer 360 in the unified commerce ecosystem and how it differs from traditional omnichannel approaches.

Unified Commerce with Data Streaming using Apache Kafka and Flink at the Edge and in the Cloud

What is Unified Commerce?

Unified commerce is an approach that integrates all customer-facing channels and backend systems into a single platform, providing a seamless, consistent experience across every touchpoint, whether online, in-store, or via mobile. Unlike traditional multichannel or omnichannel strategies, where different channels might operate independently or with partial integration, unified commerce brings everything together in real time. This includes inventory management, customer data, order fulfillment, and payment processing, all managed by a central system.

Difference between Unified Commerce and Omnichannel?

While both unified commerce and omnichannel aim to provide a seamless customer experience across different channels, they differ in their approach to integration:

  • Omnichannel: In an omnichannel strategy, businesses integrate multiple channels (such as online, in-store, and mobile apps) to create a consistent customer experience. However, these channels often run on separate systems, requiring middleware or manual processes to synchronize data across platforms. This can lead to delays, inconsistencies, and a fragmented view of the customer.
  • Unified Commerce: Unified commerce takes omnichannel a step further by consolidating all channels into a single platform that operates in real-time. This means inventory, customer information, and order data are updated instantly across all touchpoints to provide a more cohesive and responsive experience. It eliminates the silos between channels, ensuring that customers have a truly unified experience, no matter where or how they interact with the brand.

Gartner: In-Store for Retail = Ground Control for Space Operations

The retail store plays the same role as ground control for space operations” said. Joanne Joliet, Senior Director Analyst at Gartner. The following Gartner slide already implies how crucial hybrid connectivity and real-time correlation are to process data in real-time:

Gartner - Unified Commerce in Motion
Source: Gartner

The summary of a Gartner IT Symposium/Xpo quotes a few interesting statements:

  • “Even as online retail increased by 44% during the COVID-19 pandemic, physical retail stores are indispensable. 61% of shoppers prefer to return online orders to stores, which makes retail stores a key component of a retailer’s overall success.”
  • Stores need to be connected to reduce data latency and provide situational awareness. Smart checkout, robotic process automation, algorithmic retailing, contextualized real time pricing, conversational commerce are some of the technologies that can help achieve unified commerce.”
  • Retailers who create fluidity and flexibility by building a unified commerce ecosystem will be the ones who succeed.

Relevant Industries

Unified commerce is relevant in industries where customer experience and channel integration are critical for success. These include:

  • Retail: To provide a seamless shopping experience across online stores, physical locations, and mobile apps.
  • Hospitality: Integrating booking, dining, and customer service channels to enhance the guest experience.
  • Food and Beverage: Managing orders from various sources like in-store, online, and delivery apps, with consistent inventory and customer data.
  • E-commerce: Ensuring that online sales platforms are fully integrated with inventory, logistics, and customer service.
  • Healthcare: Unifying patient management systems, appointment scheduling, telemedicine, and billing.

Data streaming with Apache Kafka and Apache Flink can significantly enhance unified commerce by enabling real-time data integration, processing, and analysis across various channels and backend systems.

  • Real-Time Inventory Management: Kafka can stream real-time inventory updates from different channels (e.g., online stores, physical stores, warehouses) into a unified platform. Flink can then process this data to ensure that inventory levels are consistently accurate across all touchpoints, reducing stockouts and overselling.
  • Personalized Customer Experience: Kafka can stream customer interaction data from various sources, like websites, mobile apps, and in-store transactions. Flink can process this data in real time to provide personalized recommendations, targeted promotions, and a consistent shopping experience across channels.
  • Order Tracking and Fulfillment: Kafka can stream order status updates from various fulfillment centers. Flink can process these streams to provide customers with real-time updates on their order status, ensuring transparency and enhancing the customer experience.
  • Fraud Detection: Kafka can stream transaction data in real time. Flink can analyze this data to detect patterns indicative of fraudulent activities. This allows businesses to respond instantly to potential threats, protecting both the business and its customers.
  • Real-Time Analytics and Reporting: Kafka can collect data from all commerce activities. Flink can process and analyze this data in real time to generate up-to-date insights for decision-making. This enables businesses to adapt quickly to market trends, customer behaviors, and operational needs.

Here is an example of a hybrid retail architecture powered by an event-driven architecture using data streaming to enable Unified Commerce:

Hybrid Edge to Global Retail Architecture with Apache Kafka

By leveraging Kafka and Flink, unified commerce platforms can achieve high responsiveness, scalability, and a seamless customer experience.

Build vs. Buy a Unified Commerce Platform?

When choosing between building a unified commerce platform or buying one, consider that building allows for tailored customization but requires significant time, resources, and expertise. Buying a pre-built solution offers quicker deployment and vendor support, though it may have limitations in flexibility, scalability, and cost efficiency.

BUY: Tools, Products and SaaS Cloud Services for Unified Commerce

Here are some tools, products, and SaaS solutions that support unified commerce:

  • Salesforce Commerce Cloud: Provides a unified platform for managing e-commerce, in-store sales, and customer data.
  • Shopify Plus: Offers a scalable platform that unifies online and offline sales, including robust POS integration.
  • Oracle Retail: Integrates retail operations, including inventory management, e-commerce, and customer experience management.
  • Lightspeed: A cloud-based platform for retail and hospitality businesses, offering POS, e-commerce, and inventory management.
  • Square for Retail: Combines POS, online sales, and customer management in a unified platform, ideal for small to medium-sized businesses.
  • Magento (Adobe Commerce): A customizable e-commerce platform that supports unified commerce by integrating online and offline sales channels.
  • BigCommerce: A SaaS platform that helps businesses unify their sales across different channels, including online, in-store, and marketplaces.
  • SAP Commerce Cloud: Part of the SAP C/4HANA suite, it integrates e-commerce, sales, marketing, and service into a unified platform.
  • Cegid Retail Y2: A retail-focused solution that unifies POS, inventory, and e-commerce with real-time updates and customer insights.

The choice depends on your existing product portfolio and relationship with vendors. But keep in mind that emerging enterprise architecture trends such as event-driven architecture, microservices, data mesh, and data products enable a true decoupling for higher flexibility, cost-efficient product selection, and consistent data integration between different technologies, APIs, and cloud services. There is no need to implement everything in a monolithic architecture with tight vendor lock-in.

And even if you buy a “complete” Unified Platform with all the features you need, you still need to integrate with the rest of the software and IT applications, systems, and IoT interfaces in your enterprise architecture. Apache Kafka is the leading integration platform providing event-driven real-time communication, reliable transactional processing, and flexible deployment options to deploy in the public cloud or at the edge in a retail store.

Nobody will build an entire unified commerce platform from scratch. This means way too high cost, much effort, and a slow rollout. Hence, the above-listed products are an excellent starting point to implement unified commerce.

Most organizations leverage data streaming with Kafka and Flink to connect independent products, SaaS services, and custom microservices.

By the way: Did you know that even several of the above-listed commercial Unified Commerce products and cloud services leverage data streaming under the hood of their platform? Like the end user, these platforms require flexibility, scalability, consistent integration, and real-time data processing. That’s where Apache Kafka became the de facto standard as the foundation of the enterprise architecture.

Vendors like Salesforce or Shopify heavily rely on Apache Kafka as the foundation of their internal enterprise architecture. Many public articles are available going into the details. Let’s go deeper into one example from the above list: BigCommerce.

BigCommerce: A Cloud-Native Unified Commerce Platform powered by Apache Kafka

BigCommerce is a Unified Commerce platform that enables merchants to create commerce solutions for B2B, B2C, Multi-Storefront, Omnichannel, Headless, and International.

The solution is built on top of a fully managed Confluent Cloud. BigCommerce migrated from open-source Kafka with zero downtime, no data loss, and the ability to auto-scale. The Unified Commerce platform processes 1.6 billion messages each day comprising e-commerce events such as visits, product page views, add to cart, checkouts, orders, etc.

<yoastmark class=

BigCommerce implements various retail use cases in different business units with Apache Kafka using fully managed Confluent Cloud, including:

  • Real-time analytics and insights for merchants
  • Server-side event transmission for Meta’s conversion APIs for social commerce and advertisements
  • Bot filter exploration for fraud prevention using stream processing
  • Continuous data ingestion into the data lake for batch analytics and model training
  • AI and Machine Learning based real-time personalized product recommendations

Like BigCommerce, end users can either build their custom solution or leverage data streaming as the event-driven foundation to connect to a Unified Commerce platform and synchronize it with the other systems and databases in the enterprise architecture.

Unified commerce is a strategic approach that unifies all sales channels and backend systems into a single, integrated platform, enabling real-time data synchronization and a seamless customer experience across all touchpoints. It differs from omnichannel by eliminating the silos between channels, providing a more cohesive and responsive experience.

Unified commerce is relevant in industries like retail, hospitality, e-commerce, and healthcare, where customer experience is critical. A single platform cannot solve all problems. Flexibility, cost-efficiency, and the ability to innovate quickly with a fast time to market, leveraging innovative cloud services, requires the right enterprise architecture integration strategy.

An event-driven architecture powered by data streaming with Apache Kafka and Apache Flink can significantly enhance unified commerce by enabling real-time data processing and analysis across various channels and backend systems.

I published plenty of other retail articles that show data streaming use cases hybrid edge to cloud architectures, and success stories from enterprises around the world. Here are a few examples:

What is your strategy to build a Unified Commerce platform? Which open-source frameworks, products, or cloud services do you use? What strategy does data streaming play? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Unified Commerce in Retail and eCommerce with Apache Kafka and Flink for Real-Time Customer 360 appeared first on Kai Waehner.

]]>
How the Retailer Intersport uses Apache Kafka as Database with Compacted Topic https://www.kai-waehner.de/blog/2024/01/25/how-the-retailer-intersport-uses-apache-kafka-as-database-with-compacted-topic/ Thu, 25 Jan 2024 04:31:15 +0000 https://www.kai-waehner.de/?p=5760 Compacted Topic is a feature of Apache Kafka to persist and query the latest up-to-date event of a Kafka Topic. The log compaction and key/value search is simple, cost-efficient and scalable. This blog post shows in a success story of Intersport how some use cases store data long term in Kafka with no other database. The retailer requires accurate stock info across the supply chain, including the point of sale (POS) in all international stores.

The post How the Retailer Intersport uses Apache Kafka as Database with Compacted Topic appeared first on Kai Waehner.

]]>
Compacted Topic is a feature of Apache Kafka to persist and query the latest up-to-date event of a Kafka Topic. The log compaction and key/value search is simple, cost-efficient and scalable. This blog post shows in a success story of Intersport how some use cases store data long term in Kafka with no other database. The retailer requires accurate stock info across the supply chain, including the point of sale (POS) in all international stores.

How Intersport uses Apache Kafka as Database with Compacted Topic in Retail

What is Intersport?

Intersport International Corporation GmbH, commonly known as Intersport, is headquartered in Bern, Switzerland, but its roots trace back to Austria. Intersport is a global sporting goods retail group that operates a network of stores selling sports equipment, apparel, and related products. It is one of the world’s largest sporting goods retailers and has a presence in many countries around the world.

Intersport stores typically offer a wide range of products for various sports and outdoor activities, including sports clothing, footwear, equipment for sports such as soccer, tennis, skiing, cycling, and more. The company often partners with popular sports brands to offer a variety of products to its customers.

Intersport Wikipedia

Intersport actively promotes sports and physical activity and frequently sponsors sports events and initiatives to encourage people to lead active and healthy lifestyles. The specific products and services offered by Intersport may vary from one location to another, depending on local market demand and trends.

The company automates and innovates continuously with software capabilities like fully automated replenishment, drop shipping, personalized recommendations for customers, and other applications.

How does Intersport leverage Data Streaming with Apache Kafka?

Intersport presented its data streaming success story together with the system integrator DCCS at the Data in Motion Tour 2023 in Vienna, Austria.

Apache Kafka and Compacted Topics in Retail with WMS SAP ERP Cash Register POS
Source: DCCS

Here is a summary about the deployment, use cases, and project lifecycle at Intersport:

  • Apache Kafka as the strategic integration hub powered by fully managed Confluent Cloud
  • Central nervous system to enable data consistency between real-time data and non-real-time data, i.e., batch systems, files, databases, and APIs.
  • Loyalty platform with real-time bonus point system
  • Personalized marketing and hybrid omnichannel customer experience across online and stores
  • Integration with SAP ERP, financial accounting (SAP FI) and 3rd Party B2B like bike rental, 100s of POS, and legacy like FTP and XML interfaces
  • Fast time-to-market because of the fully managed cloud: The pilot project with 100 stores and 200 Point of Sale (POS) was finished in 6 months. The entire production rollout took only 12 months.
Data Streaming Architecture at Intersport with Apache Kafka KSQL and Schema Registry
Source: DCCS

Is Apache Kafka a Database? No. But…

No, Apache Kafka is NOT a database. Apache Kafka is a distributed streaming platform that is designed for building real-time data pipelines and streaming applications. Users frequently apply it for ingesting, processing, and storing large volumes of event data in real time.

Apache Kafka does not provide the traditional features associated with databases, such as random access to stored data or support for complex queries. If you need a database for storage and retrieval of structured data, you would typically use a database system like MySQL, PostgreSQL, MongoDB, or others with Kafka to address different aspects of your data processing needs.

However, Apache Kafka is a database if you focus on cost-efficient long-term storage and the replayability of historical data. I wrote a long article about the database characteristics of Apache Kafka. Read it to understand when (not) to use Kafka as a database. The emergence of Tiered Storage for Kafka created even more use cases.

In this blog post, I want to focus on one specific feature of Apache Kafka for long-term storage and query functionality: Compacted Topics.

What is a Compacted Topic in Apache Kafka?

Kafka is a distributed event streaming platform, and topics are the primary means of organizing and categorizing data within Kafka. “Compacted Topic” in Apache Kafka refers to a specific type of Kafka Topic configuration that is used to keep only the most recent value for each key within the topic.

Apache Kafka Log Compaction
Source: Apache

In a compacted topic, Kafka ensures that, for each unique key, only the latest message (or event) associated with that key is retained. The system effectively discards older messages with the same key. A Compacted Topic is often used for scenarios where you want to maintain the latest state or record for each key. This can be useful in various applications, such as maintaining the latest user profile information, aggregating statistics, or storing configuration data.

Log Compaction in Kafka with a Compacted Topic
Source: Apache

Here are some key characteristics and use cases for compacted topics in Kafka:

  1. Key-Value Semantics: A compacted topic supports scenarios where you have a key-value data model, and you want to query the most recent value for each unique key.
  2. Log Compaction: Kafka uses a mechanism called “log compaction” to ensure that only the latest message for each key is retained in the topic. This means that Kafka will retain the entire history of changes for each key, but it will remove older versions of a key’s data once a newer version arrives.
  3. Stateful Processing: Compacted topics are often used in stream processing applications where maintaining the state is important. Stream processing frameworks like Apache Kafka Streams and ksqlDB leverage a compacted topic to perform stateful operations.
  4. Change-Data Capture (CDC): Change-data capture scenarios use compacted topics to track changes to data over time. For example, capturing changes to a database table and storing them in Kafka with the latest version of each record.

Compacted Topic at Intersport to Store all Retail Articles in Apache Kafka

Intersport stores all articles in Compacted Topics, i.e., with no retention time. Article records can change several times. Topic compaction cleans out outdated records. Only the most recent version is relevant.

Master Data Flow at Intersport with Kafka Connect Compacted Topics SQL and REST API
Source: DCCS

Article Data Structure

A model comprises several SKUS as a nested array:

  • An SKU represents an article with its size and color
  • Every SKU has shop based prices (purchase price, sales price, list price)
  • Not every SKU is available in every shop
A Compacted Topic for Retail Article in Apache Kafka
Source: DCCS

Accurate Stock Information across the Supply Chain

Intersport and DCCS presented their important points and benefits of leveraging Kafka. The central integration hub uses compacted topics for storing and retrieving articles:

  • Customer facing processes demand real time
  • Stock info needs to be accurate
  • Distribute master data to all relevant sub system as soon as it changes
  • Scale flexible on high load (shopping weekends before Christmas)

Providing the right information at the right time is crucial across the supply chain. Data consistency matters, as not every system is real-time. This is one of the most underestimated sweet spots of Apache Kafka combining real-time messaging with a persistent event store.

Log Compaction in Kafka does NOT Replace BUT Complement other Databases

Intersport is an excellent example in the retail industry for persisting information long-term in Kafka Topics leveraging Kafka’s feature “Compacted Topics“. The benefits are simple usage, cost-efficient event store of the latest up-to-date information, and fast key/value queries, and no need for another database. Hence, Kafka can replace a database for some specific scenarios, like storing and querying the inventory of each store at Intersport.

If you want to learn about other use cases and success stories for data streaming with Kafka and Flink in the retail industry, check out these articles:

How do you use data streaming with Kafka and Flink? What retail use cases did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post How the Retailer Intersport uses Apache Kafka as Database with Compacted Topic appeared first on Kai Waehner.

]]>
The State of Data Streaming for Retail https://www.kai-waehner.de/blog/2023/04/27/the-state-of-data-streaming-for-retail-in-2023/ Thu, 27 Apr 2023 10:33:30 +0000 https://www.kai-waehner.de/?p=5396 This blog post explores the state of data streaming for the retail industry with omnichannel customer experiences, hybrid shopping models, and hyper-personalized recommendations. Data streaming allows integrating and correlating data in real time at any scale. I explore customer stories from Walmart, Albertsons, Otto, AO.com, and more, including a slide deck and on-demand video recording.

The post The State of Data Streaming for Retail appeared first on Kai Waehner.

]]>
This blog post explores the state of data streaming for the retail industry. The evolution of omnichannel customer experiences, hybrid shopping models, and hyper-personalized recommendations requires an optimized end-to-end supply chain, fancy mobile apps, and integration with pioneering technologies like social commerce or metaverse. Data streaming allows integrating and correlating data in real time at any scale. I look at retail trends to explore how data streaming helps as a business enabler, including customer stories from Walmart, Albertsons, Otto, AO.com, and more. A complete slide deck and on-demand video recording are included.

The State of Data Streaming for Retail in 2023

Several disruptive trends impact innovation in the retail industry to reduce costs, increase the customer experience, and keep customer retention and revenue high:

Disruptive Trends in Retail for Data Streaming

Researchers, analysts, startups, and last but not least, labs and the first real-world rollouts of traditional players show a few upcoming trends in the retail industry:

  • Hybrid shopping models with digitalization and omnichannel (see a recent Gartner webinar)
  • Generative AI and automation to improve existing business processes and innovation (as discussed in an article by McKinsey)
  • Live commerce with social platforms changes the shopping experience (even beyond China and Asia, as an analysis of Grand View Research shows)

Let’s explore the goals and impact of these trends.

Hybrid shopping models with digitalization and omnichannel

Capabilities for omnichannel retail change and improve the customer experience significantly. Mobile apps enable seamless hybrid and location-based in-store experiences. Customers leverage options like “buy online and pick up in the store” more and more:

Gartner - Hybrid Shopping Models
Source: Gartner

Generative AI and automation for innovation

Generative AI and automation to become more productive, get to market faster, and serve customers better. The McKinsey article explores various use cases for technologies like NLP (Natural Language Processing) with Machine Learning and Large Language Models (LLM) like ChatGPT, including:

  • Merchandising and Product: Customizing
  • Supply Chain and Logistics: Support negotiations with suppliers
  • Marketing: Generate personalized offers
  • Digital commerce: Tailor virtual product try-on
  • Store operations: Optimize store layout through simulations
  • Organization: Enable self-serve and automate support tasks

Live commerce changing the shopping experience

Live commerce with social platforms changes the shopping experience by combining instant purchasing of a featured product and audience participation. The covid pandemic sped up this trend. Live commerce emerged in China but arrived in the West across industries, whether you sell fashion, toys, cars, digital features, or anything else. A chart of Grand View Research shows the growth of social commerce in the North America:

Grand View Research - Social Commerce Market North America
Source: Grand View Research

I explored some time ago how Apache Kafka transforms the retail and shopping metaverse. Let’s inspect the relation of data streaming with technologies like Kafka and Flink for the retail industry.

Data streaming in the retail industry

Adopting trends like hybrid shopping models, location-based services or advanced loyalty platforms is only possible if enterprises in the retail industry can provide and correlate information at the right time in the proper context. Real-time, which means using the information in milliseconds, seconds, or minutes, is almost always better than processing data later (whatever later means):

Real Time Data Streaming in Retail

Data streaming combines the power of real-time messaging at any scale with storage for true decoupling, data integration, and data correlation capabilities. Apache Kafka is the de facto standard for data streaming.

Use Cases for Apache Kafka in Retailis a good article for starting with an industry-specific point of view on data streaming.

This is just one example. Data streaming with the Apache Kafka ecosystem and cloud services are used throughout the supply chain of the retail industry. Search my blog for various articles related to this topic: Search Kai’s blog.

Cloud adoption in retail as the foundation for innovation with data streaming

Forrester analyzed the cloud adoption in the retail industry in their research about The State Of Cloud In Retail, 2023. The results are impressive:

Forrester - The State of Cloud in Retail 2023
Source: Forrester

The cloud provides elastic scalability and shorter time-to-market cycles for innovation. Building new real-time applications is much easier in the cloud because the data streaming infrastructure is available as fully managed SaaS with critical SLAs.

Nuuly’s innovative clothing rental subscription service is a great example. It differs greatly from a typical e-commerce model with the need for a real-time event-driven architecture. They use Confluent Cloud and Kafka as the central nervous system of its business, spanning everything from customer-facing applications to distribution center operations from a technology perspective. The entire business case was developed and brought to production in just six months, thanks to the data streaming SaaS under the hood to focus on business logic.

Software is eating retail, and real-time data enables innovation

CBINSIGHTS explored various use cases that optimize the retail supply chain or improve the customer experience and retention:

CBINSIGHTS - Software is Eating Retail
Source: CBINSIGHTS

If you look at the architecture trends and customer stories for data streaming in the next section, you realize that real-time data integration and processing at scale is required to provide most modern retail use cases.

The retail industry applies various trends for enterprise architectures for cost, flexibility, security, and latency reasons. The three major topics I see these days at customers are:

  • Edge data synchronization to the cloud in real-time
  • Omnichannel up-/cross-selling
  • New retail concepts and strategies like augmented reality, live commerce, or metaverse

Let’s look deeper into some enterprise architectures that leverage data streaming for retail use cases.

Hybrid architecture with data streaming at the edge in retail store and cloud

Most retailers have a cloud-first strategy to set up modern e-commerce, CRM, marketing, loyalty, and payment platforms. However, edge computing gets more relevant for use cases like location-based services, hybrid shopping models, and other real-time analytics scenarios:

Hybrid Edge to Global Retail Architecture with Apache Kafka

Learn about architecture patterns for Apache Kafka that may require multi-cluster solutions and see real-world examples with their specific requirements and trade-offs. That blog explores scenarios such as disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments, and global Kafka.

Edge deployments for data streaming are their own challenges. In separate blog posts, I covered use cases for Kafka at the edge and provided an infrastructure checklist for edge data streaming.

Hyper-personalized customer experience

Customers expect a great customer experience across devices (like a web browser or mobile app) and human interactions (e.g., in a bank branch). Data streaming enables a context-specific omnichannel retail experience by correlating real-time and historical data at the right time in the proper context:

Context-specific Omnichannel Retail Experience with Data Streaming

Omnichannel Retail and Customer 360 in Real Time with Apache Kafka” goes into more detail. But one thing is clear: Most innovative use cases require both historical and real-time data. In summary, correlating historical and real-time information is possible with data streaming out-of-the-box because of the underlying append-only commit log and replayability of events. A cloud-native Tiered Storage Kafka infrastructure to separate compute from storage makes such an enterprise architecture more scalable and cost-efficient.

The article “Fraud Detection with Apache Kafka, KSQL and Apache Flink” explores stream processing for real-time analytics in more detail, shows an example with embedded machine learning, and covers several real-world case studies.

Live commerce with social platforms and data streaming

Live commerce requires a great customer experience end to end. Most actions and data correlations should or even have to happen in real time. Data correlation requires connectivity to the social platforms, the live commerce sales platform, and many other backend processes and applications.

Social commerce requires the right action at the right time. Requirements include:

  • Interact with the customer during the show.
  • Recommend products that need to be sold.
  • Provide context-specific pricing.
  • All automated. In real-time. At scale.

Here is an example architecture for a decentralized, scalable, real-time live commerce infrastructure powered by Kafka and its ecosystem:

Live Commerce in Retail with Data Streaming powered by Apache Kafka

A potentially enormous impact on future live commerce platforms is the metaverse and new payment and social functionality leveraging crypto platforms. This is its own topic, but most crypto platforms are powered by data streaming with Apache Kafka at its heart.

New customer stories for data streaming in the retail industry

So much innovation is happening in the retail sector. Automation and digitalization change how we search and buy products and services, communicate with partners and customers, provide hybrid shopping models, and more.

Most retail enterprises use a cloud-first approach to improve time-to-market, increase flexibility, and focus on business logic instead of operating IT infrastructure.

Here are a few customer stories from worldwide retail enterprises across industries:

  • Walmart: Supply chain optimization for replenishment from warehouse to retail stores with data consistency across batch and real-time applications
  • Albertsons: Central integration data hub and loyalty platform to keep customers for life with a scalable supply chain, revamped customer experience, and new retail media network
  • AO.com: Hyper-personalized retail experience with real-time clickstream analytics while the customer is in the (online) store
  • Otto: Data exchange with a domain-driven design for true decoupling, faster time-to-market, and data privacy (GDPR) compliance within a multi-cloud enterprise architecture
  • BigCommerce: Cloud-native eCommerce platform that provides services on the cloud with analytics and advice for merchants
  • WhatNot: A social live auctions platform with interactive selling and metaverse / augmented reality capabilities

Resources to learn more

This blog post is just the starting point. Learn more about data streaming in the retail industry in the following on-demand webinar recording, the related slide deck, and further resources, including pretty cool lightboard videos about use cases.

On-demand video recording

The video recording explores the retail industry’s trends and architectures for data streaming. The primary focus is the data streaming case studies. Check out our on-demand recording:

Video: The State of Data Streaming for Retail in 2023

Slides

If you prefer learning from slides, check out the deck used for the above recording:

Fullscreen Mode

Case studies and lightboard videos for data streaming in retail

The state of data streaming for retail is fascinating. New use cases and case studies come up every month. This includes better data governance across the entire organization, collecting and processing data from location-based services and mobile apps in real-time, data sharing and B2B partnerships with Open APIs for new business models, and many more scenarios.

We recorded lightboard videos showing the value of data streaming simply and effectively. These five-minute videos explore the business value of data streaming, related architectures, and customer stories. Stay tuned; I will update the links in the next few weeks and publish a separate blog post for each story and lightboard video.

And this is just the beginning. Every month, we will talk about the status of data streaming in a different industry. Manufacturing was the first. Financial services second, then retail, telcos, gaming, and so on…

Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post The State of Data Streaming for Retail appeared first on Kai Waehner.

]]>
Real-Time Supply Chain with Apache Kafka in the Food and Retail Industry https://www.kai-waehner.de/blog/2022/02/25/real-time-supply-chain-with-apache-kafka-in-food-retail-industry/ Fri, 25 Feb 2022 08:13:14 +0000 https://www.kai-waehner.de/?p=4309 This blog post explores real-world deployments across the end-to-end supply chain powered by data streaming with Apache Kafka to improve business processes with real-time services. The examples include manufacturing, logistics, stores, delivery, restaurants, and other parts of the business. Case studies include Walmart, Albertsons, Instacart, Domino's Pizza, Migros, and more.

The post Real-Time Supply Chain with Apache Kafka in the Food and Retail Industry appeared first on Kai Waehner.

]]>
The supply chain in the food and retail industry is complex, error-prone, and slow. This blog post explores real-world deployments across the end-to-end supply chain powered by data streaming with Apache Kafka to improve business processes with real-time services. The examples include manufacturing, logistics, stores, delivery, restaurants, and other parts of the business. Case studies include Walmart, Albertsons, Instacart, Domino’s Pizza, Migros, and more.

The supply chain in the food and retail industry

The food industry is a complex, global network of diverse businesses that supplies most of the food consumed by the world’s population. It is far beyond the following simplified visualization 🙂

The Food and Retail Industry Simplified

The term food industry covers a series of industrial activities directed at the production, distribution, processing, conversion, preparation, preservation, transport, certification, and packaging of foodstuffs.

Today’s food industry has become highly diversified, with manufacturing ranging from small, traditional, family-run activities that are highly labor-intensive to large, capital-intensive, and highly mechanized industrial processes.

This post explores several case studies with use cases and architectures to improve supply chain and business processes with real-time capabilities.

Most of the following companies leverage Kafka for various use cases. They sometimes overlap and compete. I structured the next sections to choose specific examples from each company. This way, you see a complete real-time supply chain flow using data streaming examples from different industries related to the food and retail business.

Why do the food and retail supply chains need real-time data streaming using Apache Kafka?

Before I start with the real-world examples, let’s discuss why I thought this was the right time for this post to collect a few case studies.

In February 2022, I was in Florida for a week for customer meetings. Let me report on my inconvenient experience as a frequent traveler and customer. Coincidentally, all this happened within one weekend.

A horrible weekend travel experience powered by manual processes and batch systems

  • Issue #1 (hotel): While I canceled a hotel night a week ago, I still got emails for buying upgrades from the batch system of the hotel’s booking engine. Contrarily, I did not get promotions for upgrading my new booking at the same hotel chain.
  • Issue #2 (clothing store): The Point-of-sale (POS) was down because of a power outage and no connectivity to the internet. The POS could not process payments without the internet. People left without buying items. I needed the new clothes (as my hotel also did not provide laundry service because of the limited workforce).
  • Issue #3 (restaurant): I went to a steak house to get dinner. Their POS was down, too, because of the same power outage. The waiter could not take orders. The separate software in the kitchen could not receive manual orders from the waiter either.
  • Issue #4 (restaurant): After 30 minutes of waiting, I ordered and had dinner. However, the dessert I chose was not available. The waiter explained that supply chain issues don’t provide this food, but the restaurant could never update the paper menu or online PDF.
  • Issue #5 (clothing store): After dinner, I went back to the other store to buy my clothes. It worked. The salesperson asked me for my email address to give a 15 percent discount. I wonder why they don’t see my loyalty information already. Maybe via a location based-service. Or at least, I would expect to pay and log in via my mobile app.
  • Issue #6 (hotel): Back in the hotel, I check out. The hotel did not add my loyalty bonus (discount on food in the restaurant) as the system only supports batch integration of other applications. Also, the receipt could not display loyalty information and the correct receipt value. I had to accept this, even though this is not even legal by German tax law.

This story shows why the digital transformation is crucial across all parts of the food, retail, and travel industry. I have already covered use cases and architecture for data streaming with Apache Kafka in the Airline, Aviation, and Travel Industry. Hence, this post focuses more on food and retail. But the concepts do not differ across these industries.

As you will see in the following sections, optimizing business processes, automating tasks, and improving customer experience is not complicated with the right technologies.

Digital transformation with automated business processes and real-time services across the supply chain

With this story in mind, I thought it was a good time to share various real-world examples across the food and retail supply chain. Some companies have already started with innovation. They digitalized their business processes and built innovative new real-time services.

The Real-Time Food and Retail Supply Chain powered by Apache Kafka

Food processing machinery for IoT analytics at Baader

BAADER is a worldwide manufacturer of innovative machinery for the food processing industry. They run an IoT-based and data-driven food value chain on Confluent Cloud.

The Kafka-based infrastructure provides a single source of truth across the factories and regions across the food value chainBusiness-critical operations are available 24/7 for tracking, calculations, alerts, etc.

Food Supply Chain at Baader with Apache Kafka and Confluent Cloud

The event streaming platform runs on Confluent Cloud. Hence, Baader can focus on building innovative business applications. The serverless Kafka infrastructure provides mission-critical SLAs and consumption-based pricing for all required capabilities: messaging, storage, data integration, and data processing.

MQTT provides connectivity to machines and GPS data from vehicles at the edge. Kafka Connect connectors integrate MQTT and other IT systems, such as Elasticsearch, MongoDB, and AWS S3. ksqlDB processes the data in motion continuously.

Check out my blog series about Kafka and MQTT for other related IoT use cases and examples.

Logistics and track & trace across the supply chain at Migros

Migros is Switzerland’s largest retail company, largest supermarket chain, and largest employer. They leverage MQTT and Kafka for real-time visualization and processing of logistics and transportation information.

As Migros explored in a Swiss Kafka meetup, they optimized their supply chain with a single data streaming pipeline, not just for real-time purposes but also for use cases that require replaying all historical events of the whole day.

The goal was to provide “one global optimum instead of four local optima”. Specific logistics use cases include forecasting the truck arrival time and rescheduling truck tours.

Optimized inventory management and replenishment at Walmart

Walmart operates Kafka across its end-to-end supply chain at a massive scale. The VP of Walmart Cloud says, “Walmart is a $500 billion in revenue company, so every second is worth millions of dollars. Having Confluent as our partner has been invaluable. Kafka and Confluent are the backbone of our digital omnichannel transformation and success at Walmart.”

The real-time supply chain includes Walmart’s inventory system. As part of that infrastructure, Walmart built a real-time replenishment system:

Walmart Replenishment System and Real Time Inventory

Here are a few notes from Walmart’s Kafka Summit talk covering this topic.

  • Caters millions of its online and walk-in customers
  • Ensures optimal availability of needed assortment and timely delivery on online fulfillment
  • 4+ billion messages in 3 hours generate an order plan for the entire network of Walmart stores with great accuracy for ordering decisions made daily
  • Apache Kafka as the data hub and for real-time processing
  • Apache Spark for micro-batches

The business value is enormous from a cost and operations perspective: cycle time reduction, accuracy, speed reduced complexity, elasticity, scalability, improved resiliency, reduced cost.

And to be clear: This is just one of many Kafka-powered use cases at Walmart! Check out other Kafka Summit presentations for different use cases and architectures.

Omnichannel order management in restaurants at Domino’s Pizza

Domino’s Pizza is a multinational pizza restaurant chain with ~17,000 stores. They successfully transformed from a traditional pizza company to a data-driven e-commerce organization. They focus on a data-first approach and a relentless customer experience.

Domino’s Pizza provides real-time operation views to franchise owners. This includes KPIs, such as order volume by channel and store efficiency metrics across different ordering channels.

Domino’s Pizza mentions the following capabilities and benefits of their real-time data hub to optimize their supply chain:

  • Improve store operational real-time analytics
  • Support global expansion goals via legacy IT modernization
  • Implement more personalized marketing campaigns
  • Real-time single pane of glass

Location-based services and upselling at the point-of-sale at Royal Caribbean

Royal Caribbean is a cruise line. It operates the four largest passenger ships in the world. As of January 2021, the line operates twenty-four ships and has six additional ships on order.

Royal Caribbean implemented one of Kafka’s most famous use cases at the edge. Each cruise ship has a Kafka cluster running locally for use cases, such as payment processing, loyalty information, customer recommendations, etc.:

Swimming Retail Stores at Royal Caribbean with Apache Kafka

Edge computing on the ship is crucial for an efficient supply chain and excellent customer experience at Royal Caribbean:

  • Bad and costly connectivity to the internet
  • The requirement to do edge computing in real-time for a seamless customer experience and increased revenue
  • Aggregation of all the cruise trips in the cloud for analytics and reporting to improve the customer experience, upsell opportunities, and many other business processes

Hence, a Kafka cluster on each ship enables local processing and reliable mission-critical workloads. The Kafka storage guarantees durability, no data loss, and guaranteed ordering of events – even though they are processed later. Only very critical data is sent directly to the cloud (if there is connectivity at all). All other information is replicated to the central Kafka cluster in the cloud when the ship arrives in a harbor for a few hours. A stable internet connection and high bandwidth are available before leaving for the next trip again.

Learn more about use cases and architecture for Kafka at the edge in its dedicated blog post.

Recommendations and discounts using the loyalty platform at Albertsons

Albertsons is the second-largest American grocery company with 2200+ stores and 290,000+ employees. They also operate pharmacies in many stores.

Albertsons operates Kafka powered by Confluent as the real-time data hub for various use cases, including:

  • Inventory updates from 2200 stores to the cloud services in near real-time to maintain visibility in the health of the stores and to ensure well managed out-of-stock and
    substitution recommendations for digital customers
  • Distributing offers and customer clips to each store in near real-time and improving store checkout experience.
  • Feed the data in real-time to analytics engines for supply chain order forecasts, warehouse order management, delivery, labor forecasting and management, demand planning, and inventory management.
  • Ingesting transactions near real-time to the data lake for batch workloads to generate dashboards for associates and business, training data models, and hyper-personalization.

As an example, here is how Albertsons data streaming architecture empowers their real-time loyalty platform:

Albertsons Loyalty Use Case in Grocery Stores powered by Apache Kafka

Learn more about Albertsons’ innovative real-time data hub by watching the on-demand webinar I did with two of their leading architects recently.

Payment processing and fraud detection at Grab

Grab is a mobility service in Asia, similar to Uber and Lyft in the United States or FREE NOW (former mytaxi) in Europe. The Kafka ecosystem heavily powers all of them. Like FREE NOW,  Grab leverages serverless Confluent Cloud to focus on business logic.

As part of this, Grab has built GrabDefence. This internal service provides a mission-critical payment and fraud detection platform. The platform leverages Kafka Streams and Machine Learning for stateful stream processing and real-time fraud detection at scale.

GrabDefence - Fraud Detection with Kafka Streams and Machine Learning in Grab Mobility ServiceGrab struggles with billions of fraud and safety detections performed daily for millions of transactions. Companies lose 1.6% in fraud in Southeast Asia. Therefore, Grab’s data science and engineering team built a platform to search for anomalous and suspicious transactions and identify high-risk individuals.

Here is one fraud example: An individual who pretends to be both the driver and passenger and makes cashless payments to get promotions.

Delivery and pickup at Instacart

Instacart is a grocery delivery and pickup service in the United States and Canada. The service orders groceries from participating retailers, with the shopping being done by a personal shopper.

Instacart requires a platform that enables elastic scale and fast, agile internal adoption of real-time data processing. Hence, Confluent Cloud was chosen to focus on business logic and roll out new features with fast time-to-market. As Instacart said, during the Covid pandemic, they had to “handle ten years’ worth of growth in six weeks”. Very impressive and only possible with cloud-native and serverless data streaming.

Learn more in the recent Kafka Summit interview between Confluent found Jun Rao and Instacart.

Real-time data streaming with Kafka beats slow data everywhere in the supply chain

This post showed various real-world deployments where real-time information increases operational efficiency, reduces the cost, or improves the customer experience.

Whether your business domain cares about manufacturing, logistics, stores, delivery, or customer experience and loyalty, data streaming with the Apache Kafka ecosystem provides the capability to be innovative at any scale. And in the cloud, serverless Kafka makes it even easier to focus on time to market and business value.

How do you optimize your supply chain? Do you leverage data streaming? Is Apache Kafka the tool of choice for real-time data processing? Or what data streaming technologies do you use? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Real-Time Supply Chain with Apache Kafka in the Food and Retail Industry appeared first on Kai Waehner.

]]>
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse https://www.kai-waehner.de/blog/2021/12/17/kafka-live-commerce-transform-retail-shopping-metaverse/ Fri, 17 Dec 2021 06:46:24 +0000 https://www.kai-waehner.de/?p=3631 Live commerce combines instant purchasing of a featured product and audience participation. This blog post explores the need for real-time data streaming with Apache Kafka between applications to enable live commerce across online stores and brick & mortar stores across regions, countries, and continents in any retail business. The discussion covers several buildings blocks of a live commerce enterprise architecture, including transactional data processing, omnichannel, natural language processing, augmented reality, edge computing, and more.

The post Kafka for Live Commerce to Transform the Retail and Shopping Metaverse appeared first on Kai Waehner.

]]>
Live commerce combines instant purchasing of a featured product and audience participation. The covid pandemic accelerated this trend. Live commerce emerged in China but arrived in the west across industries, no matter if you sell fashion, toys, cars, digital features, or anything else. This blog post explores the need for real-time data streaming with Apache Kafka between applications to enable live commerce across online stores and brick & mortar stores across regions, countries, and continents.

The discussion covers several buildings blocks of a live commerce enterprise architecture. Retail topics include omnichannel retail, hyper-personalized customer communication, transactional data processing, and innovative entertainment with Augmented Reality. Other technical aspects cover the replayability of historical data and correlation with real-time events, AI and Machine Learning applied to real-time data, and edge analytics in the retail store.

Apache Kafka to Transform Retail and Shopping with the Live Commerce Metaverse

Live commerce transforms the retail experience

“The arrival of Alibaba’s Taobao Live in May 2016 marked the opening of a new chapter in sales. The Chinese retail giant had pioneered a powerful new approach: linking up an online live stream broadcast with an e-commerce store to allow viewers to watch and shop at the same time,” reports McKinsey in a great article about the shopping revolution. They explain: “Live commerce combines instant purchasing of a featured product and audience participation through a chat function or reaction buttons. In China, live commerce has transformed the retail industry and established itself as a major sales channel in less than five years.”

McKinsey shows the impressive growth of live commerce in China in the following diagram:

McKinsey Live Commerce Statistics 2020

In the meantime, live commerce arrived in the western world. The earliest adopters outside of China are the German beauty retailer Douglas, fashion retailer Tommy Hilfiger in Europe and the US, and the US retail giant Walmart. The global Covid pandemic was a huge driver, too.

Live commerce via social apps everywhere

Live commerce helps brands and retailers to create value and increase online revenue. Online marketplaces, live auctions, influencer streaming, and live events such as a product launch drive sales in various ways:

  • More web traffic and an increased audience
  • Increased conversion rates via interactive discussions combined with time-limited tactics such as one-off coupons
  • Context-specific upsell strategies in real-time to increase the average basket size
  • Improved brand appeal and differentiation by providing an innovative and entertaining shopping experience

AliExpress Live App

For example, AliExpress, an Alibaba subsidiary, launched a live commerce service called “AliExpress Live”, which saw as many as 320,000 goods being added to the cart per one million views during a single live streaming session. The growth numbers and conversation rates are insane compared to the traditional retail history. It is no surprise that many retailers, auction houses, and social platforms want to get a piece of this enormous cake.

Buy now, pay later (BNPL) as an accelerator for live commerce

Point-of-sale (POS) financing services in the United States have grown significantly over the past 24 months, especially since the onset of COVID-19. Trends fueling growth include digitization, rising merchant adoption, increasing repeat usage among younger consumers, and an expanding set of players targeting lending at point of sale, a service also known as “buy now, pay later.reports McKinsey.

We can see this trend across the globe. Companies like Klarna, Afterpay, and Paypal added BNPL to their primary products and apps. It is just one click away and often even set as the default payment option.

The following diagram shows the “Buy Now, Pay Later Adoption by Generation, 2019-2021” from Cornerstone Advisors:

Buy Now Pay Later BNPL Adoption by Generation 2019-2021

BNPL is an excellent combination with live commerce. People can buy cool stuff even though they cannot afford it. A scary trend for people, but a massive opportunity for retailers (moral point of view excluded).

Let’s now look at data streaming, and why this is so relevant for live commerce.

Real-time data streaming with Apache Kafka for live commerce

Real-time data beats slow data. That’s true for almost every business scenario:

Real-time Data beats Slow Data in Retail

Live commerce contains not just the active live sales activity but the whole end-to-end sales process, including payment, order fulfillment, shipping, and much more. Hence, don’t expect buying a live commerce COTS sales platform will solve all your challenges!

The live commerce retail experience is a stream of events

Live commerce requires a great customer experience end to end. Most actions and data correlations should or even have to happen in real-time. Data correlation requires connectivity to the social platforms, the live commerce sales platform, and many other backend processes and applications:

Live Commerce with Data in Motion

Several concepts play a role in live commerce to provide a good customer experience and increased conversion rate compared to traditional retail techniques:

  • Integration with backend systems such as real-time inventory, CRM, ERP, 3rd payment providers, loyalty platform, and so on to provide the correct contextual information to any consumer application
  • Real-time data correlation for intelligent communication and pricing
  • Omnichannel user interfaces for cross-device experiences
  • The replayability of historical events for context-specific next best actions and recommendations
  • Automation of (some) communication with chatbots and other natural language processing (NLP) for faster response times and cost reduction
  • Enhanced and entertaining customer experiences with groundbreaking technologies such as Augmented Reality (AR) and Virtual Reality (VR)
  • Edge analytics for location-based services and deeper integration into brick and mortar stores while the customer is attending live events.

Live commerce in motion with event streaming and Kafka

Live commerce requires the right action at the right time. Requirements include:

  • Interact with the customer during the show.
  • Recommend products that need to be sold.
  • Provide context-specific pricing.
  • All automated. In real-time. At scale.

Some businesses buy a live commerce platform. Others differentiate by building their own. Live commerce only works well if all the other applications are integrated in real-time. Hence, event streaming with Kafka plays a pivotal role in many next-generation retail architectures – no matter if you build your live commerce platform or buy (and integrate) a 3rd party product or cloud service.

Here is an example architecture for a decentralized, scalable, real-time live commerce infrastructure powered by Kafka and its ecosystem:

Live Commerce in Retail with Data in Motion powered by Apache Kafka

Building blocks for a live commerce architecture powered by Kafka

From an event streaming perspective, here are some potential building blocks for a live commerce architecture (you don’t need all, and there can be others, too):

  • Omnichannel retail
  • The replayability of historical data
  • AI and Machine Learning applied to real-time data
  • Hyper-personalized customer communication
  • Transactional and analytical data processing
  • Groundbreaking entertainment with augmented reality / virtual reality
  • Edge analytics in the retail store

Let’s explore each building block in more detail in the following subsections.

Omnichannel real-time customer experience with true decoupling

One of Kafka’s key strengths is the true decoupling between producers and consumers to allow omnichannel retail architectures. As Kafka stores events as long as you want (from minutes to years), a consumer can process the data at its own pace, either real-time, near real-time, batch, or with a request-response call:

Omnichannel Retail with Apache Kafka - Customer 360 Sales and Aftersales

Domain-driven Design (DDD) and truly decoupled microservices are much easier to build with Kafka than using traditional message queues or ETL/ESB tools. Kafka enables a truly decentralized Data Mesh architecture with any combination of technologies, products, and cloud services.

Replayability to reuse and correlate historical data with real-time events

The storage capability of Kafka is helpful for many use cases. From a technical perspective, the replayability of historical events allows scenarios like:

  • New consumer application
  • Error-handling
  • Compliance / regulatory processing
  • Query and analyze existing events
  • Schema changes in an analytics platform
  • Model training

Use Cases for Replay and Reprocessing Historical Events with Apache Kafka

From a business perspective, the replay of historical events helps to

  • improve the next live event by analyzing past events (including customer reactions, Q&A, order history, etc.),
  • estimate the demand for live events and correlate it to real-time inventory
  • enable data science teams to build new algorithms for marketing and sales strategies
  • many other use cases that a retail business expert might become after seeing this possibility of accessing historical information in guaranteed order with timestamps and correlated to customer IDs.
A long retention time and Tiered Storage for an initial bootstrap

The retention time in Kafka can be configured to be months, years, or even forever. The replay capability solves the challenge of building an initial bootstrap. Don’t underestimate this feature. Most proprietary streaming services (such as AWS Kinesis) and eventing interfaces from cloud services (such as Salesforce) only provide a few days of historical data. Limited retention time kills many replay use cases, as it does not offer the option to perform a one-time snapshot before starting the real-time CDC.

Tiered Storage for Kafka makes long-term storage in Kafka cost-efficient and scalable, been for Terabytes or Petabytes of data. “Can Apache Kafka replace a database, data lake, or lakehouse?” goes into more detail on this discussion.

Conversational AI for cost reduction with chatbots and speech translation

Natural Language Processing (NLP) helps many projects in the real world for service desk automation, customer conversation with a chatbot, content moderation in social networks, and many other use cases. Kafka is the scalable real-time orchestration layer, but often used for additional use cases, such as embedded an analytic model into a Kafka streaming microservice:

Conversational AI NLP and Chatbot with Apache Kafka

NLP within the streaming architecture enables massive cost reductions and shortens the response time in a live commerce infrastructure. NLP adds immense business value even if just 50% of the most fundamental questions in the chat and comments are answered automatically.

I wrote a detailed article that explores how Apache Kafka is used with Machine Learning platforms at the carmaker BMW, the online travel and booking portal Expedia, and the dating app Tinder for reliable real-time conversational AI, NLP, and chatbots.

Real-time sentiment analysis to improve live shows

Related to the above topic, NLP is also helpful to analyze the chat, comments, live surveys, and other feedback in real-time to act proactively during the live event.

Sentiment analysis uses NLP to systematically identify, extract, quantify, and study affective states and subjective information. You can make (manual or automated) real-time decisions on questions such as:

  • Do people like the product?
  • Should we present it differently?
  • Do the structure of the show and the camera view work?
  • Should we focus on other features of the product?
  • Is any immediate emergency action needed, like focusing on different parts or stopping the product presentation?

Sentiment analysis is a prevalent hello world example for AI and Machine Learning. If you search for Kafka-powered examples with any ML framework, most examples show you how to implement sentiment analysis on Twitter data. The adaption to your data set is pretty straightforward regarding the model training, even though the devil lies in the details, of course. Hence, the model training is only a fraction of the real-world challenges in an ML architecture.

Data integration at scale, ML infrastructure monitoring, and reliable model predictions in real-time, and similar challenges often use Kafka’s helpful characteristics to make the ML project successful.

Sony Playstation understands gamer’s sentiment in real-time

Sony Playstation is a great real-world example for sentiment analysis with Kafka. In a Kafka Summit talk, Sony talked about their journey from daily batch jobs to real-time data processing and analytics with Apache Kafka. This enables understanding of gamers’ sentiment by streaming data from social feeds and performing language processing in real-time.

I wrote a detailed article if you want to learn more about deploying anyMachine Learning models in Kafka applications.

Hyper-personalized context-specific customer experience

A hyper-personalized online retail experience turns each customer visit into a one-on-one marketing opportunity. This communication technique is crucial for online stores and can significantly change live commerce, too.

AO.com is an electrical retailer in the UK that implemented a hyper-personalized real-time experience. Event Streaming applications correlate historical customer data with real-time digital signals. This capability maximizes customer satisfaction and revenue growth and increases customer conversions.

Building a hyper-personalized experience requires real-time data integration and correlation at scale. The realization is a journey that takes some time. AO presented their maturity curve of the last few years:

Kafka Journey at AO

Similar to AO.com, imagine how you could improve your live commerce use cases with hyper-personalized real-time customer communication.

Let’s talk about one example: Embedding a Lead Scoring Model (LSM) into your real-time conversations with customers can speed up sales engagement and generate conversion. Speed to contact leads with the correct contextual information is critical in live commerce. Insights to lead score, e.g., signals, are essential as well. Recommendations, product discounts, up-and cross-selling go beyond simple business rules and are applied in real-time when it makes the most sense.

Transactional and analytical data processing in motion

Many people still think about Kafka as a system for big data workloads. That’s indeed what it was built for over a decade ago. However, in the meantime, over 50% of use cases I see at our customers are about processing transactions in real-time with the need for zero data loss. Transactional data includes integration with the point of sale (POS), payment processing, fraud detection, CRM and ERP communication, and much more in the retail industry.

eBay Korea – Multi-region Kafka for processing transactional data

Here is a brilliant case study for transactional workloads across multiple regions to ensure full disaster recovery and service stability without any data loss. eBay Korea (acquired by Shinsegae) uses Apache Kafka for live commerce and transactional event streaming:

eBay Korea uses Apache Kafka for Live Commerce and Transactional Event Streaming

More details about eBay Korea’s Kafka deployments are available in the case study.

Augmented Reality to build an entertaining live commerce metaverse

Augmented Reality (AR) and Virtual Reality (VR) get traction across industries beyond gaming – including retail, manufacturing, transportation, and healthcare. Event Streaming plays a key role as scalable real-time integration and orchestration layer for AR and VR applications:

Retail Use Case with Augmented reality and Apache Kafka

Today, most live commerce offerings “just” use standard mobile apps. However, AR and VR make the customer experience much more fun. It allows closer interaction with the salesperson (a beloved celebrity or influencer).

We built a demo that integrates an innovative AR mobile shopping experience with the backend systems via the event streaming platform Apache Kafka.

The beauty of an event-driven architecture combined with patterns like Data Mesh enables one to onboard new features or technologies step-by-step. There is no need for a big bang or integration of a monolithic proprietary product to provide such a solution.

Edge analytics in the retail store

Most retail companies have a cloud-first strategy to focus on business problems using an agile, elastic, serverless infrastructure.

However, low-latency use cases, cost-efficiency in a connected world, or lousy internet connectivity (i.e., stores in malls) require edge computing outside a data center or cloud. Hence, many retailers deploy application logic, including event streaming at the edge:

Event Streaming with Apache Kafka at the Edge in the Smart Retail Store

A Hybrid Streaming Architecture for Smart Retail Stores with Apache Kafka” explores this use case in more detail. A key benefit is that the same architecture, technologies, APIs, and software retailers use in the cloud can be deployed on small computers in the retail store to enable edge computing. Use cases include location-based services, up-selling and discounting, integration with on-site devices (point of sale, sales machines, fun devices, whatever).

I have written plenty of articles about this already, such as use cases for event streaming at the edge and an infrastructure checklist for Apache Kafka at the edge.

Live commerce requires real-time data streaming

The building blocks in this blog post covered various concepts used in a live commerce enterprise architecture. One thing is clear: You can buy a live commerce product or build your own. But the retail innovation only works if data is moved between different applications in real-time and used for data correlation at the right time and context.

Event streaming plays a crucial role in modern retail architectures. Therefore, it is no surprise that Apache Kafka can help to build your next-generation live commerce infrastructure. eBay Korea is a great success story for deploying transactional data flows across multiple regions for zero data loss, even with a disaster.

Do you already sell your products via live commerce? What technologies and architectures do you use? Are event streaming and Kafka part of the architecture? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Kafka for Live Commerce to Transform the Retail and Shopping Metaverse appeared first on Kai Waehner.

]]>
Augmented Reality Demo with Apache Kafka https://www.kai-waehner.de/blog/2021/02/23/apache-kafka-augmented-reality-ar-vr-retail-demo-arkit-unity-unreal-engine/ Tue, 23 Feb 2021 10:56:59 +0000 https://www.kai-waehner.de/?p=3201 Augmented Reality (AR) and Virtual Reality (VR) get traction across industries far beyond gaming - including retail, manufacturing, transportation, and healthcare. This blog post explores a retail demo that integrates a cutting-edge AR mobile shopping experience with the backend systems via the event streaming platform Apache Kafka.

The post Augmented Reality Demo with Apache Kafka appeared first on Kai Waehner.

]]>
Augmented Reality (AR) and Virtual Reality (VR) get traction across industries far beyond gaming. Retail, manufacturing, transportation, healthcare, and other verticals leverage it more and more. This blog post explores a retail demo that integrates a cutting-edge augmented reality mobile shopping experience with the backend systems via the event streaming platform Apache Kafka.

Augmented Reality AR VR and Apache Kafka with ARKit Unity Unreal Engine

Augmented Reality (AR) and Virtual Reality (VR)

Augmented reality (AR) is an interactive experience of a real-world environment where the objects in the real world are enhanced by computer-generated perceptual information. AR is a system that fulfills three basic features: a combination of real and virtual worlds, real-time interaction, and accurate 3D registration of virtual and real objects.

Pokemon Go from Nintendo is one of the most famous AR applications used by millions of people. The adoption of AR increased a lot in the last few years as modern smartphones and tablets support AR out-of-the-box. Fun fact: Pokemon Go is also a great success story for Google and Kubernetes to provide an elastic and scalable infrastructure: Bringing Pokémon GO to life on Google Cloud.

.Augmented reality alters one’s ongoing perception of a real-world environment, whereas Virtual Reality (VR) completely replaces the user’s real-world environment with a simulated one. While the below demo is for AR, the combination with Kafka is possible for VR use cases similarly.

Apache Kafka and Augmented Reality SDKs

Before we jump to the use case, let’s be clear AR and VR are not directly related to Apache Kafka. However, both are complementary to implement innovative use cases. Most AR and VR applications require communication with backend services and other users to provide the expected user experience. In real-time. At scale. 24/7. That’s why Kafka is a perfect fit for AR and VR, similarly to all the other use cases across all verticals where event streaming helps.

AR and VR applications are built with specific 3D engines. This includes:

  • Game engines (that move into other markets more and more) like Unity 3D, Unreal Engine, CryEngine, Lumberyard
  • Physics engines like Bullet, Havok, PhysX
  • Vertical solutions, for instance, Industry 4.0 / Industrial IoT (IIoT) products from PTC, Siemens, or General Electric
  • AR/VR SDK that is either included in the 3D engines or separate technologies such as Apple’s ARKit for iOS devices, ARCore from Google, Vuforia, or EasyAR

Fun fact: Many of these engines and vertical solutions do not just sell the software. They also provide additional services on top of their products. For the same reason, as in other companies, these services’ central nervous system is often Apache Kafka. For instance, Unity is a heavy user of Apache Kafka in Confluent Cloud to handle on average about half a million events per second to process millions of dollars of monetary transactions for their in-game purchase and advertisement services.

Use Cases for Kafka and Augmented Reality

This blog post explores a use case from the retail industry: Online shopping with an enhanced user experience and location-based services. Similar examples exist across industries:

  • Mobility services: Enhanced customer experience and location-based services for navigation, logistics, …
  • Manufacturing: Education and training on simulated machines, equipment repair
  • Gaming: Augmented real-world environments
  • Smart City: Simulated planning of urban, electricity, water, …
  • Industry-agnostic: Improved planning (before production), ergonomy simulation tests,
  • Etc.

The heart of the applications is the AR/VR hardware and app.

The data’s central nervous system is Apache Kafka to integrate with other systems and correlate the aggregated data sets continuously in real-time.

The Architecture of a Cutting Edge Retail Example

Kafka is the new black in retail across various use cases. Learn more details about event streaming in retail for omnichannel customer 360 experiences and hybrid Kafka retail architectures with edge deployments in the smart retail store.

The following example demonstrates online shopping with an enhanced user experience and location-based services. Customers can buy anything they see. Anywhere. The customer makes a picture of the thing. The backend processes the picture and augments it with additional information such as a digital picture, product details, and different shops to buy it.

This example shows how important the integration of the AR app with the company’s backend services is:

Retail Use Case with Augmented reality and Apache Kafka

A few notes on the architecture:

  • The AR enhancements are possible on the client device or server-side. This depends on the business case and technical environment. For instance, the app could store images for products and only load updated details like the server’s price.
  • Kafka supports the handling of large messages (like images from the phone camera). Nevertheless, a comparison of the trade-offs is inevitable.
  • The example uses Swift on the mobile app and Kafka-native stream processing tools (Kafka Streams and ksqlDB) on the backend side. However, the architecture is very flexible. Any other technology or 3rd party SaaS service is possible. That’s the beauty of Kafka’s strong support for domain-driven design (DDD).

Live Demo Video

The following video demonstrates the above use case of a smart retail experience with AR. Kudos to my colleague Carsten Muetzlitz who implemented the demo:

Augmented Reality Retail Live Demo with Apache Kafka

The demo uses Swift, ARKit, Xcode, and Confluent Cloud. Hence, it only supports Apple iOS devices. But hey, it is just a demo to share the idea of building an AR app and integrate/correlate the data in real-time at scale with other backend systems using the Kafka ecosystem. Confluent Cloud provides the serverless SaaS offering so that the developer can focus on building the AR app and business applications in the backend.

Augmented Reality + Apache Kafka = Science Fiction?!

Augmented Reality and Virtual Reality are still in the early stages. Use cases like the online shopping experience – where you can buy anything you see anywhere – are often just a vision. But it is not far away from reality.

Kafka deployments exist across industries and business units already. If you want to build an innovative AR or VR app, it is just another microservices and mobile app to connect to the Kafka infrastructure. The integration of AR/VR into the supply chain is pretty straightforward. Kafka is typically already connected to the payment infrastructure, real-time inventory, and so on.

What are your experiences and plans for event streaming together with AR/VR use cases? Did you already build applications with Apache Kafka? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Augmented Reality Demo with Apache Kafka appeared first on Kai Waehner.

]]>
Omnichannel Retail and Customer 360 in Real Time with Apache Kafka https://www.kai-waehner.de/blog/2021/02/08/omnichannel-retail-customer-360-apache-kafka-edge-cloud-manufacturing-aftersales/ Mon, 08 Feb 2021 11:03:58 +0000 https://www.kai-waehner.de/?p=3090 Event Streaming with Apache Kafka disrupts the retail industry. Walmart's real-time inventory system and Target's omnichannel distribution and logistics are two great examples. This blog post explores a key use case for postmodern retail companies: Real-time omnichannel retail and customer 360.

The post Omnichannel Retail and Customer 360 in Real Time with Apache Kafka appeared first on Kai Waehner.

]]>
Event Streaming with Apache Kafka disrupts the retail industry. Walmart’s real-time inventory system and Target’s omnichannel distribution and logistics are two great examples. This blog post explores a key use case for postmodern retail companies: Real-time omnichannel retail and customer 360.

Omnichannel Retail and Customer 360 with Apache Kafka at the Edge and in the Cloud

Disruption of the Retail Industry with Apache Kafka

Various deployments across the globe leverage event streaming with Apache Kafka for very different use cases. Consequently, Kafka is the right choice, whether you need to optimize the supply chain, disrupt the market with innovative business models, or build a context-specific customer experience.

I discussed the use cases of Apache Kafka in retail in a dedicated blog post: “The Disruption of Retail with Event Streaming and Apache Kafka“. Learn about the real-time inventory system from Walmart, omnichannel distribution and logistics at Target, context-specific customer 360 at AO.com, and much more.

This post explores a specific use case in more detail: Real-time omnichannel retail and customer 360 with Apache Kafka. Learn about a possible architecture to deploy this scenario across the whole supply chain: From design and manufacturing to sales and aftersales. The architecture is very flexible. Any infrastructure (one or multiple cloud providers and/or on-premise data centers, bare metal, containers, Kubernetes) can be used.

‘My Porsche’ – A Global Omnichannel Platform for customers, fans, and enthusiasts

Let’s start with a great example from the automotive industry: ‘My Porsche’ is the innovative and modern digital omnichannel platform from Porsche for keeping a great relationship with their customers. Porsche can describe it better than me:

The way automotive customers interact with brands has changed, accompanied by a major transformation of customer needs and requirements. Today’s brand experience expands beyond the car and other offline touchpoints to include various digital touchpoints. Automotive customers expect a seamless brand experience across all channels — both offline and online.

My Porsche Digital Service Platform Omnichannel

The ‘Porsche Dev‘ group from Porsche published a few great posts about their architecture. Here is a good overview:

My Porsche Architecture with Apache Kafka

Kafka provides real decoupling between applications. Hence, Kafka became the defacto standard for microservices and Domain-driven Design (DDD) in many companies. It allows building independent and loosely coupled but scalable, highly available, and reliable applications.

That’s exactly what Porsche describes for their usage of Apache Kafka through its supply chain:

“The recent rise of data streaming has opened new possibilities for real-time analytics. At Porsche, data streaming technologies are increasingly applied across a range of contexts, including warranty and sales, manufacturing and supply chain, connected vehicles, and charging stations writes Sridhar Mamella (Platform Manager Data Streaming at Porsche).

As you can see in the above architecture, there is no need to have a fight between REST / HTTP and Event Streaming / Kafka enthusiasts! As I explained in detail before, most microservice architectures need Apache Kafka and API Management for REST. HTTP and Kafka complement each other very well!

Last but not least, a great podcast link: The Porsche guys talked about “Apache Kafka and Porsche: Fast Cars and Fast Data” to explain their event streaming platform called Streamzilla.

Omnichannel Retail and Customer 360 with Kafka

After exploring an example, let’s take a look at an omnichannel architecture with Apache Kafka from different points of view: Marketing+Sales, Analytics, and Manufacturing.

The good news first: All business units can use the same Kafka cluster! That’s actually pretty common and a key reason why Kafka is used so much today: Events generated in one domain are consumed for very different use cases by different departments and stakeholders. The real decoupling with Kafka’s storage is quite different from HTTP/REST web services or traditional message queues like RabbitMQ.

Marketing, Sales, and Aftersales (aka Customer 360)

From a marketing and sales perspective, companies need to correlate all the customer actions. No matter how old. No matter if online or offline channels. The following shows such a retail process for selling a car:

Omnichannel Retail with Apache Kafka - Customer 360 Sales and Aftersales

Reporting, Analytics, and Machine Learning

The data science team also uses the same events used for marketing and sales to create reports for management and train analytic models for making better decisions and recommendations in the future. The new, improved analytic model is then deployed back to the production pipeline of the marketing and sales channel:

Omnichannel Retail with Apache Kafka - Reporting Analytics and Machine Learning

In this example, Kafka is used together with TensorFlow for streaming machine learning. The reporting is done via Rockset’s native Kafka integration supporting ANSI SQL. This allows easy integration with business intelligence tools such as Tableau, Qlik, or Power BI in the same way as with other databases and batch data lakes.

Obviously, a key strength of Kafka is that it works together with any other technology. No matter if open source or proprietary. SaaS or self-managed, you choose. And no matter if real-time or batch. In most real-world Kafka deployments, the connectivity to systems changes over time.

Design and Manufacturing

Here is a pretty cool fact in some postmodern omnichannel retail solutions: Customers can monitor the whole manufacturing and delivery process. When buying a car, you can track the manufacturing steps on your mobile app to know the exact status of your future car. When you finally pick up the car, all future data is also stored (for you and the carmaker) for various use cases:

Omnichannel Retail with Apache Kafka - Digital Twin Design Manufacturing

 

Of course, this does not end when you picked up the car. Aftersales will become super important for carmakers in the future. Check out what you can do with a Tesla (which is actually more software on wheels than just hardware) today, already.

Apache Kafka as Digital Twin for Customer 360 and Omnichannel Aftersales

A key buzzword that has to be mentioned here is the digital twin or digital thread. The automotive industry is the perfect example for building a digital twin of each car. Both the carmaker and the buyer can get huge benefits. Use cases like predictive maintenance or context-specific upselling of new features are now possible.

Apache Kafka is the heart of this digital twin in many cases. Often, projects also combine Kafka with other technologies. For instance, Kafka can be the data integration pipeline into MongoDB. The latter stores the data of the cars. Check out the following two posts to learn more about this topic:

Omnichannel Customer 360 with Kafka for Happy Customers and Increased Revenue

In conclusion, Event Streaming with Apache Kafka plays a key role in this evolution of re-inventing the retail business. Walmart, Target, and many other retail companies rely on Apache Kafka and its ecosystem to provide a real-time infrastructure to make the customer happy, increase revenue, and stay competitive in this tough industry. Omnichannel and customer 360 end-to-end in real-time is key for success in a postmodern retail company. No matter if you operate in your own data center or in one or more public clouds.

What are your experiences and plans for event streaming in the retail industry? Did you already build applications with Apache Kafka for omnichannel, aftersales, customer 360? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Omnichannel Retail and Customer 360 in Real Time with Apache Kafka appeared first on Kai Waehner.

]]>
Infrastructure Checklist for Apache Kafka at the Edge https://www.kai-waehner.de/blog/2021/02/03/kafka-edge-infrastructure-checklist-deployment-outside-data-center/ Wed, 03 Feb 2021 12:39:30 +0000 https://www.kai-waehner.de/?p=3085 This blog post explores an infrastructure checklist to build an open, flexible, and scalable event streaming architecture with Apache Kafka at the edge outside data centers.

The post Infrastructure Checklist for Apache Kafka at the Edge appeared first on Kai Waehner.

]]>
Event streaming with Apache Kafka at the edge is getting more and more traction these days. It is a common approach to providing the same open, flexible, and scalable architecture in the cloud and at the edge outside the data center. Possible locations for Kafka edge deployments include retail stores, cell towers, trains, small factories, restaurants, hospitals, stadiums, etc. This post explores a checklist with infrastructure questions you need to check and evaluate if you want to deploy Kafka at the edge.

Infrastructure Checklist for Apache Kafka at the Edge

Apache Kafka at the Edge == Outside the Data Center

I already discussed the concepts and architectures of Kafka at the edge in detail in the past:

This blog post explores a checklist of common infrastructure questions you need to answer and doublecheck before planning to deploy Kafka at the edge.

What is the Edge?

The term ‘edge’ needs to be defined to have the same understanding. When I talk about the edge in the context of Kafka, it means:

  • Edge is NOT a data center, i.e., limited compute, storage, network bandwidth
  • Kafka clients AND the Kafka broker(s) deployed here, not just the client applications
  • Offline business continuity, i.e., the workloads continue to work even if there is no connection to the cloud
  • Often 100+ locations, like restaurants, coffee shops, or retail stores, or even embedded into 1000s of devices or machines
  • Low-footprint and low-touch, i.e., Kafka can run as a normal highly available cluster or as a single broker (no cluster, no high availability); often shipped “as a preconfigured box” in OEM hardware (e.g., Hivecell)
  • Hybrid integration, i.e., most use cases require uni- or bidirectional communication with a remote Kafka cluster in a data center or the cloud

Let’s recap one architecture example that deploys Kafka in the cloud and at the edge: A hybrid event streaming architecture for real-time omnichannel retail and customer 360:

Hybrid Edge to Global Retail Architecture with Apache Kafka

This definition of a ‘Kafka edge deployment‘ can also be summarized as an ‘autonomous edge‘ or ‘disconnected edge‘. On the other side, the ‘connected edge’ means that Kafka clients at the edge connect directly to a remote data center or cloud.

Infrastructure Checklist: How to Deploy Apache Kafka at the Edge?

I talked to 100+ customers and prospects across industries with the need to do edge computing for different reasons, including bad internet connection, reduced cost, low latency requirements, and security implications.

The following discussion points and questions come up all the time. Make sure to discuss them with your project team:

  • What are the use cases for Kafka at the edge? For instance, edge processing (e.g., business logic/analytics), replication to the cloud (uni- or bi-directional), data integration (e.g., 0 to devices, IoT gateways, local databases)?

  • What is the data model, and what the replication scenarios and SLAs (aggregation to “just gather data”, command&control to send data back to the edge, local analytics, etc.)? Check out Kafka-native replication tools, especially MirrorMaker 2 and Confluent’s Cluster Linking.

  • What is the main motivation for doing edge processing (vs. ingestion into a DC/cloud for all processing)? Examples: Low latency requirements, cost-efficiency, business continuity even when offline / disconnected from the cloud, etc.

  • How many “edge sites” do you plan to deploy to (e.g., retail stores, factories, restaurants, trains, …)? This needs to be considered from the beginning. If you want to roll out edge computing to thousands of restaurants, you need a different hardware and automation strategy than deploying to just ten smart factories worldwide.

  • What hardware do you use at the edge (e.g., hardware specifications)? How much memory, disk, CPU, etc., is available? Do you work with a specific hardware vendor? What are the support model and monitoring setup for the edge computers?

  • What network do you use? Is it stable? What is the connection to the cloud? If it is a stable connection (like AWS DirectConnect or Azure ExpressRoute), do you still need Kafka at the edge?

  • What is the infrastructure you plan to run Kafka on at the edge (e.g., operating system, container, Kubernetes, etc.)?

  • Do you need high availability and a ‘real’ Kafka cluster with 3+ brokers? Or is a single broker good enough? In many cases, the latter is good enough to decouple edge and cloud, handle backpressure, and enable business continuity even if the internet connection is gone for some time.

  • What edge protocols do you need to integrate with? is Kafka Connect sufficient with its connectors, or do you need a 3rd party IoT gateway? Common integration points at the edge are OPC UA, MQTT, proprietary PLC, traditional relational databases, files, IoT Gateways, etc.

  • Do you need to process the data at the edge? Kafka-native stream processing with Kafka Streams or ksqlDB is usually a straightforward and lightweight, but still scalable and reliable option. Almost all use cases I have seen at least need some streaming ETL at the edge. For instance, preprocess and filter data so that you only send relevant, aggregated data over the network to the cloud. However, many customers also deploy business applications at the edge, for instance, for real-time model inference.
  • How will fleet management work? Which part of the infrastructure or tool handles the management and operations of the edge machines. In most cases, this is not specific for Kafka but instead handled on the infrastructure level. For instance, if you run a Kubernetes cluster, Rancher might be used to provision and manage the edge clusters, including the Kafka ecosystem. Of course, specific Kafka metrics are also integrated here, for instance via Prometheus if you are using Kubernetes.

Discussing and answering these questions will help you with your planning for Kafka at the edge. Are there any key questions missing? Please let me know and I will update the list.

Kafka at the Edge is the new Black!

Apache Kafka at the edge is a common approach to providing the same open, flexible, and scalable architecture in the cloud and outside the data center. A huge benefit is that the same technology and architecture and be deployed everywhere across regions, sites, and clouds. This is a real hybrid architecture combing edge sites, data centers, and multiple clouds! Discuss the above infrastructure checklist with your team to be successful.

What are your experiences and plans for event streaming with Apache Kafka at the edge? Did you already deploy Apache Kafka on a small node somewhere, maybe even as a single broker setup? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Infrastructure Checklist for Apache Kafka at the Edge appeared first on Kai Waehner.

]]>