Telco Archives - Kai Waehner

Real-Time Data Sharing in the Telco Industry for MVNO Growth and Beyond with Data Streaming

Kai Waehner — Wed, 30 Apr 2025 07:04:07 +0000

The telecommunications industry is entering a new era. Partnerships with MVNOs, IoT platforms, and enterprise customers demand flexible, secure, and real-time access to network and customer data. Traditional batch-driven architectures are no longer sufficient. Instead, real-time data streaming combined with policy-driven data sharing provides a powerful foundation for building scalable data products for internal and external consumers. A modern Telco must manage data collection, processing, governance, data sharing, and distribution with the same rigor as its core network services. Leading Telcos now operate centralized real-time data streaming platforms to integrate and share network events, subscriber information, billing records, and telemetry from thousands of data sources across the edge and core networks.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming use cases, including a dedicated chapter about the telco industry.

Data Streaming in the Telco Industry

Telecommunications networks generate vast amounts of data every second. Every call, message, internet session, device interaction, and network event produces valuable information. Historically, much of this data was processed in batches — often hours or even days after it was collected. This delayed model no longer meets the needs of modern Telcos, partners, and customers.

Data streaming transforms how Telcos handle information. Instead of storing and processing data later, it is ingested, processed, and acted upon in real time as it is generated. This enables continuous intelligence across all parts of the network and business.

Learn more about “The Top 20 Problems with Batch Processing (and How to Fix Them with Data Streaming)“.

Business Value of Data Streaming in the Telecom Sector

Key benefits of data streaming for Telcos include:

Real-Time Visibility: Immediate insight into network health, customer behavior, fraud attempts, and service performance.
Operational Efficiency: Faster detection and resolution of issues reduces downtime, improves customer satisfaction, and lowers operating costs.
New Revenue Opportunities: Real-time data enables new services such as dynamic pricing, personalized offers, and proactive customer support.
Enhanced Security and Compliance: Immediate anomaly detection and instant auditability support regulatory requirements and protect against cyber threats.

Technologies like Apache Kafka and Apache Flink are now core components of Telco IT architectures. They allow Telcos to integrate massive, distributed data flows from radio access networks (RAN), 5G core systems, IoT ecosystems, billing and support platforms, and customer devices.

Modern Telcos use data streaming to not only improve internal operations but also to deliver trusted, secure, and differentiated services to external partners such as MVNOs, IoT platforms, and enterprise customers.

Learn More about Data Streaming in Telco

Learn more about data streaming in the telecommunications sector:

Data streaming is not an allrounder to solve every problem. Hence, a modern enterprise architecture combines data streaming with purpose-built telco-specific platforms and SaaS solutions, and data lakes/warehouses/lakehouses like Snowflake or Databricks for the analytical workloads.

I already wrote about the combination of data streaming platforms like Confluent together with Snowflake and Microsoft Fabric. A blog series about data streaming with Confluent combined with AI and analytics using Databricks is coming right after this blog post here.

By mastering real-time data streaming, Telcos unlock the ability to share valuable insights securely and efficiently with internal divisions, IoT platforms, and enterprise customers.

Mobile Virtual Network Operators (MVNOs) — companies that offer mobile services without owning their own network infrastructure — are an equally important group of consumers. As an MVNO delivers niche services, competitive pricing, and tailored customer experiences, real-time data sharing becomes essential to support their growth and enable differentiation in a highly competitive market.

A strong real-time data sharing platform in the telco industry integrates multiple types of components and stakeholders, organized into four critical areas:

Data Sources

A real-time data platform aggregates information from a wide range of technical systems across the Telco infrastructure.

Radio Access Network (RAN) Metrics: Capture real-time information about signal quality, handovers, and user session performance.
5G Core Network Functions: Manage traffic flows, session lifecycles, and device mobility through UPF, SMF, and AMF components.
Operational Support Systems (OSS) and Business Support Systems (BSS): Provide data for service assurance, provisioning, customer management, and billing processes.
IoT Devices: Send continuous telemetry data from connected vehicles, industrial assets, healthcare monitors, and consumer electronics.
Customer Premises Equipment (CPE): Supply performance and operational data from routers, gateways, modems, and set-top boxes.
Billing Events: Stream usage records, real-time charging information, and transaction logs to support accurate billing.
Customer Profiles: Update subscription plans, user preferences, device types, and behavioral attributes dynamically.
Security Logs: Capture authentication events, threat detections, network access attempts, and audit trail information.

Stream Processing

Stream processing technologies ensure raw events are turned into enriched, actionable data products as they move through the system.

Real-Time Data Ingestion: Continuously collect and process events from all sources with low latency and high reliability.
Data Aggregation and Enrichment: Transform raw network, billing, and device data into structured, valuable datasets.
Actionable Data Products: Create enriched, ready-to-consume information for operational and business use cases across the ecosystem.

Data Governance

Effective governance frameworks guarantee that data sharing is secure, compliant, and aligned with commercial agreements.

Policy-Based Access Control: Enforce business, regulatory, and contractual rules on how data is shared internally and externally.
Data Protection Techniques: Apply masking, anonymization, and encryption to secure sensitive information at every stage.
Compliance Assurance: Meet regulatory requirements like GDPR, CCPA, and telecom-specific standards through real-time monitoring and enforcement.

Data Consumers

Multiple internal and external stakeholders rely on tailored, policy-controlled access to real-time data streams to achieve business outcomes.

MVNO Partners: Consume real-time network metrics, subscriber insights, and fraud alerts to offer better customer experiences and safeguard operations.
Internal Telco Divisions: Use operational data to improve network uptime, optimize marketing initiatives, and detect revenue leakage early.
IoT Platform Services: Rely on device telemetry and mobility data to improve fleet management, predictive maintenance, and automated operations.
Enterprise Customers: Integrate real-time network insights and SLA compliance monitoring into private network and corporate IT systems.
Regulatory and Compliance Bodies: Access live audit streams, security incident data, and privacy-preserving compliance reports as required by law.

In modern Telco architectures, data products act as the building blocks for a data mesh approach, enabling decentralized ownership, scalable integration with microservices, and direct access for consumers across the business and partner ecosystem.

The right data products accelerate time-to-insight and enable additional revenue streams. Leading Telcos typically offer:

Network Quality Metrics: Monitoring service degradation, latency spikes, and coverage gaps continuously.
Customer Behavior Analytics: Tracking app usage, mobility patterns, device types, and engagement trends.
Fraud and Anomaly Detection Feeds: Capturing unusual usage, SIM swaps, or suspicious roaming activities in real time.
Billing and Charging Data Streams: Delivering session records and consumption details instantly to billing systems or MVNO partners.
Device Telemetry and Health Data: Providing operational status and error signals from smartphones, CPE, and IoT devices.
Subscriber Profile Updates: Streaming changes in service plans, device upgrades, or user preferences.
Location-Aware Services Data: Powering geofencing, smart city applications, and targeted marketing efforts.
Churn Prediction Models: Scoring customer retention risks based on usage behavior and network experience.
Network Capacity and Traffic Forecasts: Helping optimize resource allocation and investment planning.
Policy Compliance Monitoring: Ensuring real-time validation of internal and external SLAs, privacy agreements, and regulatory requirements.

These data products can be offered via APIs, secure topics, or integrated into partner platforms for direct consumption.

How Each Data Consumer Gains Strategic Value

Real-time data streaming empowers each data consumer within the Telco ecosystem to achieve specific business outcomes, drive operational excellence, and unlock new growth opportunities based on continuous, trusted insights.

Internal Telco Divisions

Real-time insights into network behavior allow proactive incident management and customer support. Marketing teams optimize campaigns based on live subscriber data, while finance teams minimize revenue leakage by tracking billing and usage patterns instantly.

MVNO Partners

Access to live network quality indicators helps MVNOs improve customer satisfaction and loyalty. Real-time fraud monitoring protects against financial losses. Tailored subscriber insights enable MVNOs to offer personalized plans and upsells based on actual usage.

IoT Platform Services

Large-scale telemetry streaming enables better device management, predictive maintenance, and operational automation. Real-time geolocation data improves logistics, fleet management, and smart infrastructure performance. Event-driven alerts help detect and resolve device malfunctions rapidly.

Enterprise Customers

Private 5G networks and managed services depend on live analytics to meet SLA obligations. Enterprises integrate real-time network telemetry into their own systems for smarter decision-making. Data-driven optimizations ensure higher uptime, better resource utilization, and enhanced customer experiences.

Building a Trusted Data Ecosystem for Telcos with Real-Time Streaming and Hybrid Cloud

Real-time data sharing is no longer a luxury for Telcos — it is a competitive necessity. A successful platform must balance openness with control, ensuring that every data exchange respects privacy, governance, and commercial boundaries.

Hybrid cloud architectures play a critical role in this evolution. They enable Telcos to process, govern, and share real-time data across on-premises infrastructure, edge environments, and public clouds seamlessly. By combining the flexibility of cloud-native services with the security and performance of on-premises systems, hybrid cloud ensures that data remains accessible, scalable, cost-efficient and compliant wherever it is needed.

By deploying scalable data streaming solutions across a hybrid cloud environment, Telcos enable secure, real-time data sharing with MVNOs, IoT platforms, enterprise customers, and internal business units. This empowers critical use cases such as dynamic quality of service monitoring, real-time fraud detection, customer behavior analytics, predictive maintenance for connected devices, and SLA compliance reporting — all without compromising performance or regulatory requirements.

The future of telecommunications belongs to those who implement real-time data streaming and controlled data sharing — turning raw events into strategic advantage faster, more securely, and more effectively than ever before.

How do you share data in your organization? Do you already leverage data streaming or still operate in batch mode? Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

The post Real-Time Data Sharing in the Telco Industry for MVNO Growth and Beyond with Data Streaming appeared first on Kai Waehner.

How Data Streaming and AI Help Telcos to Innovate: Top 5 Trends from MWC 2025

Kai Waehner — Fri, 07 Mar 2025 06:44:11 +0000

The telecommunications and technology industries are at a pivotal moment. As innovation accelerates, businesses must leverage cutting-edge technologies to stay ahead. For MWC 2025, McKinsey highlighted five crucial themes shaping the future: IT excellence in telecom, sustainability challenges, the evolution of 6G, the rise of generative AI, and AI-driven software development.

MWC (Mobile World Congress) 2025 serves as the global stage where industry leaders, telecom operators, and technology pioneers converge to discuss the next wave of connectivity and digital transformation. As organizations gear up for a data-driven future, real-time data streaming emerges as the critical enabler of efficiency, agility, and value creation.

This blog explores each of McKinsey’s key themes from MWC 2025 and how data streaming helps businesses innovate and gain a competitive advantage in the hyper-connected world ahead.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

1. IT Excellence: Driving Telecom Innovation and Cost Efficiency

Telecom operators are under immense pressure to monetize massive infrastructure investments while maintaining cost efficiency. McKinsey’s benchmarking study shows that leading telecom tech players spend less on IT while achieving superior cost efficiency and innovation. Successful operators integrate business and IT transformations holistically, optimizing cloud strategies, IT architectures, and AI-driven processes.

How Data Streaming Powers IT Excellence

Real-Time IT Monitoring: Streaming data pipelines provide continuous observability into IT performance, reducing downtime and optimizing infrastructure costs.
Automated Network Operations: Event-driven architectures allow operators to dynamically allocate resources, minimizing network congestion and improving service quality.
Cloud-Native AI Models: By continuously feeding AI models with live data, telecom leaders ensure optimal network performance and predictive maintenance.

Business Impact: Faster time-to-market, lower IT costs, and improved network reliability.

A great example of this transformation is Dish Wireless, which built a fully cloud-native, software-driven 5G network powered by Apache Kafka. By leveraging real-time data streaming, Dish ensures low-latency, scalable, and event-driven operations, allowing it to optimize network performance, automate infrastructure management, and provide next-generation connectivity for enterprise applications.

Dish’s data-first approach demonstrates how streaming technologies are redefining telecom infrastructure and unlocking new business models.

Read more about how Apache Kafka powers Dish Wireless’ 5G infrastructure or watch the following webinar with Dish:

2. Tackling Telecom Emissions: A Sustainable Future

The telecom industry faces increasing regulatory pressure and consumer expectations to decarbonize operations. While many companies have reduced Scope 1 (direct emissions) and Scope 2 (energy consumption) emissions, the real challenge lies in Scope 3 emissions from supply chains. McKinsey’s research suggests that 60% of an integrated operator’s emissions can be reduced for less than $100 per ton of CO₂.

How Data Streaming Supports Sustainability Efforts

Energy Optimization in Real Time: Streaming analytics continuously monitor energy usage across network infrastructure, automatically adjusting power consumption.
Carbon Footprint Tracking: Data pipelines aggregate real-time emissions data, enabling operators to meet sustainability goals efficiently.
Predictive Maintenance for Energy Efficiency: AI-driven insights help optimize network hardware lifespan, reducing waste and unnecessary energy consumption.

Business Impact: Reduced carbon footprint, cost savings on energy consumption, and regulatory compliance.

Beyond telecom, data streaming is transforming sustainability efforts across industries. For example, in manufacturing and real estate, companies like Ampeers Energy and PAUL Tech AG use Apache Kafka and Flink to optimize energy distribution, reduce emissions, and improve ESG ratings.

These real-time data platforms analyze IoT sensor data, weather forecasts, and energy consumption patterns to automate decision-making and lower energy waste. Similarly, EverySens leverages streaming data to decarbonize freight transport, eliminating hundreds of thousands of unnecessary truck rides each year. These use cases demonstrate how data-driven sustainability strategiescan be scaled across sectors to achieve meaningful environmental impact.

3. Shaping the Future of 6G: Beyond Connectivity

6G is expected to revolutionize industries by enabling ultra-low latency, ubiquitous connectivity, and AI-driven network optimization. However, the transition from 5G to 6G requires overcoming legacy infrastructure challenges and developing multi-capability platforms that go beyond simple connectivity.

How Data Streaming Powers the 6G Revolution

Network Sensing and Intelligent Routing: Streaming architectures process real-time network telemetry, enabling adaptive, self-optimizing networks.
AI-Enhanced Edge Computing: Real-time analytics ensure minimal latency for mission-critical applications such as autonomous vehicles and smart cities.
Cross-Sector Data Monetization: Operators can leverage streaming data to offer network-as-a-service (NaaS) solutions, opening new revenue streams.

Business Impact: New monetization opportunities, improved network efficiency, and enhanced customer experience.

Source: Dish Wireless

As the 6G era approaches, real-time data streaming is already proving its value in 5G deployments, unlocking low-latency, high-bandwidth use cases.

A great example is Verizon’s Mobile Edge Computing (MEC) initiative, which uses data streaming and AI-powered analytics to support real-time applications like autonomous drone monitoring, vehicle-to-everything (V2X) communication, and predictive maintenance in industrial settings. By processing data at the network edge, telcos minimize latency and optimize bandwidth—capabilities that will be even more critical in 6G.

With cloud-native, event-driven architectures, data streaming enables telcos to evolve from traditional connectivity providers to technology leaders. As 6G advances, expect faster network automation, more sophisticated AI integration, and deeper partnerships between telecom operators and enterprise customers.

4. Generative AI: A Profitability Game-Changer for Telcos

McKinsey highlights generative AI’s potential to boost telco profitability by up to 10% in annual EBITDA through automation, hyper-personalization, and AI-driven customer engagement. Leading telcos are already leveraging AI to improve customer service, marketing, and network operations.

How Data Streaming Enhances Gen AI in Telecom

Real-Time Customer Insights: AI-powered recommendation engines deliver personalized offers and dynamic pricing in milliseconds.
Automated Call Center Operations: Real-time transcription and sentiment analysis improve chatbot accuracy and agent productivity.
Proactive Network Management: AI models trained on continuous streaming data predict and prevent network failures before they occur.

Business Impact: Higher customer satisfaction, reduced operational costs, and increased revenue per user.

As telecom providers integrate Generative AI (GenAI) into their business models, real-time data streaming is a foundational technology that enables efficient AI inference and model retraining. One compelling example is the GenAI Demo with Kafka, Flink, LangChain, and OpenAI, which illustrates how streaming architectures power AI-driven sales and customer interactions.

This demo showcases how real-time CRM data from Salesforce is enriched with web and LinkedIn data via streaming ETL using Apache Flink. Then, AI models process this context using LangChain and OpenAI, generating personalized, context-specific sales recommendations—a workflow that can be extended to telecom call centers and customer engagement platforms.

Expedia’s success story further highlights how GenAI combined with data streaming improves customer interactions. Facing a massive surge in support requests during COVID-19, Expedia automated responses with AI-driven chatbots, significantly reducing agent workloads. By integrating Apache Kafka with AI models, 60% of travelers began self-servicing their inquiries, resulting in over 40% cost savings in customer support operations.

Source: Confluent

For telecom providers, similar AI-driven automation can optimize call centers, personalized customer offers, fraud detection, and even predictive maintenance for network infrastructure. Data streaming ensures that AI models continuously learn from fresh data, making GenAI solutions more accurate, responsive, and cost-effective.

5. AI-Driven Software Development: Faster, Smarter, Better

AI is fundamentally transforming software development, accelerating the product development lifecycle (PDLC) and improving product quality. AI-assisted coding, automated testing, and real-time feedback loops are enabling companies to deliver customer-centric solutions at unprecedented speed.

How Data Streaming Transforms AI-Driven Software Development

Continuous Feedback and Iteration: Streaming analytics provide instant feedback from user behavior, enabling faster iterations and bug fixes.
Automated Code Quality Checks: AI-driven continuous integration (CI/CD) pipelines validate new code in real-time, ensuring seamless software deployments.
Live Performance Monitoring: Streaming data enables real-time anomaly detection, ensuring optimal application performance.

Business Impact: Faster time-to-market, higher software reliability, and reduced development costs.

For telecom providers, AI-driven software development is key to maintaining a reliable, scalable, and secure network infrastructure while rolling out new customer-facing services at speed. Data streaming accelerates software development by enabling real-time feedback loops, automated testing, and AI-powered observability—bringing the industry closer to a true “Shift Left” approach.

The Shift Left Architecture in software development means moving testing, security, and quality assurance earlier in the development lifecycle, reducing costly errors and vulnerabilities late in production. Data streaming enables this shift by continuously feeding AI-driven CI/CD pipelines with real-time insights, allowing developers to detect issues earlier, optimize network performance, and iterate faster on new services.

A relevant AI-powered automation example comes from the “GenAI for Development vs. Visual Coding” article, which discusses how automation is shifting from traditional code-based development to AI-assisted software engineering. Instead of manual coding, AI-driven workflows help telcos streamline DevOps, automate CI/CD pipelines, and enhance software quality in real time.

For telecom providers, this transformation means proactive issue detection, faster rollouts of network upgrades, and automated AI-driven security monitoring—all powered by real-time data streaming and a Shift Left mindset.

Data Streaming as the Ultimate Competitive Advantage for Telcos

Across all five of McKinsey’s key trends, real-time data streaming is the backbone of transformation. Whether optimizing IT efficiency, reducing emissions, unlocking 6G’s potential, enabling generative AI and Agentic AI, or accelerating software development, streaming technologies provide the agility and intelligence businesses need to win in 2025 and beyond.

The path forward isn’t just about adopting AI or cloud-native infrastructure—it’s about embracing real-time, event-driven architectures to drive innovation at scale.

As organizations take bold steps to lead the future, those who harness the power of data streaming will emerge as the industry’s true pioneers.

Stay ahead of the curve! Subscribe to my newsletter for insights into data streaming and connect with me on LinkedIn to continue the conversation. And make sure to download my free book about data streaming use cases.

The post How Data Streaming and AI Help Telcos to Innovate: Top 5 Trends from MWC 2025 appeared first on Kai Waehner.

Data Streaming with Apache Kafka and Flink in the Media Industry: Disney+ Hotstar and JioCinema

Kai Waehner — Fri, 28 Feb 2025 05:27:28 +0000

The media industry in India has witnessed a seismic shift with the $8.5 billion merger of Disney+ Hotstar and Reliance’s JioCinema. This collaboration brings together two of the country’s most influential data streaming deployments under one umbrella, creating a powerhouse for entertainment delivery. Beyond the headlines, this merger underscores the critical role of data streaming technologies, particularly Apache Kafka and Flink, in enabling large-scale content distribution and real-time data processing. This blog post explores the existing data streaming infrastructures and use cases. Additional, potential migrations leveraging Kafka tools for real-time data replication and synchronization without downtime of the production environments are explored.

Data Streaming with Apache Kafka and Flink in the Media Industry

Data streaming technologies like Apache Kafka and Flink are revolutionizing the media industry by enabling real-time data processing at an unprecedented scale. Media platforms, including Over-The-Top (OTT) services operated by telcos and media companies, leverage these technologies to deliver video, audio, and other content directly to viewers over the internet. The OTT services bypass traditional cable or satellite channels.

As these platforms cater to growing audiences with diverse needs, data streaming serves as the backbone for seamless content delivery, real-time user engagement, and operational efficiency. Data streaming ensures a superior viewing experience at scale.

Netflix is a leading global media company renowned for its extensive use of Apache Kafka and Flink. The media company powers critical use cases such as real-time personalization, anomaly detection, and monitoring at extreme scale. Its data streaming architecture processes billions of events daily, ensuring seamless content delivery and exceptional viewer experiences for a global audience.

Use Cases for Data Streaming in the Media Industry

Data streaming with technologies like Apache Kafka and Flink is transforming the media industry by enabling real-time data processing for seamless content delivery, personalized experiences, and operational efficiency.

Live Video Streaming: Data streaming with Apache Kafka serves as a central event hub for managing log data, metadata, and control signals associated with live video streaming. It processes real-time data related to user interactions, stream health, and session analytics to ensure ultra-low latency and a seamless experience for live events like concerts and sports. The actual video streams are handled by Content Delivery Networks (CDNs).
On-Demand Content Delivery: Media platforms use Kafka to reliably manage data pipelines, delivering movies, TV shows, and other content to millions of users.
Personalized Recommendations: By integrating Kafka with analytics tools, platforms provide tailored suggestions based on user behavior, increasing viewer engagement and satisfaction.
Real-Time Ad Targeting: Kafka enables real-time ad insertion by processing user events and contextual data, ensuring ads are relevant and timely.
Monitoring and Anomaly Detection: Media companies use Kafka to monitor backend systems in real time, detecting and resolving issues proactively to ensure a smooth user experience.
Churn Prediction: By analyzing behavioral patterns in real time, platforms can predict user churn and take corrective actions, such as offering discounts or new content recommendations.

Learn more about data streaming use cases in the telco and media industry from real world customer stories like Dish Network, British Telecom, Globe Telecom, Swisscom, and more:

Business Value of Data Streaming in Media

Data streaming technologies like Apache Kafka and Flink drive transformative business value in the media industry by enabling real-time insights, efficiency, and innovation:

Enhanced User Experience: Real-time at any scale capabilities enable faster content delivery, personalized recommendations, and reduced buffering.
Cost Optimization: Streamlined pipelines improve infrastructure utilization and reduce operational costs. The Shift Left Architecture is adopted across business units.
Revenue Growth: Precision in ad targeting and churn reduction leads to higher revenues.
Competitive Edge: Real-time analytics and content delivery position companies as leaders in their field.

Disney+ Hotstar (Disney) and JioCinema (Viacom18): Streaming Giants Shaping India’s Media Landscape

Disney+ Hotstar revolutionized OTT streaming in India with a robust freemium model. Catering to a diverse audience, it provided an extensive library of movies, TV shows, and sports, including exclusive streaming rights for the Indian Premier League (IPL), the world’s most popular cricket league. By blending free content with premium subscriptions, it attracted millions of users, leveraging IPL viewership as a major growth driver.

JioCinema, part of Reliance Jio, employs a mass-market approach, offering free streaming supported by Reliance’s vast 5G network. It gained significant traction by taking over the IPL digital streaming rights in 2023 in 4K resolution to over 32 million concurrent viewers, breaking records for live streaming.

Each platform used respectively uses IPL strategically—Hotstar with a premium model and JioCinema for mass-market penetration. Post-merger, the unified platform could combine these approaches, delivering enhanced IPL experiences powered by a consolidated Kafka-based streaming infrastructure.

Both platforms share a commitment to innovation, scalability, and user engagement, making them ideal candidates for heavy Apache Kafka usage.

Data Streaming Architectures with Kafka and Flink

Both Disney+ Hotstar and JioCinema (Viacom18) are renowned for their openness in discussing their technical data streaming architectures, similar to Netflix. They frequently presented at conferences like Kafka Summit and industry events, sharing insights about their data streaming strategies and implementations.

This transparency achieves several goals:

Showcasing Innovation: Highlighting their advanced use of Kafka and Flink establishes their leadership in tech innovation.
Talent Acquisition: Open discussions attract engineers who want to work on cutting-edge systems.
Industry Collaboration: Sharing experiences fosters collaboration within the streaming and open-source communities.

By examining their presentations and publications, we gain a deeper understanding of their use of Kafka to achieve extreme scalability and efficiency.

Data Streaming Solves the Challenges and Extreme Scale of OTT Services in the Media Industry

Running platforms of this scale comes with its share of challenges:

Massive Throughput: Kafka handles billions of messages daily, requiring extensive partitioning and scaling strategies.
Fault Tolerance: Platforms implement advanced disaster recovery and replication strategies to ensure zero downtime, even during critical events like IPL.
Cost vs. Performance Trade-Offs: Streaming 4K video for millions of users demands balancing high infrastructure costs with user expectations.

Data streaming with Apache Kafka and Flink is a key piece of the data strategy to solve these challenges.

Disney+ Hotstar: Gamification at Extreme Scale

Disney+ Hotstar’s “Watch N Play” feature transformed live sports streaming, particularly cricket, into an interactive experience. Viewers predict outcomes, answer trivia, and participate in polls, earning points for rewards or leaderboard rankings, adding a competitive and social element to the platform.

Hotstar’s presentation from Kafka Summit 2019 is still very impressive and worth watching. Here is a summary about the OTT services serving millions of cricket fans:

Source: Disney+ Hotstar

Powered by Apache Kafka, Disney+ Hotstar’s infrastructure processed millions of real-time interactions per second. The integration of data sources via Kafka Connect enables seamless analytics and rewards. This gamified approach enhances user engagement and extends to broader applications like e-sports, interactive TV, and IoT-driven fan experiences, making Hotstar a leader in innovative streaming.

Disney+ Hotstar runs ~15 different Kafka Connect clusters with over 2000+ connectors and auto-scaling based on traffic, as they presented in another Kafka Summit talk in 2021.

Source: Disney+ Hotstar

Single Message Transforms (SMT) are used within the Kafka Connect integration for stateless streaming ETL. Integration use cases include masking/filtering of PlI, sampling of data, and schema validation and enforcement.

JioCinema: Multiple Kafka Clusters and Deployment Strategies

JioCinema leverages a robust enterprise architecture built on Apache Kafka, Flink, and Spark. As showcased at Kafka Summit India 2024, data streaming is central to its platform, enabling real-time analytics, personalized recommendations, and seamless content delivery.

Source: JioCinema

Initially, JioCinema operated a single Kafka cluster handling 1,000+ topics and 100,000+ partitions for diverse use cases.

Over time, the platform transitioned to multiple Kafka clusters with different SLAs and architectures, optimizing uptime, performance, and costs for specific workloads, as explained by Kushal Khandelwal, Head of Data Platform.

Source: JioCinema

This shift from a monolithic to a segmented architecture highlights the scalability and flexibility of Kafka. This approach ensures JioCinema meets the demands of high traffic and complex SLAs. Their success reflects the common journey of organizations scaling data streaming infrastructures to achieve operational excellence.

Use Cases for Kafka in Disney+ Hotstar and JioCinema

Disney+ Hotstar and JioCinema rely on Apache Kafka to power diverse use cases, from IPL cricket streaming to real-time personalization and ad targeting.

IPL Cricket Streaming at Massive Scale

The Indian Premier League (IPL) is the crown jewel of streaming in India, drawing millions of concurrent viewers. Here’s how Kafka and Flink support IPL’s massive scale:

Concurrent Viewers: During IPL 2023, JioCinema hit a record of over 32 million concurrent viewers, streaming matches in 4K resolution. Disney+ Hotstar has also scaled to tens of millions of viewers in past IPL seasons.
Data Throughput: JioCinema and Hotstar handle millions of messages per second with Kafka, ensuring uninterrupted video delivery.
Kafka Infrastructure: Reports reveal that JioCinema operates over 100 Kafka clusters, managing tens of thousands of partitions. These clusters handle not only video streaming but also ancillary tasks, like ad placement and user analytics.
Connectors: Both platforms rely on hundreds of Kafka Connect connectors to integrate with databases, storage systems, and real-time analytics platforms.

On-Demand Streaming and Catalog Management

Both platforms use Kafka to deliver on-demand content to millions of users, ensuring quick access to movies and TV shows. Kafka’s reliable event streaming guarantees smooth playback and dynamic scaling during peak usage.

Real-Time Personalization and Recommendations

Personalization is central to user retention. Kafka streams user behavior data to machine learning systems in real time, enabling both platforms to recommend content tailored to individual preferences. Customer loyalty and Rewards platform often leverage Kafka and Flink under the hood.

Ad Targeting and Revenue Optimization

By processing user data in real time, Kafka enables precise ad targeting with context-specific advertisements. This not only improves ad effectiveness but also enhances viewer experience by ensuring ads are contextually relevant. Many real-time advertising platforms are powered by a data streaming platform using Apache Kafka and Flink.

Content Quality Monitoring

Both platforms use Kafka for continuous real-time monitoring of video stream quality, automatically adjusting bitrate or rerouting streams during disruptions to maintain a consistent viewing experience.

Data Streaming for M&A, Merger and Migrations

The merger of Disney+ Hotstar and JioCinema presents a significant opportunity to integrate their Kafka-based infrastructures, paving the way for a unified, more efficient system. Such transitions are a natural fit for Apache Kafka and its ecosystem. Migrations are a core capability. Tools like MirrorMaker and Cluster Linking allow seamless data movement between clusters for continuous replication and a later lift and shift. The usage of data streaming for migration projects enables zero-downtime and business continuity.

Here are some opportunities and benefits of data streaming for integrations and migrations:

Integrated Pipelines: A combined Kafka architecture could streamline content delivery, reduce costs, and support advanced analytics, providing an optimized infrastructure for their vast user base.
Expanded Use Cases: The merger might drive innovations such as multi-angle viewing, live interactive features, and more personalized experiences powered by real-time data.
Hybrid and Multi-Cloud Flexibility: Transitions like these often span hybrid and multi-cloud environments, making Kafka’s flexibility essential for connecting and scaling across platforms.
Multi-Organization Integration: Merging Kafka clusters across distinct organizations, as in this case, is a common use case where Kafka’s tools excel.
Technical Leadership: Both platforms are transparent about their Kafka implementations, and we can anticipate new insights from their efforts to integrate and scale, highlighting lessons for the broader streaming industry.

In conclusion, Kafka and Flink are not just enablers but drivers of success for Disney+ Hotstar and JioCinema. Data streaming at scale creates new benchmarks for innovation and user experience in the media industry.

Do you see similar opportunities in your organization? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter. And make sure to download my free book about data streaming use cases.

The post Data Streaming with Apache Kafka and Flink in the Media Industry: Disney+ Hotstar and JioCinema appeared first on Kai Waehner.

The State of Data Streaming for Telco

Kai Waehner — Fri, 02 Jun 2023 05:38:56 +0000

This blog post explores the state of data streaming for the telco industry. The evolution of telco infrastructure, customer services, and new business models requires real-time end-to-end visibility, fancy mobile apps, and integration with pioneering technologies like 5G for low latency or augmented reality for innovation. Data streaming allows integrating and correlating data in real-time at any scale to improve most telco workloads.

I look at trends in the telecommunications sector to explore how data streaming helps as a business enabler, including customer stories from Dish Network, British Telecom, Globe Telecom, Swisscom, and more. A complete slide deck and on-demand video recording are included.

General trends in the telco industry

The Telco industry is fundamental for growth and innovation across all industries.

The global spending on telecom services is expected to reach 1.595 trillion U.S. dollars by 2024 (Source: Statista, Jul 2022).

Cloud-native infrastructure and digitalization of business processes are critical enablers. 5G network capabilities and telco marketplaces enable entirely new business models.

5G enables new business models

Presentation of Amdocs / Mavenir:

A report from McKinsey & Company says, “74 percent of customers have a positive or neutral feeling about their operators offering different speeds to mobile users with different needs”. The potential for increasing the revenue per user (ARPU) with 5G use cases is enormous for telcos:

Telco marketplace

Many companies across industries are trying to build a marketplace these days. But especially the telecom sector might shine here because of its interface between infrastructure, B2B, partners, and end users for sales and marketing.

tmforum has a few good arguments for why communication service providers (CSP) should build a marketplace for B2C and B2B2X:

Operating the marketplace keeps CSP in control of the relationship with customers
A marketplace is a great sales channel for additional revenue
Operating the marketplace helps CSPs monetize third-party (over-the-top) content
The only other option is to be relegated to connectivity provider
Enterprise customers have decided this is their preferred method of engagement
CPSs can take a cut of all sales
Participating in a marketplace prevents any one company from owning the customer

Data streaming in the telco industry

Adopting trends like network monitoring, personalized sales and cybersecurity is only possible if enterprises in the telco industry can provide and correlate information at the right time in the proper context. Real-time, which means using the information in milliseconds, seconds, or minutes, is almost always better than processing data later (whatever later means):

Data streaming combines the power of real-time messaging at any scale with storage for true decoupling, data integration, and data correlation capabilities. Apache Kafka is the de facto standard for data streaming.

“Use Cases for Apache Kafka in Telco” is a good article for starting with an industry-specific point of view on data streaming. “Apache Kafka for Telco-OTT and Media Applications” explores over-the-top B2B scenarios.

Data streaming with the Apache Kafka ecosystem and cloud services are used throughout the supply chain of the telco industry. Search my blog for various articles related to this topic: Search Kai’s blog.

From Telco to TechCo: Next-generation architecture

Deloitte describes the target architecture for telcos very well:

Data streaming provides these characteristics: Open, scalable, reliable, and real-time. This unique combination of capabilities made Apache Kafka so successful and widely adopted.

Kafka decouples applications and is the perfect technology for microservices across a telco’s enterprise architecture. Deloitte’s diagram shows this transition across the entire telecom sector:

This is a massive shift for telcos:

From purpose-built hardware to generic hardware and elastic scale
From monoliths to decoupled, independent services

Digitalization with modern concepts helps a lot in designing the future of telcos.

Open Data Architecture (ODA)

tmforum describes Open Digital Architecture (ODA) as follows:

“Open Digital Architecture is a standardized cloud-native enterprise architecture blueprint for all elements of the industry from Communication Service Providers (CSPs), through vendors to system integrators. It accelerates the delivery of next-gen connectivity and beyond – unlocking agility, removing barriers to partnering, and accelerating concept-to-cash.

ODA replaces traditional operations and business support systems (OSS/BSS) with a new approach to building software for the telecoms industry, opening a market for standardized, cloud-native software components, and enabling communication service providers and suppliers to invest in IT for new and differentiated services instead of maintenance and integration.”

If you look at the architecture trends and customer stories for data streaming in the next section, you realize that real-time data integration and processing at scale is required to provide most modern use cases in the telecommunications industry.

Architecture trends for data streaming

The telco industry applies various trends for enterprise architectures for cost, flexibility, security, and latency reasons. The three major topics I see these days at customers are:

Hybrid architectures with synchronization between edge and cloud in real-time
End-to-end network and infrastructure monitoring across multiple layers
Proactive service management and context-specific customer interactions

Let’s look deeper into some enterprise architectures that leverage data streaming for telco use cases.

Hybrid 5G architecture with data streaming

Most telcos have a cloud-first strategy to set up modern infrastructure for network monitoring, sales and marketing, loyalty, innovative new OTT services, etc. However, edge computing gets more relevant for use cases like pre-processing for cost reduction, innovative location-based 5G services, and other real-time analytics scenarios:

Learn about architecture patterns for Apache Kafka that may require multi-cluster solutions and see real-world examples with their specific requirements and trade-offs. That blog explores scenarios such as disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments, and global Kafka.

Edge deployments for data streaming are their own challenges. In separate blog posts, I covered use cases for Kafka at the edge and provided an infrastructure checklist for edge data streaming.

End-to-end network and infrastructure monitoring

Data streaming enables unifying telemetry data from various sources such as Syslog, TCP, files, REST, and other proprietary application interfaces:

End-to-end visibility into the telco networks allows massive cost reductions. And as a bonus, a better customer experience. For instance, proactive service management tells customers about a network outage:

Context-specific sales and digital lifestyle services

Customers expect a great customer experience across devices (like a web browser or mobile app) and human interactions (e.g., in a telco store). Data streaming enables a context-specific omnichannel sales experience by correlating real-time and historical data at the right time in the proper context:

“Omnichannel Retail and Customer 360 in Real Time with Apache Kafka” goes into more detail. But one thing is clear: Most innovative use cases require both historical and real-time data. In summary, correlating historical and real-time information is possible with data streaming out-of-the-box because of the underlying append-only commit log and replayability of events. A cloud-native Tiered Storage Kafka infrastructure to separate compute from storage makes such an enterprise architecture more scalable and cost-efficient.

The article “Fraud Detection with Apache Kafka, KSQL and Apache Flink” explores stream processing for real-time analytics in more detail, shows an example with embedded machine learning, and covers several real-world case studies.

New customer stories for data streaming in the telco industry

So much innovation is happening in the telecom sector. Automation and digitalization change how telcos monitor networks, build customer relationships, and create completely new business models.

Most telecommunication service providers use a cloud-first approach to improve time-to-market, increase flexibility, and focus on business logic instead of operating IT infrastructure. And elastic scalability gets even more critical with all the growing networks and 5G workloads.

Here are a few customer stories from worldwide telecom companies:

Dish Network: Cloud-native 5G Network with Kafka as the central communications hub between all the infrastructure interfaces and IT applications. The standalone 5G infrastructure in conjunction with data streaming enables new business models for customers across all industries, like retail, automotive, or energy sector.
Verizon: MEC use cases for low-latency 5G stream processing, such as autonomous drone-in-a-box-based monitoring and inspection solutions or vehicle-to-Everything (V2X).
Swisscom: Network monitoring and incident management with real-time data at scale to inform customers about outages, root cause analysis, and much more. The solution relies on Apache Kafka and Apache Druid for real-time analytics use cases.
British Telecom (BT): Hybrid multi-cloud data streaming architecture for proactive service management. BT extracts more value from its data and prioritizes real-time information and better customer experiences.
Globe Telecom: Industrialization of event streaming for various use cases. Two examples: Digital personalized rewards points based on customer purchases. Airtime loans are made easier to operationalize (vs. batch, where top-up cash is already spent again).

Resources to learn more

This blog post is just the starting point. Learn more about data streaming in the telco industry in the following on-demand webinar recording, the related slide deck, and further resources, including pretty cool lightboard videos about use cases.

On-demand video recording

The video recording explores the telecom industry’s trends and architectures for data streaming. The primary focus is the data streaming case studies. Check out our on-demand recording:

Slides

If you prefer learning from slides, check out the deck used for the above recording:

Fullscreen Mode

Case studies and lightboard videos for data streaming in telco

The state of data streaming for telco is fascinating. New use cases and case studies come up every month. This includes better data governance across the entire organization, real-time data collection and processing data from network infrastructure and mobile apps, data sharing and B2B partnerships with OTT players for new business models, and many more scenarios.

We recorded lightboard videos showing the value of data streaming simply and effectively. These five-minute videos explore the business value of data streaming, related architectures, and customer stories. Stay tuned; I will update the links in the next few weeks and publish a separate blog post for each story and lightboard video.

And this is just the beginning. Every month, we will talk about the status of data streaming in a different industry. Manufacturing was the first. Financial services second, then retail, telcos, gaming, and so on…

Let’s connect on LinkedIn and discuss it! Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter.

The post The State of Data Streaming for Telco appeared first on Kai Waehner.

Apache Kafka in the Public Sector – Blog Series about Use Cases and Architectures

Kai Waehner — Thu, 07 Oct 2021 14:13:24 +0000

The public sector includes many different areas. Some groups leverage cutting-edge technology, like military leverage. Others like the public administration are years or even decades behind. This blog series explores how the public sector leverages data in motion powered by Apache Kafka to add value for innovative new applications and modernizing legacy IT infrastructures. Life is a stream of events. Therefore, examples include a broad spectrum of use cases across smart cities, citizen services, energy and utilities, and national security deployed across the edge, hybrid, and multi-cloud scenarios.

Blog series: Apache Kafka in the Public Sector and Government

This blog series explores why many governments and public infrastructure sectors leverage event streaming for various use cases. Learn about real-world deployments and different architectures for Kafka in the public sector:

Life is a Stream of Events (THIS POST)
Smart City
Citizen Services
Energy and Utilities
National Security

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts once published.

As a side note: If you wonder why healthcare is not on the above list. Healthcare is another blog series on its own. While the government can provide public health care through national healthcare systems, it is part of the private sector in many other cases.

The Public Sector is a Broad Spectrum of Use Cases

The public sector covers so many different areas. Examples include defense, law enforcement, national security, healthcare, public administration, police, judiciary, finance and tax, research, aerospace, agriculture, etc. Many of these terms and sectors overlap. In many countries, some of these sectors are private or a combination of public and private. For these reasons, my blog series does not cover specific sectors. Instead, I focus on use cases. Many of these are applicable across many sectors.

Real-time Data Beats Slow Data in the Public Sector

I won’t do yet another long introduction about the added value of real-time data. Check out my blog about “Use Cases across Industries for Data in Motion powered by Apache Kafka” to understand the broad spectrum and benefits. The public sector is not different: Real-time data beats slow data in almost every use case! Here are a few examples:

But think about your use cases! How often can you say that getting data late (like in one hour or the following day) is better than getting data when it happens (now, in a few milliseconds or seconds)? Probably not very often.

An important fact is that the added business value comes from correlating the events from different data sources. As an example, let’s look at the processes in a smart city:

The sensor data from the car is only valuable if an application correlates it with data from other vehicles in the traffic planning system. Intelligent parking is only reasonable if it integrates with the overall city planning. Emergency service needs to receive an alert in real-time if a crash happens. All of that needs to happen in real-time! It does not matter if the use case is about transactional workloads (usually smaller data sets) or analytical workloads (usually more extensive data sets).

Open API and Partnerships are Mandatory

Governments can build great applications. At least in theory. In practice, they rely on external data from partners and 3rd party applications for many potential use cases:

Governments and cities need to work with several other stakeholders, including carmakers, suppliers, telcos, mobility Services, cloud providers, software providers, etc. Standards and open APIs are mandatory for successful cross-cutting projects. The foundation of such an enterprise architecture is an open, reliable, scalable platform that can process data in real-time. Apache Kafka became the de facto standard for event streaming.

An example that shows the added value of data integration across stakeholders and processing the data in real-time: Transportation Services. A mobile app needs context. Think about hailing a taxi ride. It doesn’t help you if you see the position of each taxi on the city map in real-time. You want to know the estimated time of arrival, the estimated cost, the estimated time of arrival at your destination, the car model that will pick you up, and so much more.

This use case – like many others – is only possible if you integrate and correlate the data from many different interfaces like a mapping service, all taxi drivers, all customers in a city, the weather service, backend analytics services, and much more:

The left side of the picture shows a dashboard built with a real-time message queue like RabbitMQ. The right side shows data correlation of data from different sources in real-time with an event streaming platform like Apache Kafka.

I hope you agree on the added value of the event streaming platform. Just sending data from A to B in real-time is not enough. Only the data processing in real-time adds true value.

Data in Motion as Paradigm Shift in the Public Sector

Real-time beats slow data. No matter if you think about cutting-edge use cases in national security or modernizing the IT infrastructure in the public administration. Event Streaming is the foundation of this paradigm shift moving towards real-time data processing in the public sector. The upcoming posts of this blog series explore many different use cases and architectures. If you also want to learn more about Apache Kafka offerings on the market, check out my comparison of Apache Kafka products and cloud services.

How do you leverage event streaming in the public sector? What technologies and architectures do you use? What projects did you already work on or are in the planning? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka in the Public Sector – Blog Series about Use Cases and Architectures appeared first on Kai Waehner.

Cloud-Native 5G, MEC and OSS/BSS/OTT Telco with Apache Kafka and Kubernetes

Kai Waehner — Mon, 06 Sep 2021 07:12:15 +0000

This post shares a slide deck and video recording for architectures and use cases for event streaming with the open-source frameworks Kubernetes and Apache Kafka in the Telco sector. Telecom enterprises modernize their edge and hybrid cloud infrastructure with Kafka and Kubernetes to provide an elastic, scalable real-time infrastructure for high volumes of data. Demonstrated use cases include building 5G networks, NFV management and orchestration, proactive OSS network monitoring, integration with hybrid and multi-cloud BSS and OTT services.

Video Recording – Cloud-Native Telco for 5G, MEC and OSS/BSS/OTT with Kafka and Kubernetes

Here is the video recording:

Slide Deck – Kafka in the Telecom Sector (OSS/BSS/OTT)

Here is the related slide deck for the video recording:

Use Cases and Architectures for Apache Kafka in the Telecom Sector

This section shares various other blog posts about event streaming, cloud-native architectures, and use cases in the telecom sector powered by Apache Kafka.

Topics include:

Use cases and real-world deployments
Innovative OSS, BSS, and OTT scenarios
Edge, hybrid, and multi-cloud architectures
Low-latency cloud-native MEC (multi-access edge computing)
Cybersecurity with situational awareness and threat intelligence
Comparison of different event streaming frameworks and cloud services

Real-Time Data Beats Slow Data in the Telco Industry

Think about the use cases in your project, business unit, and company: Real-time data beats slow data in almost all use cases in the telco industry. That’s why so many next-generation telco service providers and business applications leverage event streaming powered by Apache Kafka.

Do you already leverage Apache Kafka in the telecom sector? What use cases did you or do you plan to implement with Kafka and Kubernetes? How does your (future) edge or hybrid architecture look like? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Cloud-Native 5G, MEC and OSS/BSS/OTT Telco with Apache Kafka and Kubernetes appeared first on Kai Waehner.

Low Latency Data Streaming with Apache Kafka and Cloud-Native 5G Infrastructure

Kai Waehner — Sun, 23 May 2021 08:06:59 +0000

Many mission-critical use cases require low latency data processing. Running these workloads close to the edge is mandatory if the applications cannot run in the cloud. This blog post explores architectures for low latency deployments leveraging a combination of cloud-native infrastructure at the edge, such as AWS Wavelength, 5G networks from Telco providers, and event streaming with Apache Kafka to integrate and process data in motion.

The blog post is structured as follows:

Definition of “low latency data processing” and the relation to Apache Kafka
Cloud-native infrastructure for low latency computing
Low latency mission-critical use cases for Apache Kafka and its relation to analytical workloads
Example for a hybrid architecture with AWS Wavelength, Verizon 5G, and Confluent

Low Latency Data Processing

Let’s begin with a definition. “Real-time” and “low latency” are terms that different industries, vendors, and consultants use very differently.

What is real-time and low latency data processing?

For the context of this blog, real-time data processing with low latency means processing low or high volumes of data in ~5 to 50 milliseconds end-to-end. On a high level, this includes three parts:

Consume events from one or more data sources, either directly from a Kafka client or indirectly via a gateway or proxy.
Process and correlate events from one or more data sources, either stateless or stateful, with the internal state in the application and stream processing features like sliding windows.
Produce events to one or more data sinks, either directly from a Kafka client or indirectly via a gateway or proxy. The data sinks can include the data sources and/or other applications.

These parts are the same as for “traditional event streaming use cases”. However, for low latency use cases with zero downtime and data loss, the architecture often looks different to reach the defined goals and SLAs. A single infrastructure is usually the better choice than using a best-of-breed approach with many different frameworks or products. That’s where the Kafka ecosystem shines! The Kafka vs. MQ/ETL/ESB/API blog explores this discussion in more detail.

Low latency = soft real-time; NOT hard real-time

Make sure to understand that real-time in the IT world (that includes Kafka) is not hard real-time. Latency spikes and non-deterministic network behavior exist. The chosen software or framework does not matter. Hence, in the IT world, real-time means soft real-time. Contrarily, in the OT world and Industrial IoT, real-time means zero latency and deterministic networks. This is embedded software for sensors, robots, or cars.

For more details, read the blog post “Kafka is NOT hard-real-time“.

Kafka support for low latency processing

Apache Kafka provides very low end-to-end latency for large volumes of data. This means the amount of time it takes for a record that is produced to Kafka to be fetched by the consumer is short.

For example, detecting fraud for online banking transactions has to happen in real-time to deliver business value without adding more than 50—100 ms of overhead to each transaction to maintain a good customer experience.

Here is the technical architecture for end-to-end latency with Kafka:

Latency objectives are expressed as both target latency and the importance of meeting this target. For instance, a latency objective says: “I would like to get 99th percentile end-to-end latency of 50 ms from Kafka.” The right Kafka configuration options need to be optimized to achieve this. The blog post “99th Percentile Latency at Scale with Apache Kafka” shares more details.

After exploring what low latency and real-time data processing mean in Kafka’s context, let’s now discuss the infrastructure options.

Infrastructure for Low Latency Data Processing

Low latency always requires a short distance between data sources, data processing platforms, and data sinks due to physics. Latency optimization is relatively straightforward if all your applications run in the same public cloud. Low end-to-end latency gets much more difficult as soon as some software, mobile apps, sensors, machines, etc., run elsewhere. Think about connected cars, mobile apps for mobility services like ride-hailing, location-based services in retail, machines/robots in factories, etc.

The remote data center or remote cloud region cannot provide low latency data processing! The focus of this post is software that has to provide low end-to-end latency outside a central data center or public cloud. This is where edge computing and 5G networks come into play.

Edge infrastructure for low latency data processing

As for real-time and low latency, we need to define the term first, as everyone uses it differently. When I talk about the edge in the context of Kafka, it means:

Edge is NOT a regular data center or cloud region, but limited compute, storage, network bandwidth.
Edge can be a regional cloud-native infrastructure enabled for low-latency use cases – often provided by Telco enterprises in conjunction with cloud providers.
Kafka clients AND the Kafka broker(s) deployed here, not just the client applications.
Often 100+ locations, like restaurants, coffee shops, or retail stores, or even embedded into 1000s of devices or machines.
Offline business continuity, i.e., the workloads continue to work even if there is no connection to the cloud.
Low-footprint and low-touch, i.e., Kafka can run as a normal highly available cluster or as a single broker (no cluster, no high availability); often shipped “as a preconfigured box” in OEM hardware (e.g., Hivecell).
Hybrid integration, i.e., most use cases require uni- or bidirectional communication with a remote Kafka cluster in a data center or the cloud.

Check out my infrastructure checklist for Apache Kafka at the edge and use cases for Kafka at the edge across industries for more details.

Mobile Edge Compute / Multi-access Edge Compute (MEC)

In addition to edge computing, a few industries (especially everyone related to the Telco sector) uses the terms Mobile Edge Compute / Multi-access Edge Compute (MEC) to describe use cases around edge computing, low latency, 5G, and data processing.

MEC is an ETSI-defined network architecture concept that enables cloud computing capabilities and an IT service environment at the edge of the cellular network and, more generally, at the edge of any network. The basic idea behind MEC is that by running applications and performing related processing tasks closer to the cellular customer, network congestion is reduced, and applications perform better.

MEC technology is designed to be implemented at the cellular base stations or other edge nodes. It enables flexible and rapid deployment of new applications and services for customers. Combining elements of information technology and telecommunications networking, MEC also allows cellular operators to open their radio access network (RAN) to authorized third parties, such as application developers and content providers.

5G and cloud-native Infrastructure are a key piece of a MEC infrastructure!

Low-latency data processing outside a cloud region requires a cloud-native infrastructure and 5G networks. Let’s explore this combination in more detail.

5G infrastructure for low latency and high throughput SLAs

On a high level from a use case perspective, it is important to understand that 5G is much more than just higher speed and lower latency:

Public 5G telco infrastructure: That’s what Verizon, AT&T, T-Mobile, Dish, Vodafone, Telefonica, and all the other telco providers talk about in their TV spots. The end consumer gets higher download speeds and lower latency (at least in theory). This infrastructure integrates vehicles (e.g., cars) and devices (e.g., mobile phones) to the 5G network (V2N).
Private 5G campus networks: That’s what many enterprises are most interested in. The enterprise can set up private 5G networks with guaranteed quality of service (QoS) using acquired 5G slices from the 5G spectrum. Enterprise work with telco providers, telco hardware vendors, and sometimes also with cloud providers to provide cloud-native infrastructure (e.g., AWS Outposts, Azure Edge Zones, Google Anthos). This infrastructure is used similarly to the public 5G but deployed, e.g., in a factory or hospital. The trade-offs are guaranteed SLAs and increased security vs. higher cost. Lufthansa Technik and Vodafone’s standalone private 5G campus network at the aircraft hangar is a great example for various use cases like maintenance via video streaming and augmented reality.
Direct connection between devices: That’s for interlinking the communication between two or more vehicles (V2V) or vehicles and infrastructure (V2I) via unicast or multicast. There is no need for a network hop to the cell tower due to using a 5G technique called 5G sidelink communications. This enables new use cases, especially in safety-critical environments (e.g., autonomous driving) where Bluetooth, Wi-Fi, and similar network communications do not work well for different reasons.

Cloud-native infrastructure

Cloud-native infrastructure provides capabilities to build applications in an elastic, scalable, and automated way. Software development concepts like microservices, DevOps, and containers usually play a crucial role here.

A fantastic example is Dish Network in the US. Dish builds a brand new 5G network completely on cloud-native AWS infrastructure with cloud-native 1st and 3rd party software. Thus, even the network providers – where enterprises build their applications – build the underlying infrastructure this way.

Cloud-native infrastructure is required in the public cloud (where it is the norm) and at the edge. Flexibility for agile development and deployment of applications is only possible this way. Hence, technologies such as Kubernetes and on-premise solutions from cloud providers are adopted more and more to achieve this goal.

The combination of 5G and cloud-native infrastructure enables building low latency applications for data processing everywhere.

Software for Low Latency Data Processing

5G and cloud-native infrastructure provide the foundation for building mission-critical low latency applications everywhere. Let’s now talk about the software part and with that about event streaming with Kafka.

Why event streaming with Apache Kafka for low latency?

Apache Kafka provides a complete software stack for real-time data processing, including:

Messaging (real-time pub/sub)
Storage (caching, backpressure handling, decoupling)
Data integration (IoT data, legacy platforms, modern microservices, and databases)
Stream processing (stateless/stateful correlation of data).

This is super important because simplicity and cost-efficient operations matter much more at the edge than in a public cloud infrastructure where various SaaS services can be glued together.

Hence, Kafka is uniquely positioned to run mission-critical and analytics workloads at the edge on cloud-native infrastructure via 5G networks. Bi-directional replication to “regular” data centers or public clouds for integration with other systems is also possible via the Kafka protocol.

Use Cases for Low Latency Data processing with Apache Kafka

Low latency and real-time data processing are crucial for many use cases across industries. Hence, no surprise that Kafka plays a key role in many architectures – whether the infrastructure runs at the edge or in a close data center or cloud.

Mobile Edge Compute / Multi-access Edge Compute (MEC) use cases for Kafka across industries

Let’s take a look at a few examples:

Telco: Infrastructure like cloud-native 5G networks, OSS applications, integration with BSS and OTT services require to integrate, orchestrate and correlate huge volumes of data in real-time.
Manufacturing: Predictive maintenance, quality assurance, real-time locating systems (RTLS), and other shop floor applications are only effective and valuable with stable, continuous data processing.
Mobility Services: Ride-hailing, car sharing, or parking services can only provide a great customer experience if the events from thousands of regional end-users are processed in real-time.
Smart City: Cars from various carmakers, infrastructures such as traffic lights, smart buildings, and many other things need to get real-time information from a central data hub to improve safety and new innovative customer experiences.
Media: Interactive live video streams, real-time interactions, a hyper-personalized experience, augmented reality (AR) and virtual reality (VR) applications for training/maintenance/customer experience, and real-time gaming can only work well with stable, high throughput, and low latency.
Energy: Utilities, oil rigs, solar parks, and other energy upstream/distribution/downstream infrastructures are supercritical environments and very expensive. Every second counts for safety and efficiency/cost reasons. Optimizations combine data from all machines in a plant to achieve greater efficiency – not just optimizing one unit but for the entire system.
Retail: Location-based services for better customer experience and cross-/upselling need notifications while customers are looking at a product or in front of the checkout.
Military: Border control, surveillance, and other location-based applications only work efficiently with low latency.
Cybersecurity: Continuous monitoring and signal processing for thread detection and practice prevention are fundamental for any security operation center (SOC) and SIEM/SOAR implementation.

For a concrete example, check out my blog “Building a Smart Factory with Apache Kafka and 5G Campus Networks“.

NOT every use case requires low latency or real-time

Real-time data in motion beats data at rest in databases or data lakes in most scenarios. However, not every use case can be or needs to be real-time. Therefore, low latency networks and communication are not required. A few examples:

Reporting (traditional business intelligence)
Batch analytics (processing high volumes of data in a bundle, for instance, Hadoop and Spark’s map-reduce, shuffling, and other data processing only make sense in batch mode)
Model training as part of a machine learning infrastructure (while model scoring and monitoring often require real-time predictions, the model training is batch in almost all currently available ML algorithms).

These use cases can be outsourced to a remote data center or public cloud. Low latency networking in terms of milliseconds does not matter and likely increases the infrastructure cost. For that reason, most architectures are hybrid to separate low latency from analytics workloads.

Let’s now take a concrete example after all the theory in the last sections.

Hybrid Architecture for Critical Low Latency and Analytical Batch Workloads

Many enterprises I talk to don’t have and don’t want to build their own infrastructure at the edge. Cloud providers understand this pain and started rolling out offerings to provide cloud-native infrastructure close to the customer’s sites. AWS Outposts, Azure Edge Zones, Google Anthos exist for this reason. This solves the problem of providing cloud-native infrastructure.

But what about low latency?

AWS is once again the first to build a new product category: AWS Wavelength is a service that enables you to deliver ultra-low latency applications for 5G devices. It is built on top of AWS Outposts. AWS works with Telco providers like Verizon, Vodafone, KDDI, or SK Telecom to build this offering. A win-win-win: Cloud-native + low latency + no need to build own data centers at the edge.

This is the foundation for building low latency applications at the edge for mission-critical workloads, plus bi-directional integration with the regular public cloud region for analytics workloads and integration with other cloud applications.

Let’s see how this looks like in a real example.

Use case: Energy Production and distribution

Energy production and distribution are perfect examples. They require reliability, flexibility, sustainability, efficiency, security, and safety. These are perfect ingredients for a hybrid architecture powered by cloud-native infrastructure, 5G networks, and event streaming.

The energy sector usually separates analytical capabilities (in the data center or cloud) and low-latency computing for mission-critical workloads (at the edge). Kafka became a critical component for various energy use cases.

For more details, check out the blog post “Apache Kafka for Smart Grid, Utilities and Energy Production” which also covers real-world examples from EON, Tesla, and Devon Energy.

Architecture with AWS Wavelength, Verizon 5G, and Confluent

The concrete example uses:

AWS Public Cloud for analytics workloads
Confluent Cloud for event streaming in the cloud and integration with 1st party (e.g., AWS S3 and Amazon Redshift) and 3rd party SaaS (e.g., MongoDB Atlas, Snowflake, Salesforce CRM)
AWS Wavelength with Verizon 5G for low latency workloads
Confluent Platform with Kafka Connect and ksqlDB for low latency competing in the Wavelength 5G zone
Confluent Cluster Linking to glue together the Wavelength zone and the public AWS region using the native Kafka protocol for bi-directional replication in real-time

The following diagram shows the same architecture from the perspective of the Wavelength zone where the low latency processing happens:

Implementation: Hybrid data processing with Kafka/Confluent, AWS Wavelength, and Verizon 5G

Diagrams are nice. But a real implementation is even better to demonstrate the value of low latency computing close to the edge, plus the integration with the edge devices and public cloud. My colleague Joseph Morais had the lead in implementing a low-latency Kafka scenario with infrastructure provided by AWS and Verizon:

We implemented a use case around real-time analytics with Machine Learning. A single data pipeline collects provides end-to-end integration in real-time across locations. The data comes from edge locations. The low latency processing happens in the AWS Wavelength zone. This includes data integration, preprocessing like filtering/aggregations, and model scoring for anomaly detection.

Cluster Linking (a Kafka-native built-in replication feature) replicates the relevant data to Confluent Cloud in the local AWS region. The cloud is used for batch use cases such as model training with AWS Sagemaker.

This demo demonstrates a realistic hybrid end-to-end scenario to combine mission-critical low latency and analytics batch workloads.

Curious about the relation between Kafka and Machine Learning? I wrote various blogs. One good starter: “Machine Learning and Real-Time Analytics in Apache Kafka Applications“.

Last mile integration: Direct Kafka connection vs gateway / bridge (MQTT / HTTP)?

The last mile integration is an important aspect. How do you integrate “the last mile”? Examples include mobile apps (e.g., ride-hailing), connected vehicles (e.g., predictive maintenance), or machines (e.g., quality assurance for the production line).

This is worth a longer discussion in its own blog post, but let’s do a summary here:

Kafka was not built for bad networks. And Kafka was not built for tens of thousands of connections. Hence, it is pretty straightforward to decide. Option 1 is a direct connection with a Kafka client (using Kafka client APIs for Java, C++, Go, etc.). Option 2 is a scalable gateway or bridge (like MQTT or HTTP Proxy). When to use which one?

Use a direct connection via a Kafka client API if you have a stable network and only a limited number of connections (usually not higher than 1000 or so).
Use a gateway or bridge if you have a bad network infrastructure and/or tens of thousands of connections.

The blog series “Use Case and Architectures for Kafka and MQTT” gives you some ideas about use cases that require a bridge or gateway, for instance, connected cars and mobility services. But keep it as simple as possible. If a direct connection works for your use case, why add yet another technology with all its implications regarding complexity and cost?

Low Latency Data Processing Requires the Right Architecture

Low latency data processing is crucial for many use cases across industries. Processing data close to the edge is necessary if the applications cannot run in the cloud. Dedicated cloud-native infrastructure such as AWS Wavelength leverages 5G networks to provide the infrastructure. Event streaming with Apache Kafka provides the capabilities to implement edge computing and the integration with the cloud.

What are your experiences and plans for low latency use cases? What use case and architecture did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Low Latency Data Streaming with Apache Kafka and Cloud-Native 5G Infrastructure appeared first on Kai Waehner.

Apache Kafka for the Connected World – Vehicles, Factories, Cities, Digital Services

Kai Waehner — Mon, 01 Mar 2021 11:30:05 +0000

The digital transformation enables a connected world. People, vehicles, factories, cities, digital services, and other “things” communicate with each other in real-time to provide a safe environment, efficient processes, and a fantastic user experience. This scenario only works well with data processing in real-time at scale. This blog post shares a presentation that explains why Apache Kafka plays a key role in these industries and use cases but also to connect the different stakeholders.

Software is Changing and Connecting the World

Event Streaming with Apache Kafka plays a massive role in processing massive volumes of data in real-time in a reliable, scalable, and flexible way integrating with various legacy and modern data sources and sinks.

I want to give you an overview of existing use cases for event streaming technology in a connected world across supply chains, industries, and customer experiences that come along with these interdisciplinary data intersections:

The Automotive Industry (and it’s not only Connected Cars)
Mobility Services across verticals (transportation, logistics, travel industry, retailing, …)
Smart Cities (including citizen health services, communication infrastructure, …)
Technology Providers (including cloud hyperscaler, software vendors, telco infrastructure, …)

A Connected World with MQ, ETL, ESB, and Kafka

All these industries and sectors do not have new characteristics and requirements. They require data integration, data correlation, and real decoupling. The difference is the massively increased volumes of data.

Real-time messaging solutions have existed for many years. Hundreds of platforms exist for data integration (including ETL and ESB tooling or specific IIoT platforms). Proprietary monoliths monitor plants, telco networks, and other infrastructures for decades in real-time. But now, Kafka combines all the above characteristics in an open, scalable, and flexible infrastructure to operate mission-critical workloads at scale in real-time. And is taking over the world of connecting data.

“Apache vs. MQ/ETL/ESB” goes into more detail about this discussion.

Streaming Data Exchange with Apache Kafka

Before we jump into the presentation, I want to cover one key trend I see across industries: A streaming data exchange with Apache Kafka:

TL;DR: If you use event streaming with Kafka in your projects (for reasons like real-time processing, scalability, decoupling), and your partner does the same, well, then it does NOT make sense to put a REST / HTTP API in the middle. Instead, the partners should be integrated in a streaming way.

APIs and API Management still have their value for some use cases, of course. Check out the comparison of “Event Streaming with Apache Kafka vs. API Gateway / API Management with Mulesoft or Kong” for more details.

Slide Deck

Here is the slide deck covering various use cases and architectures to realize a connected world with Apache Kafka from different perspectives:

On-Demand Video Recording

The on-demand video recording walks you through the above presentation:

Apache Kafka for the Connected World

Connecting the world is a key requirement across industries. Many innovative digital services are only possible through collaboration between stakeholders. Real-time messaging, integration, continuous stream processing, and replication between partners are required. Event Streaming with Apache Kafka helps with the implementation of these use cases.

What are your experiences and plans for event streaming to connect the world? Did you already build applications with Apache Kafka to connect your products and services to partners? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka for the Connected World – Vehicles, Factories, Cities, Digital Services appeared first on Kai Waehner.

Infrastructure Checklist for Apache Kafka at the Edge

Kai Waehner — Wed, 03 Feb 2021 12:39:30 +0000

Event streaming with Apache Kafka at the edge is getting more and more traction these days. It is a common approach to providing the same open, flexible, and scalable architecture in the cloud and at the edge outside the data center. Possible locations for Kafka edge deployments include retail stores, cell towers, trains, small factories, restaurants, hospitals, stadiums, etc. This post explores a checklist with infrastructure questions you need to check and evaluate if you want to deploy Kafka at the edge.

Apache Kafka at the Edge == Outside the Data Center

I already discussed the concepts and architectures of Kafka at the edge in detail in the past:

This blog post explores a checklist of common infrastructure questions you need to answer and doublecheck before planning to deploy Kafka at the edge.

What is the Edge?

The term ‘edge’ needs to be defined to have the same understanding. When I talk about the edge in the context of Kafka, it means:

Edge is NOT a data center, i.e., limited compute, storage, network bandwidth
Kafka clients AND the Kafka broker(s) deployed here, not just the client applications
Offline business continuity, i.e., the workloads continue to work even if there is no connection to the cloud
Often 100+ locations, like restaurants, coffee shops, or retail stores, or even embedded into 1000s of devices or machines
Low-footprint and low-touch, i.e., Kafka can run as a normal highly available cluster or as a single broker (no cluster, no high availability); often shipped “as a preconfigured box” in OEM hardware (e.g., Hivecell)
Hybrid integration, i.e., most use cases require uni- or bidirectional communication with a remote Kafka cluster in a data center or the cloud

Let’s recap one architecture example that deploys Kafka in the cloud and at the edge: A hybrid event streaming architecture for real-time omnichannel retail and customer 360:

This definition of a ‘Kafka edge deployment‘ can also be summarized as an ‘autonomous edge‘ or ‘disconnected edge‘. On the other side, the ‘connected edge’ means that Kafka clients at the edge connect directly to a remote data center or cloud.

Infrastructure Checklist: How to Deploy Apache Kafka at the Edge?

I talked to 100+ customers and prospects across industries with the need to do edge computing for different reasons, including bad internet connection, reduced cost, low latency requirements, and security implications.

The following discussion points and questions come up all the time. Make sure to discuss them with your project team:

What are the use cases for Kafka at the edge? For instance, edge processing (e.g., business logic/analytics), replication to the cloud (uni- or bi-directional), data integration (e.g., 0 to devices, IoT gateways, local databases)?
What is the data model, and what the replication scenarios and SLAs (aggregation to “just gather data”, command&control to send data back to the edge, local analytics, etc.)? Check out Kafka-native replication tools, especially MirrorMaker 2 and Confluent’s Cluster Linking.
What is the main motivation for doing edge processing (vs. ingestion into a DC/cloud for all processing)? Examples: Low latency requirements, cost-efficiency, business continuity even when offline / disconnected from the cloud, etc.
How many “edge sites” do you plan to deploy to (e.g., retail stores, factories, restaurants, trains, …)? This needs to be considered from the beginning. If you want to roll out edge computing to thousands of restaurants, you need a different hardware and automation strategy than deploying to just ten smart factories worldwide.
What hardware do you use at the edge (e.g., hardware specifications)? How much memory, disk, CPU, etc., is available? Do you work with a specific hardware vendor? What are the support model and monitoring setup for the edge computers?
What network do you use? Is it stable? What is the connection to the cloud? If it is a stable connection (like AWS DirectConnect or Azure ExpressRoute), do you still need Kafka at the edge?
What is the infrastructure you plan to run Kafka on at the edge (e.g., operating system, container, Kubernetes, etc.)?
Do you need high availability and a ‘real’ Kafka cluster with 3+ brokers? Or is a single broker good enough? In many cases, the latter is good enough to decouple edge and cloud, handle backpressure, and enable business continuity even if the internet connection is gone for some time.
What edge protocols do you need to integrate with? is Kafka Connect sufficient with its connectors, or do you need a 3rd party IoT gateway? Common integration points at the edge are OPC UA, MQTT, proprietary PLC, traditional relational databases, files, IoT Gateways, etc.
Do you need to process the data at the edge? Kafka-native stream processing with Kafka Streams or ksqlDB is usually a straightforward and lightweight, but still scalable and reliable option. Almost all use cases I have seen at least need some streaming ETL at the edge. For instance, preprocess and filter data so that you only send relevant, aggregated data over the network to the cloud. However, many customers also deploy business applications at the edge, for instance, for real-time model inference.

How will fleet management work? Which part of the infrastructure or tool handles the management and operations of the edge machines. In most cases, this is not specific for Kafka but instead handled on the infrastructure level. For instance, if you run a Kubernetes cluster, Rancher might be used to provision and manage the edge clusters, including the Kafka ecosystem. Of course, specific Kafka metrics are also integrated here, for instance via Prometheus if you are using Kubernetes.

Discussing and answering these questions will help you with your planning for Kafka at the edge. Are there any key questions missing? Please let me know and I will update the list.

Kafka at the Edge is the new Black!

Apache Kafka at the edge is a common approach to providing the same open, flexible, and scalable architecture in the cloud and outside the data center. A huge benefit is that the same technology and architecture and be deployed everywhere across regions, sites, and clouds. This is a real hybrid architecture combing edge sites, data centers, and multiple clouds! Discuss the above infrastructure checklist with your team to be successful.

What are your experiences and plans for event streaming with Apache Kafka at the edge? Did you already deploy Apache Kafka on a small node somewhere, maybe even as a single broker setup? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Infrastructure Checklist for Apache Kafka at the Edge appeared first on Kai Waehner.

Building a Smart Factory with Apache Kafka and 5G Campus Networks

Kai Waehner — Tue, 12 Jan 2021 10:35:21 +0000

The Fourth Industrial Revolution (also known as Industry 4.0) is the ongoing automation of traditional manufacturing and industrial practices using modern smart technology. Event Streaming with Apache Kafka plays a key role in processing massive volumes of data in real-time in a reliable, scalable, and flexible way of integrating with various legacy and modern data sources and sinks. This blog post explores Apache Kafka’s relationship to modern telco infrastructures that leverage private 5G campus networks for Industrial IoT (IIoT) and edge computing.

Event Streaming with Kafka at the Disconnected Edge

Apache Kafka is the new black at the edge.

This is true not just for obvious verticals such as manufacturing, oil&gas, and the automotive industry. Other industries, including retail, healthcare, government, financial services, and energy, leverage Apache Kafka to take advantage of IoT devices, sensors, smart machines, robotics, and connected data.

This post focuses on the autonomous (and sometimes disconnected) edge. This means the edge sites required good, stable network communication, but not necessarily stable and low latency connectivity to the remote data center or cloud. The autonomous or disconnected edge needs to operate continuously even if the connection to the internet is broken. The below example utilizes smart factories, but the same use cases are deployed across many other scenarios, including restaurants, retail stores, and hospitals.

This post does NOT explore the connected edge with use cases such as V2X (vehicle-to-everything) and standards such as C-V2X (Cellular / 5G) by 5GAA. V2X and all the use cases around mobility services and smart cities will be explored in another post. This topic is very different, e.g., because there is no stable internet connection and you (have to) leverage standards such as MQTT in conjunction with Kafka. Obviously, plenty of very relevant use cases exist here, too. Subscribe to my newsletter to stay updated with new blog posts!

Why is 5G a Game Changer for Industrial IoT, Automotive, and Smart City?

5G is the fifth generation technology standard for broadband cellular networks. Many people wonder why there is such a hype around 5G.

What actually is 5G?

I cannot tell you all the technical details. But on a high level from a use case perspective, it is important to understand that 5G is much more than just higher speed and lower latency:

Public 5G telco infrastructure: That’s what Verizon, AT&T, T-Mobile, and all the other telco providers talk about in their TV spots. The end consumer gets higher download speeds and lower latency (at least in theory). This infrastructure integrates vehicles (e.g., cars) and devices (e.g., mobile phones) to the 5G network (V2N).
Private 5G campus networks: That’s what many enterprises are most interested in. The enterprise can setup private 5G networks with guaranteed quality of service (QoS) using acquired 5G slices from the 5G spectrum. Enterprise work with telco providers, telco hardware vendors, and sometimes also with cloud providers (e.g., AWS Wavelength). This infrastructure is used similarly to the public 5G but deployed, e.g., in a factory or hospital. The trade-offs are guaranteed SLAs and increased security vs. higher cost.
Direct connection between devices: That’s for interlinking the communication between two or more vehicles (V2V) or vehicles and infrastructure (V2I) via unicast or multicast. There is no need for a network hop to the cell tower due to using a 5G technique called 5G sidelink communications. This enables new use cases, especially in safety-critical environments (e.g., autonomous driving) where Bluetooth, Wi-Fi, and similar network communications do not work well for different reasons.

As I mentioned before, this post focuses on architectures for private 5G campus networks and their relation to the public 5G infrastructure. V2X, including all the connected mobility services, will be covered in other posts.

5G for Wide-Area, Local-Area, and Personal-Area Communication

In conclusion about the 5G hype: “Instead of providing a different radio interface for every use case, device vendors could rely solely on 5G as the link for wide-area, local-area, and personal-area communications“, as explained in a great 5G blog post from Benny Vejlgaard (Nokia).

Let’s now see how 5G infrastructures are related to event streaming with Apache Kafka.

Multi-Access Edge Computing (MEC)

Multi-access edge computing (MEC) is another important term in this context. MEC was formerly called mobile edge computing. It is an ETSI-defined network architecture concept that enables cloud computing capabilities and an IT service environment at the edge of the cellular network. Hence, data processing n general is closer at the edge of any network.

The basic idea behind MEC is that by running applications and performing related processing tasks closer to the cellular customer, network congestion is reduced and applications perform better. MEC technology is designed to be implemented at the cellular base stations or other edge nodes. It enables flexible and rapid deployment of new applications and services for customers. Combining elements of information technology and telecommunications networking, MEC also allows cellular operators to open their radio access network (RAN) to authorized third parties, such as application developers and content providers.

The use cases overlap with what you can read about 5G. So I focus on the term 5G in this blog post. However, the concept of MEC is equally relevant.

Event Streaming in a Hybrid 5G Architecture

Industry 4.0 is all about processing high volumes of data in real-time. That’s obviously a perfect fit for Apache Kafka. Please note that Apache Kafka is NOT used for “hard real-time” but only for soft real-time. If you need zero latency for embedded systems, PLCs, and robots, that’s assembler or MISRA C, not Java and Kafka. Kafka is a perfect fit for any use case where an end-to-end latency of 10+ms is good enough. This is almost all IT use cases, but not OT use cases.

The following shows a high-level hybrid 5G architecture. It combines cloud computing with edge processing in 5G campus networks installed in smart factories:

Some notes on the picture:

Most enterprise applications, such as the Kafka-based real-time location system (RTLS), run in a data center or public cloud. They use public 5G networks or any other stable internet connection.
Each smart factory has a dedicated 5G campus network. These 5G slices provide guaranteed QoS. Various deployment options exist for 5G networks. All have their pros and cons regarding cost, bandwidth, latency, cost, and SLAs. In this example, the combination of a Telco provider and AWS Wavelength is used to enable an edge infrastructure with stable 5G processing and compute power to deploy Apache Kafka and other applications close to the production line in the plant within AWS EC2 instances.
The integration between edge sites and the central data center or cloud is implemented with Kafka-native real-time technologies such as MirrorMaker 2 or Confluent’s Cluster Linking. This enables decoupled infrastructures and high throughput, guaranteed ordering, real-time replication, and out-of-the-box error handling. These are key characteristics: Each smart factory runs mission-critical workloads disconnected from the cloud.

Let’s now dig a little bit deeper into a smart factory to understand how edge computing works in this example.

Apache Kafka in Smart Factory at the Edge with a 5G Campus Network

The following picture shows the event streaming infrastructure inside a smart factory:

Some notes on this architecture:

All the mission-critical workloads on the production line at the edge can operate without a connection to the internet. This includes processing on the production lines and analytics such as predictive maintenance or real-time dashboards for the on-site plant manager. The infrastructure runs 24/7, even if the location is offline and not connected to the public internet. This is not just about the outage of a data center or cloud! Often, applications in Industrial IoT(IIoT) are disconnected intentionally to provide a more secure environment.
Some applications run in the remote data center or cloud. They continuously consume relevant data from the smart factory in real-time. After a disconnection, they fall behind. As soon as they get connected again, they consume all missed data and go back to real-time updates.
In this example, Mojix, a Kafka-native supply chain management service, is deployed in the cloud. Obviously, if these supply chain processes are critical for the production line, the architecture would either include a direct, stable connection to the cloud (e.g., AWS Direct Connect or Azure ExpressRoute) or also be deployed in the smart factory. Kafka allows a flexible, hybrid architecture where applications can live where it makes the most sense from a technical and business perspective.
A supply chain is complex. It includes much more than just the production lines and MES/ERP/APS systems in the smart factory. Integration to enterprise IT systems in the data center AND integration with suppliers and partners is key for success. Event Streaming with Apache Kafka plays a huge role in many postmodern supply chain architectures.

5G is the Future for many Edge and Hybrid Kafka Use Cases in Industry 4.0

Apache Kafka plays a key role in processing massive volumes of data in real-time in a reliable, scalable, and flexible way. This is relevant across industries for Industry 4.0 use cases. Public and private 5G networks enable the next generation of Industrial IoT, edge computing, and real-time use cases across verticals.

At the beginning of 2021, we are still in the early stage of 5G infrastructures. But first enterprises already work with telco providers to build great use cases with 5G and event streaming.

What are your experiences and plans with private and public 5G infrastructures? Do you plan to use Apache Kafka at the edge, too? Which approach works best for you? What is your strategy? Check out the “Infrastructure Checklist for Apache Kafka at the Edge” if you plan to go that direction!

Let’s connect on LinkedIn and discuss it! Also, stay informed about new blog posts by subscribing to my newsletter.

The post Building a Smart Factory with Apache Kafka and 5G Campus Networks appeared first on Kai Waehner.