Transportation Archives - Kai Waehner

How Penske Logistics Transforms Fleet Intelligence with Data Streaming and AI

Kai Waehner — Mon, 02 Jun 2025 04:44:37 +0000

Real-time visibility is no longer a competitive advantage in logistics—it’s a business necessity. As global supply chains become more complex and customer expectations rise, logistics providers must respond with agility and precision. That means shifting away from static, delayed data pipelines toward event-driven architectures built around real-time data.

Technologies like Apache Kafka and Apache Flink are at the heart of this transformation. They allow logistics companies to capture, process, and act on streaming data as it’s generated—from vehicle sensors and telematics systems to inventory platforms and customer applications. This enables new use cases in predictive maintenance, live fleet tracking, customer service automation, and much more.

A growing number of companies across the supply chain are embracing this model. Whether it’s real-time shipment tracking, automated compliance reporting, or AI-driven optimization, the ability to stream, process, and route data instantly is proving vital.

One standout example is Penske Logistics—a transportation leader using Confluent’s data streaming platform (DSP) to transform how it operates and delivers value to customers.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

Why Real-Time Data Matters in Logistics and Transportation

Transportation and logistics operate on tight margins and stricter timelines than almost any other sector. Delays ripple through supply chains, disrupting manufacturing schedules, customer deliveries, and retail inventories. Traditional data integration methods—batch ETL, manual syncing, and siloed systems—simply can’t meet the demands of today’s global logistics networks.

Data streaming enables organizations in the logistics and transportation industry to ingest and process information in real-time while the data is valuable and critical. Vehicle diagnostics, route updates, inventory changes, and customer interactions can all be captured and acted upon in real time. This leads to faster decisions, more responsive services, and smarter operations.

Real-time data also lays the foundation for advanced use cases in automation and AI, where outcomes depend on immediate context and up-to-date information. And for logistics providers, it unlocks a powerful competitive edge.

How Data Streaming with Apache Kafka and Flink Transforms the Supply Chain

Apache Kafka serves as the backbone for real-time messaging—connecting thousands of data producers and consumers across enterprise systems. Apache Flink adds stateful stream processing to the mix, enabling continuous pattern recognition, enrichment, and complex business logic in real time.

In the logistics industry, this event-driven architecture supports use cases such as:

Continuous monitoring of vehicle health and sensor data
Proactive maintenance scheduling
Real-time fleet tracking and route optimization
Integration of telematics, ERP, WMS, and customer systems
Instant alerts for service delays or disruptions
Predictive analytics for capacity and demand forecasting

This isn’t just theory. Leading logistics organizations are deploying these capabilities at scale.

Data Streaming Success Stories Across the Logistics and Transportation Industry

Many transportation and logistics firms are already using Kafka-based architectures to modernize their operations. A few examples:

LKW Walter relies on data streaming to optimize its full truck load (FTL) freight exchanges and enable digital freight matching.
Uber Freight leverages real-time telematics, pricing models, and dynamic load assignment across its digital logistics platform.
Instacart uses event-driven systems to coordinate live order delivery, matching customer demand with available delivery slots.
Maersk incorporates streaming data from containers and ports to enhance shipping visibility and supply chain planning.

These examples show the diversity of value that real-time data brings—across first mile, middle mile, and last mile operations.

An increasing number of companies are using data streaming as the event-driven control tower for their supply chains. It’s not only about real-time insights—it’s also about ensuring consistent data across real-time messaging, HTTP APIs, and batch systems. Learn more in this article: A Real-Time Supply Chain Control Tower powered by Kafka.

Penske Logistics: A Leader in Transportation, Fleet Services, and Supply Chain Innovation

Penske Transportation Solutions is one of North America’s most recognizable logistics brands. It provides commercial truck leasing, rental, and fleet maintenance services, operating a fleet of over 400,000 vehicles. Its logistics arm offers freight management, supply chain optimization, and warehousing for enterprise customers.

Source: Penske Logistics

But Penske is more than a fleet and logistics company. It’s a data-driven operation where technology plays a central role in service delivery. From vehicle telematics to customer support, Penske is leveraging data streaming and AI to meet growing demands for reliability, transparency, and speed.

Penske’s Data Streaming Success Story

Penske explored its data streaming journey at the Confluent Data in Motion Tour. Sarvant Singh, Vice President of Data and Emerging Solutions at Penske, explains the company’s motivation clearly: “We’re an information-intense business. A lot of information is getting exchanged between our customers, associates, and partners. In our business, vehicle uptime and supply chain visibility are critical.”

This focus on uptime is what drove Penske to adopt a real-time data streaming platform, powered by Confluent. Today, Penske ingests and processes around 190 million IoT messages every day from its vehicles.

Each truck contains hundreds of sensors (and thousands of sub-sensors) that monitor everything from engine performance to braking systems. With this volume of data, traditional architectures fell short. Penske turned to Confluent Cloud to leverage Apache Kafka at scale as a fully-managed, elastic SaaS to eliminate the operational burden and unlocking true real-time capabilities.

By streaming sensor data through Confluent and into a proactive diagnostics engine, Penske can now predict when a vehicle may fail—before the problem arises. Maintenance can be scheduled in advance, roadside breakdowns avoided, and customer deliveries kept on track.

This approach has already prevented over 90,000 potential roadside incidents. The business impact is enormous, saving time, money, and reputation.

Other real-time use cases include:

Diagnosing issues instantly to dispatch roadside assistance faster
Triggering preventive maintenance alerts to avoid unscheduled downtime
Automating compliance for IFTA reporting using telematics data
Streamlining repair workflows through integration with electronic DVIRs (Driver Vehicle Inspection Reports)

Why Confluent for Apache Kafka?

Managing Kafka in-house was never the goal for Penske. After initially working with a different provider, they transitioned to Confluent Cloud to avoid the complexity and cost of maintaining open-source Kafka themselves.

“We’re not going to put mission-critical applications on an open source tech,” Singh noted. “Enterprise-grade applications require enterprise level support—and Confluent’s business value has been clear.”

Key reasons for choosing Confluent include:

The ability to scale rapidly without manual rebalancing
Enterprise tooling, including stream governance and connectors
Seamless integration with AI and analytics engines
Reduced time to market and improved uptime

Data Streaming and AI in Action at Penske

Penske’s investment in AI began in 2015, long before it became a mainstream trend. Early use cases included Erica, a virtual assistant that helps customers manage vehicle reservations. Today, AI is being used to reduce repair times, predict failures, and improve customer service experiences.

By combining real-time data with machine learning, Penske can offer more reliable services and automate decisions that previously required human intervention. AI-enabled diagnostics, proactive maintenance, and conversational assistants are already delivering measurable benefits.

The company is also exploring the role of generative AI. Singh highlighted the potential of technologies like ChatGPT for enterprise applications—but also stressed the importance of controls: “Configuration for risk tolerance is going to be the key. Traceability, explainability, and anomaly detection must be built in.”

Fleet Intelligence in Action: Measurable Business Value Through Data Streaming

For a company operating hundreds of thousands of vehicles, the stakes are high. Penske’s real-time architecture has improved uptime, accelerated response times, and empowered technicians and drivers with better tools.

The business outcomes are clear:

Fewer breakdowns and delays
Faster resolution of vehicle issues
Streamlined operations and reporting
Better customer and driver experience
Scalable infrastructure for new services, including electric vehicle fleets

With 165,000 vehicles already connected to Confluent and more being added as EV adoption grows, Penske is just getting started.

The Road Ahead: Agentic AI and the Next Evolution of Event-Driven Architecture Powered By Apache Kafka

The future of logistics will be defined by intelligent, real-time systems that coordinate not just vehicles, but entire networks. As Penske scales its edge computing and expands its use of remote sensing and autonomous technologies, the role of data streaming will only increase.

Agentic AI—systems that act autonomously based on real-time context—will require seamless integration of telematics, edge analytics, and cloud intelligence. This demands a resilient, flexible event-driven foundation. I explored the general idea in a dedicated article: How Apache Kafka and Flink Power Event-Driven Agentic AI in Real Time.

Penske’s journey shows that real-time data streaming is not only possible—it’s practical, scalable, and deeply transformative. The combination of a data streaming platform, sensor analytics, and AI allows the company to turn every vehicle into a smart, connected node in a global supply chain.

For logistics providers seeking to modernize, the path is clear. It starts with streaming data—and the possibilities grow from there. Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases.

The post How Penske Logistics Transforms Fleet Intelligence with Data Streaming and AI appeared first on Kai Waehner.

Real-Time Logistics, Shipping, and Transportation with Apache Kafka

Kai Waehner — Thu, 29 Sep 2022 05:48:00 +0000

Logistics, shipping, and transportation require real-time information to build efficient applications and innovative business models. Data streaming enables correlated decisions, recommendations, and alerts. Kafka is everywhere across the industry. This blog post explores several real-world case studies from companies such as USPS, Swiss Post, Austrian Post, DHL, and Hermes. Use cases include cloud-native middleware modernization, track and trace, and predictive routing and ETA planning.

Logistics and transportation

Logistics is the detailed organization and implementation of a complex operation. It manages the flow of things between the point of origin and the point of consumption to meet the requirements of customers or corporations. The resources managed in logistics may include tangible goods such as materials, equipment, and supplies, as well as food and other consumable items.

Logistics management is the part of supply chain management (SCM) and supply chain engineering that plans, implements, and controls the efficient, effective forward, and reverse flow and storage of goods, services, and related information between the point of origin and the point of consumption to meet customers’ requirements.

The evolution of logistics technology

Unity created an excellent overview of the future of logistics and transportation:

The diagram shows the critical technical characteristics for innovation: Digitalization, automation, connectivity, and real-time data are must-haves for optimizing logistics and transportation infrastructure.

Data streaming with Apache Kafka in the shipping industry

Real-time data is relevant everywhere in logistics and transportation. Apache Kafka is the de facto standard for real-time data streaming. Kafka works well almost everywhere. Here is an example of enterprise architecture for transporting goods across the globe:

Most companies have a cloud-first strategy. Kafka in the cloud as a fully-managed service enables project teams to focus on building applications and scale elastically depending on the needs. Use cases like big data analytics or a real-time supply chain control tower often run in the cloud today.

On-premise Kafka deployments connect to existing IT infrastructure such as Oracle databases, SAP ERP systems, and other monolith and often decade-old technology.

The edge either directly connects to the data center or cloud (if the network connection is relatively stable), or operates its own mission-critical edge Kafka cluster (e.g., on a ship) or a single broker (e.g., embedded into a drone) in a semi-connected or air-gapped environment.

Case studies for real-time transportation, shipping, and logistics with Apache Kafka

The following shows several real-world deployments of the logistics, shipping, and transportation industry for real-time data streaming with the broader Kafka ecosystem.

Swiss Post: Decentralized integration using data as an asset across the shipping pipeline

Swiss Post is the national postal service of Switzerland. Data streaming is a fundamental shift in their enterprise architecture. Swiss Post had several motivations:

Data as an asset: Management and accessibility of strategic company data
New requirements regarding the amount of event throughput (new parcel
center, loT, etc.)
Integration is not dependent on a central development team (self-service)
Empowering organization and integration skill development
Growing demand for real-time event processing (Event-driven architecture)
Providing a flexible integration technology stack (no one fits all)

The Kafka-based integration layer processes small events and large legacy files and images.

Source: Swiss Post

The shift from ETL/ESB integration middleware to event-based and scalable Kafka is an approach many companies use nowadays:

Source: Swiss Post

DHL: Parcel and letter express service with cloud-native middleware

The German logistics company DHL is a subsidiary of Deutsche Post AG. DHL Express is the market leader for parcel services in Europe.

Like the Swiss Post, DHL modernized its integration architecture with data streaming. They complement MQ and ESB with data streaming powered by Kafka and Confluent. Check out the comparison between Message Queue systems and Apache Kafka to understand why adding Kafka is sometimes the better approach than initially trying to replace MQ with Kafka.

Here is the target future hybrid enterprise architecture of DHL with IBM MQ, Apache Kafka, and Spring Boot applications:

Source: DHL

This is a very common approach to modernizing middleware infrastructure. Here, the on-premise middleware based on IBM MQ and Oracle Weblogic struggles with the scale, even though we are “only” talking about a few thousand messages per second.

A few more notes about DHL’s middleware migration journey:

Migration to a cloud-native Kubernetes Microservices infrastructure
Migration to Azure Cloud planned with Cluster Linking
Mid-term: Replacement of the legacy ESB.

An interesting side note: DHL processes relatively large messages (70kb) with Kafka, resulting in hundreds of MB/sec.

Austrian Post: Track & trace parcels in the cloud with Kafka

Austrian Post leverages data streaming to track and trace parcels end-to-end across the delivery routes:

Source: Austrian Post

The infrastructure for Austrian Post’s data streaming infrastructure runs on Microsoft Azure. They evaluated three technologies with the following results in their own words:

Azure Event Hubs (fully managed, only the Kafka protocol, not true Kafka, with various limitations): Not flexible enough, limited stream processing, no schema registry.
Apache Kafka (open source, self-managed): Way too much hassle.
Confluent Cloud on Azure (fully-managed, complete platform): Selected option.

One example use case of Austrian Post is about problems with ident codes: They are not unique. Instead, they can (and will be) re-used. Shipments can have more than one ident code. Scan events for ident codes need to be added to the correct “digital twin” of parcel delivery.

Stream processing enables the implementation of such a stateful business process:

Source: Austrian Post

Hermes: Predictive delivery planning with CDC and Kafka

Hermes is another German delivery company. Their goal: Making business decisions more data-driven with real-time analytics. To achieve this goal, Hermes integrates, processes, and correlates data generated by machines, companies, humans, and interactions for predictive delivery planning.

They leverage Change Data Capture (CDC) with HVR and Kafka for real-time delivery and collection services. Databases like MongoDB and Redis provide long-term storage and analytical capabilities:

Source: Hermes

This is an excellent example of technology and architecture modernization, combining data streaming and various databases.

USPS: Digital representation of all critical assets in Kafka for real-time logistics

USPS (United States Postal Service) is by geography and volume the globe’s largest postal system. They started the Kafka journey in 2016. Today, USPS operates a hybrid multi-cloud environment including real-time replication across regions.

“Kafka processes every event that is important for us,” said USPS CIO Pritha Mehra at Current 2022. Kafka events process a digital representation of all assets important for USPS, including carrier movement, vehicle movement, trucks, package scans, etc. For instance, USPS processes 900 million scans per day.

One interesting use case was an immediate response to a White House directive in late 2021 to send Covid test kits to every American free of charge. Time-to-market for the project was three weeks (!). USPS processed up to 8.7 million test kits per hour with help from Kafka:

Baader: Real-time logistics for dynamic routing and ETA calculations

BAADER is a worldwide manufacturer of innovative machinery for the food processing industry. They run an IoT-based and data-driven food value chain on Confluent Cloud.

The Kafka-based infrastructure is running as a fully-managed service in the cloud. It provides a single source of truth across the factories and regions across the food value chain. Business-critical operations are available 24/7 for tracking, calculations, alerts, etc.:

MQTT provides connectivity to machines and GPS data from vehicles at the edge. Kafka Connect connectors integrate MQTT and IT systems, such as Elasticsearch, MongoDB, and AWS S3. ksqlDB processes the data in motion continuously.

Check my blog series about Kafka and MQTT for other related IoT use cases and examples.

Shippeo: A Kafka-native transportation platform for logistics providers, shippers, and carriers

Shippeo provides real-time and multimodal transportation visibility for logistics providers, shippers, and carriers. Its software uses automation and artificial intelligence to share real-time insights, enable better collaboration, and unlock your supply chain’s full potential. The platform can give instant access to predictive, real-time information for every delivery.

Shippeo integrates traditional databases (MySQL and PostgreSQL) and cloud-native data warehouses (Snowflake and BigQuery) with Apache Kafka and Debezium:

This is a terrific example of cloud-native enterprise architecture leveraging a “best of breed” approach for data warehousing and analytics. Kafka decouples the analytical workloads from the transactional systems and handles the backpressure for slow consumers.

A real-time locating system (RTLS) built with Apache Kafka

I want to end this blog post with a more concrete example of a Kafka implementation. The following picture shows a multi-purpose Kafka-native real-time locating system (RTLS) for transportation and logistics:

The example shows three use cases of how produced events (“P”) are consumed and processed:

(“C1”) Real-time alerting on a single event: Monitor assets and people and send an alert to a controller, mobile app, or any other interface if an issue happens.
(“C2”) Continuous real-time aggregation of multiple events: Correlation data while in motion. Calculate average, enforce business rules, and apply an analytic model for predictions on new events, or any other business logic.
(“C3”) Batch analytics on all historical events: Take all historical data to find insights, e.g., for analyzing past issues, planning future location requirements, or training analytic models.

The Kafka-native RTLS can run in the data center, cloud, or closer to the edge, e.g., in a factory close to the shop floor and production lines. The blog post “Real Time Locating System (RTLS) with Apache Kafka for Transportation and Logistics” explores this use case in more detail.

The logistics and transportation industry requires Kafka-native real-time data streams!

Real-time data beats slow data. That’s true almost everywhere. But logistics, shipping, and transportation cannot build efficient and innovative business models without real-time information and correlated decisions, recommendations, and alerts. Kafka is everywhere in this industry. And it is just getting started.

After writing the blog post, I realized most case studies were from European companies. This is just accidentally. I assure you that similar companies in the US, Asia, or Australia have built or are building similar enterprise architectures.

If you still want to learn more, here are more related blog posts:

Real-time supply chain optimization with Apache Kafka
End-to-end visibility with a Kafka-powered supply chain control tower
Data streaming with Kafka in the aviation, airline, and travel industry.
Improving the customer experience in transportation at Deutsche Bahn (German railway) with Kafka.
Supply chain integration across the food value chain with Kafka

What role plays data streaming in your logistics and transportation scenarios? Do you run everything around Kafka in the cloud or operate hybrid edge scenarios? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Real-Time Logistics, Shipping, and Transportation with Apache Kafka appeared first on Kai Waehner.

Apache Kafka in the Public Sector – Part 2: Smart City

Kai Waehner — Tue, 12 Oct 2021 07:48:48 +0000

The public sector includes many different areas. Some groups leverage cutting-edge technology, like military leverage. Others like the public administration are years or even decades behind. This blog series explores how the public sector leverages data in motion powered by Apache Kafka to add value for innovative new applications and modernizing legacy IT infrastructures. This post is part 2: Use cases and architectures for a Smart City.

Blog series: Apache Kafka in the Public Sector and Government

This blog series explores why many governments and public infrastructure sectors leverage event streaming for various use cases. Learn about real-world deployments and different architectures for Kafka in the public sector:

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts once published.

As a side note: If you wonder why healthcare is not on the above list. Healthcare is another blog series on its own. While the government can provide public health care through national healthcare systems, it is part of the private sector in many other cases.

Real-time is Mandatory for a Smart City Everywhere

I wrote a lot about event streaming and Apache Kafka for smart city infrastructure and use cases. I won’t repeat myself. Check out the following event Streaming with Kafka as Foundation for a Smart City and Apache Kafka and MQTT for the Last Mile IoT integration in a Smart City.

This post dives deeper into architectural questions and how collaboration with 3rd party services can look from the government’s perspective and public administration of a smart city.

The Need for Real-time Data Processing Everywhere in a Smart City and how Kafka helps

A smart city is a very complex beast. I am glad that I only cover technology and not regulatory or political discussions. However, even the technology standpoint is not straightforward. A smart city needs to correlate data across data centers, devices, vehicles, and many other things. This scenario is an actual internet of things (IoT) and therefore includes plenty of different technologies, communication paradigms, and infrastructures:

Smart city projects require the integration of various 1st party and 3rd party services. Most use cases only work well if that data is correlated in real-time; think about traffic routing, emergency alerts, predictive monitoring and maintenance, mobility services such as ride-hailing, and other fancy smart city use cases. Without real-time data processing, the use case is either a bad user experience or not cost-efficient. Hence, Kafka is adopted more and more for these scenarios.

Low Latency and 5G Networks for (some) Data Streaming Use Cases

The term “real-time” needs to be defined. Processing data in a few seconds is good enough in most use cases and a significant game-changer compared to hourly, daily, or weekly batch processing.

Having said this, some use cases like location-based upselling in retail or condition monitoring in equipment and manufacturing require lower latency, meaning sub-second end-to-end data processing.

Here is an example of leveraging 5G networks for low latency. The demo was built by the AWS Wavelength team, Verizon, and Confluent:

Most real-world deployments use separation of concerns: Low-latency use cases run at the edge and everything else in the regular data center or public cloud region. Read the article “Low Latency Data Streaming with Apache Kafka and Cloud-Native 5G Infrastructure” for more details.

At this point, it is important to remind everybody that Kafka (and any IT software) is not hard real-time and not built for the OT world and embedded systems. Learn more in the article “Kafka is NOT hard real-time but soft real-time“. Also, (soft) real-time is not competitive to batch processing and data warehouse/data lake architecture. As you can learn in “Serverless Kafka in a Cloud-native Data Lake Architecture” it is complimentary.

Collaboration between Government, City, and 3rd Party via Open API

Real-time data processing is crucial in implementing smart city use cases. Additionally, most smart city projects require collaboration between different teams, infrastructures, and 3rd party services.

Let’s take a look at three very different real-world event streaming deployments to see the broad spectrum of use cases and integration challenges:

Ohio Department of Transportation’s government-owned event streaming platform
Deutsche Bahn’s single source of truth for customer communication in real-time and 3rd party integration with the Google Maps API
Free Now’s mobility service in the cloud for real-time data correlation in compliance with regional laws and independent vehicles/drivers.

Ohio Department of Transportation (ODOT) – A Government-Owned Event Streaming Platform

Ohio Department of Transportation (ODOT) has an exciting initiative: DriveOhio. It aims to organize and accelerate smart vehicle and connected vehicle projects in the State of Ohio. DriveOhio offers to be the single point of contact for policymakers, agencies, researchers, and private companies to collaborate with one another on intelligent transportation efforts around the state.

ODOT presented their real-time data transportation data platform at the last Kafka Summit Americas:

The whole Kafka ecosystem powers ODOT’s cloud-native Event Streaming Platform (ESP). The platform enables continuous data integration and stream processing for transactional and analytical workloads. The ESP runs on Kubernetes to provide an elastic, flexible, and scalable infrastructure for real-time data processing.

Deutsche Bahn – Single Source of Truth and Google Maps Integration in Real-time

Deutsche Bahn is a German railway company. It is a private joint-stock company (AG), with the Federal Republic of Germany being its single shareholder. I already talked about their real-time traveler information system in another blog post: “Mobility Services and Transportation powered by Apache Kafka“.

They leverage the Apache Kafka ecosystem powered by Confluent because it combines several characteristics that you would have to integrate with different technologies otherwise:

Real-time messaging
Data integration
Data correlation
Storage and caching
Replication and high availability
Elastic scalability

This example is excellent for this blog. It shows how an existing solution needs connectivity to other internal applications and 3rd party services to provide a better customer experience and expand the customer base.

Recently, Deutsche Bahn integrated its platform with Google Maps via Google’s Open API. In addition to a better customer experience, the railway company can reach out to many new end-users to expand their business. The Railway-News has a good article about this integration. Here is my summary:

Free Now – Mobility Service in the Cloud Connected to Regional Laws and Vehicles

Free Now (former MyTaxi) is a mobility service. Their app uses mobile and GPS technology to match taxi drivers with passengers based on availability and proximity. Mobility services need to integrate with other 3rd party services for routing, payment, tax implications, and many different use cases.

Here is one example from Free Now’s Kafka Summit talk where they explain the added value of continuous stream processing for calculating context-specific dynamic pricing:

The public administration is always involved when a new mobility service is released to the public. While some cities build their mobility services, the reality is that most governments provide the infrastructure together with the Telco providers, and 3rd party vendors provide the mobility service. The specific relationship between the government, city, and mobility service provider differs across regions, countries, and continents.

Almost every mobility service uses Kafka as its backbone. Google for your favorite mobility service across the globe and add “Kafka” to the search. Chances are very high that you find some excellent blog posts, conferences talks, or at least job offers from the mobility service’s recruiting page. Here are just a few examples that posted great content about their Kafka usage: Uber, Lyft, Grab, Otonomo, Here Technologies, and many more.

Data in Motion with Kafka for a Connected and Innovative Smart City

Smart City is a vast topic. Many stakeholders are involved. Collaboration and Open APIs are critical for success. In most cases, governments work together with telco providers, infrastructure providers such as the cloud hyperscalers, and software vendors (including an event streaming platform like Kafka).

Most valuable and innovative smart city use cases require data processing in real-time. The use cases require data integration, storage, and backpressure handling, and data correlation. Event Streaming is the ideal technology for these use cases. Examples from the Ohio Department of Transportation, Deutsche Bahn and its Google Maps integration, and Free Now showed a few different angles to realize successful smart city projects.

How do you leverage event streaming in the public sector? Are you working on smart city projects? What technologies and architectures do you use? What projects did you already work on or are in the planning? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka in the Public Sector – Part 2: Smart City appeared first on Kai Waehner.

Apache Kafka and MQTT (Part 4 of 5) – Mobility Services and Transportation

Kai Waehner — Thu, 25 Mar 2021 10:48:00 +0000

Apache Kafka and MQTT are a perfect combination for many IoT use cases. This blog series covers the pros and cons of both technologies. Various use cases across industries, including connected vehicles, manufacturing, mobility services, and smart city are explored. The examples use different architectures, including lightweight edge scenarios, hybrid integrations, and serverless cloud solutions. This post is part four: Mobility Services and Transportation.

Apache Kafka + MQTT Blog Series

The first blog post explores the relation between MQTT and Apache Kafka. Afterward, the other four blog posts discuss various use cases, architectures, and reference deployments.

Part 1 – Overview: Relation between Kafka and MQTT, pros and cons, architectures
Part 2 – Connected Vehicles: MQTT and Kafka in a private cloud on Kubernetes; use case: remote control and command of a car
Part 3 – Manufacturing: MQTT and Kafka at the edge in a smart factory; use case: Bidirectional OT-IT integration with Sparkplug between PLCs, IoT Gateways, Data Historian, MES, ERP, Data Lake, etc.
Part 4 – Mobility Services (THIS POST): MQTT and Kafka leveraging serverless cloud infrastructure; use case: Traffic jam prediction service using machine learning
Part 5 – Smart City: MQTT at the edge connected to fully-managed Kafka in the public cloud; use case: Intelligent traffic routing by combining and correlating 3rd party services

Subscribe to my newsletter to get updates immediately after the publication. Besides, I will also update the above list with direct links to this blog series’s posts as soon as published.

Use Case: Mobility as a Service (MaaS) and Transportation

Transportation is changing significantly these days. Mobility Services – often called Mobility-as-a-Service (MaaS) – is a type of service that through a joint digital channel enables users to plan, book, and pay for multiple types of mobility services.

The concept describes a shift away from personally-owned modes of transportation and towards mobility provided as a service. This is enabled by combining transportation services from public and private transportation providers through a unified gateway that creates and manages the trip, which users can pay for with a single account. Users can pay per trip or a monthly fee for a limited distance. The key concept behind MaaS is to offer travelers mobility solutions based on their travel needs. Specialist urban mobility applications are also expanding their offerings to enable MaaS, such as Transit, Uber, and Lyft.

Travel planning typically begins in a journey planner. For example, a trip planner can show that the user can get from one destination to another by using a train/bus combination. The user can then choose their preferred trip based on cost, time, and convenience. At that point, any necessary bookings (e.g. calling a taxi, reserving a seat on a long-distance train) would be performed as a unit. It is expected that this service should allow roaming, that is, the same end-user app should work in different cities, without the user needing to become familiar with a new app or to sign up for new services.

As you can hopefully already imagine, plenty of innovative new use cases are possible by combining Apache Kafka and MQTT for MaaS. And most of these scenarios require data integration and data processing at scale in real-time.

Architecture: MQTT and Kafka for Mobility Services

Mobility services are often separated from other core IT infrastructure. MaaS – as the term says – is just consumed as a service. Hence, most mobility services I have seen run in the cloud. The following diagram shows an intelligent navigation service built with MQTT, Kafka, and Machine Learning:

A few notes on the architecture:

As mobility services connect to moving vehicles, smartphones, or other things, the cloud is perfect. No need to operate the infrastructure. Just focus on building applications.
Many mobility services integrate other 1st or 3rd party services. For instance, there is no need to build yet another mapping service. If you need one for building your innovative new application, just embed HERE Technologies (that actually provides a public Kafka interface as the preferred integration option instead of HTTP!) or any other available mapping service.
Regional services with low latency are often very relevant for mobility services. Hence, multiple MQTT and Kafka clusters are the norm, not an exception.

Let’s take a look at some real-world examples for cutting-edge mobility services in the transportation industry.

Example: Cloud Ecosystem for Next-Generation Mobility @ ZF

ZF Friedrichshafen AG is a global automotive supplier that enables vehicles to see, think, and act. With a broad range of systems for passenger cars, commercial vehicles, and industrial technology, ZF offers comprehensive solutions for established vehicle manufacturers as well as newly emerging transport and mobility service providers.

ZFs Connectivity Suite enables new business models for mobility as a service (MaaS) and transportation as a service (TaaS). The ProCV gateway device allows each vehicle to communicate using MQTT. The gateway provides a secure and reliable channel for transferring telemetry data from the car to the cloud and remote commands from the cloud to each vehicle.

Applications can exchange data such as real-time positioning information, remote commands to the vehicle, and vehicle-generated alerts. Some possible use cases:

Remote diagnostics for technical insight & management of vehicle performance
Fleet monitoring
Secure & reliable middleware between connected vehicles & cloud services

Read the case study from HiveMQ for more details about ZF’s IoT gateway.

Example: Real-Time Traveler Information @ Deutsche Bahn

Deutsche Bahn (the German railway) has a very complex network of short-distance and long-distance trains. Hence, delays and cancellations are common, not an exception. Hence, at least the traveler information should work well to send real-time notifications to customers.

For that reason, Deutsche Bahn has built a single source of truth traveler information platform with Confluent:

The mobility service integrates via Kafka with many legacy and modern applications. The mobile app shows real-time status updates about each train. While train delays and cancellations cannot be avoided completely, the app at least allows you to get to a lounge or grab a coffee if the delay is more than just a few minutes. I use the app every week myself and can confirm that the customer experience improved significantly.

Not every interface is or will be real-time. Kafka helps!

Fun fact: The first proof of concept to build this traveler information app used a traditional messaging queue. In theory, this is is sufficient as you “just” need to send status updates in real-time to the mobile app. Unfortunately, a few issues came up quickly:

Not every interface is real-time! In addition to messaging sources such as JMS or MQTT, the platform needed to integrate with databases, files, web services, and other legacy systems. Hence, data integration and is a key piece of the puzzle.
Data storage is important to handle backpressure and decouple applications. Slow consumers fall behind. Analytics workloads take data in batches, not in real-time. Web applications consume specific events via request-response queries.
Sending events from A to B is just part of the problem! The real added value comes by correlating the data from streaming and non-streaming applications and databases in real-time. Kafka-native Stream processing frameworks such as Kafka Streams or ksqlDB help to process data in motion.

For the above reasons, Deutsche Bahn re-started their proof of concept. Their existing project used different frameworks for messaging, caching, integration, and processing. This setup was replaced with Kafka. Kafka Connect integrates applications and databases. Kafka Streams processes the data in motion. The Kafka storage handles backpressure and slow consumers. All of this is built into Kafka out-of-the-box. And it scales much better. Today, the traveler information system is live and creates a much better customer experience.

Find more details about the traveler information system from Deutsche Bahn in their Confluent blog post.

Kafka + MQTT = Mobility Services and Transportation

In conclusion, Apache Kafka and MQTT are a perfect combination for mobility services and transportation. Follow the blog series to learn about use cases such as connected vehicles, manufacturing, mobility services, and smart city. Every blog post also includes real-world deployments from companies across industries. It is key to understand the different architectural options to make the right choice for your project.

What are your experiences and plans in IoT projects? What use case and architecture did you implement? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Apache Kafka and MQTT (Part 4 of 5) – Mobility Services and Transportation appeared first on Kai Waehner.

Infrastructure Checklist for Apache Kafka at the Edge

Kai Waehner — Wed, 03 Feb 2021 12:39:30 +0000

Event streaming with Apache Kafka at the edge is getting more and more traction these days. It is a common approach to providing the same open, flexible, and scalable architecture in the cloud and at the edge outside the data center. Possible locations for Kafka edge deployments include retail stores, cell towers, trains, small factories, restaurants, hospitals, stadiums, etc. This post explores a checklist with infrastructure questions you need to check and evaluate if you want to deploy Kafka at the edge.

Apache Kafka at the Edge == Outside the Data Center

I already discussed the concepts and architectures of Kafka at the edge in detail in the past:

This blog post explores a checklist of common infrastructure questions you need to answer and doublecheck before planning to deploy Kafka at the edge.

What is the Edge?

The term ‘edge’ needs to be defined to have the same understanding. When I talk about the edge in the context of Kafka, it means:

Edge is NOT a data center, i.e., limited compute, storage, network bandwidth
Kafka clients AND the Kafka broker(s) deployed here, not just the client applications
Offline business continuity, i.e., the workloads continue to work even if there is no connection to the cloud
Often 100+ locations, like restaurants, coffee shops, or retail stores, or even embedded into 1000s of devices or machines
Low-footprint and low-touch, i.e., Kafka can run as a normal highly available cluster or as a single broker (no cluster, no high availability); often shipped “as a preconfigured box” in OEM hardware (e.g., Hivecell)
Hybrid integration, i.e., most use cases require uni- or bidirectional communication with a remote Kafka cluster in a data center or the cloud

Let’s recap one architecture example that deploys Kafka in the cloud and at the edge: A hybrid event streaming architecture for real-time omnichannel retail and customer 360:

This definition of a ‘Kafka edge deployment‘ can also be summarized as an ‘autonomous edge‘ or ‘disconnected edge‘. On the other side, the ‘connected edge’ means that Kafka clients at the edge connect directly to a remote data center or cloud.

Infrastructure Checklist: How to Deploy Apache Kafka at the Edge?

I talked to 100+ customers and prospects across industries with the need to do edge computing for different reasons, including bad internet connection, reduced cost, low latency requirements, and security implications.

The following discussion points and questions come up all the time. Make sure to discuss them with your project team:

What are the use cases for Kafka at the edge? For instance, edge processing (e.g., business logic/analytics), replication to the cloud (uni- or bi-directional), data integration (e.g., 0 to devices, IoT gateways, local databases)?
What is the data model, and what the replication scenarios and SLAs (aggregation to “just gather data”, command&control to send data back to the edge, local analytics, etc.)? Check out Kafka-native replication tools, especially MirrorMaker 2 and Confluent’s Cluster Linking.
What is the main motivation for doing edge processing (vs. ingestion into a DC/cloud for all processing)? Examples: Low latency requirements, cost-efficiency, business continuity even when offline / disconnected from the cloud, etc.
How many “edge sites” do you plan to deploy to (e.g., retail stores, factories, restaurants, trains, …)? This needs to be considered from the beginning. If you want to roll out edge computing to thousands of restaurants, you need a different hardware and automation strategy than deploying to just ten smart factories worldwide.
What hardware do you use at the edge (e.g., hardware specifications)? How much memory, disk, CPU, etc., is available? Do you work with a specific hardware vendor? What are the support model and monitoring setup for the edge computers?
What network do you use? Is it stable? What is the connection to the cloud? If it is a stable connection (like AWS DirectConnect or Azure ExpressRoute), do you still need Kafka at the edge?
What is the infrastructure you plan to run Kafka on at the edge (e.g., operating system, container, Kubernetes, etc.)?
Do you need high availability and a ‘real’ Kafka cluster with 3+ brokers? Or is a single broker good enough? In many cases, the latter is good enough to decouple edge and cloud, handle backpressure, and enable business continuity even if the internet connection is gone for some time.
What edge protocols do you need to integrate with? is Kafka Connect sufficient with its connectors, or do you need a 3rd party IoT gateway? Common integration points at the edge are OPC UA, MQTT, proprietary PLC, traditional relational databases, files, IoT Gateways, etc.
Do you need to process the data at the edge? Kafka-native stream processing with Kafka Streams or ksqlDB is usually a straightforward and lightweight, but still scalable and reliable option. Almost all use cases I have seen at least need some streaming ETL at the edge. For instance, preprocess and filter data so that you only send relevant, aggregated data over the network to the cloud. However, many customers also deploy business applications at the edge, for instance, for real-time model inference.

How will fleet management work? Which part of the infrastructure or tool handles the management and operations of the edge machines. In most cases, this is not specific for Kafka but instead handled on the infrastructure level. For instance, if you run a Kubernetes cluster, Rancher might be used to provision and manage the edge clusters, including the Kafka ecosystem. Of course, specific Kafka metrics are also integrated here, for instance via Prometheus if you are using Kubernetes.

Discussing and answering these questions will help you with your planning for Kafka at the edge. Are there any key questions missing? Please let me know and I will update the list.

Kafka at the Edge is the new Black!

Apache Kafka at the edge is a common approach to providing the same open, flexible, and scalable architecture in the cloud and outside the data center. A huge benefit is that the same technology and architecture and be deployed everywhere across regions, sites, and clouds. This is a real hybrid architecture combing edge sites, data centers, and multiple clouds! Discuss the above infrastructure checklist with your team to be successful.

What are your experiences and plans for event streaming with Apache Kafka at the edge? Did you already deploy Apache Kafka on a small node somewhere, maybe even as a single broker setup? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Infrastructure Checklist for Apache Kafka at the Edge appeared first on Kai Waehner.

Real Time Locating System (RTLS) with Apache Kafka for Transportation and Logistics

Kai Waehner — Thu, 07 Jan 2021 07:49:41 +0000

Real-Time Locating System (RTLS) enables identifying and tracking the location of objects or people in real-time. It is used everywhere in transportation and logistics across industries. A postmodern RTLS requires an open architecture and high scalability. This blog post explores the use cases for RTLS, the challenges of existing implementations, and why more and more RTLS implementations rely on Apache Kafka as an open, scalable, and reliable event streaming platform.

Real-Time Locating / Tracking System (RTLS) in Supply Chain and Logistics

RTLS is a key part of many use cases across verticals. Many manufacturing processes and supply chains rely on good real-time information of assets and people. But also, other innovative scenarios could not exist without RTLS. For instance, think about ride-sharing, car-sharing, or food delivery.

An RTLS enables identifying and tracking the location of objects or people in real-time. Some examples:

Tracking automobiles through an assembly line
Locating pallets of merchandise in a warehouse
Finding medical equipment in a hospital
Track tools, machines, people (if legal) in a construction area

An RTLS has three key goals:

Improve safety
Control security
Optimize processes and productivity

Wireless RTLS tags are attached to objects or worn by people, and in most RTLS, fixed reference points receive wireless signals from tags to determine their location. However, more and more use cases require outdoors tracking, too. In many cases, a postmodern RTLS combines indoors and outdoors location tracking.

Challenges of Today’s Location and Tracking Systems

RTLS exist for a long time, already. Plenty of products are available on the market. While they differ in their characteristics and features, most traditional RTLS have at least some of the following technical challenges:

Monolithic
Proprietary
Limited Scalability
No Hardware Flexibility
Single Purpose Solution
Limited Integration Capabilities
Limited Tracking Technologies

Many vendors invest in their RTLS system. Similarly to CRM, ERP, and MES systems, many of the next generation RTLS systems are based on Kafka to solve these challenges. So feel free to check the above characteristics with your favorite vendor and how they plan to solve (or have already solved) them.

Many enterprises prefer building their own custom postmodern RTLS. This approach allows an open, flexible solution. Custom RTLS are typically built to include innovative and differentiating features that add business value and optimize the business processes.

A Postmodern RTLS for Multi-Purpose Use Cases and Architectures

From my conversations with customers across industries, I learned that use cases and requirements for RTLS changed in the last years. In addition to solving the above technical challenges, Two key differences establish a postmodern view on how to define an RTLS:

RTLS is not just about location anymore. Applications leverage enhanced metadata such as speed, direction, or spatial orientation. Data integration and correlation is key for adding business value and improving processes.
The combination of indoors and outdoors via hybrid architectures enables multi-purpose RTLS.

Some examples for indoors location tracking: Asset tracking monitoring, non-linear production line, geofencing for safety (cobots) and distance enforcement (e.g., Covid 19). Outdoors track&trace enables regional or global logistics, routing, and end-to-end monitoring (e.g., construction areas).

A key requirement of modern RTLS is the ability to integrate with different technologies. This includes Location Tracking Technologies such as Radiofrequency (RF), Infrared (IR), RFID, Beacon, Wi-Fi, Bluetooth, UWB, GPS, GSM, 5G, etc. But that’s not all. The RTLS also needs to integrate with the rest of the enterprise reliably in real-time at scale. This includes MES, ERP, APS, CRM, data lakes, and many other applications.

Use Cases for a Postmodern RTLS

Many use cases exist to leverage a postmodern RTLS to improve processes or build innovative new applications that were not possible beforehand. Some examples:

Locate and manage assets within a facility, such as finding a misplaced tool cart in a warehouse or medical equipment
Notification of new locations, such as an alert if a tool cart improperly has left the facility
Combine identity of multiple items placed in a single location, such as on a pallet
Locate customers, for example, in a restaurant, for delivery of food or service
Maintain proper staffing levels of operational areas, such as ensuring guards are in the proper locations in a correctional facility
Quickly and automatically account for all staff after or during an emergency evacuation
Automatically track and time stamp the progress of people or assets through a process, such as following a patient’s emergency room wait time, time spent in the operating room, and total time until discharge
Clinical-grade locating to support acute care capacity management
Replay past events to understand the mass movements of workflows
Plan future location requirements
Auditing for compliance cases
Etc.

Two important notes here:

Many use cases exist for a long time already. But once again: Check out the challenges discussed above. The requirements change regarding scale, flexibility, and other characteristics.
As you can see, most of these use cases do not just require location tracking but also data correlation in real-time. That’s where the optimization or added business value comes from.

Vehicle Tracking System in other Industries

Transportation and logistics are the obvious industries for real-time tracking systems. But industries not traditionally known to use vehicle tracking systems have started to use it in creative ways to improve their processes or businesses. Here are a few examples:

The hospitality industry has caught on to this technology to improve customer service. For example, a luxury hotel in Singapore has installed vehicle tracking systems in their limousines to ensure they can welcome their VIPs when they reach the hotel.
Vehicle tracking systems used in food delivery vans may alert if the temperature of the refrigerated compartment moves outside of the range of safe food storage temperatures.
Car rental companies are also using it to monitor their rental fleets.

The following sections explore an example using the scenario around transportation and logistics with truck delivery. Let’s look at how Apache Kafka and Event Streaming can help implement a postmodern RTLS.

Kafka-native Real-Time Locating / Tracking System (RTLS)

The following picture shows a multi-purpose Kafka-native RTLS for transportation and logistics:

The example shows three use cases of how produced events (“P”) are consumed and processed:

(“C1”) Real-time alerting on a single event: Monitor assets and people and send an alert to a controller, mobile app, or any other interface if an issue happens.
(“C2”) Continuous real-time aggregation of multiple events: Correlation data while it is in motion. Calculate average, enforce business rules, apply an analytic model for predictions on new events, or any other business logic.
(“C3”) Batch analytics on all historical events: Take all historical data to find insights, e.g., for analyzing issues of the past, planning future location requirements, or training analytic models.

The Kafka-native RTLS can run in the data center, cloud, or closer to the edge, e.g., in a factory close to the shop floor and production lines.

Hybrid Kafka Architecture for Transportation and Logistics for RTLS and Track&Trace

One of the benefits of Apache Kafka is the freedom to deploy the infrastructure as needed. On the one end, Kafka can be deployed as a single broker in a vehicle (like a truck or train). On the other end, a global Kafka infrastructure can spread multiple cloud providers, regions, countries, or even continents and integrate with tens or hundreds of factories or other edge locations. The reality is often somewhere in the middle. Most enterprises start small and roll it out across locations and countries over time.

The following shows a pretty powerful hybrid architecture for a Kafka-native RTLS:

In the above scenario, the hybrid architecture includes:

A 5G infrastructure with public telco and private 5G Campus networks
Confluent Cloud as fully-managed event streaming platform in the cloud
Confluent Platform deployed at the edge in the 5G Campus leveraging AWS Wavelength
Real-time integration with assets and people at the edge and in the cloud
Real-time integration with enterprise applications such as APS, CRM, or ERP systems
Data correlation of edge and cloud data (replicated bi-directionally in real-time with tools such as Confluent’s Cluster Linking or Apache Kafka’s MirrorMaker 2)

This is obviously just one sample architecture. Again, you are totally free to design your own architecture with the components and technologies you need for your use cases.

An RTLS system is heavily connected to the whole Supply Chain Management (SCM) process. As Kafka plays a key role in many supply chains, it is also a perfect fit for building real-time asset tracking.

Let’s now move over to two public use cases for location-based transportation and logistics with Kafka-native technologies.

Example: Bosch – Location-based Construction Site Management

The global supplier Bosch has a track&trace application leveraging Apache Kafka and Confluent Cloud: Construction site management analyzing sensors, machines, and workers.

Use cases include collaborative planning, inventory and asset management, and track, manage, and locate tools and equipment anytime and anywhere:

The example is close to the hybrid architecture I showed in the last section. The solution spans multiple construction areas in various regions and integrates with the event streaming platform running in the cloud.

Let’s now take a look at another advanced use case for a real-time location service.

Location-Analytics and Geofencing with Kafka and ksqlDB

A geofence is a virtual perimeter for a real-world geographic area and is used for location-analytics in real-time. A geo-fence could be dynamically generated—as in a radius around a point location, or a geo-fence can be a predefined set of boundaries (such as school zones or neighborhood boundaries).

The use of a geofence is called geofencing. One example of usage involves a location-aware device of a location-based service (LBS) user entering or exiting a geo-fence. This activity could trigger an alert to the device’s user and message to the geo-fence operator. Or, in the case of a factory, it could enforce distancing during Covid 19 times.

Guido Schmutz from Trivadis has done great work on this topic: “Location Analytics and Real-time Geofencing using Apache Kafka and KSQL“. It is actually quite simple to implement with KSQL:

These ksqlDB queries create continuous stream processing that analyses and correlates sensor data in motion in real-time. As ksqlDB is a Kafka-native technology, it is possible to process millions of events per second in a reliable, scalable, and secure way.

Example: Lyft – Real-Time Map-Matching to Provide Accurate Locations

The ride-sharing giant Lyft shared a great example for location analytics in real-time. Lyft implemented map-matching to track customers based on the GPS information of the mobile app.

Lyft has “two main use cases for map-matching:

At the end of a ride, to compute the distance traveled by a driver to calculate the fare.
In real-time, to provide accurate locations to the ETA team and make dispatch decisions as well as to display the drivers’ cars on the rider app.“

As the signal is often weak, Lyft enhanced and correlated the data with other data sets to get more accurate information. For instance, Lyft also uses location data from public free Wi-Fi hotspots close to the customer.

This is a great outdoors example of a modern, scalable RTLS. And once again, this example shows that the real added value of real-time data is the data correlation. It does not help if you only use real-time messaging and process the data in batch mode in a data lake.

Open, Scalable, Multi-Purpose, Real-Time RTLS based on Kafka is the New Black

Real-Time Locating System (RTLS) enables identifying and tracking the location of objects or people in real-time. This is not a new problem. But the requirements changed…

A postmodern RTLS provides an open architecture and high scalability. For this reason, more and more RTLS implementations rely on Apache Kafka as an open, scalable, and reliable event streaming platform.

Last but not least, if you wonder what the term “real-time” actually means in “RTLS” (no matter if Kafka-based or not), check out the article “Apache Kafka is NOT Hard Real-Time BUT Used Everywhere in Automotive and Industrial IoT” to understand what real-time really means.

What are your experiences with RTLS architectures and applications? Did you already use Apache Kafka? Which approach works best for you? What is your strategy? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Real Time Locating System (RTLS) with Apache Kafka for Transportation and Logistics appeared first on Kai Waehner.