SAP Archives - Kai Waehner

Data Streaming Meets the SAP Ecosystem and Databricks – Insights from SAP Sapphire Madrid

Kai Waehner — Wed, 28 May 2025 05:17:50 +0000

I had the opportunity to attend SAP Sapphire 2025 in Madrid—an impressive gathering of SAP customers, partners, and technology leaders from around the world. It was a massive event, bringing the global SAP community together to explore the company’s future direction, innovations, and growing ecosystem.

A key highlight was SAP’s deepening integration of Databricks as an OEM partner for AI and analytics within the SAP Business Data Cloud—showing how the ecosystem is evolving toward more open, composable architectures.

At the same time, conversations around Confluent and data streaming highlighted the critical role real-time integration plays in connecting SAP systems (including ERP, MES, DataSphere, Databricks, etc.) with the rest of the enterprise. As always, it was a great place to learn, connect, and discuss where enterprise data architecture is heading—and how technologies like data streaming are enabling that transformation.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And make sure to download my free book about data streaming use cases, focusing on industry scenarios, success stories and business value.

SAP’s Vision: Business Data Cloud, Joule, and Strategic Ecosystem Moves

SAP presented a broad and ambitious strategy centered around the SAP Business Data Cloud (BDC), SAP Joule (including its Agentic AI initiative), and strategic collaborations like SAP Databricks, SAP DataSphere, and integrations across multiple cloud platforms. The vision is clear: SAP wants to connect business processes with modern analytics, AI, and automation.

Source: SAP

For those of us working in data streaming and integration, these developments present a major opportunity. Most customers I meet globally uses SAP ERP or other products like MES, SuccessFactors, or Ariba. The relevance of real-time data streaming in this space is undeniable—and it’s growing.

Building the Bridge: Event-Driven Architecture + SAP

One of the most exciting things about SAP Sapphire is seeing how event-driven architecture is becoming more relevant—even if the conversations don’t start with “Apache Kafka” or “Data Streaming.” In the SAP ecosystem, discussions often focus on business outcomes first, then architecture second. And that’s exactly how it should be.

Many SAP customers are moving toward hybrid cloud environments, where data lives in SAP systems, Salesforce, Workday, ServiceNow, and more. There’s no longer a belief in a single, unified data model. Master Data Management (MDM) as a one-size-fits-all solution has lost its appeal, simply because the real world is more complex.

This is where data streaming with Apache Kafka, Apache Flink, etc. fits in perfectly. Event streaming enables organizations to connect their SAP solutions with the rest of the enterprise—for real-time integration across operational systems, analytics platforms, AI engines, and more. It supports transactional and analytical use cases equally well and can be tailored to each industry’s needs.

In the SAP ecosystem, customers typically don’t look for open source frameworks to assemble their own solutions—they look for a reliable, enterprise-grade platform that just works. That’s why Confluent’s data streaming platform is an excellent fit: it combines the power of Kafka and Flink with the scalability, security, governance, and cloud-native capabilities enterprises expect.

SAP, Databricks, and Confluent – A Triangular Partnership

At the event, I had some great conversations—often literally sitting between leaders from SAP and Databricks. Watching how these two players are evolving—and where Confluent fits into the picture—was eye-opening.

SAP and Databricks are working closely together, especially with the SAP Databricks OEM offering that integrates Databricks into the SAP Business Data Cloud as an embedded AI and analytics engine. SAP DataSphere also plays a central role here, serving as a gateway into SAP’s structured data.

Meanwhile, Databricks is expanding into the operational domain, not just the analytical lakehouse. After acquiring Neon (a Postgres-compatible cloud-native database), Databricks is expected to announce an additional own transactional OLTP solution soon. This shows how rapidly they’re moving beyond batch analytics into the world of operational workloads—areas where Kafka and event streaming have traditionally provided the backbone.

This trend opens up a significant opportunity for data streaming platforms like Confluent to play a central role in modern SAP data architectures. As platforms like Databricks expand their capabilities, the demand for real-time, multi-system integration and cross-platform data sharing continues to grow.

Confluent is uniquely positioned to meet this need—offering not just data movement, but also the ability to process, govern, and enrich data in motion using tools like Apache Flink, and a broad ecosystem of connectors, including those for transactional systems like SAP ERP, but also Oracle databases, IBM mainframe, and other cloud services like Snowflake, ServiceNow or Salesforce.

Data Products, Not Just Pipelines

The term “data product” was mentioned in nearly every conversation—whether from the SAP angle (business semantics and ownership), Databricks (analytics-first), or Confluent (independent, system-agnostic, streaming-native). The key message? Everyone wants real-time, reusable, discoverable data products.

This is where an event-driven architecture powered by a data streaming platform shines: Data Streaming connects everything and distributes data to both operational and analytical systems, with governance, durability, and flexibility at the core.

Confluent’s data streaming platform enables the creation of data products from a wide range of enterprise systems, complementing the SAP data products being developed within the SAP Business Data Cloud. The strength of the partnership lies in the ability to combine these assets—bringing together SAP-native data products with real-time, event-driven data products built from non-SAP systems connected through Confluent. This integration creates a unified, scalable foundation for both operational and analytical use cases across the enterprise.

Industry-Specific Use Cases to Explore the Business Value of SAP and Data Streaming

One major takeaway: in the SAP ecosystem, generic messaging around cutting edge technologies such as Apache Kafka does not work. Success comes from being well-prepared—knowing which SAP systems are involved (ECC, S/4HANA, on-prem, or cloud) and what role they play in the customer’s architecture. The conversations must be use case-driven, often tailored to industries like manufacturing, retail, logistics, or the public sector.

This level of specificity is new to many people working in the technical world of Kafka, Flink, and data streaming. Developers and architects often approach integration from a tool- or framework-centric perspective. However, SAP customers expect business-aligned solutions that address concrete pain points in their domain—whether it’s real-time order tracking in logistics, production analytics in manufacturing, or spend transparency in the public sector.

Understanding the context of SAP’s role in the business process, along with industry regulations, workflows, and legacy system constraints, is key to having meaningful conversations. For the data streaming community, this is a shift in mindset—from building pipelines to solving business problems—and it represents a major opportunity to bring strategic value to enterprise customers.

You are lucky: I just published a free ebook about data streaming use cases focusing on industry scenarios and business value: “The Ultimate Data Streaming Guide“.

Looking Forward: SAP, Data Streaming, AI, and Open Table Formats

Another theme to watch: data lake and format standardization. All cloud providers and data vendors like Databricks, Confluent or Snowflake are investing heavily in supporting open table formats like Apache Iceberg (alongside Delta Lake at Databricks) to standardize analytical integrations and reduce storage costs significantly.

SAP’s investment in Agentic AI through SAP Joule reflects a broader trend across the enterprise software landscape, with vendors like Salesforce, ServiceNow, and others embedding intelligent agents into their platforms. This creates a significant opportunity for Confluent to serve as the streaming backbone—enabling real-time coordination, integration, and decision-making across these diverse, distributed systems.

An event-driven architecture powered by data streaming is crucial for the success of Agentic AI with SAP Joule, Databricks AI agents, and other operational systems that need to be integrated into the business processes. The strategic partnership between Confluent and Databricks makes it even easier to implement end-to-end AI pipelines across the operational and analytical estates.

SAP Sapphire Madrid was a valuable reminder that data streaming is no longer a niche technology—it’s a foundation for digital transformation. Whether it’s SAP ERP, Databricks AI, or new cloud-native operational systems, a Data Streaming Platform connects them all in real time to enable new business models, better customer experiences, and operational agility.

The post Data Streaming Meets the SAP Ecosystem and Databricks – Insights from SAP Sapphire Madrid appeared first on Kai Waehner.

Databricks and Confluent in the World of Enterprise Software (with SAP as Example)

Kai Waehner — Mon, 12 May 2025 11:26:54 +0000

Modern enterprises rely heavily on operational systems like SAP ERP, Oracle, Salesforce, ServiceNow and mainframes to power critical business processes. But unlocking real-time insights and enabling AI at scale requires bridging these systems with modern analytics platforms like Databricks. This blog explores how Confluent’s data streaming platform enables seamless integration between SAP, Databricks, and other systems to support real-time decision-making, AI-driven automation, and agentic AI use cases. It explores how Confluent delivers the real-time backbone needed to build event-driven, future-proof enterprise architectures—supporting everything from inventory optimization and supply chain intelligence to embedded copilots and autonomous agents.

About the Confluent and Databricks Blog Series

This article is part of a blog series exploring the growing roles of Confluent and Databricks in modern data and AI architectures:

Blog 1: The Past, Present and Future of Confluent (The Kafka Company) and Databricks (The Spark Company)
Blog 2: Confluent Data Streaming Platform vs. Databricks Data Intelligence Platform for Data Integration and Processing
Blog 3: Shift-Left Architecture for AI and Data Warehousing with Confluent and Databricks
Blog 4 (THIS ARTICLE): Databricks and Confluent in Enterprise Software Environments (with SAP as Example)
Blog 5: Databricks and Confluent Leading Data and AI Architectures – and How They Compare to Competitors

Learn how these platforms will affect data use in businesses in future articles. Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming use cases, including technical architectures and the relation to other operational and analytical platforms like SAP and Databricks.

Most Enterprise Data Is Operational

Enterprise software systems generate a constant stream of operational data across a wide range of domains. This includes orders and inventory from SAP ERP systems, often extended with real-time production data from SAP MES. Oracle databases capture transactional data critical to core business operations, while MongoDB contributes operational data—frequently used as a CDC source or, in some cases, as a sink for analytical queries. Customer interactions are tracked in platforms like Salesforce CRM, and financial or account-related events often originate from IBM mainframes.

Together, these systems form the backbone of enterprise data, requiring seamless integration for real-time intelligence and business agility. This data is often not immediately available for analytics or AI unless it’s integrated into downstream systems.

Confluent is built to ingest and process this kind of operational data in real time. Databricks can then consume it for AI and machine learning, dashboards, or reports. Together, SAP, Confluent and Databricks create a real-time architecture for enterprise decision-making.

SAP Product Landscape for Operational and Analytical Workloads

SAP plays a foundational role in the enterprise data landscape—not just as a source of business data, but as the system of record for core operational processes across finance, supply chain, HR, and manufacturing.

On a high level, the SAP product portfolio has three categories (these days): SAP Business AI, SAP Business Data Cloud (BDC), and SAP Business Applications powered by SAP Business Technology Platform (BTP).

Source: SAP

To support both operational and analytical needs, SAP offers a portfolio of platforms and tools, while also partnering with best-in-class technologies like Databricks and Confluent.

Operational Workloads (Transactional Systems):

SAP S/4HANA – Modern ERP for core business operations
SAP ECC – Legacy ERP platform still widely deployed
SAP CRM / SCM / SRM – Domain-specific business systems
SAP Business One / Business ByDesign – ERP solutions for mid-market and subsidiaries

Analytical Workloads (Data & Analytics Platforms):

SAP Datasphere – Unified data fabric to integrate, catalog, and govern SAP and non-SAP data
SAP Analytics Cloud (SAC) – Visualization, reporting, and predictive analytics
SAP BW/4HANA – Data warehousing and modeling for SAP-centric analytics

SAP Business Data Cloud (BDC)

SAP Business Data Cloud (BDC) is a strategic initiative within SAP Business Technology Platform (BTP) that brings together SAP’s data and analytics capabilities into a unified cloud-native experience. It includes:

SAP Datasphere as the data fabric layer, enabling seamless integration of SAP and third-party data
SAP Analytics Cloud (SAC) for consuming governed data via dashboards and reports
SAP’s partnership with Databricks to allow SAP data to be analyzed alongside non-SAP sources in a lakehouse architecture
Real-time integration scenarios enabled through Confluent and Apache Kafka, bringing operational data in motion directly into SAP and Databricks environments

Together, this ecosystem supports real-time, AI-powered, and governed analytics across operational and analytical workloads—making SAP data more accessible, trustworthy, and actionable within modern cloud data architectures.

SAP Databricks OEM: Limited Scope, Full Control by SAP

SAP recently announced an OEM partnership with Databricks, embedding parts of Databricks’ serverless infrastructure into the SAP ecosystem. While this move enables tighter integration and simplified access to AI workloads within SAP, it comes with significant trade-offs. The OEM model is narrowly scoped, optimized primarily for ML and GenAI scenarios on SAP data, and lacks the openness and flexibility of native Databricks.

This integration is not intended for full-scale data engineering. Core capabilities such as workflows, streaming, Delta Live Tables, and external data connections (e.g., Snowflake, S3, MS SQL) are missing. The architecture is based on data at rest and does not embrace event-driven patterns. Compute options are limited to serverless only, with no infrastructure control. Pricing is complex and opaque, with customers often needing to license Databricks separately to unlock full capabilities.

Critically, SAP controls the entire data integration layer through its BDC Data Products, reinforcing a vendor lock-in model. While this may benefit SAP-centric organizations focused on embedded AI, it restricts broader interoperability and long-term architectural flexibility. In contrast, native Databricks, i.e., outside of SAP, offers a fully open, scalable platform with rich data engineering features across diverse environments.

Whichever Databricks option you prefer, this is where Confluent adds value—offering a truly event-driven, decoupled architecture that complements both SAP Datasphere and Databricks, whether used within or outside the SAP OEM framework.

Confluent and SAP Integration

Confluent provides native and third-party connectors to integrate with SAP systems to enable continuous, low-latency data flow across business applications.

Source: Confluent

This powers modern, event-driven use cases that go beyond traditional batch-based integrations:

Low-latency access to SAP transactional data
Integration with other operational source systems like Salesforce, Oracle, IBM Mainframe, MongoDB, or IoT platforms
Synchronization between SAP DataSphere and other data warehouse and analytics platforms such as Snowflake, Google BigQuery or Databricks
Decoupling of applications for modular architecture
Data consistency across real-time, batch and request-response APIs
Hybrid integration across any edge, on-premise or multi-cloud environments

SAP Datasphere and Confluent

To expand its role in the modern data stack, SAP introduced SAP Datasphere—a cloud-native data management solution designed to extend SAP’s reach into analytics and data integration. Datasphere aims to simplify access to SAP and non-SAP data across hybrid environments.

SAP Datasphere simplifies data access within the SAP ecosystem, but it has key drawbacks when compared to open platforms like Databricks, Snowflake, or Google BigQuery:

Closed Ecosystem: Optimized for SAP, but lacks flexibility for non-SAP integrations.
No Event Streaming: Focused on data at rest, with limited support for real-time processing or streaming architectures.
No Native Stream Processing: Relies on batch methods, adding latency and complexity for hybrid or real-time use cases.

Confluent alleviates these drawbacks and supports this strategy through bi-directional integration with SAP Datasphere. This enables real-time streaming of SAP data into Datasphere and back out to operational or analytical consumers via Apache Kafka. It allows organizations to enrich SAP data, apply real-time processing, and ensure it reaches the right systems in the right format—without waiting for overnight batch jobs or rigid ETL pipelines.

Confluent for Agentic AI with SAP Joule and Databricks

SAP is laying the foundation for agentic AI architectures with a vision centered around Joule—its generative AI copilot—and a tightly integrated data stack that includes SAP Databricks (via OEM), SAP Business Data Cloud (BDC), and a unified knowledge graph. On top of this foundation, SAP is building specialized AI agents for use cases such as customer 360, creditworthiness analysis, supply chain intelligence, and more.

Source: SAP

The architecture combines:

SAP Joule as the interface layer for generative insights and decision support
SAP’s foundational models and domain-specific knowledge graph
SAP BDC and SAP Databricks as the data and ML/AI backbone
Data from both SAP systems (ERP, CRM, HR, logistics) and non-SAP systems (e.g. clickstream, IoT, partner data, social media) from its partnership with Confluent

But here’s the catch: What happens when agents need to communicate with one another to deliver a workflow? Such Agentic systems require continuous, contextual, and event-driven data exchange—not just point-to-point API calls and nightly batch jobs.

This is where Confluent’s data streaming platform comes in as critical infrastructure.

Agentic AI with Apache Kafka as Event Broker

Confluent provides the real-time data streaming platform that connects the operational world of SAP with the analytical and AI-driven world of Databricks, enabling the continuous movement, enrichment, and sharing of data across all layers of the stack.

The above is a conceptual view on the architecture. The AI agents on the left side could be built with SAP Joule, Databricks, or any “outside” GenAI framework.

The data streaming platform helps connecting the AI agents with the reset of the enterprise architecture, both within SAP and Databricks but also beyond:

Real-time data integration from non-SAP systems (e.g., mobile apps, IoT devices, mainframes, web logs) into SAP and Databricks
True decoupling of services and agents via an event-driven architecture (EDA), replacing brittle RPC or point-to-point API calls
Event replay and auditability—critical for traceable AI systems operating in regulated environments
Streaming pipelines for feature engineering and inference: stream-based model triggering with low-latency SLAs
Support for bi-directional flows: e.g., operational triggers in SAP can be enriched by AI agents running in Databricks and pushed back into SAP via Kafka events

Without Confluent, SAP’s agentic architecture risks becoming a patchwork of stateless services bound by fragile REST endpoints—lacking the real-time responsiveness, observability, and scalability required to truly support next-generation AI orchestration.

Confluent turns the SAP + Databricks vision into a living, breathing ecosystem—where context flows continuously, agents act autonomously, and enterprises can build future-proof AI systems that scale.

Data Streaming Use Cases Across SAP Product Suites

With Confluent, organizations can support a wide range of use cases across SAP product suites, including:

Real-Time Inventory Visibility: Live updates of stock levels across warehouses and stores by streaming material movements from SAP ERP and SAP EWM, enabling faster order fulfillment and reduced stockouts.
Dynamic Pricing and Promotions: Stream sales orders and product availability in real time to trigger pricing adjustments or dynamic discounting via integration with SAP ERP and external commerce platforms.
AI-Powered Supply Chain Optimization: Combine data from SAP ERP, SAP Ariba, and external logistics platforms to power ML models that predict delays, optimize routes, and automate replenishment.
Shop Floor Event Processing: Stream sensor and machine data alongside order data from SAP MES, enabling real-time production monitoring, alerting, and throughput optimization.
Employee Lifecycle Automation: Stream employee events (e.g., onboarding, role changes) from SAP SuccessFactors to downstream IT systems (e.g., Active Directory, badge systems), improving HR operations and compliance.
Order-to-Cash Acceleration: Connect order intake (via web portals or Salesforce) to SAP ERP in real time, enabling faster order validation, invoicing, and cash flow.
Procure-to-Pay Automation: Integrate procurement events from SAP Ariba and supplier portals with ERP and financial systems to streamline approvals and monitor supplier performance continuously.
Customer 360 and CRM Synchronization: Synchronize customer master data and transactions between SAP ERP, SAP CX, and third-party CRMs like Salesforce to enable unified customer views.
Real-Time Financial Reporting: Stream financial transactions from SAP S/4HANA into cloud-based lakehouses or BI tools for near-instant reporting and compliance dashboards.
Cross-System Data Consistency: Ensure consistent master data and business events across SAP and non-SAP environments by treating SAP as a real-time event source—not just a system of record.

Example Use Case and Architecture with SAP, Databricks and Confluent

Consider a manufacturing company using SAP ERP for inventory management and Databricks for predictive maintenance. The combination of SAP Datasphere and Confluent enables seamless data integration from SAP systems, while the addition of Databricks supports advanced AI/ML applications—turning operational data into real-time, predictive insights.

With Confluent as the real-time backbone:

Machine telemetry (via MQTT or OPC-UA) and ERP events (e.g., stock levels, work orders) are streamed in real time.
Apache Flink enriches and filters the event streams—adding context like equipment metadata or location.
Tableflow publishes clean, structured data to Databricks as Delta tables for analytics and ML processing.
A predictive model hosted in a Databricks model detects potential equipment failure before it happens in a Flink application calling the remote model with low latency.
The resulting prediction is streamed back to Kafka, triggering an automated work order in SAP via event integration.

This bi-directional, event-driven pattern illustrates how Confluent enables seamless, real-time collaboration across SAP, Databricks, and IoT systems—supporting both operational and analytical use cases with a shared architecture.

Going Beyond SAP with Data Streaming

This pattern applies to other enterprise systems:

Salesforce: Stream customer interactions for real-time personalization through Salesforce Data Cloud
Oracle: Capture transactions via CDC (Change Data Capture)
ServiceNow: Monitor incidents and automate operational responses
Mainframe: Offload events from legacy applications without rewriting code
MongoDB: Sync operational data in real time to support responsive apps
Snowflake: Stream enriched operational data into Snowflake for near real-time analytics, dashboards, and data sharing across teams and partners
OpenAI (or other GenAI platforms): Feed real-time context into LLMs for AI-assisted recommendations or automation
“You name it”: Confluent’s prebuilt connectors and open APIs enable event-driven integration with virtually any enterprise system

Confluent provides the backbone for streaming data across all of these platforms—securely, reliably, and in real time.

Strategic Value for the Enterprise of Event-based Real-Time Integration with Data Streaming

Enterprise software platforms are essential. But they are often closed, slow to change, and not designed for analytics or AI.

Confluent provides real-time access to operational data from platforms like SAP. SAP Datasphere and Databricks enable analytics and AI on that data. Together, they support modern, event-driven architectures.

Use Confluent for real-time data streaming from SAP and other core systems
Use SAP Datasphere and Databricks to build analytics, reports, and AI on that data
Use Tableflow to connect the two platforms seamlessly

This modern approach to data integration delivers tangible business value, especially in complex enterprise environments. It enables real-time decision-making by allowing business logic to operate on live data instead of outdated reports. Data products become reusable assets, as a single stream can serve multiple teams and tools simultaneously. By reducing the need for batch layers and redundant processing, the total cost of ownership (TCO) is significantly lowered. The architecture is also future-proof, making it easy to integrate new systems, onboard additional consumers, and scale workflows as business needs evolve.

Beyond SAP: Enabling Agentic AI Across the Enterprise

The same architectural discussion applies across the enterprise software landscape. As vendors embed AI more deeply into their platforms, the effectiveness of these systems increasingly depends on real-time data access, continuous context propagation, and seamless interoperability.

Without an event-driven foundation, AI agents remain limited—trapped in siloed workflows and brittle API chains. Confluent provides the scalable, reliable backbone needed to enable true agentic AI in complex enterprise environments.

Examples of AI solutions driving this evolution include:

SAP Joule / Business AI – Context-aware agents and embedded AI across ERP, finance, and supply chain
Salesforce Einstein / Copilot Studio – Generative AI for CRM, service, and marketing automation built on top of Salesforce Data Cloud
ServiceNow Now Assist – Intelligent workflows and predictive automation in ITSM and Ops
Oracle Fusion AI / OCI AI Services – Embedded machine learning in ERP, HCM, and SCM
Microsoft Copilot (Dynamics / Power Platform) – AI copilots across business and low-code apps
Workday AI – Smart recommendations for finance, workforce, and HR planning
Adobe Sensei GenAI – GenAI for content creation and digital experience optimization
IBM watsonx – Governed AI foundation for enterprise use cases and data products
Infor Coleman AI – Industry-specific AI for supply chain and manufacturing systems
All the “traditional” cloud providers and data platforms such as Snowflake with Cortex, Microsoft Azure Fabric, AWS SageMaker, AWS Bedrock, and GCP Vertex AI

Each of these platforms benefits from a streaming-first architecture that enables real-time decisions, reusable data, and smarter automation across the business.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch. And download my free book about data streaming use cases, including technical architectures and the relation to other operational and analytical platforms like SAP and Databricks.

The post Databricks and Confluent in the World of Enterprise Software (with SAP as Example) appeared first on Kai Waehner.

How Siemens Healthineers Leverages Data Streaming with Apache Kafka and Flink in Manufacturing and Healthcare

Kai Waehner — Tue, 17 Dec 2024 05:58:17 +0000

Siemens Healthineers, a global leader in medical technology, delivers solutions that improve patient outcomes and empower healthcare professionals. As part of the Siemens AG family, Siemens Healthineers stands out with innovative products, data-driven solutions, and services designed to optimize workflows, improve precision, and enhance efficiency in healthcare systems worldwide. A significant aspect of their technological prowess lies in their use of data streaming to unlock real-time insights and optimize processes. This blog post explores how Siemens Healthineers uses data streaming with Apache Kafka and Flink, their cloud-focused technology stack, and the use cases that drive tangible business value such as real-time logistics, robotics, SAP ERP integration, AI/ML, and more.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch.

Siemens Healthineers: Shaping the Future of Healthcare Technology

Who They Are

Siemens AG, a global powerhouse in industrial manufacturing, energy, and technology, has been a leader in innovation for over 170 years. Known for its groundbreaking contributions across sectors, Siemens combines engineering expertise with digitalization to shape industries worldwide. Within this ecosystem, Siemens Healthineers stands out as a pivotal player in healthcare technology.

Source: Siemens Healthineers

With over 71,000 employees operating in 70+ countries, Siemens Healthineers supports critical clinical decisions in healthcare. Over 90% of leading hospitals worldwide collaborate with them, and their technologies influence over 70% of critical clinical decisions.

Their Vision

Siemens Healthineers focuses on innovation through data and AI, aiming to streamline healthcare delivery. With more than 24,000 technical intellectual property rights, including 15,000 granted patents, their technological foundation enables precision medicine, enhanced diagnostics, and patient-centric solutions.

Source: Siemens Healthineers

Siemens Healthineers and Data Streaming for Healthcare and Manufacturing

Siemens is a large conglomerate. I already covered a few data streaming use cases at other Siemens divisions. For instance, the integration project from SAP ERP on-premise to Salesforce CRM in the cloud.

At the Data in Motion Tour 2024 in Frankfurt, Arash Attarzadeh (“Apache Kafka Jedi“) from Siemens Heathineers presented various very interesting success stories that leverage data streaming using Apache Kafka, Flink, Confluent, and its entire ecosystem.

Why Data Streaming with Apache Kafka and Flink?

Healthcare and manufacturing processes generate massive volumes of real-time data. Whether it’s monitoring devices on production floors, analyzing telemetry data from hospitals, or optimizing logistics, Siemens Healthineers recognizes that data streaming enables:

Real-time insights: Immediate and continuously action on events as they happen.
Improved decision-making: Faster and more accurate responses.
Cost efficiency: Reduced downtime and optimized operations.

Healthineers Data Cloud

The Siemens Healthineers Data Cloud serves as the backbone of their data strategy. Built on a robust technology stack, it facilitates real-time data ingestion, transformation, and analytics using tools like Confluent Cloud (including Apache Kafka and Flink) and Snowflake.

Source: Siemens Healthineers

This combination of leading SaaS solutions enables seamless integration of streaming data with batch processes and diverse analytics platforms.

Technology Stack: Healthineers Data Cloud

Key Components

Confluent Cloud (Apache Kafka): For real-time data ingestion, data integration and stream processing.
Snowflake: A centralized warehouse for analytics and reporting.
Matillion: Batch ETL processes for structured and semi-structured data.
IoT Data Integration: Sensors and PLCs collect data from manufacturing floors, often via MQTT.

Source: Siemens Healthineers

Many other solutions are critical for some use cases. Siemens Healthineers also uses Databricks, dbt, OPC-UA, and many other systems for the end-to-end data pipelines.

Diverse Data Ingestion

Real-Time Streaming: IoT data (sensors, PLCs) is ingested within minutes.
Batch Processing: Structured and semi-structured data from SAP systems.
Change Data Capture (CDC): Data changes in SAP sources are captured and available in under 30 minutes.

Not every data integration process is or can be real-time. Data consistency is still one of the most underrated capabilities of data streaming. Apache Kafka supports real-time, batch and request-response APIs communicating with each other in a consistent way.

Use Cases for Data Streaming at Siemens Healthineers

Siemens Healthineers described six different use cases that leverage data streaming together with various other IoT, software and cloud services:

Machine monitoring and predictive maintenance
Data integration layer for analytics
Machine and robot integration
Telemetry data processing for improved diagnostics
Real-time logistics with SAP events for better supply chain efficiency
Track and Trace Orders for improved customer satisfaction and ensured compliance

Let’s take a look at them in the following subsections.

1. Machine Monitoring and Predictive Maintenance in Manufacturing

Goal: To ensure the smooth operation of production devices through predictive maintenance.

Using data streaming, real-time IoT data from drill machines is ingested into Kafka topics, where it’s analyzed to predict maintenance needs. By using a TensorFlow machine learning model for infererence with Apache Kafka, Siemens Healthineers can:

Reduce machine downtime.
Optimize maintenance schedules.
Increase productivity in manufacturing CT scanners.

Business Value: Predictive maintenance reduces operational costs and prevents production halts, ensuring timely delivery of critical medical equipment.

2. IQ-Data Intelligence from IoT and SAP to Cloud

Goal: Develop an end-to-end data integration layer for analytics.

Data from various lifecycle phases (e.g., SAP systems, IoT interfaces via MQTT using Mosquitto, external sources) is streamed into a consistent model using stream processing with ksqlDB. The resulting data backend supports the development of MLOps architectures and enables advanced analytics.

Source: Siemens Healthineers

Business Value: Streamlined data integration accelerates the development of AI applications, helping data scientists and analysts make quicker, more informed decisions.

3. Machine Integration with SAP and KUKA Robots

Goal: Integrate machine data for analytics and real-time insights.

Data from SAP systems (such as SAP ME and SAP PCO) and machines like KUKA robots is streamed into Snowflake for analytics. MQTT brokers and Apache Kafka manage real-time data ingestion and facilitate predictive analytics.

Source: Siemens Healthineers

Business Value: Enhanced machine integration improves production quality and supports the shift toward smart manufacturing processes.

4. Digital Healthcare Service Operations using Data Streaming

Goal: Stream telemetry data from Siemens Healthineers products for analytics.

Telemetry data from hospital devices is streamed via WebSockets to Kafka and combined with ksqlDB for continuous stream processing. Insights are fed back to clients for improved diagnostics.

Business Value: By leveraging real-time device data, Siemens Healthineers enhances the reliability of its medical equipment and improves patient outcomes.

5. Real-Time Logistics with SAP Events and Confluent Cloud

Goal: Stream SAP logistics event data for real-time packaging and shipping updates.

Using Confluent Cloud, Siemens Healthineers reduces delays in packaging and shipping by enabling real-time insights into logistics processes.

Source: Siemens Healthineers

Business Value: Improved packaging planning reduces delivery times and enhances supply chain efficiency, ensuring faster deployment of medical devices.

6. Track and Trace Orders with Apache Kafka and Snowflake

Goal: Real-time order tracking using streaming data.

Data from Siemens Healthineers orders is streamed into Snowflake using Kafka for real-time monitoring. This enables detailed tracking of orders throughout the supply chain.

Business Value: Enhanced order visibility improves customer satisfaction and ensures compliance with regulatory requirements.

Real-Time Data as a Catalyst for Healthcare and Manufacturing Innovation at Siemens Healthineers

Siemens Healthineers’ innovative use of data streaming exemplifies how real-time insights can drive efficiency, reliability, and innovation in healthcare and manufacturing. By leveraging tools like Confluent (including Apache Kafka and Flink), MQTT and Snowflake and transitiing some workloads to the cloud, they’ve built a robust infrastructure to handle diverse data streams, improve decision-making, and deliver tangible business outcomes.

From predictive maintenance to enhanced supply chain visibility, the adoption of data streaming unlocks value at every stage of the production and service lifecycle. For Siemens Healthineers, these advancements translate into better patient care, streamlined operations, and a competitive edge in the dynamic healthcare industry.

To learn more about the relationship between these key technologies and their applications in different use cases, explore the articles below:

Do you have similar use cases and architectures like Siemens Healthineers to leverage data streaming with Apache Kafka and Flink in the healthcare and manufacturing sector? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post How Siemens Healthineers Leverages Data Streaming with Apache Kafka and Flink in Manufacturing and Healthcare appeared first on Kai Waehner.

Data Streaming in Healthcare and Pharma: Use Cases and Insights from Cardinal Health

Kai Waehner — Thu, 28 Nov 2024 04:12:15 +0000

Cardinal Health is at the forefront of leveraging real-time data streaming to transform healthcare and manufacturing operations. With Apache Kafka and Confluent Cloud, Cardinal Health has modernized its legacy systems, enhanced real-time analytics, and improved efficiency across its Pharma and Med divisions. This blog explores Cardinal Health’s journey, exploring how its event-driven architecture powers use cases like supply chain optimization, and medical device and equipment management. By integrating data streaming with platforms like Apigee, Dell Boomi and SAP, Cardinal Health sets a benchmark for IT modernization and innovation in the healthcare and pharma sectors.

Join the data streaming community and stay informed about new blog posts by subscribing to my newsletter and follow me on LinkedIn or X (former Twitter) to stay in touch.

Data Streaming in Healthcare and Pharma

Many healthcare companies leverage Apache Kafka today. Use cases exist in every domain across the healthcare value chain. Most companies deploy data streaming in different business domains. Use cases often overlap.

I tried to categorize a few real-world deployments into different technical scenarios and added a few real-world examples:

Data Streaming Use Cases and Architectures for Healthcare (including Slide Deck)
Legacy Modernization and Hybrid Cloud (Optum / UnitedHealth Group, Centene, Bayer)
Streaming ETL (Bayer, Babylon Health)
Real-time Analytics (Cerner, Celmatix, CDC/Centers for Disease Control and Prevention)
Machine Learning and Data Science (Recursion, Humana)
Open API and Omnichannel (Care.com, Invitae)

Apache Kafka and Flink for Stream Processing and Analytics

Apache Kafka and Apache Flink enable continuous stream processing and analytics in healthcare and pharma, providing the ability to ingest, process, and analyze massive volumes of data with low latency.

These technologies empower use cases such as real-time patient monitoring, personalized medicine, supply chain optimization, and regulatory compliance, ensuring timely insights and operational efficiency.

The State of Data Streaming in Healthcare and Pharma

Recently, I did a webinar that explores the healthcare industry’s trends and architectures for data streaming. The primary focus is the data streaming architectures and case studies.

Check out the on-demand recording:

Cardinal Health’s Data Streaming Adoption

Cardinal Health is an American multinational health care services company, and the 14th highest revenue generating company in the United States with 48,000 employees. It plays a critical role in healthcare by distributing pharmaceuticals, manufacturing medical products, and providing data solutions to healthcare facilities.

Source: Cardinal Health

With operations in over 30 countries and a presence in 90% of U.S. hospitals, Cardinal Health’s ability to innovate is directly tied to its extensive use of data. Their journey into event-driven architecture (EDA) with data streaming using Apache Kafka and Confluent Cloud has been a transformative step in achieving real-time operational excellence and enabling scalable digital transformation.

At the data streaming conference Current 2024 in Austin, Texas, Cardinal Health presented how they “devised their enablement strategy and proceeded to build an ecosystem of information and learning support to entice leaders, architects, and developers to get on the EDA-with-Confluent-Cloud-Kafka bandwagon“.

Data streaming is a journey. Cardinal Health is an excellent success story about adopting a fully managed SaaS data streaming platform in the cloud, learning from technical and organizations challenges and best practices.

The Challenge: Modernizing Data Management

Cardinal Health faced significant challenges with outdated middleware technologies and a data-at-rest approach reliant on batch processing. These limitations led to inefficiencies, such as delayed updates on order statuses and manual interventions in customer service.

Recognizing the need for change, the company sought a modern solution to transform its data management and address the growing demand for real-time insights.

The Strategy: Adopting Event-Driven Architecture

In 2022, Cardinal Health implemented a strategy to shift from traditional data processing methods to an event-driven architecture powered by Kafka via the fully-managed Confluent Cloud SaaS solution.

Source: Cardinal Health

The goal was not just to deploy a new tool, but to change how data was managed, shared, and leveraged across the organization. This involved:

Establishing an event-driven architecture (EDA) team to act as consultants for business units, guiding them on how best to use Kafka and data streaming.
Partnering with Confluent Professional Services to create a scalable and secure platform tailored to Cardinal Health’s needs.
Integrating change management strategies to ensure teams adopted EDA successfully and moved beyond familiar legacy systems.

Building Organizational Capability

Key to Cardinal Health’s success was an intentional focus on learning and performance enablement:

Comprehensive Training Programs: Cardinal Health created an ecosystem of training resources, including live events, asynchronous content, and structured learning paths. These were designed to cater to diverse roles and skill levels across the organization, from business users to developers.
Agile Learning Development: By adopting agile methodologies, the training team rapidly delivered concise, modular content that could be easily updated and scaled.
Cross-Team Engagement: Events like tech talks and interactive sessions with internal and external experts fostered a culture of continuous learning and collaboration.

Results and Business Impact

In just 17 months, Cardinal Health expanded its use of Kafka from a single topic to over 58 applications across its pharmaceutical and medical product divisions. This rapid growth reflects the organization’s commitment to scaling its digital capabilities.

Key outcomes include:

Improved Efficiency: Real-time data streaming has streamlined operations, reducing manual interventions and ensuring timely updates for both internal teams and customers.
Scalability: The cloud-native and elastic platform allows Cardinal Health to handle peak loads seamlessly, such as during high-demand periods.
Cost Savings: The adoption of EDA has optimized resource use, resulting in significant cost reductions.
Enhanced Customer Experience: Real-time transparency into order statuses and operational data has improved customer satisfaction and reduced support calls.

Lessons Learned and Best Practices

Cardinal Health’s journey provides valuable insights for organizations embarking on similar transformations:

Define a Clear Strategy: Aligning technical initiatives with business goals ensures meaningful impact and sustained investment.
Partner with Experts: Collaborating with Confluent and other professionals accelerated Cardinal Health’s deployment and knowledge-building efforts.
Invest in Training: A robust training and enablement ecosystem empowers teams to adopt new technologies effectively and confidently.
Engage Stakeholders: Frequent communication and recognition programs foster a culture of adoption and innovation.

Cardinal Health’s Pharma and Med Use Cases for Data Streaming

A wide range of use cases across the Pharma and Med divisions was implemented by Cardinal Health leveraging data sharing across the event-driven architecture (EDA). Real-time data to improve operational efficiency, enhances customer experiences, and drive business agility.

The data streaming platform CHEDA (Cardinal Health Event-Driven Architecture) integrates and shares data across business units for unified insights and operations:

Source: Cardinal Health

Cardinal Health’s event-driven architecture is central to enabling data sharing and real-time responsiveness, boosting operational efficiency, and providing a seamless customer experience while keeping the systems truly decoupled from each other. By integrating EDA across its Pharma, Med and other divisions, the organization has modernized its data infrastructure, paving the way for future innovation and growth.

The next two subsections explore the use cases and integrated platforms for Cardinal Health’s Pharma and Med divisions. Cardinal Health has seamlessly integrated Confluent with other data integration and application integration platforms, respectively API gateways, such as Apigee, Dell Boomi, Axway, and Qlik Attunity. Keep in mind that data streaming with Kafka and Flink is complementary to other integration platforms. Data streaming is its own software category and not the same as an ETL tool, Enterprise Service Bus (ESB) or Integration Platform as a Service (iPaaS).

Pharma: Distribution of Pharmaceuticals Supporting Retail Pharmacies, Hospitals and Specialty Care Providers

Cardinal Health’s Pharma division focuses on the distribution of pharmaceuticals, supporting retail pharmacies, hospitals, and specialty care providers with a robust supply chain network. This division also provides solutions like specialty drug management, contract handling, and real-time analytics to optimize operations and improve patient outcomes.

Use cases for data streaming with Cardinal Health Event-Driven Architecture in the pharma division include:

Generics and Flu Vaccination Management
Therapy and Specialty Platforms
Contract and Chargeback Management
E-commerce and Ordering
Data Analytics and Reporting
Warehouse and Logistics

Integrated platforms include JARVIS/Vantus, IBM, Rimsys RIMS, Veeva Vault QMS, Nuctrac, Palantir, PRISM, ISOTRAK, and many more.

Med: Manufacturing and Distribution of Medical Devices, Supplies and Laboratory Products for Healthcare Facilities

The Med division of Cardinal Health specializes in the manufacturing and distribution of medical devices, surgical supplies, and laboratory products for healthcare facilities. It also offers integrated solutions for inventory management, compliance, and logistics to enhance efficiency and ensure timely delivery of critical medical products.

Use cases for data streaming with Cardinal Health Event-Driven Architecture in the med division include:

Medical Device and Equipment Management
Regulatory and Compliance Solutions
Ordering and Customer Systems
Supply Chain Monitoring
Back-office Operations

Integrated platforms include Edgepark Data Warehouse, SAP ECC, SAP MDG (Master Data Governance), SAP S/4Hana, Kodiak Canada ERP, Edgepark, Vastera Tradephere, and many more.

Cardinal Health: A Data-Driven Transformation in Healthcare and Pharma

Cardinal Health shows the transformative power of data streaming and event-driven architecture (EDA) in healthcare and pharma. By modernizing legacy systems and deploying real-time data streaming across its Pharma and Med divisions, Cardinal Health has enhanced efficiency, improved supply chain management, enabled seamless data sharing across business units and applications, and delivered better customer experiences.

With use cases ranging from Generics and Flu Vaccination Management to Therapy and Specialty Platforms, and optimizing supply chain operations, Cardinal Health’s EDA platform has become a foundation for data sharing and innovation. By connecting to other API gateways and data integration platforms such as Axway or Apigee and industry-specific applications such as Edgepark, PRISM or Nuctrac, the company has built a robust ecosystem that supports real-time responsiveness and operational efficiency across business units. Cardinal Health’s journey showcases how embracing real-time data streaming can drive significant value across critical healthcare and pharma operations.

How do you leverage data streaming with Apache Kafka and Flink in the healthcare and pharma sector? How does your enterprise architecture look like? What data platforms and applications do you interconnect? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Data Streaming in Healthcare and Pharma: Use Cases and Insights from Cardinal Health appeared first on Kai Waehner.

SAP Datasphere and Apache Kafka as Data Fabric for S/4HANA ERP Integration

Kai Waehner — Wed, 03 Jan 2024 09:06:02 +0000

SAP is the leading ERP solution across industries around the world. Data integration with other data platforms, applications, databases, and APIs is one of the hardest challenges in the IT and software landscape. This blog post explores how SAP Datasphere in conjunction with the data streaming platform Apache Kafka enables a reliable, scalable and open data fabric for connecting SAP business objects of ECC and S/4HANA ERP with other real-time, batch, or request-response interfaces.

What is SAP ERP?

SAP is a German multinational software corporation that develops enterprise software to manage business operations and customer relations. SAP is best known for its ERP (Enterprise Resource Planning) software, which helps organizations integrate and streamline their business processes.

A wide range of industries and companies of all sizes use it. SAP ERP is one of the most widely used ERP solutions globally. SAP is not a single product, like many people think. Over the years, SAP has expanded its product portfolio. It includes cloud-based solutions, analytics, database management, and other enterprise software applications.

SAP ECC, S/4HANA, and more ERP Products

SAP offers a range of ERP products that cater to different business needs and industries. Some of the key SAP ERP products include:

SAP S/4HANA: SAP S/4HANA is the flagship ERP suite that represents the next generation of SAP’s ERP solutions. The product is built on the SAP HANA in-memory database and provides a simplified data model, improved user experience, and advanced analytics capabilities. It covers core business functions, such as finance, supply chain, manufacturing, procurement, and more.
SAP ERP Central Component (ECC): ECC is the predecessor to SAP S/4HANA and is still widely used by many organizations. It includes various modules, such as SAP ERP Financials, SAP ERP Human Capital Management (HCM), SAP ERP Operations, and others.
SAP Business ByDesign: This is a cloud-based ERP solution designed for small to medium-sized enterprises (SMEs). It integrates core business functions, such as financials, human resources, procurement, supply chain management, and customer relationship management (CRM).
SAP Business One: Another ERP solution targeted at small and medium-sized businesses, SAP Business One is an integrated suite that covers areas such as accounting, sales, purchasing, inventory, and production.
SAP S/4HANA Cloud: This is a cloud-based version of SAP S/4HANA, offering similar functionalities but with the advantages of cloud deployment, including scalability, accessibility, and reduced infrastructure management.
SAP Business Suite: This is a set of business applications that includes SAP ERP and other related products. It comprises different modules to address various business processes.
SAP All-in-One: This is an industry-specific version of SAP ERP designed for midsize companies. It provides pre-configured industry solutions for sectors such as manufacturing, retail, and healthcare.

This product list might be out of date when you read it. SAP continuously develops its product offerings. Products get new names from time to time, consolidate, or deprecate. In other words, SAP modernization, integration, and migration are usually an ongoing effort that never ends.

What is SAP Datasphere?

SAP Datasphere is the next generation of SAP Data Warehouse Cloud. The platform provides a comprehensive data service that enables data professionals to deliver seamless and scalable access to critical business data.

SAP Datasphere is a cloud-based product packaged within SAP’s Business Technology Platform (BTP). Datasphere brings together two previously standalone products, SAP Data Intelligence Cloud (DIC) and SAP Data Warehouse Cloud (DWC), into one cloud native, data integration, and data management platform. The solution allows SAP customers to ingest, integrate, store, and analyze core SAP ERP data, as well as to share this data with other analytical services and downstream applications.

SAP Datasphere = Cloud Data Warehouse and Analytics Platform

Datasphere is the core part of a new solution, known as Business Data Fabric, to simplify data integration and management involving SAP ERP backend data. A key focus of SAP Datasphere is business intelligence and analytics.

I see Datasphere similar to Snowflake or Databricks as a general data warehouse / data lake / lakehouse, but focusing on SAP data with deep integration into the SAP ERP ecosystem and surrounding applications.

However, the out-of-the-box availability of SAP ERP data from SAP ECC, S/4HANA, and other SAP apps enables a simple but powerful opportunity for data integration beyond the SAP landscape. No need to use legacy SAP protocols like BAPI or IDoc anymore. Instead, SAP Datasphere provides a unified way to discover, connect, and manage data across different data sources, systems, and landscapes.

Features of SAP Datasphere and Complementary Software Partnerships

The key features of SAP Datasphere include:

Data Connectivity: SAP Datasphere enables organizations to connect to and access data from various sources, whether they are on-premises or in the cloud. It supports integration with different databases, data lakes, and other data repositories.
Data Orchestration: The platform allows organizations to orchestrate data flows and processing across different data environments. This can be essential for managing complex data pipelines and ensuring data consistency and coherence.
Data Governance: SAP Datasphere includes features for data governance, providing tools for managing metadata, ensuring data quality, and enforcing data policies across the distributed landscape.
Unified Data Discovery: The platform offers a unified view of data assets, helping organizations discover and understand the available data resources across their entire landscape.
Multi-Cloud and Edge Support: SAP Datasphere works in multi-cloud and edge computing environments, providing flexibility and scalability for organizations with diverse data storage and processing needs.

This sounds like any other data management platform, doesn’t it?

But the above features are focusing mainly on SAP environments. Therefore, Datasphere has a few strategic software partnerships:

Confluent (data streaming)
Databricks (data lakehouse)
Collibra (data governance)
Data Robot (automated machine learning)

This emphasizes the strength of Datasphere around the SAP ecosystem. The other partners connect non-SAP IT infrastructure and applications with SAP environments bidirectionally.

SAP Datasphere = One-Stop-Shop for Multi-Generation SAP ERP Systems

SAP Datasphere is more than just an analytical platform for SAP ERP data.

Datasphere leverages SAP internal tooling to access data directly from SAP systems. It is a complete data integration and analytics solution optimized for collecting and preparing data from all SAP ERP systems of multiple generations. For the first time in their history, SAP is making core ERP data from numerous back-end systems available in a one-stop-shop fashion through Datasphere.

This brings us to the excellent opportunity of combining SAP business objects with Apache Kafka and the rest of the enterprise architecture.

Why Apache Kafka for SAP Integration?

Apache Kafka is a distributed streaming platform that has gained widespread popularity for its ability to handle large-scale, real-time data streaming and event processing. When it comes to SAP integration, there are several reasons organizations choose to use Apache Kafka:

Real-time Data Streaming
- Apache Kafka is designed for real-time data streaming, making it well-suited for scenarios where timely and continuous data updates are crucial. This is important in SAP environments where real-time integration is essential for various business processes.
Scalability
- Kafka is highly scalable and can handle large volumes of data and high-throughput requirements. SAP systems often handle massive amounts of data. Kafka’s scalability enables efficient management and processing of this data.
Reliability and Fault Tolerance
- Kafka is known for its reliability and fault-tolerance features. It ensures data durability and availability, which is critical for critical applications like those in SAP environments, e.g., in finance or supply chain business processes. Features like rolling upgrades allow zero downtime continuously.
Decoupling Systems
- Kafka facilitates the decoupling of systems by acting as an intermediary for data exchange. This decoupling allows SAP systems and other applications to communicate without being directly connected, leading to a more flexible and modular architecture. Kafka ensures data consistency across real-time and non-real-time systems.
Event-Driven Architecture
- Kafka supports an event-driven architecture, which aligns well with modern integration patterns. The streaming platform efficiently propagates events, such as changes in SAP data or system events. This enables a more responsive and agile IT landscape. Kafka Connect enables integration with other plain messaging platforms like IBM MQ, TIBCO EMS, or Solace.
Integration with Big Data Ecosystem
- Kafka integrates well with the broader big data ecosystem, including tools like Apache Hadoop, Apache Spark, Elasticsearch, MongoDB, and others. This can be beneficial for organizations looking to analyze and derive insights from SAP data in combination with other data sources and data sinks. Kafka is a much more flexible, scalable and reliable middleware compared to other data integration tools (including SAP middleware like SAP PI).
Message Retention
- Kafka stores messages for a configurable period, allowing systems to catch up on missed messages in case of temporary disruptions. This is particularly useful in scenarios where SAP systems may be temporarily offline, unreachable, or cannot handle the throughput. Or if the transaction cost needs to be reduced by offloading the consumption of downstream applications to a cheaper platform like Kafka. Tiered Storage for Kafka is a significant change for long-term event store for ERP information.
Support for Multiple Protocols
- The Kafka ecosystem supports various communication protocols (like Kafka, HTTP, File, WebSockets, and more), making it versatile for integration with different systems and technologies. This flexibility is crucial in heterogeneous IT environments, where SAP systems coexist with other technologies.
Open Source Community and Ecosystem
- Kafka has a vibrant open-source community and a rich ecosystem of connectors and tools. This ecosystem can simplify integration efforts by providing pre-built connectors for SAP systems and other common technologies.
Analytical and Operational Workloads
- Kafka was initially built for big data analytics use cases. However, most organizations leverage the technology for operational workloads, like orders or payments. Kafka evolved over the years and even introduced a transaction API for exactly-once semantics.

An ERP environment should be real-time, scalable, and open. SAP ERP is not just one product or technology. And organizations always combine it with other open source frameworks, proprietary standard software, and SaaS. “Building a Postmodern ERP with Apache Kafka” explores how SAP ERP and other technologies provide the most value together in a flexible, open environment. Many next-generation ERP systems use Kafka under the hood, too. Even if you don’t see it because it is a proprietary product or SaaS. But event-driven architectures are helpful for software products as they are for any other software projects.

Continuous SAP Migration and Cutover with Kafka

Integration between SAP ERP and other applications is crucial. Another kind of project is the migration and ERP modernization, e.g., from SAP ECC to S/4HANA or the migration between SAP and another software vendor.

A SAP migration project involves moving an SAP system or landscape from one environment to another. This could include moving from an on-premises environment to the cloud, upgrading to a newer version of SAP software, or consolidating multiple SAP instances. The exact steps and considerations for a SAP migration can vary based on the specific migration scenario.

Most SAP ERP migrations these days are from SAP ECC to SAP S/4Hana. These projects usually take years. Apache Kafka can provide valuable help in different SAP integration and migration scenarios.

The combination of real-time capabilities, an event storage for true decoupling and data consistency across real-time and non-real-time systems, and data integration with non-SAP systems and APIs make Kafka the perfect middleware for SAP modernization and ERP migrations.

I covered such a migration via Apache Kafka in a data warehouse modernization story where legacy and modern applications live in parallel for some months or even years until the final cutover is done.

Until the completion of the S/4Hana migration in the cloud, SAP ECC on-premise continues to exist for years. The hybrid deployment and synchronization capabilities of Kafka make it unique for SAP migration and modernization projects.

Confluent’s Fully Managed SAP Integration and Strategic Partnership

Data streaming defines a new software category. Confluent leads the data streaming industry. It provides a serverless cloud offering on all major public clouds and an offering for self-managed deployments powered by Apache Kafka and Flink. In December 2013, the research company Forrester published “The Forrester Wave: Streaming Data Platforms, Q4 2023“. Get free access to the report here. The report explores what Confluent and other vendors like AWS, Microsoft, Google, Oracle and Clouders provide.

Confluent is now available in the SAP® Store, the online marketplace for SAP and partner offerings. The data streaming platform integrates with SAP Datasphere. The combination delivers a secure, governed solution for accessing SAP data as fully managed data streams for customers.

Confluent provides businesses that use SAP solutions with a cloud-native and complete data streaming platform available everywhere it’s needed – in the cloud, across clouds, on-premises, and hybrid environments. Configured directly within SAP Datasphere, the new Confluent integration allows businesses to:

Build real-time applications at a lower cost with fully managed data streams powered by Confluent’s Kora Engine, which reduces the total cost of ownership for Kafka by up to 60%.
Move SAP data anywhere it needs to go. Merge with third-party sources in real time via many pre-built connectors, including AWS Redshift, AWS S3, Databricks, Google Cloud BigQuery, MongoDB, and Snowflake paired with a serverless offering for Apache Flink.
Maintain strict security, compliance, and governance standards with enterprise-grade data streaming security controls, and the industry’s only fully managed governance suite for Kafka.

Confluent in the SAP PartnerEdge Program

Confluent is a partner in the SAP PartnerEdge program. The SAP PartnerEdge program provides the enablement tools, benefits and support to facilitate building high-quality, innovative applications focused on specific business needs – quickly and cost-effective.

Here is an example architecture connecting SAP ERP and non-SAP applications (Flink and Snowflake in this example) with Datasphere and Confluent:

Confluent and SAP Datasphere are the perfect combination for building a data fabric for all enterprise data. Like many companies leverage Apache Kafka as data fabric for AI and Machine Learning.

Alternative Integration Options for SAP and Kafka

Is SAP Datasphere the new silver bullet for SAP ERP integration scenarios? No! As you learned in the above sections, Datasphere enables easy access to old and new SAP ERP data objects. However, Datasphere might have some drawbacks, too:

New technology: The product is only available for a few months at the time of writing this blog post in early 2024. It will mature and features will strengthen.
Heavyweight: A direct integration with a proprietary SAP API call, e.g., BAPI, IDoc or the more modern Operational Data Provisioning (ODP) might be easier to implement and more cost-efficient from a TCO perspective for some projects.
Vendor lock-in: Choosing a SAP product as middleware and/or analytics platform might not be the right strategy. Many organizations choose a best-of-breed approach for different domains and use cases instead of relying on a single vendor from a technology and licensing perspective.

One solution does not fit all integration use cases. Know the different options and make your evaluation.

Plenty of other options exist for SAP-Kafka integration. I explored tens of APIs, tools, and connectors for data integration between SAP ERP and Apache Kafka.

For instance, look at the Confluent Hub and search for SAP Kafka integration. You will find many mature, lightweight and innovative solutions from various vendors. For instance, INIT, Asapio, Advantco, KaTe, Onibex, and Qlik provide integrations via different open and proprietary SAP interfaces like ODB, OData, REST, BAPI, or iDoc.

SAP Datasphere and Kafka Connect the Entire Enterprise (and Hybrid Cloud)

It was never easier to integrate the SAP ecosystem with the rest of the IT world in an enterprise architecture. SAP Datasphere supports straightforward access to SAP S/4 HANA, SAP BW/4HANA, SAP BW, SAP ECC, and SAP HANA ERP data without the need for complex integration projects. In addition, SAP supports connectivity to Business Warehouse, SAP’s on-premise data warehouse solution.

Apache Kafka enables data consistency across SAP and non-SAP applications across the data center and public cloud. No matter if the data source or sink is real time, near-real-time, batch, file-based, or a rest-response API like HTTP/REST. The heart of the data fabric is event-based, scalable, and reliable.

Confluent is the leading vendor of data streaming technologies like Apache Kafka. The strategic partnership and deep product integration between SAP Datasphere and Confluent provides an excellent opportunity for any organization that needs to integrate SAP and the rest of the IT infrastructure.

Some people might tell you how great Kafka is for analytical use cases. But not suited for operational, critical use cases (because some folks want to pitch another product for SAP integrations). That’s not accurate. Apache Kafka supports analytical AND transactional workloads. Actually, almost all customers I work with around the world use Confluent for transactional data from the SAP ERP for orders, payments, fraud detection, and similar operational use cases.

How do you integrate with your SAP systems today? Do you already use modern technologies like Apache Kafka? What connectors or solutions do you use? Will you use SAP Datasphere in the future? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post SAP Datasphere and Apache Kafka as Data Fabric for S/4HANA ERP Integration appeared first on Kai Waehner.

Building a Postmodern ERP with Apache Kafka

Kai Waehner — Fri, 20 Nov 2020 09:59:59 +0000

Enterprise resource planning (ERP) exists for many years. It is often monolithic, complex, proprietary, batch, and not scalable. Postmodern ERP represents the next generation of ERP architectures. It is real-time, scalable, and open. A Postmodern ERP uses a combination of open source technologies and proprietary standard software. This blog post explores why and how companies, both software vendors and end-users, leverage event streaming with Apache Kafka to implement a Postmodern ERP.

What is ERP (Enterprise Ressource Planning)?

Let’s define the term “ERP” first. This is not an easy task, as ERP is used for concepts and various standard software products.

Enterprise resource planning (ERP) is the integrated management of main business processes, often in real-time and mediated by software and technology.

ERP is usually referred to as a category of business management software—typically a suite of integrated applications – that an organization can use to collect, store, manage, and interpret data from many business activities.

ERP provides an integrated and continuously updated view of core business processes using common databases. These systems track business resources – cash, raw materials, production capacity – and the status of business commitments: orders, purchase orders, and payroll. The applications that make up the system share data across various departments (manufacturing, purchasing, sales, accounting, etc.) that provide the data ERP facilitates information flow between all business functions and manages connections to outside stakeholders.

It is important to understand that ERP is not just for manufacturing and relevant across various business domains. Hence, Supply Chain Management (SCM) is orthogonal to ERP.

ERP is a Zoo of Concepts, Technologies, and Products

An ERP is a key concept and typically uses various products as part of every supply chain where tangible goods are produced. For that reason, an ERP is very complex in most cases. It usually is not just one product, but a zoo of different components and technologies:

Example: SAP ERP – More than a Single Product…

SAP is the leading ERP vendor. I explored SAP, its product portfolio, and integration options for Kafka in a separate blog post: “Kafka SAP Integration – APIs, Tools, Connector, ERP et al.”

Check that out if you want to get deeper into the complexity of a “single product and vendor”. You will be surprised how many technologies and integration options exist to integrate with SAP. SAP’s stack includes plenty of homegrown products like SAP ERP and acquisitions with their own codebase, including Ariba for supplier network, hybris for e-commerce solutions, Concur for travel & expense management, and Qualtrics for experience management. The article “The ERP is Dead. Long live the Distributed Planning System” from the SAP blog goes in a similar direction.

ERP Requirements are Changing…

This is not different for other big vendors. For instance, if you explore the Oracle website, you will also find a confusing product matrix.

That’s the status quo of most ERP vendors. However, things change due to shifting requirements: Digital Transformation, Cloud, Internet of Things (IoT), Microservices, Big Data, etc. You know what I mean… Requirements for standard software are changing massively.

Every ERP vendor (that wants to survive) is working on a Postmodern ERP these days by upgrading its existing software products or writing a completely new product – that’s often easier. Let’s explore what a Postmodern ERP is in the next section.

Introducing the Postmodern ERP

The term “Postmodern ERP” was coined by Gartner several years ago, already.

From the Gartner Glossary:

“Postmodern ERP is a technology strategy that automates and links administrative and operational business capabilities (such as finance, HR, purchasing, manufacturing, and distribution) with appropriate levels of integration that balance the benefits of vendor-delivered integration against business flexibility and agility.”

This definition shows the tight relation to other non-Core-ERP systems, the company’s whole supply chain, and partner systems.

The Architecture of a Postmodern ERP

According to Gartner’s definition of the postmodern ERP strategy, legacy, monolithic and highly customized ERP suites, in which all parts are heavily reliant on each other, should sooner or later be replaced by a mixture of both cloud-based and on-premises applications, which are more loosely coupled and can be easily exchanged if needed. Hint: This sounds a lot like Kafka, doesn’t it?

The basic idea is that there should still be a core ERP solution that would cover the most important business functions. In contrast, other functions will be covered by specialist software solutions that merely extend the core ERP.

There is, however, no golden rule as to what business functions should be part of the core ERP and what should be covered by supplementary solutions. According to Gartner, every company must define its own postmodern ERP strategy, based on its internal and external needs, operations, and processes. For example, a company may define that the core ERP solution should cover those business processes that must stay behind the firewall and choose to leave their core ERP on-premises. At the same time, another company may decide to host the core ERP solution in the cloud and move only a few ERP modules as supplementary solutions to on-premises.

Pros and Cons of a Postmodern ERP

SelectHub explores the pros and cons of a Postmodern ERP compared to legacy ERPs:

The pros are pretty obvious, and the main motivation why companies want or need to move away from their legacy ERP system. Software is eating the world. Companies (need to) become more flexible, elastic, and scalable. Applications (need to) become more personalized and context-specific—all that (need to be) in real-time. There are no ways around a Postmodern ERP and all the related supply chain processes to leverage solve these requirements.

The main benefits that companies will gain from implementing a Postmodern ERP strategy are speed and flexibility when reacting to unexpected changes in business processes or on the organizational level. With most applications having a relatively loose connection, it is fairly easy to replace or upgrade them whenever necessary. Companies can also select and combine cloud-based and on-premises solutions that are most suited for their ERP needs.

The cons are more interesting because they need to be solved to deploy a Postmodern ERP successfully. The key downside of a postmodern ERP is that it will most likely lead to an increased number of software vendors that companies will have to manage and pose additional integration challenges for central IT.

Coincidentally, I had similar discussions with customers in the past quarters regularly. More and more companies adopt Apache Kafka to solve these challenges to build a Postmodern ERP and flexible, scalable supply chain processes.

Kafka as the Foundation of a Postmodern ERP

If you follow my blog and presentations, you know that Kafka is used in all areas where an ERP is relevant, for instance, Industrial IoT (IIoT), Supply Chain Management, Edge Analytics, and many other scenarios. Check out “Kafka in Industry 4.0 and Manufacturing” to learn more details about various use cases.

Example: A Postmodern ERP built on top of Kafka

A Postmodern ERP built on top of Apache Kafka is part of this story:

This architecture shows a Postmodern ERP with various components. Note that the Core ERP is built on Apache Kafka. Many other systems and applications are integrated.

Each component of the Postmodern ERP has a different integration paradigm:

The TMS (Transportation Management System) is a legacy COTS application providing only a legacy XML-based SOAP Web Service interface. The integration is synchronous and not scalable but works for small transactional data sets.
The LMS (Labor Management System) is a legacy homegrown application. The integration is implemented via Kafka Connect and a CDC (Change-Data-Capture) connector to push changes from the relational Oracle database in real-time into Kafka.
The SRM (Supplier Relationship Management) is a modern application built on top of Kafka itself. Integration with the Core ERP is implemented with Kafka-native replication technologies like MirrorMaker 2, Confluent Replicator, or Confluent Cluster Linking to provide a scalable real-time integration.
The MES (Manufacturing Execution System) is an SAP COTS product and part of the SAP S4/Hana product portfolio. The integration options include REST APIs, the Eventing API, and Java APIs. The right choice depends on the use case. Again, read Kafka SAP Integration – APIs, Tools, Connector, ERP et al. to understand how complex the longer explanation is.
The CRM (Customer Relationship Management) is Salesforce, a SaaS cloud service, integrated via Kafka Connect and the Confluent connector.
Many more integrations to additional internal and external applications are needed in a real-world architecture.

This is a hypothetical implementation of a Postmodern ERP. However, more and more companies implement this architecture for all the discussed benefits. Unfortunately, such modern architecture also includes some challenges. Let’s explore them and discuss how to solve them with Apache Kafka and its ecosystem.

Solving the Challenges of a Postmodern ERP with Kafka

This section covers three main challenges of implementing a Postmodern ERP and how Kafka and its ecosystem help implement this architecture.

I quote the three main challenges from the blog post “Postmodern ERP: Just Another Buzzword?” and then explain how the Kafka ecosystem solved them more or less out-of-the-box.

Issue 1: More Complexity Between Systems!

“Because ERP modules and tools are built to work together, legacy systems can be a lot easier to configure than a postmodern solution composed entirely of best-of-breed solutions. Because postmodern ERP may involve different programs from different vendors, it may be a lot more challenging to integrate. For example, during the buying process, you would need to ask about compatibility with other systems to ensure that the solution that you have in mind would be sufficient.”

First of all, is your existing ERP system easy to integrate? Any ERP system older than five years uses proprietary interfaces (such as BAPI and iDoc in case of SAP) or ugly/complex SOAP web services to integrate with other systems. Even if all the software components come from one single vendor, it was built by different business units or even acquired. The codebases and interfaces speak very different languages and technologies.

So, while a Postmodern ERP requires complex integration between systems, so does any legacy ERP system! Nevertheless:

How Kafka Helps…

Kafka provides an open, scalable, elastic real-time infrastructure for implementing the middleware between your ERP and other systems. More details in the comparison between Kafka and traditional middleware such as ETL and ESB products.

Kafka Connect is a key piece of this architecture. It provides a Kafka-native integration framework.

Additionally, another key reason why Kafka makes these complex integration successful is that Kafka really decouples systems (in contrary to traditional messaging queues or synchronous SOAP/REST web services):

The heart of Kafka is real-time and event-based. Additionally, Kafka decouples producers and consumers with its storage capabilities and handles the backpressure and optionally the long-term storage of events and data. This way, batch analytics platforms, REST interfaces (e.e.g mobiles apps) with request-response, and databases can access the data, too. Learn more about “Domain-driven Design (DDD) for decoupling applications and microservices with Kafka“.

Understanding the relation between event streaming with Kafka and non-streaming APIs (usually HTTP/REST) is also very important in this discussion. Check out “Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?” for more details.

The integration capabilities and real coupling using Kafka enables the integration of the complexity between systems.

Issue 2: More Difficult Upgrades!

“This con goes hand in hand with the increased complexity between systems. Because of this increased complexity and the fact that the solution isn’t an all-in-one program, making system upgrades can be difficult. When updates occur, your IT team will need to make sure that the relationship between the disparate systems isn’t negatively affected.”

How Kafka Helps…

The issue with upgrades is solved with Kafka out-of-the-box. Remember: Kafka really decouples systems from each other due to its storage capabilities. You can upgrade one system without even informing the other systems and without downtime! Two reasons why this works so well and out-of-the-box:

Kafka is backward compatible. Even if you upgrade the server-side (Kafka brokers, ZooKeeper, Schema Registry, etc.), the other applications and interfaces continue to work without breaking changes. Server-side and client-side can be updated independently. Sometimes an older application is not updated anymore at all because it will be replaced soon. That’s totally fine. An old Kafka client can speak to a newer Kafka broker.
Kafka uses rolling upgrades. The system continues to work without any downtime. 24/7. For Mission-critical workloads like ERP or MES transactions. From the outer perspective, the upgrade will not even be recognized at all.

Let’s take a look at an example with different components of the Postmodern ERP:

In this case, we see different versions and distributions of Kafka being used:

The Tier 1 Supplier uses the fully-managed and serverless Confluent Cloud solution. It automatically upgrades to the latest Kafka release under the hood (this is never a problem due to backward compatibility). The client applications use pretty old versions of Kafka.
The Core ERP uses open-source Kafka as it is a homegrown solution, not standard software. The operations and support are handled by the company itself (pretty risky for such a critical system, but totally valid). The Kafka version is relatively new. One client application even uses a Kafka version, which is newer than the server-side, to leverage a new feature in Kafka Streams (Kafka is backward compatible in both directions, so this is not a problem).
The MES vendor uses Confluent Platform, which embeds Apache Kafka. The version is up-to-date as the vendor does regular releases and supports rolling upgrades.
Integration between the different ERP applications is implemented with Kafka-native replication tools, MirrorMaker 2, respectively Confluent Cluster Linking. As discussed in a former section, various other integration options are available, including REST, Kafka Connect, native Kafka clients in any programming languages, or any ETL or ESB tool.

Backward compatibility and rolling upgrades make updating systems easy and invisible for integrated systems. Business continuity is guaranteed out-of-the-box.

Issue 3: Lack of Access When Offline

“When you implement a cloud-based software, you need to account for the fact that you won’t be able to access it when you are offline. Many legacy ERP systems offer on-premise solutions, albeit with a high installation cost. However, this software is available offline. For cloud ERP solutions, you are reliant on the internet to access all of your data. Depending on your specific business needs, this may be a dealbreaker.”

How Kafka Helps…

Hybrid architectures are the new black. Local processing on-premise is required in most use cases. It is okay to build the next generation ERP in the cloud. But the integration between cloud and on-premise/edge is key for success. A great example is Mojix, a Kafka-native cloud platform for real-time retail & supply chain IoT processing with Confluent Cloud.

When tangible goods are produced and sold, some processing needs to happen on-premise (e.g., in a factory) or even closer to the edge (e.g., in a restaurant or retail store). No access to your data is a dealbreaker. No capability of local processing is a dealbreaker. Latency and cost for cloud-only can be another deal-breaker.

Kafka works well on-premise and at the edge. Plenty of examples exist. Including Kafka-native bi-directional real-time replication between on-premise / edge and the cloud.

I covered these topics so often already; therefore, I just share a few links to read:

I specifically recommend the latter link. It covers hybrid architectures where processing at the edge (i.e. outside the data center) is key and required even offline, like in the following example running Kafka in a factory (including the server-side):

The hybrid integration capabilities using Kafka and its ecosystem solves the issue with lack of access when offline.

Kafka and Event Streaming as Foundation for a Postmodern ERP Infrastructure

Postmodern ERP represents the next generation of ERP architectures. It is real-time, scalable, and open by using a combination of open source technologies and proprietary standard software. This blog post explored how software vendors and end-users leverage event streaming with Apache Kafka to implement a Postmodern ERP.

What are your experiences with ERP systems? Did you already implement a Postmodern ERP architecture? Which approach works best for you? What is your strategy? Let’s connect on LinkedIn and discuss it!

The post Building a Postmodern ERP with Apache Kafka appeared first on Kai Waehner.

Kafka SAP Integration – APIs, Tools, Connector, ERP et al

Kai Waehner — Tue, 25 Aug 2020 13:58:42 +0000

A question I get every week from customers across the globe: How can I integrate my SAP system with Apache Kafka? This post explores various alternatives, including connectors, 3rd party tools, custom glue code, and trade-offs between the different options.

After exploring what SAP is, I will discuss several integration options between Apache Kafka and SAP systems:

Traditional middleware (ETL/ESB)
Web services (SOAP/REST)
3rd party turnkey solutions
Kafka-native connectivity with Kafka Connect
Custom glue code using SAP SDKs

Disclaimer before you read on:

I am not an SAP expert. It is tough to stay up-to-date with the vast and complex ecosystem of SAP products, (re-)brands, versions, services, SDKs, and APIs. I am sorry if some of the below information is not 100% accurate or outdated. Always double-check on the SAP website (if the links from Google still work – I had some issues with some pages “no longer available” while researching for this blog post). If you see any inaccurate or missing information, please let me know and I will update the blog post.

What is SAP?

SAP is a German multinational software corporation that makes enterprise software to manage business operations and customer relations. In 2019, SAP had revenue of €27.553 billion, a net income of €3.387 billion, and ~100.000 employees.

It is quite interesting: Nobody asks how to integrate with IBM or Oracle. Instead, people more specifically ask how to integrate with IBM MQ, IBM DB2, IBM Mainframe (still very ambiguous), or any other of the 100s of IBM products.

For SAP, people ask: How can I integrate with SAP? Let’s clarify what SAP is before exploring integration options.

The company is primarily known for its ERP software. But if you check out the official “What is SAP?” page, you find out that SAP offers solutions across a wide range of areas:

ERP and Finance
CRM and Customer Experience
Network and Spend Management
Digital Supply Chain
HR and People Engagement
Experience Management
Business Technology Platform
Digital Transformation
Small and Midsize Enterprises
Industry Solutions

SAP’s Software Portfolio

SAP’s stack includes homegrown products like SAP ERP and acquisitions with their own codebase, including Ariba for supplier network, hybris for e-commerce solutions, Concur for travel & expense management, and Qualtrics for experience management.

Even if you talk about SAP ERP, the situation is still not that easy. Most companies still run SAP ERP Central Component (ECC, formerly called SAP R/3), SAP’s sophisticated (and aged) ERP product. ECC runs on a third-party relational database from Oracle, IBM, or Microsoft, while HANA is SAP’s in-memory database. The new ERP product is SAP S4/Hana (no, this is not just the famous in-memory database). Oh, and there is SAP S4/Hana Cloud. And before you wonder: No, this is not the same feature set as the on-premise version!

Various interfaces exist depending on your product. An interface can be an (awful) proprietary technologies like BAPI or iDoc, (okayish) standards-based web service APIs using SOAP or REST / HTTP, a (non-scalable) JDBC database connectivity, or if you are lucky even a (scalable and real-time) Event / Messaging API. The article “The ERP is Dead. Long live the Distributed Planning System” from the SAP blog describes the situation very well.

And sorry, we are still not done yet. Even if you talk about ERP systems, this can mean anything from a zoo of products or components, depending on who you are talking to:

So, before you want to discuss the integration of your SAP product with Kafka, please please please find out the product, version, and deployment infrastructure of your SAP components.

Different Integration Options between Kafka and SAP

After this introduction, you hopefully understand that there is no silver bullet for SAP integration. The following will explore different integration options between Kafka and SAP and their trade-offs. The main focus is on SAP ERP (old ECC and new S4/Hana), but the overview is more generic, including integration capabilities with other components and products.

Also, keep in mind that you typically need or want to integrate with a function or service. Direct integration with the data object does not make much sense in most cases, as you would have to re-implement the mapping and denormalization between the data objects. Especially for source integration, i.e., building pipelines from SAP to Kafka. In the case of SAP ERP, you typically integrate with RFC/BAPI/iDoc or any other web service interface for this reason.

Traditional Middleware (ETL / ESB) for SAP Integration

Integration tools exist just for the sake of integrating different sources and sinks:

Extract-Transform-Load (ETL) for batch integration, like Informatica, Talend or SAP NetWeaver Process Integration
Enterprise Service Bus (ESB) for integration via web services and messaging, like TIBCO BusinessWorks or Software AG webMethods
Integration Platform as a Service (iPaaS) for cloud-native integration, similar to ETL/ESB tools, but provided as a fully managed service, such as Boomi, Mulesoft, or SAP Cloud Integration (and some cloud-washed products from legacy middleware vendors).

Most traditional middleware products were built to integrate with complex, proprietary systems from the last 20+ years, such as IBM Mainframe, EDIFACT, and guess what ERP systems like SAP ECC. In the meantime, all of them also have a Kafka connector. There are plenty of good reasons why many companies chose Kafka as a modern integration platform instead of a legacy of traditional middleware.

Most traditional ETL and ESB tools provide SAP connectivity. SAP Cloud Platform Integration (SAP CPI) is SAP’s own “modern” middleware solution. CPI includes a Kafka adapter to send and receive Kafka messages.

Pros:

In place: Typically already in place, no new project is required.
Maturity: Built over the years (because of the complexity), running in production for a long time already
Tooling: Visual coding for the integration (required because of the complexity), directly map iDoc / BAPI / Hana / SOAP schemas to other data structures
Integration: Not just connectors to the legacy systems but also Kafka for producing and consuming messages (due to market pressure)

Cons:

Legacy: Products are as old as the source and sink systems,
Scalability: Monolithic, inflexible architecture
Tight coupling: Integration has to be developed and runs on the middleware, no real decoupling and domain-driven design DDD like in Kafka
Licensing: High-cost per server, often already planned to be replaced (e.g., you can replace 100+ IBM MQ or TIBCO EMS servers with a single Kafka cluster)
Point-to-point: No streaming architecture, most integrations are based on web services (even if the core under the hood is based on a messaging system)

TL;DR:

Traditional integration tools are mature and have great tooling, but limited scalability/flexibility and high licensing cost. Often a quick win as it is already running, and you just need to add the Kafka connector.

Custom Glue Code for Kafka Integration using SAP SDKs

Writing your custom integration between SAP systems and Kafka is a viable option. This glue code typically leverages the same SDKs as 3rd party tools use under the hood:

Legacy: SAP NetWeaver RFC SDK – a C/C++ interface for connecting to SAP systems from release R/3 4.6C up to today’s SAP S/4HANA systems.
Legacy: SAP Java Connector (SAP JCo) – the famous JCO.jar library – is a Java SDK for integration to the SAP ECC / ERP (this is just a wrapper around the C/C++ SDK).
Legacy: SAP ACO is an integrated ABAP component that is designed to consume RFC Services on remote ABAP systems.
Legacy: SAP ABAP TCP Push Channel if you are forced to use custom ABAP code and need or want to use TCP instead of the Confluent REST Proxy for HTTP communication.
Legacy: JMS Adapter to integrate via the standard messaging protocol. Great option (if you get it running and working for your use case and functions). For instance, JMS integration can be done via SAP PI.
Modern: SAP Cloud SDK allows developing applications with Java or JavaScript that communicate with SAP solutions and services such as SAP S/4 Hana Cloud, SAP SuccessFactors, and others (the term ‘Cloud’ actually means ‘Cloud-native’ in this case, i.e., this SDK also works with SAP’s on-premise products).
Modern: SAP Cloud Platform Enterprise Messaging: S4/Hana provides an asynchronous messaging interface (running on Solace on CloudFoundry under the hood). Different messaging standards are supported, including AMQP 1.0 and JMS (depending on the specific product you look at). Some examples demonstrate how to connect via the Java Client using the JMS API.
Modern: SAP ODP (Operational Data Provisioning): Technical infrastructure for operational analytics, and data extraction + replication. Some kind of CDC (Change Data Capture) with out-of-the-box support for various SAP products, including SAP BW, SAP BW/4HANA, SAP Data Services, and SAP HANA Smart Data Integration. ODP is not just for SAP interfaces but also integrates with 3rd party technologies (via a custom connector, not out-of-the-box) such as HDFS or Kafka.

Pros:

Flexibility: Custom coding allows you to implement precisely what you need.

Cons:

Maintenance: No vendor support – develop, maintain, operate, support by yourself.
Point-to-point: No streaming architecture, most integrations are based on web services (even if the core under the hood is based on a messaging system).

TL;DR:

“Build vs. Buy” always has trade-offs. I have only seen custom glue code for SAP integration in the field if no solution from a vendor was available and affordable. SAP Cloud Platform Enterprise Messaging is a possible integration pattern for Kafka, but it also adds yet another messaging layer to the architecture.

SOAP / REST Web Services for SAP Integration

The last 15 years brought us web services for building a Service-oriented Architecture (SOA) to integrate applications. A web service typically uses SOAP or REST / HTTP as technology. I will not start yet another FUD war here. Both have their use cases and trade-offs.

Pros:

Standards-based: Different SDKs, products, and services talk the same language (at least in theory; true for HTTP, not so true for SOAP); most middleware tools have proper support for building HTTP services.
Simplicity (HTTP): Well-understood, supported by most programming languages and APIs, established for many use cases – middleware is just yet another one.

Cons:

Point-to-point: No streaming architecture, most integrations are based on web services (even if the core under the hood is based on a messaging system).
Tight coupling: Integration has to be developed and runs on the middleware, no real decoupling, and domain-driven design DDD like in Kafka.
Complexity (SOAP): SOAP/WSDL is just the tip of the iceberg! Check out the list of WS-* standards to understand why this is often called the “WS star hell”. The AXIS framework (Apache extensible Interaction System) is one example of SAP’s SOAP integration using an open framework. While the Apache project was last updated in 2006, SAP still recommends using this interface in 2020.
Missing features (REST / HTTP): Representational state transfer (REST) is a concept, but most people mean synchronous HTTP communication. Most middleware tools (and most other applications) only just a small fraction of the full standard. HTTP is an excellent standard, but all the tooling and features need to be built on top of it.
Only indirect support: Several SAP products do not provide open interfaces. While using SOAP or HTTP under the hood, you are forced to use the licensed tooling to create web services. For instance, SAP Business Connector (restricted license version of webMethods Integration Server), SAP NetWeaver Process Integration (PI), SAP Process Orchestration (PO), Cloud Platform Integration (CPI), or SAP Cloud Integration.

TL;DR:

SOAP and REST web services work well for point-to-point communication and have good tool support. Both have their trade-offs, make sure to choose the right one – if your SAP product provides both interfaces. Unfortunately, you will often not have a choice. Even worse: You cannot use any tool but are forced to use the right licensed SAP tool or wrapper interface. Large scale, high volume, and continuous processing of data are not ideal requirements for these (legacy) integration products.

For direct HTTP(S) communication with Kafka, Confluent Rest Proxy is an excellent option for producing, consuming, and administrating from any Kafka client (including custom SAP applications). For instance, SAP Cloud Platform Integration (CPI) can use this integration pattern to integrate between SAP and Kafka.

SAP-specific 3rd Party Tools for Kafka

SAP integration is a huge market globally. SAP provides several tools for data integration (some legacy, some modern – honestly, I don’t have a full overview of their complex product and API portfolio). Additionally, plenty of other software vendors have built specific integration software for SAP systems.

A few examples I have seen in the field recently:

Examples:

SAP itself has various integration tools, though many are already deprecated. One example of using an SAP solution is the SAP OpenHub Service. It allows you to distribute data from a BW system to non-SAP data marts, analytical applications, and other applications. SAP Data Hub (rebranded to SAP Data Intelligence – always search for both terms) can also connect to Kafka as demonstrated in this example using ODP under the hood. Limitations of SAP Data Hub include that you have to operate a licensed BI system, that dedicated “OpenHub” components have to be implemented in BW for each scenario, that a rather batch-oriented, request-based processing takes place and that this is a classic point-to-point scenario. Furthermore, the additional license fees have to be taken into account.
There are more SAP products in the confusing SAP universe that are able to integrate with Kafka. Honestly, the list of deprecated products in conjunction with new products and rebranding of old products is a total mess. Even the SAP websites struggle with this a lot!
ASAPIO: Their Cloud Integrator is designed for SAP ERP and SAP S/4HANA. It enables the replication of required data between system environments based on SAP NetWeaver technology.
Advantco: The Kafka Adapter is fully integrated with the SAP Adapter Framework, which means the Kafka connections can be configured, managed, and monitored using the standard SAP PI/PO & CPI tools. Includes support for Confluent Schema Registry (+ Avro / Protobuf).
workato: The company provides SAP OData Integration to Kafka and various pre-built recipe templates.
INIT Software has implemented a Kafka Connect ODP connector called i-OhJa. The Kafka-native nature provides benefits such as high performance, high scalability, and exactly-once semantics.
KaTe Kafka Adapter to connect SAP PO with Kafka in both directions.

These are just a few examples. Many more exist for on-premise, cloud, and hybrid integration with different SAP products and interfaces.

Some of these tools are natively integrated into SAP’s integration tools instead of being completely independent runtimes. This can be good or bad. An advantage of this approach is that you can leverage the SAP-native features for complex iDoc / BAPI mappings and the integrated 3rd party connector for Kafka communication.

Pros:

Turnkey solution: Built for SAP integration, often combined with other additional helpful features beyond just doing the connectivity, more lightweight than traditional generic middleware.
Focus: Many 3rd party solutions focus on a few specific use cases and/or products and technologies. It is much harder to integrate with “SAP in general” than focusing on a particular niche, e.g., Human Resources processes and related HTTP interfaces.
Maturity: Built over the years
Tooling: Visual coding for the integration (required because of the complexity), directly map iDoc / BAPI / Hana / SOAP schemas to other data structures
Integration: Not just connectors to the legacy systems but also modern technologies such as Kafka

Cons:

Scalability: Often monolithic, inflexible architecture (but focusing on SAP integration only, therefore often “okayish”)
Tight coupling: Integration has to be developed and runs on the tool, but separated from other middleware, thus decoupling and domain-driven design DDD in conjunction with Kafka
Licensing: Moderate cost per server (typically cheaper than the traditional generic middleware)
Point-to-point: No streaming architecture, most integrations are based on web services (even if the core under the hood is based on a messaging system)

TL;DR:

A turnkey solution is an excellent choice in many scenarios. I see this pattern of combining Kafka with a dedicated 3rd party solution for SAP integration very often. I like it as the architecture is still decoupled, but no vast efforts required for doing a (complex) SAP integration. And there is still hope that even SAP themselves releases a nice Kafka-native integration platform.

Kafka-native SAP Integration with Kafka Connect

Kafka Connect, an open-source component of Apache Kafka, is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems.

Kafka Connect connectors are available for SAP ERP databases:

Confluent Hana connector and SAP Hana connector for S4/Hana
Confluent JDBC connector for R/3 / ECC to integrate with Oracle / IBM DB2 / MS SQL Server.

Pros:

Kafka-native: Kafka under the hood, providing real-time processing for high volumes of data with high scalability and reliability.
Simplicity: Just one infrastructure for messaging and data integration, much easier to develop, test, operate, scale, and license than using different frameworks or products (e.g., Kafka for messaging plus an ESB for data integration).
Real decoupling: Kafka’s architecture uses smart endpoints and dumb pipes by design, one of the key design principles of microservices. Not just for the applications, but also for the integration components. Leverage all the benefits of a domain-driven architecture for your Kafka-native middleware.
Custom connectors: Kafka Connect provides an open template. If no connector is available, you (or your favorite system integrator or Kafka-vendor) can build an SAP-specific connector once, and you can roll it out everywhere.

Cons:

Only database connectors: No connectors available beyond the native JDBC database integration are available at the time of writing this.
Anti-pattern of direct database access: In most cases, you want or need to integrate with a function or service, not with the data objects. In most cases, you don’t even get direct access from the database admin anyway.
Efforts: Build your own SAP-native (i.e., non-JDBC) connector or ask (and pay) your favorite SI or Kafka vendor.

UPDATE January 2021: A Kafka-native integration is available with INIT’s ODP connector (as discussed in section “3rd party tools”). It eliminates the above cons and might be a great 3rd party option for some use cases.

TL;DR:

Kafka Connect is a great framework and used in most Kafka architectures for various good reasons. For SAP integration, the situation is different because no connectors are available (beyond direct database access). It took 3rd party vendors many years to implement RFC/BAPI/iDoc integration with their tools. Such an implementation will probably not happen again for Kafka because it is very complex, and these proprietary legacy interfaces are dying anyway. The situation is different for modern SAP interfaces: Some 3rd party providers leverage Kafka Connect for their product. For instance, INIT Software’s Kafka Connect ODP connector.

A Kafka Connect connector for SAP Cloud Platform Enterprise Messaging using its Java client would be a feasible and best option. I assume we will see such a connector on the market sooner or later.

Embedded Kafka in SAP Products

We have seen various integration options between SAP and Kafka. Unfortunately, all of them are based on the principle of “data at rest” in contrary to Kafka processing “data in motion”. The closest fit until here is the integration via the SAP Cloud Platform Enterprise Messaging because you can at least leverage an asynchronous messaging API.

The real added value comes when Kafka is leveraged not just for real-time messaging but for event streaming. Kafka provides a combination of messaging, data integration, data processing, and real decoupling using its distributed storage infrastructure.

Native Event Streaming with Kafka in SAP Products

Interestingly, some of SAPs acquisitions leverage Kafka under the hood for event streaming. Two public examples:

SAP Concur: Wanny Morellato, a director of engineering, lead the effort of refactoring Concur monolithic travel and expense backend into an event-driven distributed system of microservices using Kafka: Breaking Down a SQL Monolith with Change Tracking, Kafka and KStreams/KSQL
SAP Qualtrics: A particular challenge amidst their growth was implementing standardizations around different types of data. They were working out how to blend numerical data – about companies’ subscriptions, or sales, or content engagement – with experience data collected from surveys. To do that, they utilize technologies like Kafka and Spark and really fast data stores to create a real-time engine to transform data into actionable observational reports.

Obviously, people are also waiting for the Kafka-native SAP S4/Hana interface so that they can leverage events in real-time for processing data in motion and correlate real-time and historical data together. A native Kafka integration with SAP S4/Hana should be the next step for SAP! HERE Technologies provides a great example of how to provide a Kafka-native interface (and an alternative REST option) for their product.

Having said this, current SAP blogs (from mid of 2019) still talk about replacing the 20+ years old BAPI and RFC integration style with SOAP and OData (Open Data Protocol, an open protocol that allows the creation and consumption of queryable and interoperable REST) APIs in SAP S/4HANA Public Cloud.

My personal feeling and hope are that a native Kafka interface is just a matter of time as the market demand is everywhere across the globe (I talk to many customers in EMEA, US, and APAC), and even several non-S4/Hana SAP products use Kafka internally.

I have also seen a two-fold approach from some other vendors: Provide a Kafka-native interface to the outside world first (in SAP terms you could e.g. provide a Kafka-interface on top of BAPIs. At a later point, reengineer the internal architecture away from the non-scalable technology to Kafka under the hood (in SAP terms you could replace RFC / BAPI functions with a more scalable Kafka-native version – even using the same API interface and message structure).

Native Streaming Replication between Products, Departments, and Companies

Native Kafka integration does not just happen within a product or company. A widespread trend I see on the market in different industries is to integrate with partners via Kafka-native streaming replication instead of REST APIs:

Think about it: If you use Kafka in different application infrastructures, but the interface is just a web service or database, then all the benefits might go away because scalability and/or real-time data correlation capabilities go away.

More and more vendors of standard software use Kafka as the backbone of their internal architecture. If the interface between products (imaginatively say SAP’s ERP system, SAP’s MES system, and the SCM application of an OEM customer) is just a SOAP or REST API, then this does not scale and perform well for the requirements of use cases in the digital transformation and Industry 4.0.

Hence, more and more companies leverage Kafka not just internally but also between departments or even different companies. Streaming replication between companies is possible with tools like MirrorMaker 2.0 or Confluent Replicator. Or you use the much simpler Cluster Linking from Confluent, which enables integration between hybrid, multi-cloud, or 3rd party integration using the Kafka protocol under the hood.

SAP + Apache Kafka = The Future for ERP et al

There is huge demand across the globe to integrate SAP applications with Apache Kafka for real-time messaging, data integration, and data processing at scale. The demand is true for SAP ERP (ECC and S4/Hana) but also for most other products from the vast SAP portfolio.

Kafka is deployed in many modern and innovative use cases for supply chain management, manufacturing, customer experience, and so on. Edge, hybrid and multi-cloud Kafka deployments is the norm, not an exception.

Kafka integrates with SAP systems well. Different integration options are available via SAP SDKs and 3rd party products for proprietary interfaces, open standards, and modern messaging and event streaming concepts. Choose the right option for your need and get started with Kafka SAP integration…

If you want to build a modernize your existing ERP infrastructure (no matter if SAP or any other vendor), also check out the article “Building a Postmodern ERP with Apache Kafka“.

What are your experiences with SAP Kafka integration? How did it work? What challenges did you face and how did you or do you plan to solve this? What is your strategy? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Kafka SAP Integration – APIs, Tools, Connector, ERP et al appeared first on Kai Waehner.