Oracle Archives - Kai Waehner https://www.kai-waehner.de/blog/category/oracle/ Technology Evangelist - Big Data Analytics - Middleware - Apache Kafka Thu, 21 Jul 2022 09:47:39 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 https://www.kai-waehner.de/wp-content/uploads/2020/01/cropped-favicon-32x32.png Oracle Archives - Kai Waehner https://www.kai-waehner.de/blog/category/oracle/ 32 32 Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure https://www.kai-waehner.de/blog/2022/07/15/data-warehouse-data-lake-modernization-from-legacy-on-premise-to-cloud-native-saas-with-data-streaming/ Fri, 15 Jul 2022 06:03:28 +0000 https://www.kai-waehner.de/?p=4649 The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let's explore this dilemma in a blog series. This is part 3: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure.

The post Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure appeared first on Kai Waehner.

]]>
The concepts and architectures of a data warehouse, a data lake, and data streaming are complementary to solving business problems. Storing data at rest for reporting and analytics requires different capabilities and SLAs than continuously processing data in motion for real-time workloads. Many open-source frameworks, commercial products, and SaaS cloud services exist. Unfortunately, the underlying technologies are often misunderstood, overused for monolithic and inflexible architectures, and pitched for wrong use cases by vendors. Let’s explore this dilemma in a blog series. Learn how to build a modern data stack with cloud-native technologies. This is part 3: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure.

Data Warehouse and Data Lake Modernization with Data Streaming

Blog Series: Data Warehouse vs. Data Lake vs. Data Streaming

This blog series explores concepts, features, and trade-offs of a modern data stack using data warehouse, data lake, and data streaming together:

  1. Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
  2. Data Streaming for Data Ingestion into the Data Warehouse and Data Lake
  3. THIS POST: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure
  4. Case Studies: Cloud-native Data Streaming for Data Warehouse Modernization
  5. Lessons Learned from Building a Cloud-Native Data Warehouse

Stay tuned for a dedicated blog post for each topic as part of this blog series. I will link the blogs here as soon as they are available (in the next few weeks). Subscribe to my newsletter to get an email after each publication (no spam or ads).

Data warehouse modernization: From legacy on-premise to cloud-native infrastructure

Many people talk about data warehouse modernization when they move to a cloud-native data warehouse. Though, what does data warehouse modernization mean? Why do people move away from their on-premise data warehouse? What are the benefits?

Many projects I have seen in the wild went through the following steps:

  1. Select a cloud-native data warehouse
  2. Get data into the new data warehouse
  3. [Optional] Migrate from the old to the new data warehouse

Let’s explore these steps in more detail and understand the technology and architecture options.

1. Selection of a cloud-native data warehouse

Many years ago, cloud computing was a game-changer for operating infrastructure. AWS innovated by providing not just EC2 virtual machines but also storage, like AWS S3 as a service.

Cloud-native data warehouse offerings are built on the same fundamental change. Cloud providers brought their analytics cloud services, such as AWS Redshift, Azure Synapse, or GCP BigQuery. Independent vendors rolled out a cloud-native data warehouse or data lake SaaS such as Snowflake, Databricks, and many more. While each solution has its trade-offs, a few general characteristics are true for most of them:

  • Cloud-native: A modern data warehouse is elastic, scales for small up to extreme workloads, and automates most business processes around development, operations, and monitoring.
  • Fully managed: The vendor takes over the operations burden. This includes scaling, failover handling, upgrades, and performance tuning. Some offerings are truly serverless, while many services require capacity planning and manual or automated scaling up and down.
  • Consumption-based pricing: Pay-as-you-go enables getting started in minutes and scaling costs with broader software usage. Most enterprise deployments allow commitment to getting price discounts.
  • Data sharing: Replicating data sets across regions and environments is a common feature to offer data locality, privacy, lower latency, and regulatory compliance.
  • Multi-cloud and hybrid deployments: While cloud providers usually only offer the 1st party service on their cloud infrastructure, 3rd party vendors provide a multi-cloud strategy. Some vendors even offer hybrid environments, including on-premise and edge deployments.

Plenty of comparisons exist in the community, plus analyst research from Gartner, Forrester, et al. Looking at vendor information and trying out the various cloud products using free credits is crucial, too. Finding the right cloud-native data warehouse is its own challenge and not in this blog post.

2. Data streaming as (near) real-time and hybrid integration layer

Data ingestion into data warehouses and data lakes was already covered in part two of this blog series. The more real-time, the better for most business applications. Near real-time ingestion is possible with specific tools (like AWS Kinesis or Kafka) or as part of the data fabric (the streaming data hub where a tool like Kafka plays a bigger role than just data ingestion).

The often more challenging part is data integration. Most data warehouse and data lake pipelines require ETL to ingest data. As the next-generation analytics platform is crucial for making the right business decisions, the data ingestion and integration platform must also be cloud-native! Tools like Kafka provide the reliable and scalable integration layer to get all required data into the data warehouse.

Integration of legacy on-premise data into the cloud-native data warehouse

In a greenfield project, the project team is lucky. Data sources run in the same cloud, using open and modern APIs, and scale as well as the cloud-native data warehouse.

Unfortunately, the reality is brownfield almost always, even if all applications run in public cloud infrastructure. Therefore, the integration and replication of data from legacy and on-premise applications is a general requirement.

Data is typically consumed from legacy databases, data warehouses, applications, and other proprietary platforms. The replication into the cloud data warehouse usually needs to be near real-time and reliable.

A data streaming platform like Kafka is perfect for replicating data across data centers, regions, and environments because of its elastic scalability and true decoupling capabilities. Kafka enables connectivity to modern AND legacy systems via connectors, proprietary APIs, programming languages, or open REST interfaces:

Accelerate modernization from on-prem to AWS with Kafka and Confluent Cluster Linking

A common scenario in such a brownfield project is the clear separation of concerns and true decoupling between legacy on-premise and modern cloud workloads. Here, Kafka is deployed on-premise to connect to legacy applications.

Tools like MirrorMaker, Replicator, or Confluent Cluster Linking replicate events in real-time into the Kafka cluster in the cloud. The Kafka brokers provide access to the incoming events. Downstream consumers read the data into the data sinks at their own pace; real-time, near real-time, batch, or request-response via API. Streaming ETL is possible at any site – where it makes the most sense from a business or security perspective and is the most cost-efficient.

Example: Confluent Cloud + Snowflake = Cloud-native Data Warehouse Modernization

Here is a concrete example of data warehouse modernization using cloud-native data streaming and data warehousing with Confluent Cloud and Snowflake:

Cloud-native Data Warehouse Modernization with Apache Kafka Confluent Snowflake

For modernizing the data warehouse, data is ingested from various legacy and modern data sources using different communication paradigms, APIs, and integration strategies. The data is transmitted in motion from data sources via Kafka (and optional preprocessing) into the Snowflake data warehouse. The whole end-to-end pipeline is scalable, reliable, and fully managed, including the connectivity and ingestion between the Kafka and Snowflake clusters.

However, there is more to the integration and ingestion layer: The data streaming platform stores the data for true decoupling and slow downstream applications; not every consumer is or can be real-time. Most enterprise architectures do not ingest data into a single data warehouse or data lake or lakehouse. The reality is that different downstream applications need access to the same information; even though vendors of data warehouses and data lakes tell you differently, of course 🙂

By consuming events from the streaming data hub, each application domain decides by itself if it

  • processes events within Kafka with stream processing tools like Kafka Streams or ksqlDB
  • builds own downstream applications with its code and technologies (like Java, .NET, Golang, Python)
  • integrates with 3rd party business applications like Salesforce or SAP
  • ingests the raw or preprocessed and curated data from Kafka into the sink system (like a data warehouse or data lake)

3. Data warehouse modernization and migration from legacy to cloud-native

An often misunderstood concept is the buzz around data warehouse modernization: Companies rarely take the data of the existing on-premise data warehouse or data lake, write a few ETL jobs, and put the data into a cloud-native data warehouse for the sake of doing it.

If you think about a one-time lift-and-shift from an on-premise data warehouse to the cloud, then a traditional ETL tool or a replication cloud service might be the easiest. However, usually, data warehouse modernization is more than that! 

What is data warehouse modernization?

A data warehouse modernization can mean many things, including replacing and migrating the existing data warehouse, building a new cloud-native data warehouse from scratch, or optimizing a legacy ETL pipeline of a cloud-native data warehouse.

In all these cases, data warehouse modernization requires business justification, for instance:

  • Application issues in the legacy data warehouse, such as too slow data processing with legacy batch workloads, result in wrong or conflicting information for the business people.
  • Scalability issues in the on-premise data warehouse as the data volume grows too much.
  • Cost issues because the legacy data warehouse does not offer reasonable pricing with pay-as-you-go models.
  • Connectivity issues as legacy data warehouses were not built with an open API and data sharing in mind. Cloud-native data warehouses run on cost-efficient and scalable object storage, separate storage from computing, and allow data consumption and sharing. (but keep in mind that Reverse ETL is often an anti-pattern!)
  • A strategic move to the cloud with all infrastructure. The analytics platform is no exception if all old and new applications go to the cloud.

Cloud-native applications usually come with innovation, i.e., new business processes, data formats, and data-driven decision-making. From a data warehouse perspective, the best modernization is to start from scratch. Consume data directly from the existing data sources, ETL it, and do business intelligence on top of the new data structures.

I have seen many more projects where customers use change data capture (CDC) from Oracle databases (i.e., the leading core system) instead of trying to replicate data from the legacy data warehouse (i.e., the analytics platform) as scalability, cost, and later shutdown of legacy infrastructure benefits from this approach.

Data warehouse migration: Continuous vs. cut-over

The project is usually a cut-over when you need to do a real modernization (i.e., migration) from a legacy data warehouse to a cloud-native one. This way, the first project phase integrates the legacy data sources with the new data warehouse. The old and new data warehouse platforms operate in parallel, so that old and new business processes go on. After some time (months or years later), when the business is ready to move, the old data warehouse will be shut down after legacy applications are either migrated to the new data warehouse or replaced with new applications:

Data Warehouse Offloading Integration and Replacement with Data Streaming 

My article “Mainframe Integration, Offloading and Replacement with Apache Kafka” illustrates this offloading and long-term migration process. Just scroll to the section “Mainframe Offloading and Replacement in the Next 5 Years” in that post and replace the term ‘mainframe’ with ‘legacy data warehouse’ in your mind.

A migration and cut-over is its project and can include the legacy data warehouse; or not. Data lake modernization (e.g., from a self- or partially managed Cloudera cluster running on-premise in the data center to a fully managed Databricks or Snowflake cloud infrastructure) follows the same principles. And mixing the data warehouse (reporting) and data lake (big data analytics) into a single infrastructure does not change this either.

Data warehouse modernization is NOT a big bang and NOT a single tool approach!

Most data warehouse modernization projects are ongoing efforts over a long period. You must select a cloud-native data warehouse, get data into the new data warehouse from various sources, and optionally migrate away from legacy on-premise infrastructure.

Data streaming for data ingestion, business applications, or data sharing in real-time should always be a separate component in the enterprise architecture. It has very different requirements regarding SLAs, uptime, through, latency, etc. Putting all real-time and analytical workloads into the same cluster makes little sense from a cost, risk, or business value perspective. The idea of a modern data flow and building a data mesh is the separation of concerns with domain-driven design and focusing on data products (using different, independent APIs, technologies, and cloud services).

For more details, browse other posts of this blog series:

  1. Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
  2. Data Streaming for Data Ingestion into the Data Warehouse and Data Lake
  3. THIS POST: Data Warehouse Modernization: From Legacy On-Premise to Cloud-Native Infrastructure
  4. Case Studies: Cloud-native Data Streaming for Data Warehouse Modernization
  5. Lessons Learned from Building a Cloud-Native Data Warehouse

What cloud-native data warehouse(s) do you use? How does data streaming fit into your journey? Did you integrate or replace your legacy on-premise data warehouse(s); or start from greenfield in the cloud? Let’s connect on LinkedIn and discuss it! Stay informed about new blog posts by subscribing to my newsletter.

The post Data Warehouse and Data Lake Modernization: From Legacy On-Premise to Cloud-Native Infrastructure appeared first on Kai Waehner.

]]>
Building a Postmodern ERP with Apache Kafka https://www.kai-waehner.de/blog/2020/11/20/postmodern-erp-mes-scm-with-apache-kafka-event-streaming-edge-hybrid-cloud/ Fri, 20 Nov 2020 09:59:59 +0000 https://www.kai-waehner.de/?p=2847 Postmodern ERP represents the next generation of ERP architectures. It is real-time, scalable, and open by using a combination of open source technologies and proprietary standard software. This blog post explores why and how companies, both software vendors and end-users, leverage event streaming with Apache Kafka to implement a Postmodern ERP.

The post Building a Postmodern ERP with Apache Kafka appeared first on Kai Waehner.

]]>
Enterprise resource planning (ERP) exists for many years. It is often monolithic, complex, proprietary, batch, and not scalable. Postmodern ERP represents the next generation of ERP architectures. It is real-time, scalable, and open. A Postmodern ERP uses a combination of open source technologies and proprietary standard software. This blog post explores why and how companies, both software vendors and end-users, leverage event streaming with Apache Kafka to implement a Postmodern ERP.

Postmodern ERP with Apache Kafka

What is ERP (Enterprise Ressource Planning)?

Let’s define the term “ERP” first. This is not an easy task, as ERP is used for concepts and various standard software products.

Enterprise resource planning (ERP) is the integrated management of main business processes, often in real-time and mediated by software and technology.

ERP is usually referred to as a category of business management software—typically a suite of integrated applications – that an organization can use to collect, store, manage, and interpret data from many business activities.

ERP provides an integrated and continuously updated view of core business processes using common databases. These systems track business resources – cash, raw materials, production capacity – and the status of business commitments: orders, purchase orders, and payroll. The applications that make up the system share data across various departments (manufacturing, purchasing, sales, accounting, etc.) that provide the data  ERP facilitates information flow between all business functions and manages connections to outside stakeholders.

It is important to understand that ERP is not just for manufacturing and relevant across various business domains. Hence, Supply Chain Management (SCM) is orthogonal to ERP.

ERP is a Zoo of Concepts, Technologies, and Products

An ERP is a key concept and typically uses various products as part of every supply chain where tangible goods are produced. For that reason, an ERP is very complex in most cases. It usually is not just one product, but a zoo of different components and technologies:

SAP ERP System - Zoo of Products including SCM MES CRM PLM WMS LMS

 

Example: SAP ERP – More than a Single Product…

SAP is the leading ERP vendor. I explored SAP, its product portfolio, and integration options for Kafka in a separate blog post: “Kafka SAP Integration – APIs, Tools, Connector, ERP et al.

Check that out if you want to get deeper into the complexity of a “single product and vendor”. You will be surprised how many technologies and integration options exist to integrate with SAP. SAP’s stack includes plenty of homegrown products like SAP ERP and acquisitions with their own codebase, including Ariba for supplier network, hybris for e-commerce solutions, Concur for travel & expense management, and Qualtrics for experience management. The article “The ERP is Dead. Long live the Distributed Planning System” from the SAP blog goes in a similar direction.

ERP Requirements are Changing…

This is not different for other big vendors. For instance, if you explore the Oracle website, you will also find a confusing product matrix. 🙂

That’s the status quo of most ERP vendors. However, things change due to shifting requirements: Digital Transformation, Cloud, Internet of Things (IoT), Microservices, Big Data, etc. You know what I mean… Requirements for standard software are changing massively.

Every ERP vendor (that wants to survive) is working on a Postmodern ERP these days by upgrading its existing software products or writing a completely new product – that’s often easier. Let’s explore what a Postmodern ERP is in the next section.

Introducing the Postmodern ERP

The term “Postmodern ERP” was coined by Gartner several years ago, already.

From the Gartner Glossary:

Postmodern ERP is a technology strategy that automates and links administrative and operational business capabilities (such as finance, HR, purchasing, manufacturing, and distribution) with appropriate levels of integration that balance the benefits of vendor-delivered integration against business flexibility and agility.”

This definition shows the tight relation to other non-Core-ERP systems, the company’s whole supply chain, and partner systems.

The Architecture of a Postmodern ERP

According to Gartner’s definition of the postmodern ERP strategy, legacy, monolithic and highly customized ERP suites, in which all parts are heavily reliant on each other, should sooner or later be replaced by a mixture of both cloud-based and on-premises applications, which are more loosely coupled and can be easily exchanged if needed. Hint: This sounds a lot like Kafka, doesn’t it?

The basic idea is that there should still be a core ERP solution that would cover the most important business functions. In contrast, other functions will be covered by specialist software solutions that merely extend the core ERP. 

There is, however, no golden rule as to what business functions should be part of the core ERP and what should be covered by supplementary solutions. According to Gartner, every company must define its own postmodern ERP strategy, based on its internal and external needs, operations, and processes. For example, a company may define that the core ERP solution should cover those business processes that must stay behind the firewall and choose to leave their core ERP on-premises. At the same time, another company may decide to host the core ERP solution in the cloud and move only a few ERP modules as supplementary solutions to on-premises.

Pros and Cons of a Postmodern ERP

SelectHub explores the pros and cons of a Postmodern ERP compared to legacy ERPs:

Pros and Cons of a Postmodern ERP

The pros are pretty obvious, and the main motivation why companies want or need to move away from their legacy ERP system. Software is eating the world. Companies (need to) become more flexible, elastic, and scalable. Applications (need to) become more personalized and context-specific—all that (need to be) in real-time. There are no ways around a Postmodern ERP and all the related supply chain processes to leverage solve these requirements.

The main benefits that companies will gain from implementing a Postmodern ERP strategy are speed and flexibility when reacting to unexpected changes in business processes or on the organizational level. With most applications having a relatively loose connection, it is fairly easy to replace or upgrade them whenever necessary. Companies can also select and combine cloud-based and on-premises solutions that are most suited for their ERP needs.

The cons are more interesting because they need to be solved to deploy a Postmodern ERP successfully. The key downside of a postmodern ERP is that it will most likely lead to an increased number of software vendors that companies will have to manage and pose additional integration challenges for central IT.

Coincidentally, I had similar discussions with customers in the past quarters regularly. More and more companies adopt Apache Kafka to solve these challenges to build a Postmodern ERP and flexible, scalable supply chain processes.

Kafka as the Foundation of a Postmodern ERP

If you follow my blog and presentations, you know that Kafka is used in all areas where an ERP is relevant, for instance, Industrial IoT (IIoT), Supply Chain Management, Edge Analytics, and many other scenarios. Check out “Kafka in Industry 4.0 and Manufacturing” to learn more details about various use cases.

Example: A Postmodern ERP built on top of Kafka

A Postmodern ERP built on top of Apache Kafka is part of this story:

Postmodern ERP with Apache Kafka SAP S4 Hana Oracle XML Web Services MES

This architecture shows a Postmodern ERP with various components. Note that the Core ERP is built on Apache Kafka. Many other systems and applications are integrated.

Each component of the Postmodern ERP has a different integration paradigm:

  • The TMS (Transportation Management System) is a legacy COTS application providing only a legacy XML-based SOAP Web Service interface. The integration is synchronous and not scalable but works for small transactional data sets.
  • The LMS (Labor Management System) is a legacy homegrown application. The integration is implemented via Kafka Connect and a CDC (Change-Data-Capture) connector to push changes from the relational Oracle database in real-time into Kafka.
  • The SRM (Supplier Relationship Management) is a modern application built on top of Kafka itself. Integration with the Core ERP is implemented with Kafka-native replication technologies like MirrorMaker 2, Confluent Replicator, or Confluent Cluster Linking to provide a scalable real-time integration.
  • The MES (Manufacturing Execution System) is an SAP COTS product and part of the SAP S4/Hana product portfolio. The integration options include REST APIs, the Eventing API, and Java APIs. The right choice depends on the use case. Again, read Kafka SAP Integration – APIs, Tools, Connector, ERP et al. to understand how complex the longer explanation is.
  • The CRM (Customer Relationship Management) is Salesforce, a SaaS cloud service, integrated via Kafka Connect and the Confluent connector.
  • Many more integrations to additional internal and external applications are needed in a real-world architecture.

This is a hypothetical implementation of a Postmodern ERP. However, more and more companies implement this architecture for all the discussed benefits. Unfortunately, such modern architecture also includes some challenges. Let’s explore them and discuss how to solve them with Apache Kafka and its ecosystem.

Solving the Challenges of a Postmodern ERP with Kafka

This section covers three main challenges of implementing a Postmodern ERP and how Kafka and its ecosystem help implement this architecture.

I quote the three main challenges from the blog post “Postmodern ERP: Just Another Buzzword?” and then explain how the Kafka ecosystem solved them more or less out-of-the-box.

Issue 1: More Complexity Between Systems!

“Because ERP modules and tools are built to work together, legacy systems can be a lot easier to configure than a postmodern solution composed entirely of best-of-breed solutions. Because postmodern ERP may involve different programs from different vendors, it may be a lot more challenging to integrate. For example, during the buying process, you would need to ask about compatibility with other systems to ensure that the solution that you have in mind would be sufficient.”

First of all, is your existing ERP system easy to integrate? Any ERP system older than five years uses proprietary interfaces (such as BAPI and iDoc in case of SAP) or ugly/complex SOAP web services to integrate with other systems. Even if all the software components come from one single vendor, it was built by different business units or even acquired. The codebases and interfaces speak very different languages and technologies.

So, while a Postmodern ERP requires complex integration between systems, so does any legacy ERP system! Nevertheless:

How Kafka Helps…

Kafka provides an open, scalable, elastic real-time infrastructure for implementing the middleware between your ERP and other systems. More details in the comparison between Kafka and traditional middleware such as ETL and ESB products.

Kafka Connect is a key piece of this architecture. It provides a Kafka-native integration framework.

Additionally, another key reason why Kafka makes these complex integration successful is that Kafka really decouples systems (in contrary to traditional messaging queues or synchronous SOAP/REST web services):

Domain-Driven Design and Decoupling for your Postmodern ERP with Kafka

The heart of Kafka is real-time and event-based. Additionally, Kafka decouples producers and consumers with its storage capabilities and handles the backpressure and optionally the long-term storage of events and data. This way, batch analytics platforms, REST interfaces (e.e.g mobiles apps) with request-response, and databases can access the data, too. Learn more about “Domain-driven Design (DDD) for decoupling applications and microservices with Kafka“.

Understanding the relation between event streaming with Kafka and non-streaming APIs (usually HTTP/REST) is also very important in this discussion. Check out “Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies?” for more details.

The integration capabilities and real coupling using Kafka enables the integration of the complexity between systems.

Issue 2: More Difficult Upgrades!

“This con goes hand in hand with the increased complexity between systems. Because of this increased complexity and the fact that the solution isn’t an all-in-one program, making system upgrades can be difficult. When updates occur, your IT team will need to make sure that the relationship between the disparate systems isn’t negatively affected.”

How Kafka Helps…

The issue with upgrades is solved with Kafka out-of-the-box. Remember: Kafka really decouples systems from each other due to its storage capabilities. You can upgrade one system without even informing the other systems and without downtime! Two reasons why this works so well and out-of-the-box:

  1. Kafka is backward compatible. Even if you upgrade the server-side (Kafka brokers, ZooKeeper, Schema Registry, etc.), the other applications and interfaces continue to work without breaking changes. Server-side and client-side can be updated independently. Sometimes an older application is not updated anymore at all because it will be replaced soon. That’s totally fine. An old Kafka client can speak to a newer Kafka broker.
  2. Kafka uses rolling upgrades. The system continues to work without any downtime. 24/7. For Mission-critical workloads like ERP or MES transactions. From the outer perspective, the upgrade will not even be recognized at all.

Let’s take a look at an example with different components of the Postmodern ERP:

Postmodern ERP - Replication between Kafka and ERP Components

In this case, we see different versions and distributions of Kafka being used:

  • The Tier 1 Supplier uses the fully-managed and serverless Confluent Cloud solution. It automatically upgrades to the latest Kafka release under the hood (this is never a problem due to backward compatibility). The client applications use pretty old versions of Kafka.
  • The Core ERP uses open-source Kafka as it is a homegrown solution, not standard software. The operations and support are handled by the company itself (pretty risky for such a critical system, but totally valid). The Kafka version is relatively new. One client application even uses a Kafka version, which is newer than the server-side, to leverage a new feature in Kafka Streams (Kafka is backward compatible in both directions, so this is not a problem).
  • The MES vendor uses Confluent Platform, which embeds Apache Kafka. The version is up-to-date as the vendor does regular releases and supports rolling upgrades.
  • Integration between the different ERP applications is implemented with Kafka-native replication tools, MirrorMaker 2, respectively Confluent Cluster Linking. As discussed in a former section, various other integration options are available, including REST, Kafka Connect, native Kafka clients in any programming languages, or any ETL or ESB tool.

Backward compatibility and rolling upgrades make updating systems easy and invisible for integrated systems. Business continuity is guaranteed out-of-the-box.

Issue 3: Lack of Access When Offline

“When you implement a cloud-based software, you need to account for the fact that you won’t be able to access it when you are offline. Many legacy ERP systems offer on-premise solutions, albeit with a high installation cost. However, this software is available offline. For cloud ERP solutions, you are reliant on the internet to access all of your data. Depending on your specific business needs, this may be a dealbreaker.”

How Kafka Helps…

Hybrid architectures are the new black. Local processing on-premise is required in most use cases. It is okay to build the next generation ERP in the cloud. But the integration between cloud and on-premise/edge is key for success. A great example is Mojix, a Kafka-native cloud platform for real-time retail & supply chain IoT processing with Confluent Cloud.

When tangible goods are produced and sold, some processing needs to happen on-premise (e.g., in a factory) or even closer to the edge (e.g., in a restaurant or retail store). No access to your data is a dealbreaker. No capability of local processing is a dealbreaker. Latency and cost for cloud-only can be another deal-breaker.

Kafka works well on-premise and at the edge. Plenty of examples exist. Including Kafka-native bi-directional real-time replication between on-premise / edge and the cloud.

I covered these topics so often already; therefore, I just share a few links to read:

I specifically recommend the latter link. It covers hybrid architectures where processing at the edge (i.e. outside the data center) is key and required even offline, like in the following example running Kafka in a factory (including the server-side):

Edge Computing with Kafka in Manufacturing and Industry 4.0 MES ERP

The hybrid integration capabilities using Kafka and its ecosystem solves the issue with lack of access when offline.

Kafka and Event Streaming as Foundation for a Postmodern ERP Infrastructure

Postmodern ERP represents the next generation of ERP architectures. It is real-time, scalable, and open by using a combination of open source technologies and proprietary standard software. This blog post explored how software vendors and end-users leverage event streaming with Apache Kafka to implement a Postmodern ERP.

What are your experiences with ERP systems? Did you already implement a Postmodern ERP architecture? Which approach works best for you? What is your strategy? Let’s connect on LinkedIn and discuss it!

The post Building a Postmodern ERP with Apache Kafka appeared first on Kai Waehner.

]]>