In Memory Archives - Kai Waehner

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Kai Waehner — Thu, 07 Mar 2019 15:45:15 +0000

Learn the differences between an event-driven streaming platform like Apache Kafka and middleware like Message Queues (MQ), Extract-Transform-Load (ETL) and Enterprise Service Bus (ESB). Including best practices and anti-patterns, but also how these concepts and tools complement each other in an enterprise architecture.

This blog post shares my slide deck and video recording. I discuss the differences between Apache Kafka as Event Streaming Platform and integration middleware. Learn if they are friends, enemies or frenemies.

Problems of Legacy Middleware

Extract-Transform-Load (ETL) is still a widely-used pattern to move data between different systems via batch processing. Due to its challenges in today’s world where real time is the new standard, an Enterprise Service Bus (ESB) is used in many enterprises as integration backbone between any kind of microservice, legacy application or cloud service to move data via SOAP / REST Web Services or other technologies. Stream Processing is often added as its own component in the enterprise architecture for correlation of different events to implement contextual rules and stateful analytics. Using all these components introduces challenges and complexities in development and operations.

Apache Kafka – An Open Source Event Streaming Platform

This session discusses how teams in different industries solve these challenges by building a native event streaming platform from the ground up instead of using ETL and ESB tools in their architecture. This allows to build and deploy independent, mission-critical streaming real time application and microservices. The architecture leverages distributed processing and fault-tolerance with fast failover, no-downtime, rolling deployments and the ability to reprocess events, so you can recalculate output when your code changes. Integration and Stream Processing are still key functionality but can be realized in real time natively instead of using additional ETL, ESB or Stream Processing tools.

A concrete example architecture shows how to build a complete streaming platform leveraging the widely-adopted open source framework Apache Kafka to build a mission-critical, scalable, highly performant streaming platform. Messaging, integration and stream processing are all build on top of the same strong foundation of Kafka; deployed on premise, in the cloud or in hybrid environments. In addition, the open source Confluent projects, based on top of Apache Kafka, adds additional features like a Schema Registry, additional clients for programming languages like Go or C, or many pre-built connectors for various technologies.

Slides: Apache Kafka vs. Integration Middleware

Here is the slide deck:

Video Recording: Kafka vs. MQ / ETL / ESB – Friends, Enemies or Frenemies?

Here is the video recording where I walk you through the above slide deck:

Article: Apache Kafka vs. Enterprise Service Bus (ESB)

I also published a detailed blog post on Confluent blog about this topic in 2018:

Apache Kafka vs. Enterprise Service Bus (ESB)

Talk and Slides from Kafka Summit London 2019

The slides and video recording from Kafka Summit London 2019 (which are similar to above) are also available for free.

Why Apache Kafka instead of Traditional Middleware?

If you don’t want to spend a lot of time on the slides and recording, here is a short summary of the differences of Apache Kafka compared to traditional middleware:

Questions and Discussion…

Please share your thoughts, too!

Does your infrastructure see similar architectures? Do you face similar challenges? Do you like the concepts behind an Event Streaming Platform (aka Apache Kafka)? How do you combine legacy middleware with Kafka? What’s your strategy to integrate the modern and the old (technology) world? Is Kafka part of that architecture?

Please let me know either via a comment or via LinkedIn, Twitter, email, etc. I am curious about other opinions and experiences (and people who disagree with my presentation).

The post Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video appeared first on Kai Waehner.

Streaming Analytics with Analytic Models (R, Spark MLlib, H20, PMML)

Kai Waehner — Thu, 03 Mar 2016 15:51:01 +0000

In March 2016, I had a talk at Voxxed Zurich about “How to Apply Machine Learning and Big Data Analytics to Real Time Processing”.

Finding Insights with R, H20, Apache Spark MLlib, PMML and TIBCO Spotfire

“Big Data” is currently a big hype. Large amounts of historical data are stored in Hadoop or other platforms. Business Intelligence tools and statistical computing are used to draw new knowledge and to find patterns from this data, for example for promotions, cross-selling or fraud detection. The key challenge is how these findings can be integrated from historical data into new transactions in real time to make customers happy, increase revenue or prevent fraud.

Putting Analytic Models into Action via Event Processing and Streaming Analytics

“Fast Data” via stream processing is the solution to embed patterns – which were obtained from analyzing historical data – into future transactions in real-time. The following slide deck uses several real world success stories to explain the concepts behind stream processing and its relation to Apache Hadoop and other big data platforms. I discuss how patterns and statistical models of R, Apache Spark MLlib, H20, and other technologies can be integrated into real-time processing using open source stream processing frameworks (such as Apache Storm, Spark Streaming or Flink) or products (such as IBM InfoSphere Streams or TIBCO StreamBase). A live demo showed the complete development lifecycle combining analytics with TIBCO Spotfire, machine learning via R and stream processing via TIBCO StreamBase and TIBCO Live Datamart.

Slide Deck from Voxxed Zurich 2016

Here is the slide deck:

The post Streaming Analytics with Analytic Models (R, Spark MLlib, H20, PMML) appeared first on Kai Waehner.

Comparison of Stream Processing Frameworks and Products

Kai Waehner — Sun, 25 Oct 2015 16:28:59 +0000

See how products, libraries, and frameworks that full under ‘streaming data analytics’ use cases are categorized and compared.

Streaming Analytics processes data in real time while it is in motion. This concept and technology emerged several years ago in financial trading, but it is growing increasingly important these days due to digitalization and Internet of Things (IoT). The following slide deck from a recent talk at a conference covers:

Real world success stories from different industries (Manufacturing, Retailing, Sports)
Alternative Frameworks and Products for Stream Processing
Complementary Relationship to Data Warehouse, Apache Hadoop, Statistics, Machine Learning, Open Source R, SAS, Matlab, etc.

Stream Processing Frameworks and Products

The following picture shows the key differences between frameworks (no matter if open source such as Apache Storm, Apache Flink, Apache Spark or closed source such as Amazon Kinesis) and products (such as TIBCO StreamBase / Live Datamart, IBM InfoSphere Streams, Software AG’s Apama).

Of course, you can implement everything by writing code and using one or more frameworks. However, besides several other benefits, the key differentiator of using a product is time to market. You can realize projects in weeks instead of months or even years. Delivering quickly is the number one priority of most enterprises these days in a world where the only constant is change!

I recommend that you choose one or two frameworks and one or two products to implement a proof of concept (POC); spend e.g. five days with each one to implement a streaming analytics use case, which includes integration of input feeds or sensors, correlation / sliding windows / patterns, simulation and testing, and a live user interface to monitor and act proactively. At the end, you can compare the results and decide which fits you best.

Fast Data and Streaming Analytics in the Era of Hadoop, R and Apache Spark

The following slide deck discusses the above topics in much more detail:

Parts of this (extensive) slide deck were used for talks at several international conferences such as JavaOne 2015 in San Francisco. I appreciate any feedback about the content to improve it continuously…If you want to learn more about Streaming Analytics and its relation to Big Data and Apache Hadoop, I recommend the following InfoQ article: Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse.

The post Comparison of Stream Processing Frameworks and Products appeared first on Kai Waehner.

Difference between a Data Warehouse and a Live Datamart?

Kai Waehner — Fri, 09 Oct 2015 07:47:46 +0000

Data Warehouses have existed for many years in almost every company. While they are still as good and relevant for the same use cases as they were 20 years ago, they cannot solve new, existing challenges and those sure to come in a ever-changing digital world. The upcoming sections will clarify when to still use a Data Warehouse and when to use a modern Live Datamart instead.

What is a Data Warehouse (DWH)?

A Data Warehouse is a central repository of integrated data from more disparate sources. It stores historical data to create analytical reports for knowledge workers throughout the enterprise. A DWH includes a server, which stores the historical data and a client for analysis and reporting.

An ETL (Extract-Transform-Load) process extracts data from homogeneous or heterogeneous data sources such as files or relational databases, transforms the data for storing it in proper format or structure for querying and analysis purposes. Data is usually transferred in long-running batch processes from operational databases to the DWH. When data gets into the DWH, it is already at rest and some minutes, hours, or even days old.

Widespread DWHs are Teradata, EMC Greenplum or IBM Netezza. A client—often called Business Intelligence (BI) or Data Discovery tool—is either part of the server product (usually just used for reporting, e.g. weekly or monthly sales reports), or an independent solution such as TIBCO Spotfire, which offers business users the ability to discover the data easily to find new patterns or other insights. Examples of reports could range from annual and quarterly comparisons and trends to detailed daily sales analyses.

Finally, we can further classify some Data Warehouses that are deployed and focused on a single subject or functional area (sales, finance or marketing) as a Datamart. Next, we explore how a Live Datamart can enhance your business.

What is a Live Datamart (LDM)?

A Live Datamart is like a Data Warehouse or a Datamart derived from a Data Warehouse, but for real-time streaming data from sensors, social feeds, trading markets, and other messaging systems. It provides a push-based, real-time analytics solution that enables business users to analyze, anticipate, and receive alerts on key events as they occur, and act on opportunities or threats while they matter. You can manage and override escalations while they are happening.

The technical key difference to the “static database” of a DWH is the continuous query engine of a LDM server that processes high-speed streaming data, creates fully materialized live data tables, manages ad-hoc queries from clients, and continuously pushes live results as conditions change in real time.

The streaming data is ingested, normalized, and viewed in one user interface – the single LDM client. The client can be

Rich client with out-of-the-box support for tables, charts, and queries via “drag & drop” user interface
Self-developed custom rich client using Java or .NET APIs
Web user interface integrated into a website, portal or mobile application using standards such as HTML5 and JavaScript

From an end-user perspective, an LDM client can be used, for example, by a power user on its laptop, the operations center on a big screen or people on-site at customers using tablets. Of course, events can also be handled automatically—if appropriate (e.g. for sending out an alert to another system).

Combination of Historical and Real-Time Data

Of course, a Live Datamart can also connect to a historical database and define queries to be executed against that database. To an end user, LiveView makes historical tables look just like live tables, which allows users to access both types of data—live and historical—in the same way, with one user interface. Besides, Live Datamart can also easily populate historical databases based on the real-time data it has captured, either with batch end of day loads or parallel capture. See this blog post for some example use cases.

TIBCO Live Datamart is the only available option on the market, where you can combine automated streaming analytics and proactive human interaction with one toolset.

When to use which one?

Essentially, a conventional Data Warehouse or Datamart helps manage data based on yesterday, while a Live Datamart helps manage intraday data.

Use a Data Warehouse in combination with a Business Intelligence tool for analysis and reporting of historical data. This way, you can analyze and compare different strategies, departments, financial data, order information, etc. with regards to revenue, costs, and other KPIs. You can also find patterns in historical data and implement these patterns in real time with streaming analytics for new events (e.g. fraud detection, predictive fault management, cross-selling).

Use a Live Datamart to manage operations in real time while they are happening, instead of too late. This way, you can change marketing strategies, change cross-selling offers, or repair and replace machines and devices, which will (probably) break soon. A Live Datamart is not just a Dashboard for monitoring—it’s actionable!

In summary, the key difference is that a Live Datamart allows being proactive both automatically and with human interaction (whichever is appropriate) while events are happening. A Data Warehouse only allows analyzing events that already happened.

Slide Deck and Webinar

Here is a slide deck discussing this topic:

The following 15min on-demand webinar contains a video discussing the above slides.

The post Difference between a Data Warehouse and a Live Datamart? appeared first on Kai Waehner.

Right Technology, Framework or Tool to Build Microservices

Kai Waehner — Wed, 27 May 2015 04:05:42 +0000

Last week, I gave a talk at a German conference (Karlsruher Entwicklertag 2015) about Microservices. The following slide deck shows plenty of different technologies (e.g. REST, WebSockets), frameworks (e.g. Apache CXF, Apache Camel, Puppet, Docker) or tools (e.g. TIBCO BusinessWorks, API Exchange) to realize Microservices.

Abstract: How to Build Microservices

Microservices are the next step after SOA: Services implement a limited set of functions. Services are developed, deployed and scaled independently. This way you get shorter time to results and increased flexibility.

Microservices have to be independent regarding build, deployment, data management and business domains. A solid Microservices design requires single responsibility, loose coupling and a decentralized architecture. A Microservice can to be closed or open to partners and public via APIs.

This session discusses technologies such as REST, WebSockets, OSGi, Puppet, Docker, Cloud Foundry, and many more, which can be used to build and deploy Microservices. The main part shows different open service frameworks and tools to build Microservices on top of these technologies. Live demos illustrate the differences. The audience will learn how to choose the right alternative for building Microservices.

Key Messages: Integration, Real Time Event Correlation, TCO, Time-to-Market

I used three key messages within my talk to explain the complexity and variety of different Microservice concepts:

–Integration is key for success of Microservices
–Real time event correlation is the game changer
–TCO and Time-to-Market are major aspects for tool selection

Slide Deck

Here is the slide deck, which I presented at Karlsruher Entwicklertag in Germany:

The post Right Technology, Framework or Tool to Build Microservices appeared first on Kai Waehner.

TIBCO BusinessWorks and StreamBase for Big Data Integration and Streaming Analytics with Apache Hadoop and Impala

Kai Waehner — Tue, 14 Apr 2015 14:41:46 +0000

Apache Hadoop is getting more and more relevant. Not just for Big Data processing (e.g. MapReduce), but also for Fast Data processing (e.g. Stream Processing). Recently, I published two blog posts on the TIBCO blog to show how you can leverage TIBCO BusinessWorks 6 and TIBCO StreamBase to realize Big Data and Fast Data Hadoop use cases.

TIBCO ActiveMatrix BusinessWorks 6 + Apache Hadoop = Big Data Integration

Apache Hadoop was built for processing complex computations on Big Data stores (that is, terabytes to petabytes) with a MapReduce distributed computation model that runs easily on cheap commodity hardware.

A Hadoop distribution from vendors such as Hortonworks, Cloudera or MapR packages different projects of the Hadoop ecosystem. This assures that all used versions work together smoothly. On top of the packaging, Hadoop vendors offer tooling for deployment, administration and monitoring of Hadoop clusters. Commercial support completes their offerings.

The key challenge is to integrate the input and results of Hadoop processing into the rest of the enterprise. Using just a Hadoop distribution requires a lot of complex coding for integration services.

Continue here for the full article: TIBCO ActiveMatrix BusinessWorks 6 + Apache Hadoop = Big Data Integration

TIBCO StreamBase + Hadoop + Impala = Fast Data Streaming Analytics

As of today, Hadoop is evolving quickly. It is not only used for batch processing anymore. YARN, Storm, Spark, and several other solutions introduce modern paradigms to Hadoop. However, some problems still remain with Hadoop:

No good, easy development tooling for the Hadoop ecosystem components such as Hive, Storm, Spark, etc.
Missing maturity (a lot of alpha/beta/0.x versions) especially in management and monitoring tools, as well as security, connectivity, and APIs
No “real time” (== seconds, milliseconds, microseconds), but “near real time” (still several seconds and more, much more when recovering from infrastructure faults)
No operational analytics (human monitoring and proactive actions)

So why not combine the great benefits of Hadoop with the Fast Data streaming analytics tool TIBCO StreamBase with its mature, mission-critical deployments in several different industries, great graphical tooling, and operational real-time analytics (via TIBCO Live Datamart on top of StreamBase)?

This post shows how to realize a Fast Data use case with TIBCO StreamBase and the Hadoop framework’s Impala analytical database quickly and easily.

Continue here for the full article: TIBCO StreamBase + Hadoop + Impala = Fast Data Streaming Analytics

For a general introduction to Stream Processing and Streaming Analytics, I recommend the InfoQ article: Real-Time Stream Processing as Game Changer in a Big Data World with Hadoop and Data Warehouse.

As always, I appreciate any feedback…

The post TIBCO BusinessWorks and StreamBase for Big Data Integration and Streaming Analytics with Apache Hadoop and Impala appeared first on Kai Waehner.

Micro Services Architecture = Death of Enterprise Service Bus (ESB)?

Kai Waehner — Thu, 08 Jan 2015 15:08:06 +0000

These days, it seems like everybody is talking about microservices. You can read a lot about it in hundreds of articles and blog posts, but my recommended starting point would be this article by Martin Fowler, which initiated the huge discussion about this new architectural concept. This article is about the challenges, requirements and best practices for creating a good microservices architecture, and what role an Enterprise Service Bus (ESB) plays in this game.

Branding and Marketing: EAI vs. SOA vs. ESB vs. Microservices

Let’s begin with a little bit of history about Service-oriented Architecture (SOA) and Enterprise Service Bus to find out why microservices have become so trendy.

Many years ago, software vendors offered a middleware for Enterprise Application Integration (EAI), often called EAI broker or EAI backbone. The middleware was a central hub. Back then, SOA was just emerging. The tool of choice was an ESB. Many vendors just rebranded their EAI tool into an ESB. Nothing else changed. Some time later, some new ESBs came up, without central hub, but distributed agents. So, ESB served for different kinds of middleware. Many people do not like the term “ESB” as they only know the central one, but not the distributed one. As of today, some people even talk about this topic using the term “NoESB” or Twitter hashtag #noesb (similar to NoSQL). Let’s observe the future if we see the term “NoESB” more in the future…

Therefore, vendors often avoid talking about an ESB. They cannot sell a central integration middleware anymore, because everything has to be distributed and flexible. Today, you can buy a service delivery platform. In the future, it might be a microservices platform or something similar. In some cases, the code base might still be the same of the EAI broker 20 years ago. What all these products have in common is that you can solve integration problems by implementing “Enterprise Integration Patterns”.

To summarise the history about branding and marketing of integration products: Pay no attention to sexy impressive sounding names! Instead, make looking at the architecture and features the top priority. Ask yourself what business problems you need to solve, and evaluate which architecture and product might help you best. It is amazing how many people still think of a “central ESB hub”, when I say “ESB”.

Requirements for a Good Microservices Architecture

Six key requirements to overcome those challenges and leverage the full value of microservices:

Services Contract
Exposing microservices from existing applications
Discovery of services
Coordination across services
Managing complex deployments and their scalability
Visibility across services

The full article discusses these six requirements in detail, and also answers the question how a modern ESB is related to a Microservices architecture. Read the full article here:

Do Good Microservices Architectures Spell the Death of the Enterprise Service Bus?

Comments appreciated in this blog post or at the full article. Thanks.

The post Micro Services Architecture = Death of Enterprise Service Bus (ESB)? appeared first on Kai Waehner.

Intelligent BPM Suite (iBPMS): Implementation of a CRM Use Case

Kai Waehner — Wed, 03 Dec 2014 07:39:03 +0000

Today, humans have to interpret large sets of different data to make a decision. Using gut feeling is nothing but gambling. Therefore, big data analytics is getting more and more important every year to make better decisions. However, just doing big data analytics is not enough. In many use cases, systematic and monitored human interactions are as important to get best outcomes.

Making the data “actionable” is the real challenge! Seeing the information that helps to make a decision on a composite dashboard using business intelligence (BI) and big data analytics is just the first step and where too many companies stop. An enterprise must be able to fire off the business process to execute the decision made regarding the data. That’s where the buzzword “Intelligent Business Process Management Suite (iBPMS)” comes into play.

iBPMS is a term introduced by Gartner – an information technology research analyst – to indicate the evolution of the classic BPMS into the next-generation BPM, which includes integration of big data analytics, social media and mobile devices into organization’s business processes support.

Some other companies and analysts use other names, for example “Operational Business Process” or “Intelligent Business Operation (IBO)”. Many people abbreviate this topic with iBPM instead of iBPMS. However, in the end, everybody is talking about an intelligent business processes.

An intelligent business process combines big data, analytics and business process management (BPM) – including case management! This enables applications and humans to make data-driven decisions based on big data analytics. Two flavors exist: „Process starts big data analytics“ (e.g. recommendation engine) and „big data analytics starts process“ (e.g. prevention of flu epidemic).

Let’s look at a real world use case to show why realizing intelligent business processes makes a lot of sense, and how to actually build such a solution.

Use Case: Improved Customer Relationship Management (CRM)

A casino wants to increase customer satisfaction. Therefore, the casino leverages big data analytics and gives customers a digital identity (including hotel preferences, gambling behavior, etc.), so that customers can get personalized offers in real time. For instance, the casino can offer a 30% coupon for a show ticket (which is not sold out yet) or a free steak tonight (which would be perishable until tomorrow anyway, besides many seats are available in the restaurant currently). Besides increasing customer satisfaction, the casino creates further benefits such as cost reduction or increased revenue. For instance, a customer visiting a show or eating a steak will also spend money for drinks.

Products to Implement an Intelligent Business Process

It does not matter if you use Salesforce CRM or any other product for your customer management. Also, many different BPM and integration tools are available on the market: TIBCO, IBM, Oracle, Software AG, to name a few leaders…

The following implementation uses different TIBCO products to implement the use case. Even if you have never seen these products before, you will understand easily how these tools work together to realize an intelligent business process.

All these products are loosely coupled, but highly integrated. Each TIBCO product has a specific task to solve. Nevertheless, they connect to each other very well via specific adapters (e.g. the Tibbr plugin of BusinessWorks) or standards for interoperability such as SOAP / REST Web Services or JMS messaging.

Let’s discuss the steps that are necessary to realize the described use case.

Step 1: Integration of Siebel CRM, SAP ERP and CICS Mainframe with TIBCO BusinessWorks ESB

The first step is integration of different systems and interfaces. Complex transformations have to be realized to format and process all required information correctly. TIBCO Business Works is used as integration platform (ESB).

Tasks:

Integrate customer data from Siebel CRM.
Integrate casino data from SAP ERP.
Integrate payment information from CICS mainframe.
Process incoming gambling information from slot machines via EDI.
Push transformed streaming events in real time to output connector.

Depending on the use case, you can integrate any other technology, e.g. Hadoop or a Data Warehouse (DWH) such as Teradata or HP Vertica.

Step 2: Real Time Streaming Analytics with TIBCO StreamBase

The pre-processed data is pushed into a stream processing engine for doing real time streaming analytics. TIBCO StreamBase is a mature product with awesome tooling for these tasks.

Tasks:

Filter and analyze all kinds of events.
Correlate relevant events.
If possible, act in real time automatically.
Otherwise, start a business process for human interaction.

For example, you correlate events such as “customer lost a lot of money gambling”, “complaint via Twitter”, “good weather” and “seats available at pool bar” to send a 50% coupon for a cocktail at the pool bar. If no free seats are available, a human has to make a decision how to improve customer satisfaction. If a customer complains about his current situation another more complex business process has to be initiated using case management features of the BPM tool.

Step 3: Automatic Reaction or Human Interaction with TIBCO ActiveMatrix BPM

If a task cannot be automated completely, a business process instance is started to react appropriately to an event. This can be a relative simple process with human interaction and automated steps, or a more complex situation requiring flexible case management. TIBCO ActiveMatrix BPM is the right tool for this job. The current release already includes several case management features!

Tasks:

Do something to make customer happy again, e.g. check if the steak restaurant has a lot of steaks in stock. Call customer on his mobile phone and offer a steak coupon.
Send steak coupon via SMS or email.
Or escalate to your manager if customer does not appreciate the offer. Case management features can be used here to enable humans to act in a more flexible way to unexpected events.

TIBCO Tibbr is a social enterprise platform similar to Facebook, but for Enterprises; including several additional and advanced features such as security, customization or integration with other TIBCO and non-TIBCO products and applications. Tibbr’s process notifications are used for work distribution to occasional users. These can interact via their iPhone respectively Android smartphone apps or other mobile clients.

In the above example, the manager would receive a push message about the escalation on his iPhone while walking around in the casino. He can react immediately by sending a Tibbr message to a colleague or starting another business process.

Step 5: TIBCO BPM Analytics – A Picture is Worth a Thousand Processes

After the implementation is deployed and running, you can investigate and improve your processes by using explorative data analytics. TIBCO BPM Analytics provides end-to-end process visibility including self-service, interactive, drag and drop reports for business users. TIBCO Spotfire – a Business Intelligence tool for explorative data analytics – is integrated into TIBCO ActiveMatrix BPM (and many other TIBCO products) for that reason.

The Realization of Intelligent Business Processes (iBPMS) is no Rocket Science

The above sections showed an implementation of an intelligent business process using iBPMS tooling. A BPM solution is not the only thing you need to realize intelligent business processes. You need to integrate different enterprise applications and big data / fast data analytics. Integration and separation of concerns is key for success of such a project. Integration of social enterprise platforms becomes prevalent for supporting occasional users. iBPMS sounds like a very complex topic first. However, it is easy to implement iBPMS if you can use loosely coupled, but integrated tooling that can solve your requirements.

Kai Wähner works as Technical Lead at TIBCO. All opinions are his own and do not necessarily represent his employer. Kai’s main area of expertise lies within the fields of Application Integration, Big Data, SOA, BPM, Cloud Computing, Java EE and Enterprise Architecture Management. He is speaker at international IT conferences such as JavaOne, ApacheCon or OOP, writes articles for professional journals, and shares his experiences with new technologies on his blog. Contact: LinkedIn, @KaiWaehner or kontakt@kai-waehner.de.

The post Intelligent BPM Suite (iBPMS): Implementation of a CRM Use Case appeared first on Kai Waehner.

Real World Use Cases and Success Stories for In-Memory Data Grids

Kai Waehner — Mon, 24 Nov 2014 12:43:21 +0000

NoSQL Matters Conference 2014

NoSQL Matters is a great conference about different NoSQL topics. A lot of great NoSQL products and use cases are presented. In November 2014, I had a talk about “Real World Use Cases and Success Stories for In-Memory Data Grids” in Barcelona, Spain. I discussed several different use cases, which our TIBCO customers implemented using our In-Memory Data Grid “TIBCO ActiveSpaces“. I will present the same content at data2day, a German conference in Karlsruhe about big data topics.

In-Memory Data Grids: TIBCO ActiveSpaces, Oracle Coherence, Infinispan, IBM eXtreme Scale, Hazelcast, Gigaspaces, etc.

A lot of in-memory data grid products are available. TIBCO ActiveSpaces, Oracle Coherence, Infinispan, IBM WebSphere eXtreme Scale, Hazelcast, Gigaspaces, GridGain, Pivotal Gemfire to name most of the important ones. See a great graphic by 451 Research Group, which shows different databases and how data grids fit into that landscape. You can always get the newest version: 451 DataBase Landscape.

It is important to understand that an in-memory data grid offers much more than just caching and storing data in memory. Further in-memory features are event processing, publish / subscribe, ACID transactions, continuous queries and fault-tolerance – to name a few… Therefore, let’s discuss one example in the next section to get a better understanding of what an in-memory data grid actually is.

TIBCO ActivesSpaces In-Memory Data Grid

TIBCO ActiveSpaces combines the best out of NoSQL and In-Memory features. The following description is taken from TIBCO’s website:

To lift the burden of big data, TIBCO ActiveSpaces provides a distributed in-memory data grid that can increase processing speed so you can reduce reliance on costly transactional systems.

ActiveSpaces EE provides an infrastructure for building highly scalable, fault-tolerant applications. It creates large virtual data caches from the aggregate memory of participating nodes, scaling automatically as nodes join and leave. Combining the features and performance of databases, caching systems, and messaging software, it supports very large, highly volatile data sets and event-driven applications. And it frees developers to focus on business logic rather than on the complexities of distributing, scaling, and making applications autonomously fault-tolerant.

ActiveSpaces EE supplies configurable replication of virtual shared memory. This means that the space autonomously re-replicates and re-distributes lost data, resulting in an active-active fault-tolerant architecture without resource overhead.

Benefits

Reduce Management Cost: Off-load slow, expensive, and hard-to-maintain transactional systems.
Deliver Ultra-Low, Predictable Latency: Use peer-to-peer communication, avoiding intervention by a central server.
Drastically Improve Performance: Create next-generation elastic applications including high performance computing, extreme transaction processing, and complex event processing.
Simplify Administration: Eliminate the complexity of implementing and configuring a distributed caching platform using a command-line administration tool with shell-like control keys that provide command history, syntax completion, and context-sensitive help.
Become Platform Independent: Store database rows and objects and use the system as middleware to exchange information between heterogeneous platforms.
Speed Development: Enable data virtualization and let developers focus on business logic rather than on the details of data implementation.

If you want to learn more about TIBCO ActiveSpaces take a look at a great recording from QCon 2013: TIBCO Fellow Jean-Noel Moyne discusses in-memory data grids in more detail.

SAP HANA is not an In-Memory Data Grid

I should write an additional blog post about this topic. Nevertheless, to make it clear: SAP HANA is not an in-memory data grid. This is important to mention as everybody thinks about SAP HANA when talking about in-memory, right? Take a look at the 451 database landscape, which I mentioned above. SAP HANA is put into the “relational zone” under appliances (SAP HANA is only available as appliance), whereas all the other products I named are put in the “Grid / Cache Zone”.

SAP Hana is primarily being used to reduce dependency on other relational databases (e.g. Oracle). It is designed to make SAP run faster not to speed up other applications (non-SAP). SAP HANA is more like a traditional DB that is meant to ‘run reports faster’ by leveraging the large amount of RAM on the servers. It is great for some analytical use cases, e.g. faster reporting and “after the fact analysis”.

Compared to other in-memory products (i.e. “real data grids”) such as TIBCO ActiveSpaces and the other products mentioned above, SAP HANA misses several features such as implicit eventing (publish / subscribe) or deployment with flexible elasticity on commodity hardware. You can implement custom logic on SAP HANA with JavaScript or a proprietary SQL-like language (SQLScript), of course. Though, building several of the use cases in my presentation below is much more difficult with SAP HANA than with other “real data grid” products.

Be aware: I am not saying that SAP HANA is a bad product. Though, it serves different use cases than in-memory data grids such as TIBCO ActiveSpaces! For example, SAP HANA is great to replace Oracle RACs as database backend for SAP ERPs to speed up the systems and improve user experience.

Real World Use Cases and Success Stories for In-Memory Data Grids

The goal of my talk was not very technical. Instead, I discussed several different real world use cases and success stories for using in-memory data grids. Here is the abstract for my talk:

NoSQL is not just about different storage alternatives such as document store, key value store, graphs or column-based databases. The hardware is also getting much more important. Besides common disks and SSDs, enterprises begin to use in-memory storages more and more because a distributed in-memory data grid provides very fast data access and update. While its performance will vary depending on multiple factors, it is not uncommon to be 100 times faster than corresponding database implementations. For this reason and others described in this session, in-memory computing is a great solution for lifting the burden of big data, reducing reliance on costly transactional systems, and building highly scalable, fault-tolerant applications.The session begins with a short introduction to in-memory computing. Afterwards, different frameworks and product alternatives are discussed for implementing in-memory solutions. Finally, the main part of this session shows several different real world uses cases where in-memory computing delivers business value by supercharging the infrastructure.

Here is the slide deck:

As always, I appreciate every feedback. Please post a comment or contact me via Email, Twitter, LinkedIn or Xing…

The post Real World Use Cases and Success Stories for In-Memory Data Grids appeared first on Kai Waehner.

In Memory Archives - Kai Waehner

Apache Kafka vs. Middleware (MQ, ETL, ESB) – Slides + Video

Problems of Legacy Middleware

Apache Kafka – An Open Source Event Streaming Platform

Slides: Apache Kafka vs. Integration Middleware

Video Recording: Kafka vs. MQ / ETL / ESB – Friends, Enemies or Frenemies?

Article: Apache Kafka vs. Enterprise Service Bus (ESB)

Talk and Slides from Kafka Summit London 2019

Why Apache Kafka instead of Traditional Middleware?

Questions and Discussion…

Streaming Analytics with Analytic Models (R, Spark MLlib, H20, PMML)

Finding Insights with R, H20, Apache Spark MLlib, PMML and TIBCO Spotfire

Putting Analytic Models into Action via Event Processing and Streaming Analytics

Slide Deck from Voxxed Zurich 2016

Comparison of Stream Processing Frameworks and Products

See how products, libraries, and frameworks that full under ‘streaming data analytics’ use cases are categorized and compared.

Stream Processing Frameworks and Products

Fast Data and Streaming Analytics in the Era of Hadoop, R and Apache Spark

Difference between a Data Warehouse and a Live Datamart?

What is a Data Warehouse (DWH)?

What is a Live Datamart (LDM)?

Combination of Historical and Real-Time Data

When to use which one?

Slide Deck and Webinar

Right Technology, Framework or Tool to Build Microservices

Abstract: How to Build Microservices

Key Messages: Integration, Real Time Event Correlation, TCO, Time-to-Market

Slide Deck

TIBCO BusinessWorks and StreamBase for Big Data Integration and Streaming Analytics with Apache Hadoop and Impala

TIBCO ActiveMatrix BusinessWorks 6 + Apache Hadoop = Big Data Integration

TIBCO StreamBase + Hadoop + Impala = Fast Data Streaming Analytics

Micro Services Architecture = Death of Enterprise Service Bus (ESB)?

Branding and Marketing: EAI vs. SOA vs. ESB vs. Microservices

Requirements for a Good Microservices Architecture

Intelligent BPM Suite (iBPMS): Implementation of a CRM Use Case

iBPMS = BPM + Big Data / Fast Data Analytics + Social Integration

Use Case: Improved Customer Relationship Management (CRM)

Products to Implement an Intelligent Business Process

Step 1: Integration of Siebel CRM, SAP ERP and CICS Mainframe with TIBCO BusinessWorks ESB

Step 2: Real Time Streaming Analytics with TIBCO StreamBase

Step 3: Automatic Reaction or Human Interaction with TIBCO ActiveMatrix BPM

Step 4: Work Distribution to Mobile Apps with TIBCO Tibbr (Social Enterprise Platform)

Step 5: TIBCO BPM Analytics – A Picture is Worth a Thousand Processes

The Realization of Intelligent Business Processes (iBPMS) is no Rocket Science

Real World Use Cases and Success Stories for In-Memory Data Grids

NoSQL Matters Conference 2014

In-Memory Data Grids: TIBCO ActiveSpaces, Oracle Coherence, Infinispan, IBM eXtreme Scale, Hazelcast, Gigaspaces, etc.

TIBCO ActivesSpaces In-Memory Data Grid

SAP HANA is not an In-Memory Data Grid

Real World Use Cases and Success Stories for In-Memory Data Grids