Docker Archives - Kai Waehner https://www.kai-waehner.de/blog/category/docker/ Technology Evangelist - Big Data Analytics - Middleware - Apache Kafka Sun, 17 Nov 2019 15:40:59 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.2 https://www.kai-waehner.de/wp-content/uploads/2020/01/cropped-favicon-32x32.png Docker Archives - Kai Waehner https://www.kai-waehner.de/blog/category/docker/ 32 32 Service Mesh and Cloud-Native Microservices with Apache Kafka, Kubernetes and Envoy, Istio, Linkerd https://www.kai-waehner.de/blog/2019/09/24/cloud-native-apache-kafka-kubernetes-envoy-istio-linkerd-service-mesh/ Tue, 24 Sep 2019 15:50:45 +0000 http://www.kai-waehner.de/blog/?p=1585 This blog post takes a look at cutting edge technologies like Apache Kafka, Kubernetes, Envoy, Linkerd and Istio to implement a cloud-native service mesh for a scalable, robust and observable microservice architecture.

The post Service Mesh and Cloud-Native Microservices with Apache Kafka, Kubernetes and Envoy, Istio, Linkerd appeared first on Kai Waehner.

]]>
Microservice architectures are not free lunch! Microservices need to be decoupled, flexible, operationally transparent, data aware and elastic. Most material from last years only discusses point-to-point architectures with tightly coupled and non-scalable technologies like REST / HTTP. This blog post takes a look at cutting edge technologies like Apache Kafka, Kubernetes, Envoy, Linkerd and Istio to implement a cloud-native service mesh to solve these challenges and bring microservices to the next level of scale, speed and efficiency.

Here are the key requirements for building a scalable, reliable, robust and observable microservice architecture:

Key Requirements for Microservices

Before we go into more detail, let’s take a look at the key takeaways first:

  • Apache Kafka decouples services, including event streams and request-response
  • Kubernetes provides a cloud-native infrastructure for the Kafka ecosystem
  • Service Mesh helps with security and observability at ecosystem / organization scale
  • Envoy and Istio sit in the layer above Kafka and are orthogonal to the goals Kafka addresses

The following  sections cover some more thoughts about this. The end of the blog post contains a slide deck and video recording to get some more detailed explanations.

Microservices, Service Mesh and Apache Kafka

Apache Kafka became the de facto standard for microservice architectures. It goes far beyond reliable and scalable high-volume messaging. The distributed storage allows high availability and real decoupling between the independent microservices. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams.

A Service Mesh complements the architecture. It describes the network of microservices that make up such applications and the interactions between them. Its requirements can include discovery, load balancing, failure recovery, metrics, and monitoring. A service mesh also often has more complex operational requirements, like A/B testing, canary rollouts, rate limiting, access control, and end-to-end authentication.

I explore the problem of distributed Microservices communication and how both Apache Kafka and Service Mesh solutions address it. This blog post takes a look at some approaches for combining both to build a reliable and scalable microservice architecture with decoupled and secure microservices.

Kafka Service Mesh Domain Driven Design

Discussions and architectures include various open source technologies like Apache Kafka, Kafka Connect, Kubernetes, HAProxy, Envoy, LinkerD and Istio.

Learn more about decoupling microservices with Kafka in this related blog post about “Microservices, Apache Kafka, and Domain-Driven Design (DDD)“.

Cloud-Native Kafka with Kubernetes

Cloud-native infrastructures are scalable, flexible, agile, elastic and automated. Kubernetes got the de factor standard. Deployment of stateless services is pretty easy and straightforward. Though, deploying stateful and distributed applications like Apache Kafka is much harder. A lot of human operations is required. Kubernetes does NOT automatically solve Kafka-specific challenges like rolling upgrades, security configuration or data balancing between brokers. The Kafka Operator – implemented in K8s Custom Resource Definitions (CRD) – can help here!

The Operator pattern for Kubernetes aims to capture the key aim of a human operator who is managing a service or set of services. Human operators who look after specific applications and services have deep knowledge of how the system ought to behave, how to deploy it, and how to react if there are problems.

People who run workloads on Kubernetes often like to use automation to take care of repeatable tasks. The Operator pattern captures how you can write code to automate a task beyond what Kubernetes itself provides.

Different implementations for a Kafka Operator  for Kubernetes exist: Confluent Operator, IBM’s / Red Hat’s Strimzi, Banzai Cloud. I won’t go into more detail about the characteristics and advantages of a K8s Kafka Operator here. I already explained it in detail in another blog post (and the video below will also discuss this topic):

Service Mesh with Kubernetes-based Technologies like Envoy, Linkerd or Istio

Service Mesh is a microservice pattern to move visibility, reliability, and security primitives for service-to-service communication into the infrastructure layer, out of the application layer.

A great, detailed explanation of the design pattern “service mesh” can be found here, including the following diagram which shows the relation between a control plane and the microservices with proxy sidecars:

design pattern service mesh

You can find much more great content about service mesh concepts and its implementations from the creators of frameworks like Envoy or Linkerd. Check out these two links or just use Google for more information about the competing alternatives and their trade-offs.

(Potential) Features for Apache Kafka and Service Mesh

An event streaming platform like Apache Kafka and a service mesh on top of Kubernetes are cloud-native, orthogonal and complementary. Together they solve the key requirements for building a scalable, reliable, robust and observable microservice architecture:

Key Requirements for Microservices Solved with Kafka Kubernetes Envoy Istio

Companies use Kafka together with service mesh implementations like Envoy, Linkerd or Istio already today. You can easily combine them to add security, enforce rate limiting, or implement other related use cases. Banzai Cloud published one of the most interesting architectures: They use Istio for adding security to Kafka Brokers and ZooKeeper via proxies using Envoy.

However, in the meantime, the support gets even better: The pull request for Kafka support in Envoy was merged in May 2019. This means you now have native Kafka protocol support in Envoy. The very interesting discussions about its challenges and potential features of implementing a Kafka protocol filter are also worth reading.

With native Kafka protocol support, you can do many more interesting things beyond L4 TCP filtering. Here are just some ideas (partly from above Github discussion) of what you could do with L7 Kafka protocol support in a Service Mesh:

Protocol conversion from HTTP / gRPC to Kafka

  • Tap feature to dump to a Kafka stream
  • Protocol parsing for observability (stats, logging, and trace linking with HTTP RPCs)
  • Shadow requests to a Kafka stream instead of HTTP / gRPC shadow
  • Integrate with Kafka Connect and its whole ecosystem of connectors

Proxy features

  • Dynamic Routing
  • Rate limiting at both the L4 connection and L7 message level
  • Filter, add compression, …
  • Automatic topic name conversion (e.g. for canary release or blue/green deployment)

Monitoring and Tracing

  • Request logs and stats
  • Data lineage / audit log
  • Audit log by taking request logs and enriching them with the user info.
  • Client specific metrics (Byte rate per client id / per consumer groups, versions of the client libraries, consumer lag monitoring for the entire data center)

Security

  • SSL Termination
  • Mutual TLS (mTLS)
  • Authorization

Validation of Events

  • Serialization format (JSON, Avro, Protobuf, etc.)
  • Message schema
  • Headers, attributes, etc.

That’s awesome, isn’t it?

Microservices, Kafka and Service Mesh – Slide Deck and Video Recording

Let’s take a look at my slide deck and video recording to understand the requirements, challenges and opportunities of building a Service Mesh with Apache Kafka, its ecosystem, Kubernetes and Service Mesh technologies in more detail…

Here is the slide deck:

The video recording walks you through the slide deck:

Any thoughts or feedback? Please let me know via a comment or Tweet or let’s connect on LinkedIn.

 

 

The post Service Mesh and Cloud-Native Microservices with Apache Kafka, Kubernetes and Envoy, Istio, Linkerd appeared first on Kai Waehner.

]]>
Kafka Operator for Kubernetes – Confluent Operator to establish a Cloud-Native Apache Kafka Platform https://www.kai-waehner.de/blog/2019/07/29/confluent-kafka-operator-cloud-native-apache-kafka-platform-kubernetes/ Mon, 29 Jul 2019 08:03:01 +0000 http://www.kai-waehner.de/blog/?p=1516 Kafka Operator for Kubernetes - Confluent Operator to establish a Cloud-Native Apache Kafka Platform on Kubernetes (OpenShift, CloudFoundry, Hybrid Cloud).

The post Kafka Operator for Kubernetes – Confluent Operator to establish a Cloud-Native Apache Kafka Platform appeared first on Kai Waehner.

]]>
Confluent Operator is now GA for production deployments (Download Confluent Operator for Kafka here). This is a Kafka Operator for Kubernetes which provides automated provisioning and operations of an Apache Kafka cluster and its whole ecosystem (Kafka Connect, Schema Registry, KSQL, etc.) on any Kubernetes infrastructure.

Confluent Operator Kafka Operator for Kubernetes Download

I want to share a slide deck which explains:

  • Why Kubernetes is getting more and more traction to build a cloud-native infrastructure
  • Why this is relevant for Apache Kafka and Confluent Platform
  • The challenges running Kafka on Kubernetes
  • How Confluent Operator solves these problems providing a powerful Kafka Operator for Kubernetes

Cloud-Native vs. SaaS / Serverless

Software as a Service (SaaS) and Serverless Platforms provide software and services in the public cloud as managed service. Cloud-native infrastructures allow you to leverage the features of SaaS / Serverless in your own self-managed infrastructure (either on premise or in public cloud, and without vendor-lockin).

What is Cloud Native? Many different definitions exist on the web. Two definitions which I like are “The Twelve-Factor App” and “10 key characteristics of The New Stack“.

Some of the key benefits of cloud-native infrastructures:

  • Scalable
  • Flexible
  • Agile
  • Elastic
  • Automated

This is very different from traditional bare metal or VM infrastructures. Even if you use containers like Docker, you don’t automatically get above benefits. Providing cloud-native infrastructure is a key requirement to build a DevOps infrastructure and culture. Note that technology is just one part of a fully successful DevOps mentality, of course.

Kubernetes Won the Container War

In the beginning, many cloud-native container platforms built their own cloud-native technology and infrastructure. Many of these solutions were open source, but only one took over. Just take a look at these Google Trends of last five years:

Google Trends for Kubernetes (Mesosphere, Cloud Foundry, OpenShift)

In the meantime, most cloud-native infrastructure providers (such as Red Hat OpenShift, Mesosphere, Pivotal Cloud Foundry) moved their whole strategy around supporting Kubernetes. These vendors enhance the user experience and add additional features to differentiate from vanilla Kubernetes. OpenShift made this decision a few years earlier than most others; take a look how the above trends reflect this decision. Furthermore, Kubernetes is also available as managed service on all major cloud providers (AWS, Azure, GCP) in the meantime.

Stateful Kubernetes Deployments using Operator Pattern

Kubernetes was mainly used for stateless deployments in the early phases (for instance to deploy REST microservices). Today, people deploy everything on Kubernetes because it adds a lot of value – as discussed in the section about cloud-native infrastructure above. This includes the Kafka backend and clients.

Stateful deployments of backend services leverage the Kubernetes Operator pattern. For many infrastructure components, like databases, messaging, search engines, etc. The implementation of the Operator Pattern includes standard Kubernetes objects like StatefulSets, ConfigMaps, Secrets and Persistent Volumes. However, the secret sauce are the custom Kubernetes Controller and Custom Resource Definitions (CRDs) which implement unique application functionality for the specific stateful deployment.

Challenges running Kafka on Kubernetes

Apache Kafka became the de facto standard for event streaming platforms. Apache Kafka and its ecosystem provides a powerful option to build reliable, scalable, mission-critical distributed systems. Therefore, as you can image, it is harder to operate than a traditional messaging system or database which do not scale elastically without downtime and just use active/passive for high availability.

Kubernetes environments are similar: Very powerful but not easy to operate. Hence the combination of both, Kafka and Kubernetes, does not make it easier. Here are some challenges running the Apache Kafka ecosystem on Kubernetes:

  • Translating an existing architecture to Kubernetes
  • Failover handling
  • Data rebalancing
  • Communication between ZooKeeper, Kafka Brokers, Clients (Java, REST, Connect, KSQL), Schema Registry, etc.
  • External access from / to outside Kubernetes cluster
  • Persistent storage options on premise and in the cloud
  • Security configuration
  • Rolling upgrades
  • Etc.

This is the secret sauce which a Kubernetes Operator has to implement and automate. Consequently, a Kafka Operator sounds like a very good and valuable component.

Confluent Operator as Kafka Operator to establish a Cloud-Native Kafka Platform

Confluent has long experience running Kafka on Kubernetes:

Confluent Experience for Kafka on Kubernetes

Confluent Cloud runs on Kubernetes using a Kafka Operator to offer “Serverless Kafka”: Confluent Cloud provides mission-critical SLAs on all three major cloud providers (Google GCP, Microsoft Azure, Amazon AWS), consumption-based pricing and throughput of several GB / sec using a single Kafka cluster. Seems like running Kafka on Kubernetes using a Kafka Operator is not a bad idea.

Slide Deck: Confluent Operator for Kafka Ecosystem on Kubernetes

My slide deck describes the journey and the features of Confluent Operator to deploy and operate Kafka in a cloud-native way similar to how Kafka and its ecosystem (like Kafka Connect, Schema Registry, KSQL) is deployed in Confluent Cloud.

Confluent Operator - A Kafka Operator for Kubernetes

Confluent Operator enables you to:

  • Provisioning, management and operations of Confluent Platform (including ZooKeeper, Apache Kafka, Kafka Connect, KSQL, Schema Registry, REST Proxy, Control Center)
  • Deployment on any Kubernetes Platform (Vanilla K8s, OpenShift, Rancher, Mesosphere, Cloud Foundry, Amazon EKS, Azure AKS, Google GKE, etc.)
  • Automate provisioning of Kafka pods in minutes
  • Monitor SLAs through Confluent Control Center or Prometheus
  • Scale Kafka elastically, handle fail-over & Automate rolling updates
  • Automate security configuration
  • Built on our first hand knowledge of running Confluent at scale
  • Fully supported for production usage

Here is the Agenda of the slide deck:

  • Cloud Native vs. SaaS / Serverless Kafka
  • The Emergence of Kubernetes
  • Kafka on K8s Deployment Challenges
  • Confluent Operator as Kafka Operator

Also check out the documentation for Confluent Operator.

Please let me know if you have any comments or feedback.

The post Kafka Operator for Kubernetes – Confluent Operator to establish a Cloud-Native Apache Kafka Platform appeared first on Kai Waehner.

]]>
KSQL Deep Dive – The Open Source Streaming SQL Engine for Apache Kafka https://www.kai-waehner.de/blog/2018/05/15/ksql-deep-dive-open-source-streaming-sql-engine-for-apache-kafka/ Tue, 15 May 2018 06:35:00 +0000 http://www.kai-waehner.de/blog/?p=1289 KSQL is the open source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka. This post shows a deep dive (slides + video recording) including its relation to Kafka Connect and Kafka Streams, concepts, architecture and deployment options.

The post KSQL Deep Dive – The Open Source Streaming SQL Engine for Apache Kafka appeared first on Kai Waehner.

]]>
I had a workshop at Kafka Meetup Tel Aviv in May 2018: “KSQL Deep Dive – The Open Source Streaming SQL Engine for Apache Kafka“.

Here are the agenda, slides and video recording.

KSQL – The Open Source Streaming SQL Engine for Apache Kafka

KSQL is the open-source, Apache 2.0 licensed streaming SQL engine on top of Apache Kafka which aims to simplify all this and make stream processing available to everyone. Even though it is simple to use, KSQL is built for mission-critical and scalable production deployments (using Kafka Streams under the hood).
Benefits of using KSQL include No coding required; no additional analytics cluster needed; streams and tables as first-class constructs; access to the rich Kafka ecosystem. This session introduces the concepts and architecture of KSQL. Use cases such as Streaming ETL, Real-Time Stream Monitoring or Anomaly Detection are discussed. A live demo shows how to setup and use KSQL quickly and easily on top of your Kafka ecosystem.

If you want to get started, try out the KSQL quick start guide. It get’s you started in 10min locally on your laptop or alternatively in a Docker environment.

History of Apache Kafka, Confluent, and KSQL

Agenda

  1. Apache Kafka Ecosystem
  2. Kafka Streams as Foundation for KSQL
  3. Motivation for KSQL
  4. KSQL Concepts
  5. Live Demo #1 – Intro to KSQL
  6. KSQL Architecture
  7. Live Demo #2 – Clickstream Analysis
  8. Building a User Defined Function (Example: Machine Learning)
  9. Getting Started

Slides

Video Recording

There was a Youtube live stream. Unfortunately, we had some technical problems. So the audio of the first half is not really good. Sorry for that. I still want to share it. The second half has good sounds quality:

Looking forward to get your feedback. Also please feel free to ask questions in the Confluent Slack community (where you can also get help from the engineers of KSQL) or create Github tickets if you have problems or contributions to this great open source project.

The post KSQL Deep Dive – The Open Source Streaming SQL Engine for Apache Kafka appeared first on Kai Waehner.

]]>
Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL https://www.kai-waehner.de/blog/2018/03/13/rethinking-stream-processing-with-apache-kafka-streams-and-ksql/ Tue, 13 Mar 2018 16:02:49 +0000 http://www.kai-waehner.de/blog/?p=1267 I presented at JavaLand 2018 in Brühl recently. A great developer conference with over 1800 attendees. The location…

The post Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL appeared first on Kai Waehner.

]]>
I presented at JavaLand 2018 in Brühl recently. A great developer conference with over 1800 attendees. The location is also awesome! A theme park: Phantasialand. My talk: “New Era of Stream Processing with Apache Kafka’s Streams API and KSQL“. Just want to share the slide deck…

Kai Speaking at JavaLand 2018 about Kafka Streams and KSQL

Abstract

Stream Processing is a concept used to act on real-time streaming data. This session shows and demos how teams in different industries leverage the innovative Streams API from Apache Kafka to build and deploy mission-critical streaming real time application and microservices.

The session discusses important Streaming concepts like local and distributed state management, exactly once semantics, embedding streaming into any application, deployment to any infrastructure. Afterwards, the session explains key advantages of Kafka’s Streams API like distributed processing and fault-tolerance with fast failover, no-downtime rolling deployments and the ability to reprocess events so you can recalculate output when your code changes.

A demo shows how to combine any custom code with your streams application – by an example using an analytic model built with any machine learning framework like Apache Spark ML or TensorFlow.

The end of the session introduces KSQL – the open source Streaming SQL Engine for Apache Kafka. Write “simple” SQL streaming queries with the scalability, throughput and fail-over of Kafka Streams under the hood.

Slide Deck

Here we go:

 

The post Rethinking Stream Processing with Apache Kafka, Kafka Streams and KSQL appeared first on Kai Waehner.

]]>
Video Recording – Apache Kafka as Event-Driven Open Source Streaming Platform (Voxxed Zurich 2018) https://www.kai-waehner.de/blog/2018/03/13/video-recording-apache-kafka-as-event-driven-open-source-streaming-platform-voxxed-zurich-2018/ Tue, 13 Mar 2018 07:25:43 +0000 http://www.kai-waehner.de/blog/?p=1264 I spoke at Voxxed Zurich 2018 about Apache Kafka as Event-Driven Open Source Streaming Platform. The talk includes…

The post Video Recording – Apache Kafka as Event-Driven Open Source Streaming Platform (Voxxed Zurich 2018) appeared first on Kai Waehner.

]]>
I spoke at Voxxed Zurich 2018 about Apache Kafka as Event-Driven Open Source Streaming Platform. The talk includes an intro to Apache Kafka and its open source ecosystem (Kafka Streams, Connect, KSQL, Schema Registry, etc.). Just want to share the video recording of my talk.

Abstract

This session introduces Apache Kafka, an event-driven open source streaming platform. Apache Kafka goes far beyond scalable, high volume messaging. In addition, you can leverage Kafka Connect for integration and the Kafka Streams API for building lightweight stream processing microservices in autonomous teams. The open source Confluent Platform adds further components such as a KSQL, Schema Registry, REST Proxy, Clients for different programming languages and Connectors for different technologies and databases. Live Demos included.

Video Recording

The post Video Recording – Apache Kafka as Event-Driven Open Source Streaming Platform (Voxxed Zurich 2018) appeared first on Kai Waehner.

]]>
Apache Kafka + Kafka Streams + Mesos = Highly Scalable Microservices https://www.kai-waehner.de/blog/2018/01/12/apache-kafka-kafka-streams-mesos-highly-scalable-microservices/ Fri, 12 Jan 2018 16:34:12 +0000 http://www.kai-waehner.de/blog/?p=1227 This blog post discusses how to build a highly scalable, mission-critical microservice infrastructure with Apache Kafka, Kafka Streams, and Apache Mesos respectively in their vendor-supported platforms from Confluent and Mesosphere.

The post Apache Kafka + Kafka Streams + Mesos = Highly Scalable Microservices appeared first on Kai Waehner.

]]>
My latest article about Apache Kafka, Kafka Streams and Apache Mesos was published on Confluent’s blog:

Apache Mesos, Apache Kafka and Kafka Streams for Highly Scalable Microservices

This blog post discusses how to build a highly scalable, mission-critical microservice infrastructure with Apache Kafka, Kafka Streams, and Apache Mesos respectively in their vendor-supported platforms from Confluent and Mesosphere.

https://www.confluent.io/blog/apache-mesos-apache-kafka-kafka-streams-highly-scalable-microservices/

Have fun reading it and let me know if you have any feedback…

The post Apache Kafka + Kafka Streams + Mesos = Highly Scalable Microservices appeared first on Kai Waehner.

]]>
Apache Kafka + Kafka Streams + Mesos / DCOS = Scalable Microservices https://www.kai-waehner.de/blog/2017/10/27/mesos-kafka-streams-scalable-microservices/ Fri, 27 Oct 2017 08:05:16 +0000 http://www.kai-waehner.de/blog/?p=1208 Apache Kafka + Kafka Streams + Apache Mesos = Highly Scalable Microservices. Mission-critical deployments via DC/OS and Confluent on premise or public cloud.

The post Apache Kafka + Kafka Streams + Mesos / DCOS = Scalable Microservices appeared first on Kai Waehner.

]]>
I had a talk at MesosCon 2017 Europe in Prague about building highly scalable, mission-critical microservices with Apache Kafka, Kafka Streams and Apache Mesos / DCOS. I would like to share the slides and a video recording of the live demo.

Abstract

Microservices establish many benefits like agile, flexible development and deployment of business logic. However, a Microservice architecture also creates many new challenges. This includes increased communication between distributed instances, the need for orchestration, new fail-over requirements, and resiliency design patterns.

This session discusses how to build a highly scalable, performant, mission-critical microservice infrastructure with Apache Kafka, Kafka Streams and Apache Mesos respectively DC/OS. Apache Kafka brokers are used as powerful, scalable, distributed message backbone. Kafka’s Streams API allows to embed stream processing directly into any external microservice or business application. Without the need for a dedicated streaming cluster. Apache Mesos can be used as scalable infrastructure for both, the Apache Kafka brokers and external applications using the Kafka Streams API, to leverage the benefits of a cloud native platforms like service discovery, health checks, or fail-over management.

A live demo shows how to develop real time applications for your core business with Kafka messaging brokers and Kafka Streams API. You see how to deploy / manage / scale them on a DC/OS cluster using different deployment options.

Key takeaways

  • Successful microservice architectures require a highly scalable messaging infrastructure combined with a cloud-native platform which manages distributed microservices
  • Apache Kafka offers a highly scalable, mission critical infrastructure for distributed messaging and integration
  • Kafka’s Streams API allows to embed stream processing into any external application or microservice
  • Mesos respectively DC/OS allow management of both, Kafka brokers and external applications using Kafka Streams API, to leverage many built-in benefits like health checks, service discovery or fail-over control of microservices
  • See a live demo which combines the Apache Kafka streaming platform and DC/OS

Architecture: Kafka Brokers + Kafka Streams on Kubernetes and DC/OS

The following picture shows the architecture. You can either run Kafka Brokers and Kafka Streams microservices natively on DC/OS via Marathon or leverage Kubernetes as Docker container orchestration tool (which is also supported my Mesosphere in the meantime).

 

Architecture - Kafka Streams, Kubernetes and Mesos / DCOS

Slides

Here are the slides from my talk:

Live Demo

The following video shows the live demo. It is built on AWS using Mesosphere’s Cloud Formation script to setup a DC/OS cluster in ten minutes.

Here, I deployed both – Kafka brokers and Kafka Streams microservices – directly to DC/OS without leveraging Kubernetes. I expect to see many people continue to deploy Kafka brokers directly on DC/OS. For microservices many teams might move to the following stack: Microservice –> Docker –> Kubernetes –> DC/OS.

Do you also use Apache Mesos respectively DC/OS to run Kafka? Only the brokers or also Kafka clients (producers, consumers, Streams, Connect, KSQL, etc)? Or do you prefer another tool like Kubernetes (maybe on DC/OS)?

 

The post Apache Kafka + Kafka Streams + Mesos / DCOS = Scalable Microservices appeared first on Kai Waehner.

]]>
Deep Learning in Real Time with TensorFlow, H2O.ai and Kafka Streams (Slides from JavaOne 2017) https://www.kai-waehner.de/blog/2017/10/04/kafka-streams-deep-learning-tensorflow-h2o-ai/ Wed, 04 Oct 2017 16:59:04 +0000 http://www.kai-waehner.de/blog/?p=1201 Apache Kafka + Kafka Streams to Produductionize Neural Networks (Deep Learning). Models built with TensorFlow, DeepLearning4J, H2O. Slides from JavaOne 2017.

The post Deep Learning in Real Time with TensorFlow, H2O.ai and Kafka Streams (Slides from JavaOne 2017) appeared first on Kai Waehner.

]]>
Early October… Like every year in October, it is time for JavaOne and Oracle Open World in San Francisco… I am glad to be back at this huge event again. My talk at JavaOne 2017 was all about deployment of analytic models to scalable production systems leveraging Apache Kafka and Kafka Streams. Let’s first look at the abstract. After that I attach the slides and refer to further material around this topic.

Abstract “Deep Learning in Real Time with TensorFlow, H2O.ai and Kafka Streams”

Intelligent real time applications are a game changer in any industry. Deep Learning is one of the hottest buzzwords in this area. New technologies like GPUs combined with elastic cloud infrastructure enable the sophisticated usage of artificial neural networks to add business value in real world scenarios. Tech giants use it e.g. for image recognition and speech translation. This session discusses some real-world scenarios from different industries to explain when and how traditional companies can leverage deep learning in real time applications.

This session shows how to deploy Deep Learning models into real time applications to do predictions on new events. Apache Kafka will be used to inter analytic models in a highly scalable and performant way.

The first part introduces the use cases and concepts behind Deep Learning. It discusses how to build Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN) and Autoencoders leveraging open source frameworks like TensorFlow, DeepLearning4J or H2O.

The second part shows how to deploy the built analytic models to real time applications leveraging Apache Kafka as streaming platform and Apache Kafka’s Streams API to embed the intelligent business logic into any external application or microservice.

Apache Kafka, Kafka Streams and Deep Learning

Key Takeaways for the Audience: Kafka Streams + Deep Learning

Here are the takeaways of this talk:

  • Focus of this talk is to discuss and show how to productionize analytic models built by data scientists – the key challenge in most companies.
  • Deep Learning allows to build different neural networks to solve complex classification and regression scenarios and can add business value in any industry
  • Deep Learning is used to build analytics models using open source frameworks like TensorFlow, DeepLearning4J or H2O.ai.
  • Apache Kafka’s Streams API allows to embed the intelligent business logic into any application or microservice
  • Apache Kafka’s Streams API leverages these Deep Learning Models (without Redeveloping) to act on new events in real time

Slides and Further Material around Apache Kafka and Machine Learning

Here are the slides of my talk:

Some further material around Apache Kafka, Kafka Streams and Machine Learning:

I will post more examples and use cases around Apache Kafka and Machine Learning in the upcoming months… Stay tuned!

The post Deep Learning in Real Time with TensorFlow, H2O.ai and Kafka Streams (Slides from JavaOne 2017) appeared first on Kai Waehner.

]]>
Why I Move (Back) to Open Source for Messaging, Integration and Stream Processing https://www.kai-waehner.de/blog/2017/05/01/why-apache-kafka-confluent-open-source-messaging-integration-stream-processing/ Mon, 01 May 2017 13:24:08 +0000 http://www.kai-waehner.de/blog/?p=1166 After three great years at TIBCO Software, I move back to open source and join Confluent, the company behind the open source project Apache Kafka to build mission-critical, scalable infrastructures for messaging, integration and stream processsing. In this blog post, I want to share why I see the future for middleware and big data analytics in open source technologies, why I really like Confluent, what I will focus on in the next months, and why I am so excited about this next step in my career.

The post Why I Move (Back) to Open Source for Messaging, Integration and Stream Processing appeared first on Kai Waehner.

]]>
After three great years at TIBCO Software, I move back to open source and join Confluent, a company focusing on the open source project Apache Kafka to build mission-critical, scalable infrastructures for messaging, integration and streaming analytics. Confluent is a Silicon Valley startup, still in the beginning of its journey, with a 700% growing business in 2016, and is exjustpected to grow significantly in 2017 again.

In this blog post, I want to share why I see the future for middleware and big data analytics in open source technologies, why I really like Confluent, what I will focus on in the next months, and why I am so excited about this next step in my career.

Let’s talk shortly about three cutting-edge topics which get important in any industry and small, medium and big enterprises these days:

  • Big Data Analytics: Find insights and patterns in big historical datasets.
  • Real Time Streaming Platforms: Apply insights and patterns to new events in real time (e.g. for fraud detection, cross selling or predictive maintenance).
  • Machine Learning (and its hot subtopic Deep Learning): Leverage algorithms and let machines learn by themselves without programming everything explicitly.

These three topics disrupt every industry these days. Note that Machine Learning is related to the other two topics. Though today, we often see it as independent topic; many data science projects actually use only very small datasets (often less than a Gigabyte of input data). Fortunately, all three topics will be combined more and more to add additional business value.

Some industries are just in the beginning of their journey of disruption and digital transformation (like banks, telcos, insurance companies), others already realized some changes and innovation (e.g. retailers, airlines). In addition to the above topics, some other cutting edge success stories emerge in a few industries, like Internet of Things (IoT) in manufacturing or Blockchain in banking.

With all these business trends on the market, we also see a key technology trend for all these topics: The adoption of open source projects.

Key Technology Trend: Adoption of “Real” Open Source Projects

When I say “open source”, I mean some specific projects. I do not talk about very new, immature projects, but frameworks which are deployed for many years in production successfully, and used by various different developers and organizations. For example, Confluent’s Docker images like the Kafka REST Proxy or Kafka Schema Registry are each downloaded over 100.000 times, already.

A “real”, successful middleware or analytics open source project has the following characteristics:

  • Openness: Available for free and really open under a permissive license, i.e. you can use it in production and scale it out without any need to purchase a license or subscription (of course, there can be commercial, proprietary add-ons – but they need to be on top of the project, and not change the license for the used open source project under the hood)
  • Maturity: Used in business-relevant or even mission critical environments for at least one year, typically much longer
  • Adoption: Various vendors and enterprises support a project, either by contributing (source code, documentation, add-ons, tools, commercial support) or realizing projects
  • Flexibility: Deployment on any infrastructure, including on premise, public cloud, hybrid. Support for various application environments (Windows, Linux, Virtual Machine, Docker, Serverless, etc.), APIs for several programming languages (Java, .Net, Go, etc.)
  • Integration: Independent and loosely coupled, but also highly integrated (via connectors, plugins, etc.) to other relevant open source and commercial components

After defining key characteristics for successful open source projects, let’s take a look some frameworks with huge momentum.

Cutting Edge Open Source Technologies: Apache Hadoop, Apache Spark, Apache Kafka

I defined three key trends above which are relevant for any industry and many (open source and proprietary) software vendors. There is a clear trend towards some open source projects as de facto standards for new projects:

  • Big Data Analytics: Apache Hadoop (and its zoo of sub projects like Hive, Pig, Impala, HBase, etc.) and Apache Spark (which is often separated from Hadoop in the meantime) to store, process and analyze huge historical datasets
  • Real Time Streaming Platforms: Apache Kafka – not just for highly scalable messaging, but also for integration and streaming analytics. Platforms either use Kafka Streams to build stream processing applications / microservices or an “external” framework like Apache Flink, Apex, Storm or Heron.
  • Machine Learning: No clear “winner” here (and that is a good thing in my opinion as it is so multifaceted). Many great frameworks are available – for example R, Python and Scala offer various great implementations of Machine Learning algorithms, and specific frameworks like Caffee, Torch, TensorFlow or MXNet focus on Deep Learning and Artificial Neural Networks.

On top of these frameworks, various vendors build open source or proprietary tooling and offer commercial support. Think about the key Hadoop / Spark vendors: Hortonworks, Cloudera, MapR and others, or KNIME, RapidMiner or H2O.ai as specialized open source tools for machine learning in a visual coding environment.

Of course, there are many other great open source frameworks not mentioned here but also relevant on the market, for example RabbitMQ and ActiveMQ for messaging or Apache Camel for integration. In addition, new “best practice stacks” are emerging, like the SMACK Stack which combines Spark, Mesos, Akka, and Kafka.

I am so excited about Apache Kafka and Confluent, because it is used in any industry and many small and big enterprises, already. Apache Kafka production deployments accelerated in 2016 and it is now used by one-third of the Fortune 500. And this is just the beginning. Apache Kafka is not an all-rounder to solve all problems, but it is awesome in the things it is built for – as the huge and growing number of users, contributors and production deployments prove. It highly integrated with many other frameworks and tools. Therefore, I will not just focus on Apache Kafka and Confluent in my new job, but also many other technologies as discussed later.

Let’s next think about the relation of Apache Kafka and Confluent to proprietary software.

Open Source vs. Proprietary Software – A Never-ending War?

The trend is moving towards open source technologies these days. This is no question, but a fact. I have not seen a single customer in the last years who does not have projects and huge investments around Hadoop, Spark and Kafka. In the meantime, it changed from labs and first small projects to enterprise de facto standards and company-wide deployments and strategies. Replacement of closed legacy software is coming more and more – to reduce costs, but even more important to be more flexible, up-to-date and innovative.

What does this mean for proprietary software?

For some topics, I do not see much traction or demand for proprietary solutions. Two very relevant examples where closed software ruled the last ~20 years: Messaging solutions and analytics platforms. Open frameworks seem to replace them almost everywhere in any industry and enterprise in new projects (for good reasons).

New messaging projects are based on standards like MQTT or frameworks like Apache Kafka. Analytics is done with R and Python in conjunction with frameworks like scikit-learn or TensorFlow. These options leverage flexible, but also very mature implementations. Often, there is no need for a lot of proprietary, inflexible, complex or expensive tooling on top of it. Even IBM, the mega vendor, focuses on offerings around open source in the meantime.

IBM invests millions into Apache Spark for big data analytics and puts over 3500 researchers and developers to work on Spark-related projects instead of just pushing towards its various own proprietary analytics offerings like IBM SPSS. If you search for “IBM Messaging”, you find offerings based on AMQP standard and cloud services based on Apache Kafka instead of pushing towards new proprietary solutions!

I think IBM is a great example of how the software market is changing these days. Confluent (just in the beginning of its journey) or Cloudera (just went public with a successful IPO) are great examples for Silicon Valley startups going the same way.

In my opinion, a good proprietary software leverages open source technologies like Apache Kafka, Apache Hadoop or Apache Spark. Vendors should integrate natively with these technologies. Some opportunities for vendors:

  • Visual coding (for developers) to generate code (e.g. graphical components, which generate framework-compliant source code for a Hadoop or Spark job)
  • Visual tooling (e.g. for business analysts or data scientists), like a Visual Analytics tools which connect to big data stores to find insights and patterns
  • Infrastructure tooling for management and monitoring of open source infrastructure (e.g. tools to monitor and scale Apache Kafka messaging infrastructures)
  • Add-ons which are natively integrated with open source frameworks (e.g. instead of requiring own proprietary runtime and messaging infrastructures, an integration vendor should deploy its integration microservices on open cloud-native platforms like Kubernetes or Cloud Foundry and leverage open messaging infrastructures like Apache Kafka instead of pushing towards proprietary solutions)

Open Source and Proprietary Software Complement Each Other

Therefore, I do not see this as a discussion of “open source software” versus “proprietary software”. Both complement each other very well. You should always ask the following questions before making a decision for open source software, proprietary software or a combination of both:

  • What is the added value of the proprietary solution? Does this increase the complexity and increases the footprint of runtime and tooling?
  • What is the expected total cost of ownership of a project (TCO), i.e. license / subscription + project lifecycle costs?
  • How to realize the project? Who will support you, how do you find experts for delivery (typically consulting companies)? Integration and analytics projects are often huge with big investments, so how to make sure you can deliver (implementation, test, deployment, operations, change management, etc.)? Can we get commercial support for our mission-critical deployments (24/7)?
  • How to use this project with the rest of the enterprise architecture? Do you deploy everything on the same cluster? Do we want to set some (open) de facto standards in different departments and business units?
  • Do we want to use the same technology in new environments without limiting 30-day-trials or annoying sales cycles, maybe even deploy it to production without any license / subscription costs?
  • Do we want to add changes, enhancements or fixes to the platform by ourselves (e.g. if we need a specific feature immediately, not in six months)?

Let’s think about a specific example with these questions in mind:

Example: Do you still need an Enterprise Service Bus (ESB) in a World of Cloud-Native Microservices?

I faced this question a lot in the last 24 months, especially with the trend moving to flexible, agile microservices (not just for building business applications, but also for integration and analytics middleware). See my article “Do Microservices Spell the End of the ESB?”. The short summary: You still need middleware (call it ESB, integration platform, iPaaS, or somehow else), though the requirements are different today. This is true for open source and proprietary ESB products. However, something else has changed in the last 24 months…

In the past, open source and proprietary middleware vendors offered an ESB as integration platform. The offering included a runtime (to guarantee scalable, mission-critical operation of integration services) and a development environment (for visual coding and faster time-to-market). The last two years changed how we think about building new applications. We now (want to) build microservices, which run in a Docker container. The scalable, mission-critical runtime is managed by a cloud-native platform like Kubernetes or Cloud Foundry. Ideally, DevOps automates builds, tests and deployment. These days, ESB vendors adopt these technologies. So far, so good.

Now, you can deploy your middleware microservice to these cloud-native platforms like any other Java, .NET or Go microservice. However, this completely changes the added value of the ESB platform. Now, its benefit is just about visual coding and the key argument is time-to-market (you should always question and doublecheck if it is a valid argument). The runtime is not really part of the ESB anymore. In most scenarios, this completely changes the view on deciding if you still need an ESB. Ask yourself about time-to-market, license / subscription costs and TCO again! Also think about the (typically) increased resource requirements (Memory, CPU, Disk) of tooling-built services (typically some kind of big .ear file), compared to plain source code (e.g. Java’s .jar files).

Is the ESB still enough added value or should you just use a cloud-native platform and a messaging infrastructure? Is it easier to write some lines of source instead of setting up the ESB tooling where you often struggle importing your REST / Swagger or WSDL files and many other configuration environments before you actually can begin to leverage the visual coding environment? In very big, long-running projects, you might finally end up with a win. Though, in an agile, every-changing world with fail-fast ideology, various different technologies and frameworks, and automated CI/CD stacks, you might only add new complexity, but not get the expected value anymore like in the old world where the ESB was also the mission-critical runtime. The same is true for other middleware components like stream processing, analytic platforms, etc.

ESB Alternative: Apache Kafka and Confluent Open Source Platform

As alternative, you could use for example Kafka Connect, which is a very lightweight integration library based on Apache Kafka to build large-scale low-latency data pipelines. The beauty of Kafka Connect is that all the challenges around scalability, fail-over and performance are leveraged from the Kafka infrastructure. You just have to use the Kafka Connect connectors to realize very powerful integrations with a few lines of configuration for sources and sinks. If you use Apache Kafka as messaging infrastructure anyway, you need to find some very compelling reasons to use an ESB on top instead of the much less complex and much less heavyweight Kafka Connect library.

I think this section explained why I think that open source and proprietary software are complementary in many use cases. But it does not make sense to add heavyweight, resource intensive and complex tooling in every scenario. Open source is not free (you still need to spend time and efforts on the project and probably pay money for some kind of commercial support), but often open source without too much additional proprietary tooling is the better choice regarding complexity, time-to-market and TCO. You can find endless success stories about open source projects; not just from tech giants like Google, Facebook or LinkedIn, but also from many small and big traditional enterprises. Of course, any project can fail. Though, in projects with frameworks like Hadoop, Spark or Kafka, it is probably not due to issues with technology…

Confluent + “Proprietary Mega Vendors”

On the other side, I really look forward working together with “mostly proprietary” mega vendors like TIBCO, SAP, SAS and others where it makes sense to solve customer problems and build innovative, powerful, mission-critical solutions. For example, TIBCO StreamBase is a great tool if you want to develop stream processing applications via visual editor instead of writing source code. Actually, it does not even compete with Kafka Streams because the latter one is a library which you embed into any other microservice or application (deployed anywhere, e.g. in a Java application, Docker container, Apache Mesos, “you-choose-it”) while StreamBase (like its competitors Software AG Apama, IBM Streams and all the other open source frameworks like Apache Flink, Storm, Apache Apex, Heron, etc.) focus on building streaming application on its own cluster (typically either deployed on Hadoop’s YARN or on a proprietary cluster). Therefore, you could use StreamBase and its Kafka connector to build streaming applications leveraging Kafka as messaging infrastructure.

Even Confluent itself does offer some proprietary tooling like Confluent Control Center for management and monitoring on top of open source Apache Kafka and open source Confluent Platform, of course. This is the typical business model you see behind successful open source vendors like Red Hat: Embrace open source projects, offer 24/7 support and proprietary add-ons for enterprise-grade deployments. Thus, not everything is or needs to be open source. That’s absolutely fine.

So, after all these discussions about business and technology trends, open source and proprietary software, what will I do in my new job at Confluent?

Confluent Platform in Conjunction with Analytics, Machine Learning, Blockchain, Enterprise Middleware

Of course, I will focus a lot on Apache Kafka and Confluent Platform in my new job where I will work mainly with prospects and customers in EMEA, but also continue as Technology Evangelist with publications and conference talks. Let’s get into a little bit more detail here…

My focus never was being a deep level technology expert or fixing issues in production environments (but I do hands-on coding, of course). Many other technology experts are way better in very technical discussions. As in the past, I will focus on designing mission-critical enterprise architectures, discussing innovative real world use cases with prospects and customers, and evaluating cutting edge technologies in conjunction with the Confluent Platform. Here are some of my ideas for the next months:

  • Apache Kafka + Cloud Native Platforms = Highly Scalable Streaming Microservices (leveraging platforms like Kubernetes, Cloud Foundry, Apache Mesos)
  • Highly Scalable Machine Learning and Deep Learning Microservices with Apache Kafka Streams (using TensorFlow, MXNet, H2O.ai, and other open source frameworks)
  • Online Machine Learning (i.e. updating analytics models in real time for every new event) leveraging Apache Kafka as infrastructure backbone
  • Open Source IoT Platform for Core and Edge Streaming Analytics (leveraging Apache Kafka, Apache Edgent, and other IoT frameworks)
  • Comparison of Open Source Stream Processing Frameworks (differences between Kafka Streams and other modern frameworks like Heron, Apache Flink, Apex, Spark Streaming, Edgent, Nifi, StreamSets, etc.)
  • Apache Kafka / Confluent and other Enterprise Middleware (discuss when to combine proprietary middleware with Apache Kafka, and when to simply “just” use Confluent’s open source platform)
  • Middleware and Streaming Analytics as Key for Success in Blockchain Projects

You can expect publications, conference and meetup talks, and webinars about these and other topics in 2017 like I did it in the last years. Please let me know what you are most interested in and what other topics you would like to hear about!

I am also really looking forward to work together with partners on scalable, mission-critical enterprise architectures and powerful solutions around Apache Kafka and Confluent Platform. This will include combined solutions and use cases with open source but also proprietary software vendors.

Last but not least the most important part, I am excited to work with prospects and customers from traditional enterprises, tech giants and startups to realize innovative use cases and success stories to add business value.

As you can see, I am really excited to start at Confluent in May 2017. I will visit Confluent’s London and Palo Alto offices in the first weeks and also be at Kafka Summit in New York. Thus, an exciting month to get started in this awesome Silicon Valley startup.

Please let me know your feedback. Do you see the same trends? Do you share my opinions or disagree? Looking forward to discuss all these topics with customers, partners and anybody else in upcoming meetings, workshops, publications, webinars, meetups and conference talks.

The post Why I Move (Back) to Open Source for Messaging, Integration and Stream Processing appeared first on Kai Waehner.

]]>
Agile Cloud-to-Cloud Integration with iPaaS, API Management and Blockchain https://www.kai-waehner.de/blog/2017/04/23/agile-cloud-cloud-integration-ipaas-api-management-blockchain/ Sun, 23 Apr 2017 18:41:06 +0000 http://www.kai-waehner.de/blog/?p=1152 Agile Cloud-to-Cloud Integration with iPaaS, API Management and Blockchain. Scenario use case using IBM's open source Hyperledger Fabric on BlueMix, TIBCO Cloud Integration (TCI) and Mashery.

The post Agile Cloud-to-Cloud Integration with iPaaS, API Management and Blockchain appeared first on Kai Waehner.

]]>
Cloud-to-Cloud integration is part of a hybrid integration architecture. It enables to implement quick and agile integration scenarios without the burden of setting up complex VM- or container-based infrastructures. One key use case for cloud-to-cloud integration is innovation using a fail-fast methodology where you realize new ideas quickly. You typically think in days or weeks, not in months. If an idea fails, you throw it away and start another new idea. If the idea works well, you scale it out and bring it into production to a on premise, cloud or hybrid infrastructure. Finally, you make expose the idea and make it easily available to any interested service consumer in your enterprise, partners or public end users.

A great example where you need agile, fail-fast development is blockchain because it is a very hot topic, but frameworks are immature and change very frequently these days. Note that blockchain is not just about Bitcoin and finance industry. This is just the tip of the iceberg. Blockchain, which is the underlying infrastructure of Bitcoin, will change various industries, beginning with banking, manufacturing, supply chain management, energy, and others.

Middleware and Integration as Key for Success in Blockchain Projects

A key for successful blockchain projects is middleware as it allows integration with the rest of an enterprise architecture. Blockchain only adds value if it works together with your other applications and services. See an example of how to combine streaming analytics with TIBCO StreamBase and Ethereum to correlate blockchain events in real time to act proactively, e.g. for fraud detection.

The drawback of blockchain is its immaturity today. APIs change with every minor release, development tools are buggy and miss many required features for serious development, and new blockchain cloud services, frameworks and programming languages come and go every quarter.

This blog post shows how to leverage cloud integration with iPaaS and API Management to realize innovative projects quickly, fail fast, and adopt changing technologies and business requirements easily. We will use TIBCO Cloud Integration (TCI), TIBCO Mashery and Hyperledger Fabric running as IBM Bluemix cloud service.

IBM Hyperledger Fabric as Bluemix Cloud Service

Hyperledger is a vendor independent open source blockchain project which consists of various components. Many enterprises and software vendors committed to it for building different solutions for various problems and use cases. One example is IBM’s Hyperledger Fabric. The benefit of being a very flexible construction kit makes Hyperledger much more complex for getting started than other blockchain platforms, like Ethereum, which are less flexible but therefore easier to set up and use.

This is the reason why I use the BlueMix BaaS (Blockchain as a Service) to get started quickly without spending days on just setting up my own Hyperledger network. You can start a Hyperledger Fabric blockchain infrastructure with four peers and a membership service with just a few clicks. It takes around two minutes. Hyperledger Fabric is evolving fast. Therefore, a great example for quickly changing features, evolving requirements and (potentially) fast failing projects.

My project uses version Hyperledger Fabric 0.6 as free beta cloud service. I leverage its Swagger-based REST API in my middleware microservice to interconnect other cloud services with the blockchain:

 

However, when I began the project, it was already clear that the REST interface is deprecated and will not be included in the upcoming 1.0 release anymore. Thus, we are aware that an easy move to another API is needed in a few weeks or months.

Cloud-to-Cloud Integration, iPaaS and API Management for Agile and Fail-Fast Development

As mentioned in the introduction, middleware is key for success in blockchain projects to interconnect it with the existing enterprise architecture. This example leverages the following middleware components:

  • iPaaS: TIBCO Cloud Integration (TCI) is hosted and managed by TIBCO. It can be used without any setup or installation to quickly build a REST microservice which integrates with Hyperledger Fabric. TCI also allows to configure caching, throttling and security configuration to ensure controlled communication between the blockchain and other applications and services.
  • API Management: TIBCO Mashery is used to expose the TCI REST microservice for other services consumers. This can be internal, partners or public end users, depending on the use case.
  • On Premise / Cloud-Native Integration Infrastructure: TIBCO BusinessWorks 6 (BW6) respectively TIBCO BusinessWorks Container Edition (BWCE) can be used to deploy and productionize successful TCI microservices in your existing infrastructure on premise or on a cloud-native platform like CloudFoundry, Kubernetes or any other Docker-based platform. Of course, you can also continue to use the services within TCI itself to run and scale the production services in the public cloud, hosted and managed by TIBCO.

Scenario: TIBCO Cloud Integration + Mashery + Hyperledger Fabric Blockchain + Salesforce

I implemented the following technical scenario with the goal of showing agile cloud-to-cloud integration with fail-fast methodology from a technical perspective (and not to build a great business case): The scenario implements a new REST microservice with TCI via visual coding and configuration. This REST service connects to two cloud services: Salesforce and Hyperledger Fabric. The following picture shows the architecture:

 

Here are the steps to realize this scenario:

  • Create a REST service which receives customer ID and customer name as parameters.
  • Enhance the input data from the REST call with additional data from Salesforce CRM. In this case, we get a block hash which is stored in Salesforce as reference to the blockchain data. The block hash allows to doublecheck if Salesforce has the most up-to-date information about a customer while the blockchain itself ensures that the customer updates from various systems like CRM, ERP, Mainframe are stored and updated in a correct, secure, distributed chain – which can be accessed by all related systems, not just Salesforce.
  • Use Hyperledger REST API to validate via Salesforce’ block hash that the current information about the customer in Salesforce is up-to-date. If Salesforce has an older block hash, then update Salesforce to up-to-date values (both, the most recent hash block and the current customer data stored on blockchain)
  • Return the up-to-date customer data to the service consumer of the TCI REST service
  • Configure caching, throttling and security requirements so that service consumers cannot do unexpected behavior
  • Leverage API Management to expose the TCI REST service as public API so that external service consumers can subscribe to a payment plan and use it in other applications or microservices.

Implementation: Visual Coding, Web Configuration and Automation via DevOps

This section discusses the relevant steps of the implementation in more detail, including development and deployment of the chain code (Hyperledger’s term for smart contracts in blockchain), implementation of the REST integration service with visual coding in the IDE of TCI, and exposing the service as API via TIBCO Mashery.

Chain Code for Hyperledger Fabric on IBM Bluemix Cloud

Hyperledger Fabric has some detailed “getting started” tutorials. I used a “Hello World” Tutorial and adopted it to our use case so that you can store, update and query customer data on the blockchain network. Hyperledger Fabric uses Go for chain code and will allow other programming languages like Java in future releases. This is both a pro and con. The benefit is that you can leverage your existing expertise in these programming languages. However, you also have to be careful not to use “wrong” features like threads, indefinite loops or other functions which cause unexpected behavior on a blockchain. Personally, I prefer the concept of Ethereum. This platform uses Solidity, a programming language explicitly designed to develop smart contracts (i.e. chain code).

Cloud-to-Cloud Integration via REST Service with TIBCO Cloud Integration

The following steps were necessary to implement the REST microservice:

  • Create REST service interface with API Modeler web UI leveraging Swagger
  • Import Swagger interface into TCI’s IDE
  • Use Salesforce activity to read block hash from Salesforce’ cloud interface
  • Use Rest Client activity to do an HTTP Request to Hyperledger Fabric’s REST interface
  • Configure graphical mappings between the activities (REST Request à Salesforce à Hyperledger Fabric à REST Response)
  • Deploy microservice to TCI’s runtime in the cloud and test it from Swagger UI
  • Configure caching and throttling in TCI’s web UI

The last point is very important in blockchain projects. A blockchain infrastructure does not have millisecond response latency. Furthermore, every blockchain update costs money – Bitcoin, Ether, or whatever currency you use to “pay for mining” and to reach consensus in your blockchain infrastructure. Therefore, you can leverage caching and throttling in blockchain projects much more than in some other projects (only if it fits into your business scenario, of course).

Here is the implemented REST service in TCI’s graphical development environment:

 

 

And the caching configuration in TCI’s web user interface:

Exposing the REST Service to Service Consumers through API Management via TIBCO Mashery

The deployed TCI microservice can either be accessed directly by service consumers or you expose it via API Management, e.g. with TIBCO Mashery. The latter option allows to define rules, payment plans and security configuration in a centralized manner with easy-to-use tooling.

Note that the deployment to TCI and exposing its API to Mashery can also be done in one single step. Both products are loosely coupled, but highly integrated. Also note that this is typically not done manually (like in this technical demonstration), but integrated into a DevOps infrastructure leveraging frameworks like Maven or Jenkins to automate the deployment and testing steps.

Agile Cloud-to-Cloud Integration with Fail-Fast Methodology

Our scenario is now implemented successfully. However, it is already clear that the REST interface is deprecated and not included in the upcoming 1.0 release anymore. In blockchain frameworks, you can expect changes very frequently.

While implementing this example, IBM announced on its big user conference IBM Interconnect in Las Vegas that Hyperledger Fabric 1.0 is now publicly available. Well, it is really available on Bluemix since that day, but if you try to start the service you get the following message: “Due to high demand of our limited beta, we have reached capacity.” A strange error as I thought we are using public cloud infrastructures like Amazon AWS, Microsoft Azure, Google Cloud Platform or IBM Bluemix for exactly this reason to scale out easily… J

Anyway, the consequence is that we still have to work on our 0.6 version and do not know yet when we will be able or have to migrate to version 1.0. The good news is that iPaaS and cloud-to-cloud integration can realize changes quickly. In the case of our TCI REST service, we just need to replace the REST activity calling Hyperledger REST API with a Java activity and leverage Hyperledger’s Java SDK – as soon as it is available. Right now only a Client SDK for Node.js is available – not really an option for “enterprise projects” where you want to leverage the JVM respectively Java platform instead of JavaScript. Side note: This topic of using Java vs. JavaScript in blockchain projects is also well discussed in “Static Type Safety for DApps without JavaScript”.

This blog post focuses just on a small example, and Hyperledger Fabric 1.0 will bring other new features and concept changes with it, of course. Same is true for SaaS cloud offerings such as Salesforce. You cannot control when they change their API and what exactly they change. But you have to adopt it within a relative short timeframe or your service will not work anymore. An iPaaS is the perfect solution for these scenarios as you do not have a lot of complex setup in your private data center or a public cloud platform. You just use it “as a service”, change it, replace logic, or stop it and throw it away to start a new project. The implicit integration into API Management and DevOps support also allow to expose new versions for your service consumers easily and quickly.

Conclusion: Be Innovative by Failing Fast with Cloud Solutions

Blockchain is very early stage. This is true for platforms, tooling, and even the basic concepts and theories (like consensus algorithms or security enforcement). Don’t trust the vendors if they say Blockchain is now 1.0 and ready for prime time. It will still feel more like a 0.5 beta version these days. This is not just true for Hyperledger and its related projects such as IBM’s Hyperledger Fabric, but also for all others, including Ethereum and all the interesting frameworks and startups emerging around it.

Therefore, you need to be flexible and agile in blockchain development today. You need to be able to fail fast. This means set up a project quickly, try out new innovative ideas, and throw them away if they do not work; to start the next one. The same is true for other innovative ideas – not just for blockchain.

Middleware helps integrating new innovative ideas with existing applications and enterprise architectures. It is used to interconnect everything, correlate events in real time, and find insights and patterns in correlated information to create new added value. To support innovative projects, middleware also needs to be flexible, agile and support fail fast methodologies. This post showed how you can leverage iPaaS with TIBCO Cloud Integration and API Management with TIBCO Mashery to build innovative middleware microservices in innovative cloud-to-cloud integration projects.

The post Agile Cloud-to-Cloud Integration with iPaaS, API Management and Blockchain appeared first on Kai Waehner.

]]>