Categories: EAI

Slides online: “Big Data beyond Apache Hadoop – How to Integrate ALL your Data” – JavaOne 2013

Slides from my session “Big Data beyond Apache Hadoop – How to Integrate ALL your Data” at JavaOne 2013 in San Francisco are online.

Abstract

Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data.

Apache Hadoop is the open source defacto standard for implementing big data solutions on the Java platform. Hadoop consists of its kernel, MapReduce, and the Hadoop Distributed Filesystem (HDFS). A challenging task is to send all data to Hadoop for processing and storage (and then get it back to your application later), because in practice data comes from many different applications (SAP, Salesforce, Siebel, etc.) and databases (File, SQL, NoSQL), uses different technologies and concepts for communication (e.g. HTTP, FTP, RMI, JMS), and consists of different data formats using CSV, XML, binary data, or other alternatives.

This session shows different open source frameworks and products (especially Apache Camel and Talend Open Studio for Big Data) to solve this challenging task. Learn how to use every thinkable data with Hadoop – without plenty of complex or redundant boilerplate code.

Slides

Click on the button to load the content from www.slideshare.net.

Load content

Kai Waehner

bridging the gap between technical innovation and business value for real-time data streaming, processing and analytics

Next How to choose the right Open Source Integration Framework - Apache Camel (JBoss, Talend), Spring Integration (Pivotal) or Mule ESB? - JavaOne 2013 »

Previous « Slides online: "NoSQL takes over! Alternatives for Integration of NoSQL databases" - NoSQL Roadshow Zurich

Published by

Kai Waehner

Tags: Apache CamelappleBig DataClouderaEAIEnterprise Application IntegrationHadoopHortonworksMap ReduceMapRPivotalSpring IntegrationtalendTalend Open Studio

12 years ago

How Penske Logistics Transforms Fleet Intelligence with Data Streaming and AI

Real-time visibility has become essential in logistics. As supply chains grow more complex, providers must…

1 day ago

SAP Sapphire

Data Streaming Meets the SAP Ecosystem and Databricks – Insights from SAP Sapphire Madrid

SAP Sapphire 2025 in Madrid brought together global SAP users, partners, and technology leaders to…

6 days ago

Agentic AI

Agentic AI with the Agent2Agent Protocol (A2A) and MCP using Apache Kafka as Event Broker

Agentic AI is emerging as a powerful pattern for building autonomous, intelligent, and collaborative systems.…

1 week ago

Fantasy Sports

Powering Fantasy Sports at Scale: How Dream11 Uses Apache Kafka for Real-Time Gaming

Fantasy sports has evolved into a data-driven, real-time digital industry with high stakes and massive…

2 weeks ago

Lakehouse

Databricks and Confluent Leading Data and AI Architectures – What About Snowflake, BigQuery, and Friends?

Confluent, Databricks, and Snowflake are trusted by thousands of enterprises to power critical workloads—each with…

3 weeks ago

Databricks and Confluent in the World of Enterprise Software (with SAP as Example)

Enterprise data lives in complex ecosystems—SAP, Oracle, Salesforce, ServiceNow, IBM Mainframes, and more. This article…