Big Data

Big data is data that exceeds the processing capacity of conventional database systems. The data is too big (volume), moves too fast (velocity), or doesn’t fit the structures of traditional database architectures (variety).  Within this data lie valuable patterns and information, hidden because of the amount of work required to extract them.  To gain value from this data, one must choose an alternative way to process it. Today’s commodity hardware, cloud architectures and open source software bring Big Data processing into the reach of the less well-resourced.


Assuming that the volumes of data are larger than those conventional relational database infrastructures can cope with, processing options break down broadly into a choice of massively parallel processing architectures. Talend's Big Data solution is Apache Hadoop-based.  At its core, Hadoop is a platform for distributing computing problems across a number of servers. First developed and released as open source by Yahoo, it implements the MapReduce approach pioneered by Google in compiling its search indexes. Hadoop’s MapReduce involves distributing a dataset among multiple servers and operating on the data: the “map” stage. The partial results are then recombined: the “reduce” stage. To store data, Hadoop utilizes its own distributed filesystem, HDFS, which makes data available to multiple computing nodes. A typical Hadoop usage pattern involves three stages:


  • loading data into HDFS,
  • MapReduce operations, and
  • retrieving results from HDFS.


However, Big Data is not all about infrastructure. Big Data practitioners have reported that 80% of the effort involved in dealing with data is cleaning it up in the first place.  Further, Big Data rarely exists in a vacuum, but typically requires integration with other traditional data sources.



Big Data with Talend

Talend BDThe Talend platform's unified approach to the disciplines of Data Quality, Data Manipulation and Data Integration place the solution in a unique position:


  1. Talend simplifies big data technologies to lower the technical barrier and make them more accessible.
  2. The Talend development studio increases developer productivity with a graphical environment that allows them to implement big data projects in shorter timescales.
  3. Talend's Unified Platform enables co-existence and migration between big data platforms and traditional relational databases.


Read more about Talend Open Studio for Big Data.


For clients with enterprise grade requirements, the subscription-based product Talend Enterprise Data Integration - Big Data Edition is enterprise-ready.

Big Data with Talend Solutions

Talend SolutionsTalend Solutions is an early adopter of the Talend Platform for Big Data.


With our longstanding exposure to a broad suite of Talend products and extensive experience gained in large scale migration and integration projects across a wide range of market verticals, we are ideally placed to support complex data projects involving data quality and data integration challenges across a mix of traditional and big data sources.