Talend Data Integration : How to design a Talend Map/Reduce Job

This describes how you can create a Talend Map/Reduce job. This job will be reading the file that was written to HDFS (http://sindhujak.blogspot.com/2015/11/how-to-write-file-to-hdfs-using-talend.html) and counting the number of times a word occurs using Map/Reduce.

- Right Click on Job Designs and select Create Map/Reduce Job.

- Give a name and click on finish.

Create Map/Reduce job

The overall design of the job is shown below:

Map/Reduce job design

- Now we can design the Job. You just need the following components:

- tHDFSInput: Read a file from HDFS.

- tNormalize: Split words in the text file.

- tAggregateRow: Count words using the SUM function.

- tMap: Change all the words to upper case.

- tHDFSOutput: Write the word and the occurrence count to a file in HDFS.

Basic Settings:

- Since we have already created the Hadoop connection, select property type as repository and choose the connection.

- You can browse HDFS and select the path of the text file.

- Select the Type as Text File.

- Click on edit schema and create one column. I have created one called “line”.

tHDFSInput basic settings

Basic Settings:

- Column to normalize should be “line”.

- Item separator should be “ “.

tNormalize basic settings

Basic Settings:

tAggregateRow Basic Settings

Schema:

tAggregateRow Schema

Schema:

Basic Settings:

tHDFSOutput basic settings

- In the Run View -> Hadoop Configurations set the Named node, Job tracker and User name

- Now you can run the job. If your job executed successfully, you should see the following on the console.

- Check the output folder in HDFS that you have specified (in this case the folder is called “out”). You can see that the output has been written to a file. The words have been changed to upper case and the occurrence of the word has been calculated.

- The difference in creating a Talend Map/Reduce job compared to a Talend standard job is that in a Talend Map/Reduce job the java code will consist of a Map function and a Reduce function. You can check this in the code section in your design workspace.

Talend Data Integration

Tuesday, November 24, 2015

How to design a Talend Map/Reduce Job

This describes how you can create a Talend Map/Reduce job. This job will be reading the file that was written to HDFS (http://sindhujak.blogspot.com/2015/11/how-to-write-file-to-hdfs-using-talend.html) and counting the number of times a word occurs using Map/Reduce.

No comments:

Post a Comment