This describes how you can create a Talend Map/Reduce job. This job will be reading the file that was written to HDFS (http://sindhujak.blogspot.com/2015/11/how-to-write-file-to-hdfs-using-talend.html) and counting the number of times a word occurs using Map/Reduce.
- Right Click on Job Designs and
select Create Map/Reduce Job.
- Give a name and click on
finish.
The overall design of the job is shown below:
![]() |
Map/Reduce job design |
- Now we can design the Job. You
just need the following components:
- tHDFSInput: Read
a file from HDFS.
- tNormalize:
Split words in the text file.
- tAggregateRow: Count words using the SUM function.
- tMap: Change
all the words to upper case.
- tHDFSOutput: Write the word and the occurrence count to a
file in HDFS.
- Since we have already created
the Hadoop connection, select property type as repository and choose the connection.
- You can browse HDFS and select
the path of the text file.
- Select the Type as Text File.
- Click on edit schema and create
one column. I have created one called “line”.
![]() |
tHDFSInput basic settings |

Basic
Settings:
- Column to normalize should be
“line”.
- Item
separator should be “ “.


Schema:

Basic
Settings:
No comments:
Post a Comment