Prerequisites
You should have
the following installed, configured and running:
- Ubuntu 12.04.4
- Talend Enterprise Data Integration
5.6 or you can use any other Talend version that
includes Big Data.
- Apache Hadoop 1.0: Ensure that the following demons are running. You can check this by
using the jps command.
![]() |
Using JPS |
Write a file to HDFS
The following describes how to design a job to write a text file from your local machine to HDFS.
- Create a new project and open
it in Talend Studio.
- Create a Hadoop cluster
connection under the Metadata area in the repository.
![]() |
Create Hadoop Cluster |
- Give a name and click on Next.
- Select the Distribution as Apache and the version of Hadoop you have installed.
- The namenode URI and jobtracker URI depend on the configurations you have given when you installed Hadoop. You can check your core-site.xml and mapred-site.xml files found in <HADOOP_HOME>/conf.
![]() |
Hadoop Connection Configurations |
![]() |
core-site.xml |
![]() |
mapred-site.xml |
- Then click on check connections to check the connection and click on finish.
- Now you can set the HDFS
connection. You can do this by right clicking the newly created Hadoop
connection name and selecting “Create HDFS”.
- Give a name and click on Next.
![]() |
Create HDFS |
- Give the username of the superuser that you created when installing Hadoop.
- Click on "Check" to ensure that the connection is successful.
- Now we can design the Job. You
just need three components which are:
tHDFSConnection: Establish connection to HDFS.
tFileList: Iterates
on a set of files in the defined directory.
tHDFSPut: Write
file to HDFS.
- The overall design
of the job is shown below:
- Since we have
already created the Hadoop cluster connection in the repository you can simply
click on the HDFS connection you created and drag it to the design workspace.
It will give a list of the components you can use. You can select
tHDFSConnection from the list.
Basic
Settings:
Basic
Settings:
- Give the folder path that
consists of the text files that you want written to HDFS.
- Make sure that you give the
correct File mask. Note the asterisk in front of
“.txt”, this will allow any file that ends with “.txt”.
Basic
Settings:
- Tick the “use an existing
connection” check box.
- Local Directory should be the
path where the source files are located. You can type Ctrl + Space which will
give you a list. Select “tFileList_1_CURRENT_FILEDIRECTORY” from the list which
will allow you to use the variable passed by the tFileList component.
- Give the path of HDFS where you
want your files to be written.
![]() |
tHDFSPut Basic Settings |
- Add
a file mask and name by using Ctrl + Space and selecting “tFileList_1_CURRENT_FILE”
from the list.
- Now
you are ready to run your job. Once the job has been run successfully. Browse
through your HDFS by using the web UI(<hostame/IP>:50070/) or by using
command line to check if the folder called “TextFiles” has been created and the
file has been written to it.
- Click on the file to view its contents.
![]() |
Text File Contents |
-
i followed the steps you have specified to move local files in to hdfs , but it doesn't seem to work for me.
ReplyDeleteNo match file(constant_filter_valeus.csv) exists!
[ERROR]: etl_3_1.reportfilters_0_1.reportFilters - tHDFSPut_1 - No match file(constant_filter_valeus.csv) exist.
I trying to copy the linux files in to hdfs.
un-checking the regex box worked for me
ReplyDelete