I have been working with hadoop and mapreduce today. This is the command I used to run my first map reduce job.
My Map-Reduce was programmed in python, so i hadto use the Hadoop Streamer to translate the commands from Python. This is the first part of the command :
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.1.jar -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input myinput -output joboutput
My Map-Reduce was programmed in python, so i hadto use the Hadoop Streamer to translate the commands from Python. This is the first part of the command :
jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.1.jarThe next part is to specify the mapper and reducer, we have to use the -mapper <filename> -reducer <filename> format to specify the python code for the mapper and reducer followed by -file for both the mapper and reducer python files.
-mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.pyThe last part of the code is to mention the input and output folders to the mapreduce jobs. We use the -input <folder> -output <folder> :
-file reducer.py -input myinput -output joboutput
