Sunday, 27 April 2014

My First Hadoop Mapreduce job

I have been working with hadoop and mapreduce today. This is the command I used to run my first map reduce job.

hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.1.jar -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py -input myinput -output joboutput

My Map-Reduce was programmed in python, so i hadto use the Hadoop Streamer to translate the commands from Python. This is the first part of the command :
 jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.1.jar 
The next part is to specify the mapper and reducer, we have to use the -mapper  <filename> -reducer <filename> format to specify the python code for the mapper and reducer followed by -file for both the mapper and reducer python files.
 -mapper mapper.py -reducer reducer.py -file mapper.py -file reducer.py
The last part of the code is to mention the input and output folders to the mapreduce jobs. We use the -input <folder> -output <folder> :
-file reducer.py -input myinput -output joboutput

Monday, 14 April 2014

User Stories made simple


User Stories are part of agile approach to shift the focus from writing requirements to talking about them.  
User Stories mainly include one or two written sentences, followed by a series of conversations  about the desired functionality.

User Stories are similar to use cases but they are not the same, they replace the large requirement documents used in software development.

User Stories are not even narratives, they are much simpler than that. User Stories are the most of the time misunderstood for requirements and use cases in XP teams.


User Stories are one or two line description that convey what the team has to work on.
The main objective of user stories is to trigger conversations and discussions about the requirements within the team.


Although the User Stories are very short, they still have two important characteristics:


  1. Customer Value - The stories should be written in customer’s terminology. It should focus on the end result the the customer values, not the implementation details.
  2. Completion Criteria- Each story should clearly define a completion criteria. The customer can describe an objective test that would allow the programmers to tell when the story is successfully implemented.

A Simple Explanation of Map-reduce

In Map-reduce the data is processed breaking the data into small chunks and then processing them parallel.

In Map reduce, we use the concept called hash-tables. Hash Tables mainly contain the Key-Value pairs.

In Map-Reduce another main feature is the Mapper and Reducers.

Mappers traverse though the file grouping the Keys together.

Reducers then collect the organized Key value pair from the Mappers and perform the computation.

Most of the time the Reducers will be assigned a specific Key value that they will look into in alphabetical order.




The Mapper work on the small blocks of data to produce the outputs called Intermediate Record. In Hadoop these intermediate records are the key value pairs.

After the Mapper has finished, the next phase is Shuffle and Sort.

The Key value pair are shuffled and provided to the allocated Reducer and then the value is sorted usually in the alphabetical order.

The Reducer will only have one Kay value and list of sorted values. The Reducer will then do the final computation on the values. 

We can get the final result in the sorted order by merging the result by adding an extra step or by using only one reducer (which is not very scalable) 

Sunday, 6 April 2014

Learning GTD



I recently watched this video about 'Getting Things Done' by David Allen, I found it really inspiring and though I should share my note with the world.

These are my notes:

 It's very easy to get off the GTD wagon, it's just as simple to get back on that again.

The much smarter way would be to think about each time you again start doing would certainly be a duplicate event.

Most of the time, when we are stressed out and we decide to take some time out and write things down we start to feel more in-control of the situation. This is the result of what is called as "distributive cognizance".  

Why does GTD work ?

  • Potential Meaning overload
  • Alot of your competitive edge is how you deal with surprises, personal or professionally.
  • Mind like water-Perfectly appropriate response to and engagement with, whatever is present.
  • Your ability to perform any work efficiently and effectively is directly connected to you ability to concentrate. Your ability to concentrate is directly related to your ability to eliminate distractions.
  • Most of the distractions comes from 'mismanaged commitments'.
  • The mind is limited in it's ability to manage commitments, because it is handicapped in it's ability to remember and remind.
  • Until your brain trusts that their is a better system, it won't let go of that job and won't stop worrying.
  • There is an inverse proportion between the amount of something that is on your mind and the amount it is getting things done.
  • The reason why the brain is contently getting distracted by different things that has to be done is that the brain is trying to be the reminder system, at which it is not very good at.
  • If you don't give your attention to what has your attention and it will take more attention from you than it deserves.

How Does GTD help in Self-management ?
  • In order to get things of your mind, you must know that:
  1. You have captured, clarified and organized all your commitments at all horizons.
  2. You will engage with as often as you need to.
  • Your ability to refocus on the right things at the right time in the right horizon is the master key of knowledge work athletics.


  • The two aspects of self-management:


  1. Control: Conscious focused engagement, aware of all options at any one time and place.
  2. Perspective: aligned and clear about decisions, direction and priorities.


  • The Matrix of Self-management:     


GTD Models

  • Mastering Workflow:  The five keys to gaining control.
  1. Collect - Anything task that have life longer than the thought. The stuff that you need to collect meaning about.
  2. Process- Clarify the task, what kinds of action has to be done for the task.
  3. Organize-   Task should be organized to right time.
  4. Review- Keep coming back to the system to make sure that you are doing the right task at the right time.
  5. Do
  • Horizon of focus (Perspective)

50,000 ft - Purpose, Principles
40,000 ft - Vision
30,000 ft - Goals
20,000 ft - Areas of interest/ responsibility
10,000 ft - Projects
Runway - Next actions

Bottom up approach in finding the different perspectives.