Monday, 14 April 2014

A Simple Explanation of Map-reduce

In Map-reduce the data is processed breaking the data into small chunks and then processing them parallel.

In Map reduce, we use the concept called hash-tables. Hash Tables mainly contain the Key-Value pairs.

In Map-Reduce another main feature is the Mapper and Reducers.

Mappers traverse though the file grouping the Keys together.

Reducers then collect the organized Key value pair from the Mappers and perform the computation.

Most of the time the Reducers will be assigned a specific Key value that they will look into in alphabetical order.




The Mapper work on the small blocks of data to produce the outputs called Intermediate Record. In Hadoop these intermediate records are the key value pairs.

After the Mapper has finished, the next phase is Shuffle and Sort.

The Key value pair are shuffled and provided to the allocated Reducer and then the value is sorted usually in the alphabetical order.

The Reducer will only have one Kay value and list of sorted values. The Reducer will then do the final computation on the values. 

We can get the final result in the sorted order by merging the result by adding an extra step or by using only one reducer (which is not very scalable) 

0 comments:

Post a Comment