At this point of writing, I am reading the Hadoop the definitive Guide, 3rd Edition from Oreilly. I thought I capture some information down from what I learn.
Here is a summary of the difference between MPI and MapReduce
|Location of Data
||Shared Storage||Data Locality (within the Node)|
|Complexity||MPI require the programmer to handle
the mechanics of the data flow exposed
via low-level C routines and constructs
|Programmer in terms of functions
of key and value pairs, and the data flow
|Large-Scale Distributed Computation.
A failed process will slow or halt the
progress of the computation. MPI has to
explicitly manage checkpoint and recovery
|Shared Nothing Architecture. The implementation
detects failed map or reduce tasks and
reschedules replacements on machines that