At this point of writing, I am reading the Hadoop the definitive Guide, 3rd Edition from Oreilly. I thought I capture some information down from what I learn.
Here is a summary of the difference between MPI and MapReduce
MPI | MapReduce | |
---|---|---|
Location of Data |
Shared Storage | Data Locality (within the Node) |
Complexity | MPI require the programmer to handle the mechanics of the data flow exposed via low-level C routines and constructs |
Programmer in terms of functions of key and value pairs, and the data flow is implicit. |
Compute Characteristics |
Large-Scale Distributed Computation. A failed process will slow or halt the progress of the computation. MPI has to explicitly manage checkpoint and recovery |
Shared Nothing Architecture. The implementation detects failed map or reduce tasks and reschedules replacements on machines that are healthy |