March 18, 2013 by kittycool only

Hadoop and MPI

At this point of writing, I am reading the Hadoop the definitive Guide, 3rd Edition from Oreilly. I thought I capture some information down from what I learn.

Here is a summary of the difference between MPI and MapReduce

	MPI	MapReduce
Location of Data	Shared Storage	Data Locality (within the Node)
Complexity	MPI require the programmer to handle the mechanics of the data flow exposed via low-level C routines and constructs	Programmer in terms of functions of key and value pairs, and the data flow is implicit.
Compute Characteristics	Large-Scale Distributed Computation. A failed process will slow or halt the progress of the computation. MPI has to explicitly manage checkpoint and recovery	Shared Nothing Architecture. The implementation detects failed map or reduce tasks and reschedules replacements on machines that are healthy

Leave a comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Advertisements

Advertisements

Advertisements