Hadoop and MPI

At this point of writing, I am reading the Hadoop the definitive Guide, 3rd Edition from Oreilly. I thought I capture some information down from what I learn.

Here is a summary of the difference between MPI and MapReduce

MPI MapReduce
Location of Data
Shared Storage Data Locality (within the Node)
Complexity MPI require the programmer to handle
the mechanics of the data flow exposed
via low-level C routines and constructs
Programmer in terms of functions
of key and value pairs, and the data flow
is implicit.
Large-Scale Distributed Computation.
A failed process will slow or halt the
progress of the computation. MPI has to
explicitly manage checkpoint and recovery
Shared Nothing Architecture. The implementation
detects  failed map or reduce tasks and
reschedules  replacements on machines that
are healthy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.