What is Hadoop?

This is a nice Video to explain what is Hadoop in Youtube

Hadoop and MPI

At this point of writing, I am reading the Hadoop the definitive Guide, 3rd Edition from Oreilly. I thought I capture some information down from what I learn.

Here is a summary of the difference between MPI and MapReduce

MPI MapReduce
Location of Data
Shared Storage Data Locality (within the Node)
Complexity MPI require the programmer to handle
the mechanics of the data flow exposed
via low-level C routines and constructs
Programmer in terms of functions
of key and value pairs, and the data flow
is implicit.
Compute
Characteristics
Large-Scale Distributed Computation.
A failed process will slow or halt the
progress of the computation. MPI has to
explicitly manage checkpoint and recovery
Shared Nothing Architecture. The implementation
detects  failed map or reduce tasks and
reschedules  replacements on machines that
are healthy

Hadoop and Traditional RDMS

At this point of writing, I am reading the Hadoop the definitive Guide, 3rd Edition from Oreilly. I thought I capture some information down from what I learn.

Here is a summary of the difference between Traditional RDMS and MapReduce

Traditional RDBMS MapReduce
Data size Gigabytes Petabytes
Access Interactive and batch Batch
Updates Read and write many times Write once, read many times
Structure Static schema Dynamic schema
Integrity High Low
Scaling Nonlinear Linear