This is a nice Video to explain what is Hadoop in Youtube
Hadoop
Hadoop and MPI
At this point of writing, I am reading the Hadoop the definitive Guide, 3rd Edition from Oreilly. I thought I capture some information down from what I learn.
Here is a summary of the difference between MPI and MapReduce
MPI | MapReduce | |
---|---|---|
Location of Data |
Shared Storage | Data Locality (within the Node) |
Complexity | MPI require the programmer to handle the mechanics of the data flow exposed via low-level C routines and constructs |
Programmer in terms of functions of key and value pairs, and the data flow is implicit. |
Compute Characteristics |
Large-Scale Distributed Computation. A failed process will slow or halt the progress of the computation. MPI has to explicitly manage checkpoint and recovery |
Shared Nothing Architecture. The implementation detects failed map or reduce tasks and reschedules replacements on machines that are healthy |
Hadoop and Traditional RDMS
At this point of writing, I am reading the Hadoop the definitive Guide, 3rd Edition from Oreilly. I thought I capture some information down from what I learn.
Here is a summary of the difference between Traditional RDMS and MapReduce
Traditional RDBMS | MapReduce | |
---|---|---|
Data size | Gigabytes | Petabytes |
Access | Interactive and batch | Batch |
Updates | Read and write many times | Write once, read many times |
Structure | Static schema | Dynamic schema |
Integrity | High | Low |
Scaling | Nonlinear | Linear |