N/APosted on - 06/07/2015
I am working on a Hadoop application as a freelancer and want to use a Job Tracker as part of client requirement in it. I am bit confused like how it will determine the location of the data. How does it schedule a task? I also want to know how does it maintains instances and how does it handle multiple instances. Since I have never used it before, so want to understand it right from basics.
How to handle multiple instances on Hadoop
Hadoop is a free, Java-based programming framework capable of supporting the processing of huge data sets in a distributed computing environment and is part of the Apache project supported by the Apache Software Foundation. This programming framework allows the possibility to run applications on systems involving thousands of nodes with thousands of terabytes.
Its disseminated file system eases fast data transfer rates between nodes and permits the system to continue operating without interruption in case there is a failure in a node. Even if a considerable amount of nodes become inoperative, this method lessens the danger of disastrous system failure.
The Hadoop programming framework was inspired by MapReduce, a software framework developed by Google wherein an application is broken down into several small parts. Any of these small parts which often called blocks or fragments can be launched in any node in the cluster. Doug Cutting, the one who created Hadoop, named it after his child’s stuffed toy elephant.
The existing Apache Hadoop ecosystem is composed of the Hadoop kernel, MapReduce, HDFS or Hadoop distributed file system, and a quantity of related projects like the HBase, Zookeeper, and Apache Hive.