Different Installation Modes On Hadoop

Introduction

This article will help you to learn the different installation modes of Hadoop.

Hadoop commonly runs on Unix/Linux based Operating Systems. However, it can also work with Windows based machines but it is not recommended. So, here we will be moving with Linux/Unix based machines.

Hadoop can be installed in three different modes.

  1. Standalone Mode.
  2. Pseudo-Distributed Mode.
  3. Fully Distributed Mode.

Standalone Mode

This is the simplest form of mode that runs on a single node or system. It has a single JVM process to simulate the distributed system. It uses the local file system for storage. HDFS doesn't run in the Standalone mode-based machine and all the file manipulations will be in your local machine itself with the added HDFS – YARN. Both do not get supported over here. Standalone mode is commonly used for testing oriented jobs with MapReduce programs before running them on a cluster.

Pseudo-Distributed Mode

If you want to simulate an actual cluster, then you should go for Pseudo-Distributed Mode of your Hadoop installation. This is in between a Standalone mode and Fully distributed mode production level cluster. It also runs on a single node but there are two JVM processes to simulate two nodes, one as a master and other one as a slave. HDFS is used for storage over here and YARN is also used for managing resources in Hadoop Installation. This is commonly used for full fledged test environments, and it is recommended for the same.

Fully Distributed Mode

This is the production environment that runs on a cluster of machines – a real distributed machine setup which serves for the user traffic. There could be different machines in real time, like a Linux servers in a data center or even there at the cloud. Manually configuring Hadoop for the cluster machines is  highly complicated, so it's better if we can move with the enterprise edition which is pre-configured on that, like Hortonworks.

Installing Hadoop in Standalone Mode

Requirements

  1. We should have Java installed on the machine – it should be higher than Java 7. If you don't have Java installed on your machine, you can download it from the below link.

    URL - http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

  2. Download Hadoop

    You can always download Hadoop jar from www.apache.org. Just find the mirror site over here.


You will be able to find a bunch of files, as shown below. Select the correct one that you need for your machine. The latest stable version of Hadoop is Hadoop 2.0 and it should work on all. So, select this file and download hadoop- 2.7.3.tar.gz file.



Follow my next article to work with installing Hadoop in Standalone mode on Linux based machines.


Similar Articles