Install Hadoop on Windows: Part Three

Before reading this article, I highly recommend reading my previous article.

Step 1: After successful installation of Ubuntu log into Ubuntu with your credentials.

login

Step 2: After login we have to installan Ubuntu update if there is any. Write the following code.

Code:
sudo apt-get install update

update

If you find “Unable to locate package update”, it means your operating system does not require any updates.

Step 3: Install jdk using the following code
sudo apt-get install default-jdk

Step 4: Le't's create a dedicated Hadoop group and Hadoop user called hduser
sudo addgroup hadoop

hadoop

It returns an error thataHhadoop group already exists. We had created that group when we installed Ubuntu on VM. Now we add user.

sudo adduser --ingroup hadoop hduser

password

After entering password leave the default and say y.

default

Now let’s add hduser as administrator or sudo user.
sudo adduser hduser sudo

hadoop

Now let’s install openssh server. Wikipedia says “OpenSSH, also known as OpenBSD Secure Shell,[a] is a suite of security-related network-level utilities based on the SSH protocol, which help to secure network communications via the encryption of network traffic over multiple authentication methods and by providing secure tunneling capabilities.”

capabilities

Now let’s login with hduser and generate a key for hduser and add the key to the authorized keys.
su hduser

keys

ssh-keygen –t rsa –P “”

keugen

cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

authorized

Now let's try to login on localhost
ssh localhost

localhost

Don’t worry about all these messages. Now say logout for closing connection of localhost.

Now let’s install Hadoop. Download hadoop.

install

install

After download is completed you will find a message like “hadoop-2.7.1.tar.gz saved”. For me its like “hadoop-2.7.1.tar.gz.2 saved” because of my net connection. This download completed on third attempt.

attempt

tar xvzf hadoop-2.7.1.tar.gz

tar

Don’t confuse with tar xvzf hadoop-2.7.1.tar.gz.2. My zip downloaded file name is
“hadoop-2.7.1.tar.gz.2” that’s why I have written hadoop-2.7.1.tar.gz.2.

Now let’s move Hadoop 2.7.1 to a directory /usr/local/hadoop.
sudo mv hadoop-2.7.1 /usr/local/hadoop

hadoop

Let's give the directory to the hduser as the owner. After that edit the bashrc file and append to the end of the file the path of hadoop.

sudo chown –R hduser /user/local
sudo nano ~/.bashrc

bashrc

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS=”-Djava.library.path”= $HADOOP_HOME/lib

export

After press ctr+”X”

press

Say yes.

source ~/.bashrc

source

Now let's give the Java path to run Hadoop.
sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh

env.sh

After some scrolling we find export JAVA_HOME=$( JAVA_HOME)

export

Replace $(JAVA_HOME) to usr/lib/jvm/java-7-openjdk-amd64(your java location) and save.
Now let's configure the following xml file. Write following code in the configuration tag and save.

Core-site.xml

 sudo nano /usr/local/hadoop/etc/hadoop/hadoop/core-site.xml

xml

  1. <property>  
  2.     <name>fs.default.name</name>  
  3.     <value>hdfc://localhost:9000</value>  
  4. </property>  
xml

hdfs-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/hadoop/hdfs-site.xml
  1. <property>  
  2.     <name>dfs.replication</name>  
  3.     <value>1</value>  
  4. </property>  
  5. <property>  
  6.     <name>dfs.namenode.name.dir</name>  
  7.     <value>file:/usr/local/hadoop_tem/hdfs/namenode</value>  
  8. </property>  
  9. <property>  
  10.     <name>dfs.datanode.data.dir</name>  
  11.     <value> file:/usr/local/hadoop_tem/hdfs/datanode </value>  
  12. </property>  
xml

yarn-site.xml

 sudo nano /usr/local/hadoop/etc/hadoop/hadoop/yarn-site.xml
  1. <property>  
  2.     <name>yarn.nodemanager.aux-services</name>  
  3.     <value>mapreduce_shuffle</value>  
  4. </property>  
  5. <property>  
  6.     <name> yarn.nodemanager.aux-services.mapreduce.shuffle.class </name>  
  7.     <value>org.apache.hadoop.mapred.ShuffleHandler</value>  
  8. </property>  
xml

Let copy the mapred.xml template and then edit the file.

cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml

After copying the file let's make the following changes.

 mapred-site.xml

sudo nano /usr/local/hadoop/etc/hadoop/hadoop/mapred-site.xml
  1. <property>  
  2.     <name>mapreduce.framework.name</name>  
  3.     <value>yarn</value>  
  4. </property>  

xml

Now create a folder where Hadoop will process the hdfs jobs.

sudo mkdir –p /usr/local/hadoop_tmp
sudo mkdir –p /usr/local/hadoop_tmp/hdfs/namenode
sudo mkdir –p /usr/local/hadoop_tmp/hdfs/datanode

jobs

jobs

Now assign hduser the ownership of the folder. Run all the following commands.

sudo chown –R hduser /usr/local/hadoop_tmp
hdfs namenode –format
start –dfs.sh
start-yarn.sh
jsp

Now single node Hadoop cluster is installed. Now you can write the program.

Hope this article is helpful.