Multi Node Hadoop Installation Tutorial

Multi Node Hadoop Cluster Installation

Install single node hadoop on Ubuntu 14.04. Tutorial Link : http://hadooptutorial.weebly.com/hadoop-installation.html.
Overview of Hadoop Cluster :
One (1) Name Node : 192.168.1.1 (hadoopmaster)
Three (3) Data Nodes : 192.168.1.2 (hadoopslave1), 192.168.1.3 (hadoopslave2) 192.168.4 (hadoop slave3)
After installation of SINGLE NODE HADOOP CLUSTER. You are going to CLONE that ubuntu image and named it hadoopmaster.
Open terminal and run "ifconfig" command to see the IPv4 Address.
If it is IPv6 then you have to disable the IPv6 address. Here is a link of tutorial of how to disable IPv6 address - http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#disabling-ipv6.
Now change the host file -->> "$ sudo gedit /etc/hosts".
Add the following lines :
hadoopmaster 192.168.1.1
hadoopslave1 192.168.1.2
hadoopslave2 192.168.1.3
hadoopslave3 192.168.1.4
Change the hostname --->> "$ sudo gedit /etc/hostname'.
Add : hadoopmaster.
Go to hadoop director and do changes in its files : "$ cd /usr/local/hadoop/etc/hadoop".
Edit core site xml file ------>> "$ sudo gedit core-site.xml".
replace localhost as hadoopmaster.
Edit hdfs site xml file ------>> $ sudo gedit hdfs-site.xml.
replace value 1 as 3 (represents no of datanode).
Edit yarn site xml file ------>> $ sudo gedit yarn-site.xml.
Add these properties files inside Configuration tag :

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopmaster:8025</value>
<property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopmaster:8030</value>
<property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopmaster:8050</value>
</property>
Edit couple of things in yarn site xml file ----------------->> $ sudo gedit yarn-site.xml.

replace mapreduce.framework.name as mapred.job.tracker

replace yarn as hadoopmaster:54311
SHUT DOWN the hadoopmaster Ubuntu Image.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
master node setup complete
----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Clone hadoopmaster Node as hadoopslave1, hadoopslave2, hadoopslave3.
Change hadoop master host : $ sudo gedit /usr/local/hadoop/etc/hadoop/master.
replace localhost to hadoopmaster.
Change hadoop slaves : $ sudo gedit /usr/local/hadoop/etc/hadoop/slave.
replace localhost to hadoopslave1 \n hadoopslave2 \n hadoopslave3.
Change hdfs site xml file -------->> $ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml.
remove dfs.datanode.data.dir property section.

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Initial Network setup -

In a virtual machine IDE.
select hadoopmaster ubuntu image and go to its settings.
Go to network
choose attached to option as "internal network".
Give name : "hadoop multinode network".
Go to its advanced settings.
"Allow all" --- promiscuous mode.
Do this for all the slave machines as well.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

Open all 3 slave nodes and run ------->> $ sudo gedit /etc/hostname.
replace hadoopmaster to hadoopslave1, hadoopslav2, hadoopslave3 respectively to all the three slave virtual machines.
Reboot all the slave nodes/machines.
On hadoopmaster node run below command to remove all the hadoop data:
"Remove hadoop data ------>> $ sudo rm -rf /usr/local/hadoop/hadoop_data."
On hadoopmaster node === >> $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
Run this command ------->> $ sudo chown -R username:username /usr/local/hadoop

On all hadoopslave nodes ===>> Run following commands ===>>
$ sudo rm -rf /usr/local/hadoop/hadoop_data
$ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode.
$ sudo chown -R username:username /usr/local/hadoop.
Change hdfs site xml file for all slave nodes ----->>$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml.
remove dfs.namenode.data.dir property section.

On hadoopmaster node,
Run the command ---->>$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub username@hadoopmaster.
If you get error then solution of your problem is :
OpenSSH is not installed. For installation : sudo apt-get install openssh-client.
OR you will get this error "permission denied for root@localhost for ssh connection" .
Solution of 2nd problem is : http://askubuntu.com/questions/497895/permission-denied-for-rootlocalhost-for-ssh-connection.
Next problem might be your Internal network is not setup.

--------------------------Internal Network Setup between all the 4 virtual machine--------------------------------------------------

Click on top right WIFI or Internet icon.
Go to edit connection.
click on add/edit connection
Give connection name : "master connection"
Go to IPv4 settings.
Change method from automatic to manual.
Enter IP Address like for master node : 192.168.1.1
Enter net mask address : 255.255.255.0
save it and do the above steps for all the nodes.

---------------------------------------------------------------------------------------------------------------------------------------------------------------------

On hadoopmaster machine -
Run following commands -

$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub chaalpritam@hadoopmaster
$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub chaalpritam@hadoopslave1
$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub chaalpritam@hadoopslave2
$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub chaalpritam@hadoopslave3

Now we can access the machines using SSH -

$ sudo ssh hadoopmaster
$ exit
$ sudo ssh hadoopslave1
$ exit
$ sudo ssh hadoopslave2
$ exit
$ sudo ssh hadoopslave3
$ exit

If you able to do this you are accessing all the nodes using SSH.

Next

Format namenode and start hadoop -
$ hadoop namenode -format

$ start-all.sh

$ jps (check in all 3 datanodes)

http://hadoopmaster:8088/
http://hadoopmaster:50070/
http://hadoopmaster:50090/
http://hadoopmaster:50075/

Reference tutorial : http://chaalpritam.blogspot.com/2015/01/hadoop-260-multi-node-cluster-setup-on.html.
Reference Video : https://www.youtube.com/watch?v=MzdyM3N5SlE#t=204

Thank you :)
Anshul Shrivastava