Multi Node Hadoop Cluster Installation
- Install single node hadoop on Ubuntu 14.04. Tutorial Link : http://hadooptutorial.weebly.com/hadoop-installation.html.
- Overview of Hadoop Cluster :
- One (1) Name Node : 192.168.1.1 (hadoopmaster)
- Three (3) Data Nodes : 192.168.1.2 (hadoopslave1), 192.168.1.3 (hadoopslave2) 192.168.4 (hadoop slave3)
- After installation of SINGLE NODE HADOOP CLUSTER. You are going to CLONE that ubuntu image and named it hadoopmaster.
- Open terminal and run "ifconfig" command to see the IPv4 Address.
- If it is IPv6 then you have to disable the IPv6 address. Here is a link of tutorial of how to disable IPv6 address - http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/#disabling-ipv6.
- Now change the host file -->> "$ sudo gedit /etc/hosts".
- Add the following lines :
- hadoopmaster 192.168.1.1
- hadoopslave1 192.168.1.2
- hadoopslave2 192.168.1.3
- hadoopslave3 192.168.1.4
- Change the hostname --->> "$ sudo gedit /etc/hostname'.
- Add : hadoopmaster.
- Go to hadoop director and do changes in its files : "$ cd /usr/local/hadoop/etc/hadoop".
- Edit core site xml file ------>> "$ sudo gedit core-site.xml".
- replace localhost as hadoopmaster.
- Edit hdfs site xml file ------>> $ sudo gedit hdfs-site.xml.
- replace value 1 as 3 (represents no of datanode).
- Edit yarn site xml file ------>> $ sudo gedit yarn-site.xml.
- Add these properties files inside Configuration tag :
- <property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoopmaster:8025</value>
<property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoopmaster:8030</value>
<property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoopmaster:8050</value>
</property> - Edit couple of things in yarn site xml file ----------------->> $ sudo gedit yarn-site.xml.
- replace mapreduce.framework.name as mapred.job.tracker
replace yarn as hadoopmaster:54311 - SHUT DOWN the hadoopmaster Ubuntu Image.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
master node setup complete
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Clone hadoopmaster Node as hadoopslave1, hadoopslave2, hadoopslave3.
- Change hadoop master host : $ sudo gedit /usr/local/hadoop/etc/hadoop/master.
- replace localhost to hadoopmaster.
- Change hadoop slaves : $ sudo gedit /usr/local/hadoop/etc/hadoop/slave.
- replace localhost to hadoopslave1 \n hadoopslave2 \n hadoopslave3.
- Change hdfs site xml file -------->> $ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml.
- remove dfs.datanode.data.dir property section.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Initial Network setup -
- In a virtual machine IDE.
- select hadoopmaster ubuntu image and go to its settings.
- Go to network
- choose attached to option as "internal network".
- Give name : "hadoop multinode network".
- Go to its advanced settings.
- "Allow all" --- promiscuous mode.
- Do this for all the slave machines as well.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Open all 3 slave nodes and run ------->> $ sudo gedit /etc/hostname.
- replace hadoopmaster to hadoopslave1, hadoopslav2, hadoopslave3 respectively to all the three slave virtual machines.
- Reboot all the slave nodes/machines.
- On hadoopmaster node run below command to remove all the hadoop data:
- "Remove hadoop data ------>> $ sudo rm -rf /usr/local/hadoop/hadoop_data."
- On hadoopmaster node === >> $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
- Run this command ------->> $ sudo chown -R username:username /usr/local/hadoop
- On all hadoopslave nodes ===>> Run following commands ===>>
- $ sudo rm -rf /usr/local/hadoop/hadoop_data
- $ sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode.
- $ sudo chown -R username:username /usr/local/hadoop.
- Change hdfs site xml file for all slave nodes ----->>$ sudo gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml.
- remove dfs.namenode.data.dir property section.
- On hadoopmaster node,
- Run the command ---->>$ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub username@hadoopmaster.
- If you get error then solution of your problem is :
- OpenSSH is not installed. For installation : sudo apt-get install openssh-client.
- OR you will get this error "permission denied for root@localhost for ssh connection" .
- Solution of 2nd problem is : http://askubuntu.com/questions/497895/permission-denied-for-rootlocalhost-for-ssh-connection.
- Next problem might be your Internal network is not setup.
--------------------------Internal Network Setup between all the 4 virtual machine--------------------------------------------------
- Click on top right WIFI or Internet icon.
- Go to edit connection.
- click on add/edit connection
- Give connection name : "master connection"
- Go to IPv4 settings.
- Change method from automatic to manual.
- Enter IP Address like for master node : 192.168.1.1
- Enter net mask address : 255.255.255.0
- save it and do the above steps for all the nodes.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
On hadoopmaster machine -
Run following commands -
- $ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub chaalpritam@hadoopmaster
- $ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub chaalpritam@hadoopslave1
- $ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub chaalpritam@hadoopslave2
- $ sudo ssh-copy-id -i ~/.ssh/id_dsa.pub chaalpritam@hadoopslave3
Now we can access the machines using SSH -
- $ sudo ssh hadoopmaster
- $ exit
- $ sudo ssh hadoopslave1
- $ exit
- $ sudo ssh hadoopslave2
- $ exit
- $ sudo ssh hadoopslave3
- $ exit
If you able to do this you are accessing all the nodes using SSH.
Next
Format namenode and start hadoop -
$ hadoop namenode -format
$ start-all.sh
$ jps (check in all 3 datanodes)
http://hadoopmaster:8088/
http://hadoopmaster:50070/
http://hadoopmaster:50090/
http://hadoopmaster:50075/
Reference tutorial : http://chaalpritam.blogspot.com/2015/01/hadoop-260-multi-node-cluster-setup-on.html.
Reference Video : https://www.youtube.com/watch?v=MzdyM3N5SlE#t=204
Thank you :)
Anshul Shrivastava