Hadoop Installation, Configuration and Administration

Hadoop
Installation, Configuration and Administration

Preparation for Hadoop Installation:

 1.     Resources Details: 

SN Hostname FQDN IP
1 bdrenfdludcf01 bdrenfdludcf01.dle.asiaconnect.bdren.net.bd 103.28.121.5
2 bdrenfdludcf02 bdrenfdludcf02.dle.asiaconnect.bdren.net.bd 103.28.121.7
3 bdrenfdludcf03 bdrenfdludcf03.dle.asiaconnect.bdren.net.bd 103.28.121.30
4 bdrenfdludcf04 bdrenfdludcf04.dle.asiaconnect.bdren.net.bd 103.28.121.67
5 bdrenfdludcf05 bdrenfdludcf05.dle.asiaconnect.bdren.net.bd 103.28.121.34
6 bdrenfdludcf06 bdrenfdludcf06.dle.asiaconnect.bdren.net.bd 103.28.121.66

 
Prefer if we login into all the machines using Putty or Any SSH Client.  /etc/hosts configuration for Hadoop Cluster
103.28.121.5 bdrenfdludcf01 bdrenfdludcf01.dle.asiaconnect.bdren.net.bd
103.28.121.7 bdrenfdludcf02 bdrenfdludcf02.dle.asiaconnect.bdren.net.bd
103.28.121.30 bdrenfdludcf03 bdrenfdludcf03.dle.asiaconnect.bdren.net.bd
103.28.121.67 bdrenfdludcf04 bdrenfdludcf04.dle.asiaconnect.bdren.net.bd
103.28.121.34 bdrenfdludcf05 bdrenfdludcf05.dle.asiaconnect.bdren.net.bd
103.28.121.66 bdrenfdludcf06 bdrenfdludcf06.dle.asiaconnect.bdren.net.bd

 IP Check:
# ip addr show
CPU Check:
# more /proc/cpuinfo | grep ‘core id’ | wc –l
Memory Check:
# more /proc/meminfo | grep –i ‘Mem’
Set All the Host Names (If require)
To set all the host names on a system, enter the following command as root:
# hostnamectl set-hostname bdrenfdludcf01
# hostnamectl set-hostname bdrenfdludcf02
# hostnamectl set-hostname bdrenfdludcf03

This will alter the pretty, static, and transient host names alike. The static and transient host names will be simplified forms of the pretty host name. Spaces will be replaced
with “-” and special characters will be removed.

Create a Repository for installing any extended Linux Package using yum:
Create a new file with below contents with extension of “.repo”.
[root@datasciencelab-1 /root]# cd /etc/yum.repos.d
[root@datasciencelab-1 yum.repos.d]# vim bdrenfdludcfrhelepl7.repo
[rhel7repo]
gpgcheck = 0
enabled = 1
baseurl = https://dl.fedoraproject.org/pub/epel/7/x86_64/
name = datasciencelab-repo-epl
[root@nakparsdev-2-vm-01 yum.repos.d]#

Note: If you find any certification problem (peer certification expired) the temporary solution is to use http
rather than https
Run “yum clean all”
[root@datasciencelab-1 yum.repos.d]# yum clean all
Loaded plugins: langpacks, product-id, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Cleaning repos: rhel7_dvd
Cleaning up everything
[root@datasciencelab-1 yum.repos.d]#
Test the new yum repository by installing new package.
[root@UnixArena-RHEL7 yum.repos.d]# yum install telnet

http://yum.baseurl.org/wiki/YumCommandsCopy all the required /etc/hosts and /etc/yum.repos.d/datasciencelabrhelepl17.repo to other Machines
# scp/etc/hosts datasciencelab-2:/etc/
#scp/etc/yum.repos.d/datasciencelabrhelepl7.repo datasciencelab-2:/etc/yum.repos.d/
# scp/etc/hosts datasciencelab-3:/etc/
#scp/etc/yum.repos.d/datasciencelabrhelepl7.repo datasciencelab-3:/etc/yum.repos.d/

Time Zone Setting (If require)
timedatectl
timedatectl list-timezones | grep Dhaka
timedatectl set-timezone Asia/Dhaka
timedatectl set-time 13:38:00

Configure NTP (Network Time Protocol):
# yum install ntp
# vi /etc/ntp.conf
Add: Server
ntp-server-1 prefer

Server ntp-server-2
Server 127.127.1.0 stratum 10
Disable all the authentications
# service ntpd status
# service ntpd start
# chkconfig ntpd on
# ntpq -p

Stop iptables in Redhat 7 for lab purpose:
[root@datasciencelab-1
~]# systemctl status firewalld

firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2016-12-12 12:11:40 IRST; 1h 55min ago

Main PID: 699 (firewalld)
CGroup: /system.slice/firewalld.service

└─699/usr/bin/python -Es /usr/sbin/firewalld –nofork –nopid
Dec 12 12:11:38 nakparsdev-2 systemd[1]: Starting firewalld – dynamic firewall daemon…
Dec 12 12:11:40 nakparsdev-2 systemd[1]: Started firewalld – dynamic firewall daemon.
[root@datasciencelab-1~]#
[root@datasciencelab-1~]#
[root@datasciencelab-1~]#
[root@datasciencelab-1~]#
[root@datasciencelab-1~]# service firewalld stop
Redirecting to /bin/systemctl stop firewalld.service
[root@datasciencelab-1~]#
[root@datasciencelab-1~]#
[root@datasciencelab-1~]#
[root@datasciencelab-1~]#
[root@datasciencelab-1~]#
[root@datasciencelab-1~]# chkconfig firewalld off

Note: Forwarding request to ‘systemctl disable firewalld.service’.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.

Stop SELINUX:
# vim /etc/selinux/config
Make
SELINUX=disabled
Now To take effect immediately, we need to restart they system or you can run following command.
# setenforce 0

Create application users in all machines:
#adduser -g wheel -s/bin/bash -d /home/pars pars
#passwd pars
#adduser -g wheel -s/bin/bash -d /home/hadoop hadoop
#passwd hadoop
Note: Set password as Robi1234$

Assign Administrative Role to hadoop User:
Adding a dedicated user for Hadoop in both Namenode and datanode
Ucomment the below line in /etc/sudoers:
## Same thing without a password

# %wheel ALL=(ALL) NOPASSWD: ALL

Configure password less
SSH for both root and hadoop user (In production hadoop is only necessary):

In master/namenode as root, where slave/datanode(s) is the remote hosts:
Create Key in master as root & hadoop user:
# ssh-keygen
# ssh-copy-id -p 2200 -i
$HOME/.ssh/id_rsa.pub hadoop@bdrenfdludcf05

# ssh-copy-id -p 2200 -i
$HOME/.ssh/id_rsa.pub hadoop@bdrenfdludcf06

# ssh -p 2200 hadoop@bdrenfdludcf06
#Test
 

In all servers as root user:
# cat $HOME/.ssh/id_rsa.pub >>
$HOME/.ssh/authorized_keys

$ chmod 0600 ~/.ssh/authorized_keys
Login to remote host from master:
# ssh root@datanode

Check mater to master as well using root:
# ssh master
Check mater to datanode2 as well using root:
# ssh slave
Do same for hadoop user:
# su – hadoop
[hadoop@datasciencelab-1 ~]$

Java Installation or checking:
We can test to see if Java installed correctly with the following command
allnodes$ java -version
java version “1.7.0_79”
OpenJDK Runtime
Environment (IcedTea 2.5.5) (7u79–2.5.5–0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM
(build 24.79-b02, mixed mode)

Or we need to download and install
java version in both hosts.

You can use wget command to download
and install using rpm –ivh command.

Step 1
Download java (JDK<latest version> – X64.tar.gz) by visiting the following link Oracle Java.
Then jdk-7u71-linux-x64.tar.gz will be downloaded into your system.

Step 2
Generally you will find the downloaded java file in Downloads folder. Verify it and extract the jdk-7u71-linux-x64.gz
file using the following commands.

$ cd /root/downloads/
$ ls
jdk-7u71-linux-x64.gz
$ tar zxf
jdk-7u71-linux-x64.gz

$ ls
jdk1.7.0_71
jdk-7u71-linux-x64.gz

Step 3
To make java available to all the users, you have to move it to the location “/usr/local/”.
Open root and type the following commands.

$ su
password:
# mv jdk1.7.0_71
/usr/local/ [Use sudo if you are using hadoop user]

# exit

Step 4
For setting up PATH
and JAVA_HOME variables, add the following commands to ~/.bashrc file.

export
JAVA_HOME=/usr/local/jdk1.7.0_79

export PATH=$PATH:$JAVA_HOME/bin
Now apply all the
changes into the current running system.

$ source ~/.bashrc
$ vim ~/.bashrc
# Add following lines
export
JAVA_HOME=/usr/local/jdk1.7.0_79

export PATH=$PATH:$JAVA_HOME/bin

Step 5
Use the following
commands to configure java alternatives:

#alternatives
–install “/usr/bin/java” “java”
“/usr/local/jdk1.7.0_79/bin/java” 1

#alternatives –set
java /usr/local/jdk1.7.0_79/bin/java

Note: Use sudo if you are using hadoop user.
Now verify the java -versioncommand from the terminal as explained above.
Fedora, Oracle
Linux, Red Hat Enterprise Linux, etc. (Another way to install Java – you
may skkp it)

On the command line, type:
$ su -c “yum install
java-1.7.0-openjdk”

The java-1.7.0-openjdk package contains just the Java Runtime Environment. If you
want to develop Java programs then install the java-1.7.0-openjdk-devel package

  Main Hadoop Installation, Configuration & Administration:

Assumtions is that we will have already downloaded Hadoop binaries in /root/downloads
allnodes# tar zxvf/root/Binaries/hadoop-*
allnodes# mv hadoop-2.7.1hadoop
allnodes# sudo mv hadoop/usr/local/

# Set Hadoop-related environment variables
Now we’ll need to add some
Hadoop and Java environment variables to ~/.profile and source them to the current shell session.

# su – hadoop
~/.bashrc:
export JAVA_HOME=/usr/local/jdk1.7.0_79
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

Then load these environment variables by sourcing the profile
allnodes$ . ~/.bashrc

Hadoop Configurations
For a basic setup of Hadoop, we’ll be changing a few of the configurations in the Hadoop directory defined now by HADOOP_CONF_DIR environment variable. All the current configuration changes will be applied to the NameNode and all the DataNodes. After these changes, we will apply configurations specific to the NameNode and DataNodes.
Here are the following files to focus on:
•$HADOOP_CONF_DIR/hadoop-env.sh
•$HADOOP_CONF_DIR/core-site.xml
•$HADOOP_CONF_DIR/yarn-site.xml
•$HADOOP_CONF_DIR/mapred-site.xml
(This file currently does not exist in the default Hadoop installation, but a template is available. We’ll make a copy of the template and rename it to mapred-site.xml)

Common Hadoop Configurations on all Nodes
Let’s start with $HADOOP_CONF_DIR/hadoop-env.sh. Currently only root users can edit files in the Hadoop directory, but we’ll change this after all configurations have been applied. To edit the configurations files, you can simply add a sudo before the text editor of your choice, for example
allnodes$ sudo vim $HADOOP_CONF_DIR/hadoop-env.sh

The only thing that needs changing is the location of JAVA_HOME in the file. Simply replace${JAVA_HOME}
with /usr/local/jdk1.7.0_79 which is where Java was just previously installed.
$HADOOP_CONF_DIR/hadoop-env.sh:
# The java implementation to use.
export JAVA_HOME=/usr/local/jdk1.7.0_79
export HADOOP_SSH_OPTS=”-p 2200″ # Add in last lineThe next file to modify is the $HADOOP_CONF_DIR/core-site.xml. Here we will declare the default Hadoop file system. The default configuration is set to the localhost, but here we will want to specify the NameNode’s public DNS on port 9000. Scroll down in the xml file to find the configurations tag and be sure to change the file to look like the following$HADOOP_CONF_DIR/core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode_public_dns:9000</value>
</property>
</configuration>The next file to modify is the $HADOOP_CONF_DIR/yarn-site.xml. Scroll down in the xml file to find the configurations tag and be sure to change the file to look like the following$HADOOP_CONF_DIR/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode_public_dns</value>
</property>
</configuration>The last configuration file to change is the $HADOOP_CONF_DIR/mapred-site.xml. We will first need to make a copy of the template and rename it.
allnodes$ sudo cp $HADOOP_CONF_DIR/mapred-site.xml.template $HADOOP_CONF_DIR/mapred-site.xml
Scroll down in the xml file to find the configurations tag and be sure to change the file to look like the following$HADOOP_CONF_DIR/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.jobtracker.address</name>
<value>namenode_public_dns:54311</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>NameNode Specific Configurations
Now that all the common configurations are complete, we’ll finish up the NameNode specific configurations. On the NameNode, all that remains are the following:
•adding hosts to /etc/hosts
•modifying the configurations in $HADOOP_CONF_DIR/hdfs-site.xml
•defining the Hadoop master in $HADOOP_CONF_DIR/masters
•defining the Hadoop slaves in $HADOOP_CONF_DIR/slavesLet’s start with adding to the hosts file located under /etc/hosts. We will need to add each node’s public DNS and hostname to the list. The hostname can be found with the following
allnodes$ echo $(hostname)
or by taking the first part of the private DNS (e.g. ip-172–31–35–242.us-west-2.compute.internal)

By default, 127.0.0.1 localhost is present, so we can add under it to look like the following (ignoring the IPv6 settings):

 /etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.0.159 datasciencelab-1 datasciencelab-1.datasciencelab.com.bd
192.168.0.119 datasciencelab-2 datasciencelab-2.datasciencelab.com.bd
192.168.0.104 datasciencelab-3 datasciencelab-3.datasciencelab.com.bd
#IP hostname FQDNWe can now modify the $HADOOP_CONF_DIR/hdfs-site.xml file to specify the replication factor along with where the NameNode data will reside. For this setup, we will specify a replication factor of 3 for each data block in HDFS.Scroll down in the xml file to find the configurations tag and be sure to change the file to look like the following
$HADOOP_CONF_DIR/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/hadoop_data/hdfs/namenode</value>
</property>
</configuration>The current path where data on the NameNode will reside does not exist, so we’ll need to make this before starting HDFS.
namenode$ sudo mkdir -p $HADOOP_HOME/hadoop_data/hdfs/namenode
Next we’ll need to add a masters file to the $HADOOP_CONF_DIR directory
namenode$ sudo touch $HADOOP_CONF_DIR/masters
then insert the NameNode’s hostname in that file
$HADOOP_CONF_DIR/masters:
datasciencelab-1.datasciencelab.com.bd
We will also need to modify the slaves file in the $HADOOP_CONF_DIR directory to the following. By default localhost is present, but we can remove this.
$HADOOP_CONF_DIR/slaves
datasciencelab-2.datasciencelab.com.bd
datasciencelab-3.datasciencelab.com.bdNow that all configurations are set on the NameNode, we will change the ownership of the$HADOOP_HOME directory to the user hadoop.
namenode$ sudo chown hadoop:wheel -R $HADOOP_HOME DataNode Specific Configurations
Let’s now move onto the final configurations for the DataNodes. We will need to first SSH into each DataNode and only configure the $HADOOP_CONF_DIR/hdfs-site.xml fileScroll down in the xml file to find the configurations tag and be sure to change the file to look like the following
$HADOOP_CONF_DIR/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/hadoop_data/hdfs/datanode</value>
</property>
</configuration>Just like on the NameNode, we will need to create the directory specified in the$HADOOP_CONF_DIR/hdfs-site.xml file.
datanodes$ sudo mkdir -p $HADOOP_HOME/hadoop_data/hdfs/datanode
Now that all configurations are set on the DataNode, we will change the ownership of the$HADOOP_HOME directory to the hadoop user
datanodes$ sudo chown -R hadoop $HADOOP_HOMEStart Hadoop Cluster
We can now start up HDFS from the Namenode by first formatting it and then starting HDFS. An important thing to notes is that every time the NameNode is formatted, all of the data previously on it is lost.
namenode$ hdfs namenode -format
namenode$ $HADOOP_HOME/sbin/start-dfs.shWhen asked “The authenticity of host ‘Some Node’ can’t be established. Are you sure you want to continue connecting (yes/no)?” type yes and press enter. You may need to do this several timeskeep typing yes, then enter, even if there is no new prompt, since it’s the first time for Hadoop to log into each of the datanodes. 

You can go to namenode_public_dns:50070 in your browser to check if all datanodes are online. If the webUI does not display, you need to troubleshoot. 

Now let’s start up YARN as well as the MapReduce JobHistory Server.
namenode$ $HADOOP_HOME/sbin/start-yarn.sh
namenode$ $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

You can check to make sure all Java processes are running with the jps command on the NameNode and DataNodes (your process ids will be different though).
namenode$ jps
21817 JobHistoryServer
21853 Jps
21376 SecondaryNameNode
21540 ResourceManager
21157 NameNode
datanodes$ jps
20936 NodeManager
20792 DataNode
21036 Jps

Common Problem in HDFS:
Data node is not running
or joining in HDFS cluster

http://stackoverflow.com/questions/29166837/datanode-is-not-starting-in-singlenode-hadoop-2-6-0

Tips:
Note: All the names shall
have DNS entries, otherwise it won’t come to browser

How many Name node anddata node:
$ hdfs dfsadmin -report
Now you can access Hadoop Services in Browser
NameNode:
http://name_node_IP:50070/
DataNode:
http://data_node_IP:50075/
Yarn Service:
http://name_node_IP:8088/
Handling block missing or unhealthy HDFS:
http://stackoverflow.com/questions/19205057/how-to-fix-corrupt-hdfs-files
Commands for starting and stopping Hadoop Cluster
Start/Stop HDFS using
below commands

sh $HADOOP_HOME/sbin/start-dfs.sh
sh $HADOOP_HOME/sbin/stop-dfs.sh
Start/Stop YARN services using below commands
sh $HADOOP_HOME/sbin/start-yarn.sh
sh $HADOOP_HOME/sbin/stop-yarn.sh

Start History Server:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
Stop History Server:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver

Working with HDFS
You’re now ready to start working with HDFS by SSH’ing to the NameNode. The most common commands are very similar to normal Linux File System commands, except that they are preceded by hdfs dfs. Below are some common commands and a few examples to get used to HDFS.

Common HDFS Commands

List all files and folder in directoryhdfs dfs -ls <folder name>Make a directory on HDFShdfs dfs -mkdir <folder name>Copy a file from the local machine (namenode) into HDFShdfs dfs -copyFromLocal <local folder or file name>Delete a file on HDFShdfs dfs -rm <file name>Delete a directory on HDFShdfs dfs -rmdir <folder name> 

HDFS Examples
# create local dummy file to place on HDFS
namenode$ echo “Hello this
will be my first distributed and fault-tolerant data set\!”” | cat >> my_file.txt

# list directories from top level of HDFS
namenode$ hdfs dfs -ls /
namenode$ hadoop fs -ls /
# This should display nothing but a temp directory
# create /user directory on HDFS
namenode$ hdfs dfs -mkdir/user
namenode$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x — ubuntu
supergroup 0 2015–05–06 22:41 /user

# copy local file a few times onto HDFS
namenode$ hdfs dfs -copyFromLocal~/my_file.txt /user
namenode$ hadoop fs -put *_HO_201610181700* /parsdev/ [Put file hdfs using regular expression]
namenode$ hdfs dfs -copyFromLocal ~/my_file.txt /user/my_file2.txt
namenode$ hdfs dfs-copyFromLocal ~/my_file.txt /user/my_file3.txt
# list files in /user directory
namenode$ hdfs dfs -ls/user

Found 1 items

-rw-r — r — 3 ubuntu
supergroup 50 2015–05–06 22:43 /user/my_file.txt

-rw-r — r — 3 ubuntu
supergroup 50 2015–05–06 22:43 /user/my_file2.txt

-rw-r — r — 3 ubuntu
supergroup 50 2015–05–06 22:43 /user/my_file3.txt

# clear all data and
folders on HDFS

namenode$ hdfs dfs -rm
/user/my_file*

15/05/06 22:49:06 INFO fs.TrashPolicyDefault:
Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval =
0 minutes.

Deleted /user/my_file.txt

15/05/06 22:49:06 INFO
fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0
minutes, Emptier interval = 0 minutes.

Deleted /user/my_file2.txt

15/05/06 22:49:06 INFO
fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0
minutes, Emptier interval = 0 minutes.

Deleted /user/my_file3.txt

namenode$ hdfs dfs -rmdir
/user

How to leave safe mode for HDFS Hadoop:
[hadoop@nakparsdev-1 ~]$ hadoop dfsadmin -safemode leave

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.
Safe mode is OFF

Datanode is not getting Name port saying NoRouteToHost Error:
Check all service and port
are running or not in HDFS:

#netstat -pant
You will get core-site.xml
defined port is running well.

But you are not able to
see from web browser and also not able to connect from Datanode.

Solution:
# sudo
iptables –flush

It will resolve theproblem.

How to check SELINUX Status
#sestatus
If you want to disable then do below:
Disable SELinux by setting the SELINUX
setting to
disabled in /etc/selinux/config.
Of course you need to restart the system
How to restart an individual datanode of a Hadoop Cluster:
http://stackoverflow.com/questions/20208696/hadoop-restart-datanode-and-tasktracker
$cd /home/hadoop/Hadoop/sbin
$yarn-daemon.sh
stop nodemanager

$hadoop-daemon.sh stop datanode
$hadoop-daemon.sh start datanode
$yarn-daemon.sh start nodemanagerHow to commission and de-commission a DataNode from Existing Hadoop Cluster:
https://acadgild.com/blog/commissioning-and-decommissioning-of-datanode-in-hadoop/Commissioning of nodes stand for adding new nodes in current cluster which operates your Hadoop framework. In contrast, Decommissioning of nodes stands for removing nodes from your cluster. This is very useful utility to handle node failure during the
operation of Hadoop cluster without stopping entire Hadoop nodes in your cluster.
Why do we need decommissioning and commissioning?
You cannot directly remove any datanode in large cluster or a real-time cluster, as it will cause a lot of disturbance. And if you want to take a machine away for hardware up-gradation purpose, or if you want to bring down one or more than one node,
decommissioning will required because you cannot suddenly shut down the datanode/slave-nodes. Similarly, if you want to scale your cluster or add new data nodes without shutting down the cluster, you need commissioning.
Factors affecting Commissioning and Decommissioning process:
The first step is to contact the YARN manager – why? This is because it contains the records of all the running process. So, the first step is tell your YARN that you are going to remove a datanode and then you need to tell your Namenode that you are going to remove a particular node. Next, let’s add the Decommissioning and Commissioning property into the core-site.xml file of the Master node (Namenode). There are some prerequisite like you should have a working Hadoop multinode cluster (obviously you required a cluster because you are going to remove one or more Datanode whether it is temporary or permanent).We will start by adding Decommissioning property in Hadoop cluster. You need to add this for first time only and later on you need to update the exclude file alone. If the Decommissioning property is already added, then just update the exclude file
for Decommissioning.
Hadoop Cluster Configuration Note– In my case Resource Manager and Namenode on different machine so run all command accordingly.Steps for Decommissioning:
1) Before add any property, stop your cluster. Otherwise, it will affect your cluster. You can do this using the command stop-dfs.sh

 

1

stopdfs.sh

Next, Go to your Resource Manager node to edit yarn-site.xml
2) You need to add this property in your yarn-site.xml

1

2

3

4

<property>

   <name>yarn.resourcemanager.node.excludepath</name>

   <value>/home/hadoop/excludes</value>

</property>

Note- In value
section, mention your excludes file address.

Now, go to your master node (Namenode) and edit the hdfs-site.xml file-
3) Add this property to hdfs-site.xml

1

2

3

4

<property>

   <name>dfs.hosts.exclude</name>

   <value>/home/hadoop/excludes</value>

</property>

NoteIf the Resource Manager and the Namenode (Master Node) are on the same machine, then
simply edit the yarn-site.xml  and hdfs-site.xml of Namenode (Master node)

4) Next, start your cluster using the following commands:

1

 startdfs.sh
#(Run this command On Masternode/Namenode only)

1

 startyarn.sh                                  #(Run this command On Resource Manager)

Note- If the Resource Manager (Nodemanager) and Namenode are running on
the same machine, then run the above commands on Namenode (Master Node) only.

5) We need to update exclude file on both machine Resource manager and Namenode (Master Node), if it’s not there then we can create an exclude file on both the machines

$

vi excludes

Add the Datanode/Slave-node address, for decommissioning-

192.168.10.103

6) Run the following command in the Resource Manager:

$

yarn rmadmin
refreshNodes                     (on Resource Manager)

This command will basically check the yarn-site.xml and process that property.and decommission the mentioned node from yarn. It means now yarn manager will not give any job to this node.

7) Run the following command on the Namenode to check hdfs-site.xml and process the property and decommissioned the specified node/datanode.

$ hdfs dfsadmin -refreshNodes

$

hdfs dfsadmin
refreshNodes                 (on Namenode )

This command will basically check the yarn-site.xml and process that property, and Decommission the mentioned node from YARN. Meaning, the YARN Manager will not give any job to this node.

8) Run the command hadoop dfsadmin –report

1

 hadoop dfsadmin report

hadoop dfsadmin -report

Commissioning of Datanodes:
Commissioning process is just the opposite of decommissioning, but the configuration part is almost same for both.

Follow the steps for commissioning configuration –
Before starting commissioning steps, simply remove the exclude file on both machine or delete all the entries of exclude file ( make it blank)

Stop all daemons before adding any property into Hadoop cluster.
Open Resource manager machine to edit yarn-site.xml

1) Next, Go to yarn manager, and add this property into yarn-site.xml.

1

vi yarnsite.xml

1

2

3

4

<property>    

   <name>yarn.resourcemanager.nodes.includepath</name>

   <value>/home/hadoop/includes</value>

</property>

Next, Go to your Namenode (Master Node).
2) Add this property to hdfs-site.xml:

1

vi hdfssite.xml
(on
Namenode )

1

2

3

4

<property>

   <name>dfs.hosts</name>

   <value>/home/hadoop/includes</value>

</property>

3) Now, start your cluster using the following commands:

1

 startdfs.sh                            (Run this command On Namenode only)

1

startyarn.sh
(Run
this command On
Resource
Manager)

Note- If the Resource Manager (Nodemanager) and the Namenode are running on same machine, then run these commands on Namenode (Master Node) only.

4) We need to update the include file on both the Resource Manager and the Namenode (Master Node). If it’s not present, then create an include file on both the Nodes.

1

vi includes

Add the Datanode’s/Slave nodes IP address or hostname

1

2

3

192.168.10.101

192.168.10.102

192.168.10.103

exclude

Note- If you are going to add a new datanode or if you are scaling up your cluster by adding new node, you need to add the IP address and hostname to /etc/hosts file of all nodes ( Namenode, Datanode, Resource Manager).

Whenever you are going to do Commissioning, please mention all datanode address in the Include file.

5) Run the following command on the Resource Manager

$

yarn rmadmin
refreshNodes                 (on Resource Manager)

6) Next, go to the Master Node (Namenode) and run the following command to refresh all nodes:
Run this command to refresh all nodes-

$

hdfs dfsadmin
refreshNodes

7) Check Hadoop admin report using the command hadoop dfsadmin –report.

$

hadoop dfsadmin
report

Here, you can see that dn3.mycluster.com (192.168.10.103) datanode, which was on decommissioned state, is now on the Normal state (Commissioned).

Note:
·       The most important thing when you do commissioning is to make sure that the datanode which you are
going to add has everything (Should be configured for Hadoop datanode).

·       And second thing which you need to keep in your mind is that, you should have to mention all
necessary datanodes address in the include files.

·       Run cluster Balancer, as Balancer attempts to provide a balance to a certain threshold
among data nodes by copying block data from older nodes to newly commissioned nodes.

How to run Hadoop Balancer?

$

hadoop balancer

Hadoop Balancer is a built in property which makes sure that no datanode will be over utilized. When you
run the balancer utility, it checks whether some datanode are under-utilized or over-utilized and will balance the replication factor. But make sure the Balancer should run in only off peak hours in a real cluster, because if you run this during peak hours, it will cause a heavy load to networking, as it will transfer large amount of data.

So, this is how Commissioning is done!
Hope this post was helpful in understanding about the Commissioning and Decommissioning of the datanodes in Hadoop.

Important Note:
Better to update the slave file by editing below file finally so that in next start Hadoop is not trying to start data nodes over there.

[hadoop@nakparsdev-1-vm-01
~]$ vi $HADOOP_CONF_DIR/slaves

Hadoop node taking a long time to decommission
http://stackoverflow.com/questions/17789196/hadoop-node-taking-a-long-time-to-decommission

Starting HDFS even after the reboot
https://serverfault.com/questions/417997/cant-start-hadoop-from-an-init-d-script