- Nikhil Bhaskar
- July 20, 2021
How to Setup Hadoop on ubuntu 20.04
Hadoop is a free & open-source software framework.It is based on Java.Hadoop is used for the storage processing of large set of data on clusters of machines.Using Hadoop,we can manage multiple number of dedicated server.
Install and Configure Hadoop on ubuntu
Update the System.
apt-get update
Install Java.
apt-get install openjdk-11-jdk
Check Java Version.
java -version
Here is the command output.
openjdk version "11.0.11"
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)
Create a User.
adduser hadoop
Here is the command output.
- Provide the password for user.
Adding user `hadoop' ...
Adding new group `hadoop' (1002) ...
Adding new user `hadoop' (1002) with group `hadoop' ...
Creating home directory `/home/hadoop' ...
Copying files from `/etc/skel' ...
New password:
Retype new password:
passwd: password updated successfully
Changing the user information for hadoop
Enter the new value, or press ENTER for the default
Full Name []:
Room Number []:
Work Phone []:
Home Phone []:
Other []:
Is the information correct? [Y/n]
- Type Y.
Login to Hadoop user.
su - hadoop
Provide the hadoop user password.
Configure the SSH Key.
ssh-keygen -t rsa
Here is the command output.
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:QSa2syeISwP0hD+UXxxi0j9MSOrjKDGIbkfbM3ejyIk hadoop@ubuntu20
The key's randomart image is:
+---[RSA 3072]----+
| ..o++=.+ |
|..oo++.O |
|. oo. B . |
|o..+ o * . |
|= ++o o S |
|.++o+ o |
|.+.+ + . o |
|o . o * o . |
| E + . |
+----[SHA256]-----+
Move the public key from id_rsa.pub to authorized_keys.Provide the following permission.
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 640 ~/.ssh/authorized_keys
Verify the SSH authentication.
ssh localhost
Here is the command output.
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:JFqDVbM3zTPhUPgD5oMJ4ClviH6tzIRZ2GD3BdNqGMQ.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Install the Hadoop
Login to Hadoop user.
su - hadoop
Download the Hadoop.
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
Extract the downloaded file.
tar -xvzf hadoop-3.3.0.tar.gz
Rename the extracted downloaded file to hadoop.
mv hadoop-3.3.0 hadoop
Open the ~/.bashrc file.
vim ~/.bashrc
Add the following lines.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Activate the environment.
source ~/.bashrc
Open the environment variable file of Hadoop.
vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Add the following lines.
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
Create a Directory.
mkdir -p ~/hadoopdata/hdfs/namenode
mkdir -p ~/hadoopdata/hdfs/datanode
Open the core-site.xml file.
vim $HADOOP_HOME/etc/hadoop/core-site.xml
Add the following lines.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://127.0.0.1:9000</value> or <value>hdfs://0.0.0.0:9000</value>
</property>
</configuration>
Open the hdfs-site.xml file.
vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Add the following lines.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
Open the mapred-site.xml file.
vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
Add the following lines.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Open the yarn-site.xml file.
vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
Add the following lines.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Format the Namenode as a hadoop user.
hdfs namenode -format
Here is the command output.
INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ip-address
************************************************************/
Start the hadoop cluster:
start-dfs.sh
Here is the command output.
Starting namenodes on [15.228.82.126]
15.228.82.126: Warning: Permanently added '15.228.82.126' (ECDSA) to the list of
known hosts.
Starting datanodes
Starting secondary namenodes ip-address
ip-address: Warning: Permanently added 'ip-address' (ECDSA) to the list of known hosts.
Start the YARN service.
start-yarn.sh
Here is the command output.
Starting resourcemanager
Starting nodemanagers
Check the status of all Hadoop services.
jps
Here is the command output.
6032 ResourceManager
5625 DataNode
6523 Jps
5836 SecondaryNameNode
6206 NodeManager
Open the port number 9870 & 8088 on ufw firewall.
ufw allow 9870/tcp
&
ufw allow 8088/tcp
Access Hadoop web-interface
http://server-ip:9870
Here is the output.
Access the Resource Manage web-interface
http://server-ip:8088
Here is the output.
Test the Hadoop Cluster.
Create a Directory in the HDFS filesystem.
hdfs dfs -mkdir /logs
hdfs dfs -mkdir /example
list the directory:
hdfs dfs -ls /
Here is the command output.
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2021-07-19 15:27 /logs
drwxr-xr-x - hadoop supergroup 0 2021-07-19 15:26 /example
Push log files from local machine to hadoop file system.
hdfs dfs -put /var/log/* /logs/
Open the Hadoop Namenode web interface.
http://server-ip:9870/explorer.html
Here is the output.