How To Install Apache Spark On Ubuntu 20.04 LTS

Apache Spark is a free & open-source framework. It is used for distributed cluster-computing system & big data workloads. It is a engine for large-scale data processing & provides high-level APIs compatible in Java, Scala & Python

 Install Apache Spark On Ubuntu

Update the system.

apt-get update

Install Java.

apt-get install openjdk-11-jdk

Check Java version.

java --version

Here is the command output.

openjdk 11.0.11
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing)

Install Scala

apt-get install scala

Check Scala version.

scala -version

Here is the command output.

Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

Login to Scala.

scala

Here is the command output.

elcome to Scala 2.11.12 (OpenJDK 64-Bit Server VM, Java 11.0.11).
Type in expressions for evaluation. Or try :help.

scala> 

Run the command.

scala> println("Hello World")
Hello World

Install Apache Spark

Download the file.

curl -O https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz

Extract the downloaded file.

tar xvf spark-3.1.1-bin-hadoop3.2.tgz

Change the location of download extract file.

mv spark-3.1.1-bin-hadoop3.2/ /opt/spark 

Open bashrc configuration file.

vim ~/.bashrc

Add the following lines:

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

Activate the bashrc file.

source ~/.bashrc

Start a master server.

start-master.sh 

Here is the command output.

starting org.apache.spark.deploy.master.Master, logging to 
/opt/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-ubuntu.out

Open port number 8080 on ufw firewall.

ufw allow 8080/tcp

Access Apache Spark web-interface.

 

http://server-ip:8080/

Here is the output.

Fig 2

 

Start the worker process

start-slave.sh spark://ubuntu:7077

Use Spark shell.

/opt/spark/bin/spark-shell

Use pyspark for python.

/opt/spark/bin/pyspark

 

Leave a Reply