Apache Zeppelin is a web-based open source notebook and collaborative tool
for interactive data ingestion, discovery, analytics, and visualization.
Zeppelin supports 20+ languages including Apache Spark, SQL, R, Elasticsearch
and many more. Apache Zeppelin allows you to create beautiful data-driven
documents and see the results of your analytics.
Prerequisites
For this tutorial, we will use zeppelin.example.com as the
domain name pointed towards the Cobra instance. Please make sure to replace all
occurrences of the example domain name with the actual one.
Update your base system using the guide How to Update CentOS 7.
Once your system has been updated, proceed to install Java.
Install Java
Apache Zeppelin is written in Java, thus it requires JDK to work.
Download Oracle SE JDK RPM package.
wget --no-cookies --no-check-certificate --header
"Cookie:oraclelicense=accept-securebackup-cookie"
"http://download.oracle.com/otn-pub/java/jdk/8u151-b12/e758a0de34e24606bca991d704f6dcbf/jdk-8u151-linux-x64.rpm"
Install the downloaded package.
sudo yum -y localinstall jdk-8u151-linux-x64.rpm
If Java has installed successfully, then you should be able to verify
its version.
java -version
You will see the following output.
[[email protected] ~]$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
Before we can proceed further, we will need to set up
the JAVA_HOME and JRE_HOME environment variables. Find the
absolute path of the JAVA executable in your system.
readlink -f $(which java)
You will see a similar output.
[[email protected] ~]$ readlink -f $(which java)
/usr/java/jdk1.8.0_151/jre/bin/java
Now, set the JAVA_HOME and JRE_HOME environment
variables according to the path of the Java directory.
echo "export JAVA_HOME=/usr/java/jdk1.8.0_151" >>
~/.bash_profile
echo "export JRE_HOME=/usr/java/jdk1.8.0_151/jre" >>
~/.bash_profile
Execute the bash_profile file.
source ~/.bash_profile
Now you can run the echo $JAVA_HOME command to check if the
environment variable is set.
[[email protected] ~]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_151
Install Zeppelin
Apache Zeppelin ships all the dependencies along with the binary files,
so we do not need to install anything else except Java. Download the Zeppelin
binary on your system. You can always find the latest version of the
application on Zeppelin download page.
wget
http://www-us.apache.org/dist/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
Extract the archive.
sudo tar xf zeppelin-*-bin-all.tgz -C /opt
The above command will extract the archive
to /opt/zeppelin-0.7.3-bin-all. Rename the directory for sake of
convenience.
sudo mv /opt/zeppelin-*-bin-all /opt/zeppelin
Apache Zeppelin is now installed. You can immediately start the
application, but it will not be accessible to you, as it listens
to localhost only. We will configure Apache Zeppelin as a service. We
will also configure Nginx web server as a reverse proxy.
Configure Systemd service
In this step, we will set up a Systemd unit file for the Zeppelin
application. This will ensure that the application process is automatically
started on system restart and failures.
For security reasons, create an unprivileged user for running the
Zeppelin process.
sudo adduser -d /opt/zeppelin -s /sbin/nologin zeppelin
Provide ownership of the files to the newly created Zeppelin user.
sudo chown -R zeppelin:zeppelin /opt/zeppelin
Create a new Systemd service unit file.
sudo nano /etc/systemd/system/zeppelin.service
Populate the file with the following.
[Unit]
Description=Zeppelin service
After=syslog.target network.target
[Service]
Type=forking
ExecStart=/opt/zeppelin/bin/zeppelin-daemon.sh start
ExecStop=/opt/zeppelin/bin/zeppelin-daemon.sh stop
ExecReload=/opt/zeppelin/bin/zeppelin-daemon.sh reload
User=zeppelin
Group=zeppelin
Restart=always
[Install]
WantedBy=multi-user.target
Start the application.
sudo systemctl start zeppelin
Enable Zeppelin service to automatically start at boot time.
sudo systemctl enable zeppelin
To check if the service is running, you can run the following.
sudo systemctl status zeppelin
Configure Reverse Proxy
By default, the Zeppelin server listens to localhost on
port 8080. In this tutorial, we will use Nginx as a reverse proxy so that
the application can be accessed via
standard HTTP and HTTPS ports. We will also configure Nginx
to use SSL generated with Let's Encrypt free SSL CA.
Install Nginx.
sudo yum -y install nginx
Start Nginx and enable it to automatically start at boot time.
sudo systemctl start nginx
sudo systemctl enable nginx
Install Certbot, which is the client application for Let's Encrypt CA.
sudo yum -y install certbot
Before you can request the certificates, you will need to allow
port 80 and 443 or
standard HTTP and HTTPS services through the firewall.
sudo firewall-cmd --zone=public --add-service=http --permanent
sudo firewall-cmd --zone=public --add-service=https --permanent
sudo firewall-cmd --reload
Note: To obtain certificates from Let's Encrypt CA, the domain
for which the certificates are to be generated must be pointed towards the
server. If not, make the necessary changes to the DNS records of the domain and
wait for the DNS to propagate before making the certificate request again.
Certbot checks the domain authority before providing the certificates.
Generate the SSL certificates.
sudo certbot certonly --webroot -w /usr/share/nginx/html -d
zeppelin.example.com
The generated certificates are likely to be stored
in /etc/letsencrypt/live/zeppelin.example.com/. The SSL certificate will
be stored as fullchain.pem and private key will be stored
as privkey.pem.
Let's Encrypt certificates expire in 90 days, hence it is recommended to
set up auto-renewal of the certificates using Cron jobs.
Open the cron job file.
sudo crontab -e
Add the following line at the end of the file.
30 5 * * * /usr/bin/certbot renew --quiet
The above cron job will run every day at 5:30 AM. If the certificate is
due for expiry, it will automatically renew them.
Create a new server block file for the Zeppelin site.
sudo nano /etc/nginx/conf.d/zeppelin.example.com.conf
Populate the file.
upstream zeppelin {
server 127.0.0.1:8080;
}
server {
listen 80;
server_name zeppelin.example.com;
return 301 https://$host$request_uri;
}
server {
listen 443;
server_name zeppelin.example.com;
ssl_certificate
/etc/letsencrypt/live/zeppelin.example.com/fullchain.pem;
ssl_certificate_key
/etc/letsencrypt/live/zeppelin.example.com/privkey.pem;
ssl on;
ssl_session_cache builtin:1000
shared:SSL:10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
ssl_prefer_server_ciphers on;
access_log /var/log/nginx/zeppelin.access.log;
location / {
proxy_pass http://zeppelin;
proxy_set_header X-Real-IP
$remote_addr;
proxy_set_header X-Forwarded-For
$proxy_add_x_forwarded_for;
proxy_set_header Host
$http_host;
proxy_set_header
X-NginX-Proxy true;
proxy_redirect off;
}
location /ws {
proxy_pass http://zeppelin/ws;
proxy_http_version 1.1;
proxy_set_header Upgrade websocket;
proxy_set_header Connection upgrade;
proxy_read_timeout 86400;
}
}
Restart Nginx so that the changes can take effect.
sudo systemctl restart nginx zeppelin
Zeppelin is now accessible on the following address.
https://zeppelin.example.com
By default, there is no authentication is enabled, so you can use the
application directly.
Since the application is accessible to everyone, the notebooks you
create are also accessible to everyone. It is very important to disable
anonymous access and enable authentication so that only the authenticated users
can access the application.
Disable Anonymous Access
To disable the default anonymous access, copy the configuration file
template to its live location.
cd /opt/zeppelin
sudo cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
Edit the configuration file.
sudo nano conf/zeppelin-site.xml
Find the following lines in the file.
<property>
<name>zeppelin.anonymous.allowed</name>
<value>true</value>
Change the value to false to disable the anonymous access.
Enable Shiro Authentication
Now that we have disabled the anonymous access, we need to enable some
kind of authentication mechanism so that privileged users can log in. Apache Zeppelin
uses Apache Shiro authentication. Copy the Shiro configuration file.
sudo cp conf/shiro.ini.template conf/shiro.ini
Edit the configuration file.
sudo nano conf/shiro.ini
Find the following lines in the file.
[users]
admin = password1, admin
user1 = password2, role1, role2
user2 = password3, role3
user3 = password4, role2
The list contains the username, password, and roles of the users. For
now, we will only use admin and user1. Change the password
of admin and user1 and disable the other users by commenting
them. You can also change the username and roles of the users. To learn more
about Apache Shiro users and roles, read the Shiro authorization guide.
Once you have changed the passwords, the code block should look like
this.
[users]
admin = StrongPassword, admin
user1 = UserPassword, role1, role2
# user2 = password3, role3
# user3 = password4, role2
Now restart Zeppelin to apply the changes.
sudo systemctl restart zeppelin
You should see that the authentication has been enabled and you will be
able to log in using the username and password set in the Shiro configuration
file.
Want to contribute?