Apache
Zeppelin is a web-based open source notebook and collaborative tool for
interactive data ingestion, discovery, analytics and visualization. Zeppelin
supports more than 20 languages including Apache Spark, SQL, R, Elasticsearch
and many more. Apache Zeppelin allows you to create beautiful data-driven
documents and see the results of your analytics.
Prerequisites
For
this tutorial, we will use zeppelin.example.com as the domain name
pointed towards the Cobra instance. Please make sure to replace all occurrences
of the example domain name with the actual one.
Update
your base system using the guide How to Update Ubuntu 16.04. Once your
system has been updated, proceed to install Java.
Install
Java
Apache
Zeppelin is written in Java, thus it requires JDK to work. Add the Ubuntu
repository for Oracle Java 8.
sudo
add-apt-repository --yes ppa:webupd8team/java
sudo
apt update
Install
Oracle Java.
sudo
apt -y install oracle-java8-installer
Verify
its version.
java
-version
You
will see the following output.
[email protected]:~$
java -version
java
version "1.8.0_161"
Java(TM)
SE Runtime Environment (build 1.8.0_161-b12)
Java
HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)
Set
the default path for the Java by installing the following package.
sudo
apt -y install oracle-java8-set-default
You
can verify if JAVA_HOME is set by running.
echo
$JAVA_HOME
You
will see.
[email protected]:~$
echo $JAVA_HOME
/usr/lib/jvm/java-8-oracle
If
you see no output at all, you will need to log out from the current shell and
log back in.
Install
Zeppelin
Apache
Zeppelin ships all the dependencies along with the binary files, so we do not need
to install anything else except Java. Download the Zeppelin binary on your
system. You can always find the latest version of the application
on Zeppelin download page.
wget
http://www-us.apache.org/dist/zeppelin/zeppelin-0.7.3/zeppelin-0.7.3-bin-all.tgz
Extract
the archive.
sudo
tar xf zeppelin-*-bin-all.tgz -C /opt
The
above command will extract the archive to /opt/zeppelin-0.7.3-bin-all.
Rename the directory for the sake of convenience.
sudo
mv /opt/zeppelin-*-bin-all /opt/zeppelin
Apache
Zeppelin is now installed. You can immediately start the application, but it
will not be accessible to you, as it listens to localhost only. We
will configure Apache Zeppelin as a service. We will also configure Nginx as a
reverse proxy.
Configure
Systemd
In
this step, we will set up a Systemd unit file for the Zeppelin application.
This will ensure that the application process is automatically started on
system restart and failures.
For
security reasons, create an unprivileged user for running the Zeppelin process.
sudo
useradd -d /opt/zeppelin -s /bin/false zeppelin
Provide
ownership of the files to the newly created Zeppelin user.
sudo
chown -R zeppelin:zeppelin /opt/zeppelin
Create
a new Systemd service unit file.
sudo
nano /etc/systemd/system/zeppelin.service
Populate
the file with the following.
[Unit]
Description=Zeppelin
service
After=syslog.target
network.target
[Service]
Type=forking
ExecStart=/opt/zeppelin/bin/zeppelin-daemon.sh
start
ExecStop=/opt/zeppelin/bin/zeppelin-daemon.sh
stop
ExecReload=/opt/zeppelin/bin/zeppelin-daemon.sh
reload
User=zeppelin
Group=zeppelin
Restart=always
[Install]
WantedBy=multi-user.target
Start
the application.
sudo
systemctl start zeppelin
Enable
Zeppelin service to automatically start at boot time.
sudo
systemctl enable zeppelin
To
ensure that the service is running, you can run the following.
sudo
systemctl status zeppelin
Configure
Reverse Proxy
By
default, the Zeppelin server listens to localhost on port 8080.
We will use Nginx as a reverse proxy so that the application can be accessed
via standard HTTP and HTTPS ports. We will also configure
Nginx to use an SSL generated with Let's Encrypt free SSL CA.
Install
Nginx.
sudo
apt -y install nginx
Start
Nginx and enable it to automatically start at boot time.
sudo
systemctl start nginx
sudo
systemctl enable nginx
Add
the Certbot repository.
sudo
add-apt-repository --yes ppa:certbot/certbot
sudo
apt-get update
Install
Certbot, which is the client application for Let's Encrypt CA.
sudo
apt -y install certbot
Note: To
obtain certificates from Let's Encrypt CA, the domain for which the
certificates are to be generated must be pointed towards the server. If not,
make the necessary changes to the DNS records of the domain and wait for the
DNS to propagate before making the certificate request again. Certbot checks
the domain authority before providing the certificates.
Generate
the SSL certificates.
sudo
certbot certonly --webroot -w /var/www/html -d zeppelin.example.com
The
generated certificates are likely to be stored
in /etc/letsencrypt/live/zeppelin.example.com/. The SSL certificate will
be stored as fullchain.pem and private key will be stored
as privkey.pem.
Let's
Encrypt certificates expire in 90 days, hence it is recommended to set up
auto-renewal of the certificates using Cron jobs.
Open
the cron job file.
sudo
crontab -e
Add
the following line at the end of the file.
30 5
* * * /usr/bin/certbot renew --quiet
The
above cron job will run every day at 5:30 AM. If the certificate is due for
expiration, it will automatically be renewed.
Create
a new server block file for the Zeppelin site.
sudo
nano /etc/nginx/sites-available/zeppelin
Populate
the file.
upstream
zeppelin {
server
127.0.0.1:8080;
}
server
{
listen 80;
server_name zeppelin.example.com;
return 301 https://$host$request_uri;
}
server
{
listen 443;
server_name zeppelin.example.com;
ssl_certificate
/etc/letsencrypt/live/zeppelin.example.com/fullchain.pem;
ssl_certificate_key
/etc/letsencrypt/live/zeppelin.example.com/privkey.pem;
ssl on;
ssl_session_cache builtin:1000 shared:SSL:10m;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers HIGH:!aNULL:!eNULL:!EXPORT:!CAMELLIA:!DES:!MD5:!PSK:!RC4;
ssl_prefer_server_ciphers on;
access_log /var/log/nginx/zeppelin.access.log;
location
/ {
proxy_pass http://zeppelin;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header X-NginX-Proxy true;
proxy_redirect off;
}
location
/ws {
proxy_pass http://zeppelin/ws;
proxy_http_version 1.1;
proxy_set_header Upgrade websocket;
proxy_set_header Connection upgrade;
proxy_read_timeout 86400;
}
}
Activate
the configuration file.
sudo
ln -s /etc/nginx/sites-available/zeppelin /etc/nginx/sites-enabled/zeppelin
Restart
Nginx so that the changes can take effect.
sudo
systemctl restart nginx zeppelin
Zeppelin
is now accessible on the following address.
https://zeppelin.example.com
By default,
there is no authentication enabled, so you can use the application directly.
Since
the application is accessible to everyone, the notebooks you create are also
accessible to everyone. It is very important to disable anonymous access and
enable authentication so that only the authenticated users can access the
application.
Disable
Anonymous Access
To
disable the default anonymous access, copy the configuration file template to
its live location.
cd
/opt/zeppelin
sudo
cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
Edit
the configuration file.
sudo
nano conf/zeppelin-site.xml
Find
the following lines in the file.
<property>
<name>zeppelin.anonymous.allowed</name>
<value>true</value>
Change
the value to false to disable the anonymous access.
Enable
Shiro Authentication
Now
that we have disabled the anonymous access, we need to enable some kind of
authentication mechanism so that privileged users can log in. Apache Zeppelin
uses Apache Shiro authentication. Copy the Shiro configuration file.
sudo
cp conf/shiro.ini.template conf/shiro.ini
Edit
the configuration file.
sudo
nano conf/shiro.ini
Find
the following lines in the file.
[users]
admin
= password1, admin
user1
= password2, role1, role2
user2
= password3, role3
user3
= password4, role2
The
list contains the username, password, and roles of the users. For now, we will
only use admin and user1. Change the password
of admin and user1 and disable the other users by
commenting them. You can also change the username and roles of the users. To
learn more about Apache Shiro users and roles, read the Shiro
authorization guide.
Once
you have changed the passwords, the code block should will like this.
[users]
admin
= StrongPassword, admin
user1
= UserPassword, role1, role2
#
user2 = password3, role3
#
user3 = password4, role2
Now
restart Zeppelin to apply the changes.
sudo
systemctl restart zeppelin
You
will see that the authentication has been enabled and you will be able to log
in using the username and password set in the Shiro configuration file.