Installing Jupyter, IDEs and Sparkmagic on Centos/RHEL

πŸ’‘ The component versions that are mentioned in this topic are for representational purpose only. For the Gathr supported component version details, see Component Versions Supported β†’

You can install Jupyter using Docker or on the Host Machine.

If you do not have root permission, then the installation steps can be considered for presentational purposes. (i.e, the root/sax/Gathr user in the installation steps will be the non-root user for ex., <IDE_USER>).

To Install using Docker, follow the below link:

https://hub.docker.com/r/streamanalytiximpetus/jupyter

To Install it on Host Machine, follow the below steps:

Prerequisite

  • Gathr 4.9.0 onwards supports JupyterLab, R Studio and VS Code. Therefore, any existing installation of Jupyter must be uninstalled before proceeding further. Refer to the topic Uninstall Existing Jupyter Version for more details.

  • Default Python must be 2.7.

  • Python and Python2 must point to Python 2.7.

  • Python 3.8.8 must be installed. Python3 must point to Python 3.8.8

  • pip and pip2 must point to pip2.7.

  • pip3 must point to pip3.8.8

In addition, the following libraries are required:

gcc (sudo yum install gcc)
python-devel (sudo yum install python-devel)
krb5-devels (sudo yum install krb5-devel)

Uninstall Existing Jupyter Version

Login as root user and run the below commands to uninstall the existing Jupyter version:

pip2 uninstall jupyter
pip2 uninstall jupyter\_core
pip2 uninstall jupyter-client
pip2 uninstall jupyter-console
pip2 uninstall jupyterlab\_pygments
pip2 uninstall notebook
pip2 uninstall jupyter\_highlight\_selected\_word
pip2 uninstall jupyter\_contrib\_core
pip2 uninstall jupyter\_contrib\_nbextensions
pip2 uninstall jupyter\_nbextensions\_configurator
pip2 uninstall nbformat
pip2 uninstall qtconsole
pip2 uninstall nbconvert
pip2 uninstall sparkmagic

Jupyter Installation

To install Jupyter, login with root user and use the following command:

root> pip3 install jupyter

JupyterHub Installation

To install JupyterHub, login with root user and do as follows:

  • Check nodejs version, otherwise install it using the below commands:

    curl --silent --location https://rpm.nodesource.com/setup\_12.x | sudo bash -
    wget https://nodejs.org/dist/v12.1.0/node-v12.1.0-linux-x64.tar.gz
    sudo yum -y install nodejs
    Check node version node --version v12.1.0
    
  • Install JupyterHub using Python3:

    python3 -m pip install jupyterhub
    
  • Untar the custom proxy package configurable-http-proxy-gathr-hub.tar.gz that you have received from Gathr support using the below command:

    tar -zxvf configurable-http-proxy-gathr-hub.tar.gz
    
  • Go to the config HTTP home directory and run the below commands to install the custom proxy:

    npm install
    npm install -g
    
  • Install Jupyter Notebook and JupyterLab using the below commands:

    python3 -m pip install notebook
    python3 -m pip install jupyterlab
    

IDEs Installation

Gathr supports JupyterLab, VS Code and RStudio Integrated Development Environments (IDEs) that can be used with JupyterHub.

The subsequent topics cover the installation steps for the Gathr supported IDEs.

JupyterLab Installation

To install JupyterLab, with the root user use the following command:

root> pip3 install jupyterlab

If the following error occurs while installing Jupyter, then first run the commands given below:

ERROR: ipykernel requires Python version 3.4 or above.

root> pip3 install ipython
root> pip3 install ipykernel

Now, install Jupyter again.

pip3 install sudospawner==0.5.2

As a root user, run the following command:

root> pip3 install jupyter\_contrib\_nbextensions

Install Gathr Python Library

Install the Gathr python library (on node where Jupyter is running).

A python library is written to provide the functionality of reading sources, fetching data from sources, and creating a data frame in notebooks.

Dependent Libraries

Run commands as a root user. It will install all the prerequisites python libraries.

root> pip install numpy==1.14
root> pip install pandas==0.22
root> pip install scipy==1.1.0
root> pip install sklearn
root> pip install scikit-learn==0.19.1
root> pip install matplotlib
root> pip3 install numpy==1.19.2
root> pip3 install pandas==1.2.4
root> pip3 install scipy==1.6.2
root> pip3 install sklearn
root> pip3 install scikit-learn==0.24.1
root> pip3 install matplotlib==3.4.2

Follow the steps below to Install Gathr Python Library on the node where Jupyter is running:

  1. Go to Gathr user’s home folder.

    ~/
    

    Create a directory named .gathr and create a sax.config file inside it (as shown below)

    .gathr\sax.config
    

    Add the following content in sax.config file.

    [DEFAULT]
    SAX\_URL = <sax\_url>
    SAX\_DATA\_SOURCE\_GATEWAY = Gathr/notebook/sourceDetail
    SAX\_SSL\_ENABLE = <ssl\_enable>
    SSL\_CERTIFICATE\_PATH = <certificate\_path>
    

    Change <sax_url> entry with Gathr hostname/ipaddress and port (i.e http://localhost:8090).

    By default, a user can keep <ssl_enable> as FALSE.

    If SSL is enabled (i.e. Gathr application is running with https), then change <ssl_enable> as TRUE and change <certificate_path> with the location of the certificate that will be used to access the application.

  2. Open a terminal, login as root user and change the directory to <gathr_installation_dir>/conf/jupyter/python/gathr_script.

    Install_Gathr_Python_Library_01

    If Jupyter is installed on some other node, then copy the folder <gathr_installation_dir>/conf/jupyter/python to that machine and continue the following steps.

  3. Run the below command:

    root> python3 setup.py build
    

    This will build the library as shown in screenshot below:

    Install_Gathr_Python_Library_02

  4. Now run the install command as a root user.

    root> python3 setup.py install
    
    root> python2 setup.py build
    root> python2 setup.py install
    

    It will install required packages if not available and install Gathr python library.

    Install_Gathr_Python_Library_03

  5. Check installation folder of Gathr using command:

    pip show gathr
    

    Install_Gathr_Python_Library_04

  6. Now to check whether Gathr library is available in python environment, go to python console and run the command:

    >>> import gathr
    

    Install_Gathr_Python_Library_05

    If Gathr is not properly installed, you will get an error.

Auto-Create Notebook Using REST API

  1. On the node where Jupyter is running, login using the Gathr user and navigate to folder «gathr_Installation_Dir»/conf/jupyter/python/autonotebook.

    gathr> cd
    <<gathr\_Installation\_Dir>>/conf/jupyter/python/autonotebook
    
  2. Run the auto\_create\_notebook.py script using the following command:

    gathr> python3 auto\_create\_notebook.py &
    

    It will start service on port 5000. If you want to change the port then give the port number as an argument.

    Example: python auto_create_notebook.py port=5004.

    By default, it will create log file at folder from where script is started with name auto\_create\_notebook.log. If you want to change log file path, then while starting script, give an argument as log file path and name.

    Example:

    gathr> python3 auto\_create\_notebook.py
    logfile=/my/log/folder/auto\_create\_notebook.log port=5004 &
    

    To configure Jupyter in Gathr, see Jupyter in Configuring Properties.

Install Python Kernel on Jupyter

Install python2/python3 kernel on Jupyter for python2.7 and python 3.8 as follows:

Run the following commands with root user:

- python2 -m pip install ipykernel
- python3 -m pip install ipykernel
- python2 -m ipykernel install
- python3 -m ipykernel install

R and RStudio Installation

To install the necessary runtime dependencies for R, you will need to enable additional repositories for third-party or source packages by using the following commands:

# Enable the Extra Packages for Enterprise Linux (EPEL) repository
$ sudo yum install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# On RHEL 7, enable the Optional repository
$ sudo subscription-manager repos --enable "rhel-\*-optional-rpms"

# If running RHEL 7 in a public cloud, such as Amazon EC2, enable the
# Optional repository from Red Hat Update Infrastructure (RHUI) instead
$ sudo yum install yum-utils
$ sudo yum-config-manager --enable "rhel-\*-optional-rpms"

Specify R Version

Define the version of R that you want to install.

This environment variable will be used in all subsequent steps.

You can install a different version of R by specifying the desired version in this step.

Terminal
$ export R\_VERSION=4.0.5

Download and Install R

Download and install the desired version of R using the below command:

curl -O https://cdn.rstudio.com/r/centos-7/pkgs/R-${R\_VERSION}-1-1.x86\_64.rpm
$ sudo yum install R-${R\_VERSION}-1-1.x86\_64.rpm

Verify R Installation

To make sure that R is installed successfully run the below command:

Terminal
$ /opt/R/${R\_VERSION}/bin/R --version

To ensure that R is available on the default system PATH variable, create symbolic links to the version of R that you installed using the below command:

Terminal
$ sudo ln -s /opt/R/${R\_VERSION}/bin/R /usr/local/bin/R
$ sudo ln -s /opt/R/${R\_VERSION}/bin/Rscript /usr/local/bin/Rscript

RStudio IDE Setup

Installation location: /opt/R/<R_VERSION>/bin/R/

Install RStudio Server using the below command:

wget https://download2.rstudio.org/server/centos7/x86\_64/rstudio-server-rhel-1.4.1717-x86\_64.rpm

Install RStudio using below command:

sudo yum install rstudio-server-rhel-1.4.1717-x86\_64.rpm

Verify whether RStudio services are running using the below commands:

sudo systemctl status rstudio-server.service
sudo systemctl start rstudio-server.service (if it is not started already)
sudo systemctl enable rstudio-server.service

Check to verify if the IDE is pointing to the below URL on the browser:

http://<host_IP>:8787

Code Server and VS Code Installation

Follow the steps given below to install VS Code:

  1. Login with root user and navigate to " / "

  2. Create directory for VS Code using the below command:

    mkdir code-server
    
  3. Give full access to the directory created using the below command:

    chmod -R 777 code-server
    
  4. Install code server from below URL:

    wget https://github.com/cdr/code-server/releases/download/v3.11.0/code-server-3.11.0-linux-amd64.tar.gz
    
  5. Untar the copied zip file using the below command:

    tar -xvzf code-server-3.11.0-linux-amd64.tar.gz in /code-server/code-server-3.11.0-linux-amd64
    

JupyterHub With IDEs

There are certain prerequisites for Gathr supported IDEs before starting them behind JupyterHub.

Prerequisites

The required IDEs must be installed. For more details, see IDEs Installation.

If you have root permission perform the below steps.

To create users run the below commands:

root> adduser sax
root> passwd sax (Password = sax)

Edit the /etc/sudoers file and add the below properties:

root> vi /etc/sudoers
**sax ALL=(ALL) NOPASSWD: JHPROCESSES**
**Cmnd\_Alias JHPROCESSES = /usr/local/bin/jupyterhub, /usr/local/bin/python3.8**

If you do not have root permission, then follow the below steps:

  1. Create OS user group and users that are going to utilize IDE services must be a part of this group. i.e., <IDE_GROUP>

  2. Edit /etc/sudoers with root user and add the below properties for respective IDE’S:

    For VSCode:

    Runas\_Alias JUPYTER\_USERS = <IDE\_USER>
    Cmnd\_Alias JUPYTER\_VSCODE\_CMD = <IDE\_INSTALLATION\_PATH>/code-server-3.11.0-linux-amd64/bin/code-server
    <IDE\_USER> ALL=(%<IDE\_GROUP>) NOPASSWD:JUPYTER\_VSCODE\_CMD
    <IDE\_USER> ALL=(%<IDE\_GROUP>) NOPASSWD:/usr/local/bin/sax-vscode-spawner
    <IDE\_USER> ALL=(%<IDE\_GROUP>) NOPASSWD:/usr/local/bin/jupyterhub
    <IDE\_USER> ALL=(%<IDE\_GROUP>) NOPASSWD:/usr/local/bin/python3.8
    

    For Juypter

    Runas\_Alias JUPYTER\_USERS = <IDE\_USER>
    Cmnd\_Alias SUDO\_SPAWNER\_CMD = /usr/local/bin/sudospawner
    <IDE\_USER> ALL=(%IDE\_GROUP) NOPASSWD:SUDO\_SPAWNER\_CMD
    <IDE\_USER> ALL=(%IDE\_GROUP) NOPASSWD:/bin/kill
    <IDE\_USER> ALL=(%<IDE\_GROUP>) NOPASSWD:/usr/local/bin/jupyterhub
    <IDE\_USER> ALL=(%<IDE\_GROUP>) NOPASSWD:/usr/local/bin/python3.8
    

    Next, create a backup of remote_user_auth.py at the below location (If available or else ignore this step):

    <HOST>: <python installation path>/lib/python3.x/site-packages/jhub\_remote\_user\_authenticator
    

    Use the below command:

    mv remote\_user\_auth.py remote\_user\_auth\_old.py
    
    python3 -m pip install jhub\_remote\_user\_authenticator
    

    Copy the file remote_user_auth.py and paste at the location:

    <HOST>: /usr/local/lib/python3.x/site-packages/jhub\_remote\_user\_authenticator
    

    Create a new directory using the below command:

    mkdir /<IDE\_INSTALLATION\_PATH>/virtual\_env
    

    Update permissions using the below command:

    chmod 777 -R /<IDE\_INSTALLATION\_PATH>/virtual\_env
    

    From:/<Gathr_installation_dir>/upgrade/Gathr/conf/non-container/jupyterhub/

    copy vscodespawner @ /<IDE_INSTALLATION_PATH>/

    Run the following command from the above folder:

    python3 setup.py install
    

    Install Python virtual environment using below command:

    pip3 install virtualenv
    

    Create folder /<IDE_INSTALLATION_PATH>/jupyterhub_config_file_vscode [sax@<HOST>, <HOST1>, <HOST2>, and so on.]

    Create a new directory using the below command:

    mkdir /<IDE\_INSTALLATION\_PATH>/ jupyterhub\_config\_file\_vscode
    

    Update permissions using below command:

    chmod 777 -R /<IDE\_INSTALLATION\_PATH>/jupyterhub\_config\_file\_vscode
    

    Change the directory using below commands:

    cd /<IDE\_INSTALLATION\_PATH>/jupyterhub\_config\_file\_vscode
    cd /<IDE\_INSTALLATION\_PATH>/
    

    Update permissions using the below command:

    chmod 777 \* .
    

    Run via root - in /<IDE_INSTALLATION_PATH>/jupyterhub_config_file_vscode to generate config file jupyterhub\_config.py.

    jupyterhub --generate-config
    

    Update the below properties: /<IDE_INSTALLATION_PATH>/jupyterhub_config_file_vscode/jupyterhub_config.py

    c.JupyterHub.authenticator\_class = 'jhub\_remote\_user\_authenticator.remote\_user\_auth.RemoteUserAuthenticator'
    
    c.JupyterHub.spawner\_class = 'vscodespawner.SaxVSCodeProcessSpawner'
    c.Spawner.cmd = ['<CODE\_SERVER\_INSTALLATION\_LOCATION>/bin/code-server']
    c.JupyterHub.hub\_bind\_url = 'http://HOST:7071'
    c.JupyterHub.bind\_url = 'http://HOST:7000'
    c.ConfigurableHTTPProxy.command = ["configurable-http-proxy","--base-prefix", "/"]
    c.ConfigurableHTTPProxy.api\_url = "http://HOST:7001"
    c.JupyterHub.statsd\_port = 7125
    
    Note: If JupyterHub is running behind HA Proxy, then append Hub base URL as below:
    
    c.JupyterHub.bind\_url = 'http://HOST:7000/<unique placeholder for each machine; example- VSC1>'
    c.ConfigurableHTTPProxy.command = ["configurable-http-proxy","--base-prefix", "VSC1"]
    

    If it is non-root user then change the spawner_class as mentioned below by keeping the remaining properties as it is:

    c.JupyterHub.spawner\_class = 'vscodespawner.vscode\_sudo\_spawner.SaxVSCodeSudoProcessSpawner'
    

    Create a folder /<IDE_INSTALLATION_PATH>/jupyterhub_config_file_lab [sax@<HOST>] using the below command:

    |mkdir /<IDE\_INSTALLATION\_PATH>/jupyterhub\_config\_file\_lab|
    

    Update permissions using the below commands:

    chmod 755 -R /<IDE\_INSTALLATION\_PATH>/jupyterhub\_config\_file\_lab
    

    Change the directory using the below commands:

    cd /<IDE\_INSTALLATION\_PATH>/
    

    Update permissions using below command:

    chmod 755 \* .
    

    Run via root - in /<IDE_INSTALLATION_PATH>/jupyterhub_config_file_lab

    jupyterhub --generate-config
    

    Update below properties in /<IDE_INSTALLATION_PATH>/jupyterhub_config_file_lab/jupyterhub_config.py

    c.JupyterHub.authenticator\_class = 'jhub\_remote\_user\_authenticator.remote\_user\_auth.RemoteUserAuthenticator'
    c.Spawner.default\_url = '/lab'
    c.Spawner.cmd = ['/usr/local/bin/jupyterhub-singleuser']
    c.LocalProcessSpawner.shell\_cmd = ['bash', '-l', '-c']
    c.Spawner.args = ['--NotebookApp.allow\_origin=\*', '--NotebookApp.tornado\_settings={"headers":{"Content-Security-Policy":"frame-ancestors \* self "}}']
    c.JupyterHub.tornado\_settings = {"headers":{"Access-Control-Allow-Origin":"\*", "Content-Security-Policy":"frame-ancestors \* none "}}
    c.JupyterHub.hub\_bind\_url = 'http://HOST:7571'
    c.JupyterHub.bind\_url = 'http://HOST:7500'
    c.ConfigurableHTTPProxy.api\_url = "http://HOST:7501" 
    c.JupyterHub.statsd\_port = 7525
    
    Note: If JupyterHub is running behind HA Proxy, then append Hub base URL as below:
    
    c.JupyterHub.bind\_url = 'http://HOST:7000/<unique placeholder for each machine; example- jupyter>'
    

    If it is non-root user then change the spawner_class and add sudospawner_path as mentioned below by keeping the remaining properties as it is:

    c.JupyterHub.spawner\_class = 'sudospawner.SudoSpawner'
    c.SudoSpawner.sudospawner\_path = '/usr/local/bin/sudospawner'
    

    Copy the loadBalancer.sh file from the below location:

    <HOST>: <gathr installation path>/conf/non-container/customLoadBalancer/loadBalancer.sh
    

    Paste it to:

    /home/ of IDE machine

    Update permissions using the below command:

    chmod 755 /home/< user with SSH permission to all the hosts >/loadBalancer.sh
    

    Create the folder structure /home/<IDE_USER>/.gathr/sax.config file and add the below content to sax.config file:

    IF GATHR IS NON SSL :
    [DEFAULT]
    SAX\_URL = <sax\_url>
    SAX\_DATA\_SOURCE\_GATEWAY = Gathr/notebook/sourceDetail
    
    IF GATHR IS SSL Enabled:
    [DEFAULT]
    SAX\_URL = https://<HAPROXY>
    SAX\_DATA\_SOURCE\_GATEWAY = Gathr/notebook/sourceDetail
    SAX\_SSL\_ENABLE = TRUE
    SSL\_CERTIFICATE\_PATH = <Certificates\_path>/<certificate\_name>.pem
    

    Now go to /IDE_INSTALLATION_PATH/gathr_script folder and run the below commands:

    python3 setup.py build</p><p>python3 setup.py install
    

JupyterHub with Autnotebook

nohup <PYTHON\_INSTALL\_PATH>/bin/<PYTHON3.8> $<IDE\_INSTALLATION\_LOCATION\_VALUE>/autonotebook/auto\_create\_notebook.py &

JupyterHub with VSCode

Start JupyterHub for VS Code via <IDE_USER> user from the location: <IDE_INSTALLATION_PATH>/jupyterhub_config_file_vscode using the below command:

nohup sudo /usr/local/bin/python3.8 /usr/local/bin/jupyterhub -f jupyterhub\_config.py &

JupyterHub with Jupyter Lab

Start JupyterHub for JupyterLab via <IDE_USER> user from the location: <IDE_INSTALLATION_PATH>/ jupyterhub_config_file_lab using the below command:

nohup sudo /usr/local/bin/python3.8 /usr/local/bin/jupyterhub -f jupyterhub\_config.py &

SSL in JupyterLab

If SSL is required in Jupyter Lab then follow the below steps:

  1. Create .keystore file, if not already done using the below command:

    keytool -genkeypair -keystore $(hostname -f).keystore -keyalg RSA -alias $(hostname -f) -dname "CN=$(hostname -f),OU=Impetus,O=Impetus,L=Indore,ST=MP,C=IN" -storepass Impetus1! -keypass Impetus1! -ext san=dns:$(hostname -f),dns:localhost,ip:$(hostname -i)
    
  2. Create keystore.jks file, if not already done.

  3. Create .pem file, if not already done.

    keytool -export -alias $(hostname -f) -keystore $(hostname -f).keystore -rfc -file $(hostname -f).pem
    
  4. Create keystore.p12 file using the below command:

    keytool -importkeystore -srckeystore keystore.jks -destkeystore keystore.p12 -deststoretype PKCS12
    
  5. Create .key file using the below command:

    openssl pkcs12 -in keystore.p12 -nodes -nocerts -out $(hostname -f).key
    
  6. Add the below entries in jupyterhub\_config.py files [for vscode and lab] and restart JupyterHub.

    c.JupyterHub.ssl\_cert = '<path to pem file created in step 3 above >' 
    c.JupyterHub.ssl\_key = '<path to key file created in step 5 above>'
    
  7. Import pem files to Gathr machine’s jvm, if not already done.

  8. In Gathr, update below:

    PrerequisitesDetails
    Configurations> Default> SandboxSandbox Jupyterhub IDE SSL Enabled : true
    Sandbox RStudio IDE SSL Enabled : false
    PrerequisitesDetails
    Configurations> Others> Jupyterjupyter.url : <make it https>
    Configurations> Others> VSCodevscode.url : <make it https>

Sparkmagic Installation

Remove the existing spark kernals using the below command (root):

root> jupyter kernelspec uninstall pysparkkernel sparkkernel sparkkernel pyspark3kernel

Again, do the same steps where Jupyter is installed.

To install Sparkmagic, login with root user and run the following command:

References: https://github.com/jupyter-incubator/sparkmagic

root> pip3 install sparkmagic==0.17.1
root> jupyter nbextension enable --py --sys-prefix widgetsnbextension
root> jupyter labextension install β€œ@jupyter-widgets/jupyterlab-manager”
root> jupyter labextension enable β€œ@jupyter-widgets/jupyterlab-manager”

For validating the location of Sparkmagic, run the following command:

root> pip3 show sparkmagic

Now, execute the below command:

root> cd <location of spark magic>

Then run the following commands to install kernels, this will activate Scala, Pyspark, and Python kernels in Sparkmagic for further use:

root> jupyter-kernelspec install sparkmagic/kernels/sparkkernel
root> jupyter-kernelspec install sparkmagic/kernels/pysparkkernel
root> jupyter-kernelspec install sparkmagic/kernels/sparkrkernel
root> jupyter serverextension enable --py sparkmagic

Configuration for Gathr User

Login with Gathr user and follow the below steps:

  1. Create directory ~/.sparkmagic if it does not exist.

  2. Create config.json file at path ~/.sparkmagic and add details as given in example_config.json.

  3. Provide Livy URL under all kernels (i.e. kernel_python_credentials etc.) in config.json (default is localhost).

    Configuration_for_Gathr_User_01

  4. Make sure that in the config.json file the auth variable is given as None.

Adding Custom jar in Notebook Class Path

The configuration steps to add a custom jar in Notebook class path (Gathr User) are given as follows:

  1. Upload spark-notebook.jar, ojdbc8.jar, mysql-connector-java-8.0.13.jar and hive-warehouse-connector-assembly-1.0.0.3.1.0.0-78.jar to Hadoop file system of the cluster.

  2. Provide uploaded jar location in file ~/.sparkmagic/config.json under the below properties:

    • jars

    • conf > spark.driver.extraClassPath

    • conf > spark.executor.extraClassPath

    For CDP: “hdfs://cdpservice/”

    For Apache: hdfs://nameservice/

    For CDH: hdfs://cdh630service/

  3. Provide ZooKeeper entries in file ~/.sparkmagic/config.json under the following properties:

    • spark.executor.extraJavaOptions

    • spark.driver.extraJavaOptions

    These Zookeeper entries are present at the following location:

    <<gathr_Installation_Dir>>/conf/config.properties

    Adding_Custom_jar_in_Notebook_Class_Path

Certificate Configuration

Follow the steps given below for certificate configuration and to connect to the Python and Anaconda repositories:

  1. Get Root CA Bundle.

    You may find it at the location: /etc/pki/tls/cert.pem, or /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem

  2. If /etc/pki/tls/cert.pem file is available, create a copy of the file and append the below certificate content at the end of the copied .pem file, then proceed to Step 3.

    echo -n |openssl s\_client -connect <namenode\_host>:<namenode\_port> | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > impetus-<namenode\_host>.pem
    
    keytool -genkeypair -keystore wn0-saxkaf.keystore -keyalg RSA -alias cmhost-dname "CN=$(hostname -f),OU=Impetus,O=Impetus,L=Indore,ST=MP,C=IN" -storepass Impetus1! -keypass Impetus1!
    
    keytool -keystore /opt/cloudera/security/jks/truststore.jks -importcert -alias$(hostname -f) -file $(hostname -f).pem -storepass impetus
    

    For generating certificate for Livy, use the below command:

    echo -n |openssl s\_client -connect <livy\_host>:<livy\_port> | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > impetus-<livy\_host>.pem
    
  3. Merge the generated certificates (cert.pem, namenode\_host.crt and livy.crt) into a single file and place this file in /home/<Gathr User>.

  4. Export the generated certificate using the REQUESTS\_CA\_BUNDLE variable in .bashrc file pointing to the combined certificate .pem file.

    Make sure that the certificates that are generated have 666 permissions using the below command:

    chmod 666 <cert\_name>
    
  5. Restart the Jupyter Notebook service.

WebHDFS SSL Enabled

If the WebHDFS is SSL enabled then follow the below steps:

Create server side certificate for active name node for the box where Jupyter is running.

If Hadoop is in HA mode then create two certificates (one for each name node) and combine and create a single certificate using these:

Keep it at location (on same box) which can be accessible by Jupyter service

In .bashrc file make certificate’s entry like: export REQUESTS\_CA\_BUNDLE=/home/sax/hdfsssl7815.crt

Restart the Jupyter Notebook service.

Post Installation

After installation is complete, make sure that the following services are running:

  • Jupyter notebook on port 8888

  • Auto create notebook service on port 5000

Troubleshooting

Pyspark/Scala Error

  1. If the following error occurs while opening pyspark or scala notebook:

    The code failed because of a fatal error:

    Failed to register auto viz for notebook.

    Then, first check pandas version using the below command:

    pip show pandas
    

    If it is 0.23, then, downgrade it to version 0.22 using below commands:

    root> pip uninstall pandas
    root>pip install pandas==0.22
    

    Now, open config.json file at path ~/.sparkmagic. Search for entry use\_auto\_viz and change its value to false.

  2. If the notebook takes time to create spark session in pyspark and scala notebooks and the session is not up in 60 seconds, then open the config.json file at path ~/.sparkmagic. Search for entry livy\_session\_startup\_timeout\_seconds and increase the number of seconds (i.e 120).

  3. Also, make sure that the configurations given in config.json file at path ~/.sparkmagic should be syntactically correct.

    Otherwise, sparkmagic library will fail to parse this JSON and will not be able to use pyspark and scala notebooks.

  4. If there is any version mismatch for Pyspark/Spark, then place the Pyspark libs in /pysparklibs.

    Pyspark_Libs

Top