Python Environment

The following setup is required for Python Environment.

  • The setup should be done on a machine where Jupyter Notebook is configured.

  • auto_create_notebook and Jupyter services should be up and running.

  • Python environment will be created on a machine where Jupyter is running.

  • Python Executable (PEX) should be installed where Gathr services are running.

Prerequisites

  1. Python 3.8 is needed as default.

  2. Install Python 3.8.8 (If providing support for python3 in Python Environment, see the topic Python Configuration).

  3. Install Anaconda (if providing support for creating Python Environment with Anaconda).

  4. PEX should be installed.

    To install PEX, run the following commands as a sudo user:

    • For pip3 (Python 3)
    sudo> pip3 install pex
    

Packages

The following python packages (and their dependencies) are required for python environment creation.

  • virtualenv

  • ipykernel

  • Gathr (location /<sax_installation_folder>/conf/jupyter/python/gathr_script)

The user should download these packages (and dependencies), compile and place them at a location that is accessible to Jupyter.

Templates

  1. Create a JSON file (as mentioned below) for each python version(s).

  2. Create a JSON file (as mentioned below) to provide support for creating Python Environment with anaconda.

  3. Place the JSON files at the location: /<sax_installation_folder>/conf/common/templates/virtualEnvironments (if some sample JSON files are already available at this location, the user can remove those).

    Python TemplateConda Template

    {

    “name”: “Python 2.7”,

    “version”: “2.7”,

    “path”: “/usr/bin/python2.7”,

    “type”: “Python”,

    “packages”: [

    "/home/user/python_packages/ipykernel",

    "/home/user/python_packages/virtualenv",

    "/home/user/python_packages/gathr"

    ]

    }

    {

    “name”: “Conda 3.6”,

    “version”: “3.6”,

    “path”: “conda”,

    “type”: “Conda”,

    “packages”: [

    "/home/user/python_packages/ipykernel",

    "/home/user/python_packages/virtualenv",

    "/home/user/python_packages/gathr"

    ]

    }

    Below table describes the parameters with sample examples:

    PropertyDescriptionExample
    nameLogical name to identify environment.Anaconda2.7/Python2.7
    versionPython Version (2.7/3.8).2.7 or 3.8
    pathPath/Command for Python/Conda executable(s).

    conda

    /usr/bin/python2.7

    typeType of environment.Python or Conda
    packagesLocation of packages which are required to create environment.home/user/python_packages/ipykernel

Jupyter Location

Login to Gathr using Superuser credentials and go to Configuration from left navigation pane.

Select Others tile and click on Jupyter tab. Provide file system location where environment will be created in the configuration property: jupyter.virtual.environment.default.dir

Python_environment_setup

Python Environments Example

Example for supported python environments of four different types:

  1. Python 3.8 environment

  2. Conda environment with Python 3.8

    Python_environment_setup_example

Anaconda Installation

To install Anaconda on Linux, refer to the link, Installing Anaconda on Linux.

Notes:

  • Recommended Anaconda version: 4.8.3

  • Do installation using root user (if you do not want to install for each user separately).

  • Provide a folder that is accessible to all users (i.e /anaconda3) and provide full permission to this folder.

  • Select “No” for conda init. (See, Step 7 of the installation sequence in the reference link provided for Installing Anaconda on Linux)

  • Once installation is done, go to <anaconda installed folder>/bin, run command conda init. It will make entries in .bashrc file. Repeat this step for the users for which you need to initialize conda.

  • If conda activate base environment after login, then run command conda config –set auto_activate_base false.

Certificate Configuration

User will need the certificate configuration to Connect Python and Anaconda Repositories.

Follow the steps given below to connect to the Python and Anaconda repositories:

  1. Get Root CA Bundle. You can find it at /etc/pki/tls/cert.pem.

  2. If /etc/pki/tls/cert.pem file is available, create a copy of the file and append the below certificate content at the end of the copied .pem file, then proceed to Step 3.

    echo -n |openssl s\_client -connect <namenode\_host>:<namenode\_port> | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > impetus-<namenode\_host>.pem
    
    • If not available, then follow the below steps to generate the self-signed certificates using openssl cli on the respective namenodes host.

    To generate .keystore file:

    keytool -genkeypair -keystore wn0-saxkaf.keystore -keyalg RSA -alias cmhost -dname "CN=$(hostname -f),OU=Impetus,O=Impetus,L=Indore,ST=MP,C=IN" -storepass Impetus1! -keypass Impetus1!
    

    To generate .pem file:

    keytool -keystore /opt/cloudera/security/jks/truststore.jks -importcert -alias $(hostname -f) -file $(hostname -f).pem -storepass impetus
    
  3. Add this .pem certificate in pip and conda configs using the following command (run the command as a Gathr user):

    pip config set global.cert /path/to/cert/file
    conda config --set ssl\_verify /path/to/cert/file
    
  4. Export the generated certificate using the REQUESTS\_CA\_BUNDLE variable in .bashrc file pointing to the combined certificate .pem file.

  5. Restart the Jupyter Notebook service.

Install Network File System (NFS) Server

Install the below package for NFS server and client using the yum command.

yum install -y nfs-utils

After installing it, you can verify the nfs-utils version and installation status.

nfs-utils_and_installation_status

Next, create the directory structure "/home/sax/repo460/" on NFS server node.

Once the packages are installed, enable and start NFS services.

systemctl start nfs-server
systemctl enable nfs-server

Please validate that /etc/exports got created.

Add below path to /etc/exports:

etc_export_paths

Create /python_libraries directory and place NFS packages there for python2.

NFS_packages.png

Now restart the NFS server.

systemctl restart nfs-server.service
Top