Python Environment
The following setup is required for Python Environment.
The setup should be done on a machine where Jupyter Notebook is configured.
auto_create_notebook and Jupyter services should be up and running.
Python environment will be created on a machine where Jupyter is running.
Python Executable (PEX) should be installed where Gathr services are running.
Prerequisites
Python 3.8 is needed as default.
Install Python 3.8.8 (If providing support for python3 in Python Environment, see the topic Python Configuration).
Install Anaconda (if providing support for creating Python Environment with Anaconda).
PEX should be installed.
To install PEX, run the following commands as a sudo user:
- For pip3 (Python 3)
sudo> pip3 install pex
Packages
The following python packages (and their dependencies) are required for python environment creation.
virtualenv
ipykernel
Gathr (location /<sax_installation_folder>/conf/jupyter/python/gathr_script)
The user should download these packages (and dependencies), compile and place them at a location that is accessible to Jupyter.
Templates
Create a JSON file (as mentioned below) for each python version(s).
Create a JSON file (as mentioned below) to provide support for creating Python Environment with anaconda.
Place the JSON files at the location: /<sax_installation_folder>/conf/common/templates/virtualEnvironments (if some sample JSON files are already available at this location, the user can remove those).
Python Template Conda Template {
“name”: “Python 2.7”,
“version”: “2.7”,
“path”: “/usr/bin/python2.7”,
“type”: “Python”,
“packages”: [
"/home/user/python_packages/ipykernel",
"/home/user/python_packages/virtualenv",
"/home/user/python_packages/gathr"
]
}
{
“name”: “Conda 3.6”,
“version”: “3.6”,
“path”: “conda”,
“type”: “Conda”,
“packages”: [
"/home/user/python_packages/ipykernel",
"/home/user/python_packages/virtualenv",
"/home/user/python_packages/gathr"
]
}
Below table describes the parameters with sample examples:
Property Description Example name Logical name to identify environment. Anaconda2.7/Python2.7 version Python Version (2.7/3.8). 2.7 or 3.8 path Path/Command for Python/Conda executable(s). conda
/usr/bin/python2.7
type Type of environment. Python or Conda packages Location of packages which are required to create environment. home/user/python_packages/ipykernel
Jupyter Location
Login to Gathr using Superuser credentials and go to Configuration from left navigation pane.
Select Others tile and click on Jupyter tab. Provide file system location where environment will be created in the configuration property: jupyter.virtual.environment.default.dir
Python Environments Example
Example for supported python environments of four different types:
Python 3.8 environment
Conda environment with Python 3.8
Anaconda Installation
To install Anaconda on Linux, refer to the link, Installing Anaconda on Linux.
Notes:
Recommended Anaconda version: 4.8.3
Do installation using root user (if you do not want to install for each user separately).
Provide a folder that is accessible to all users (i.e /anaconda3) and provide full permission to this folder.
Select “No” for conda init. (See, Step 7 of the installation sequence in the reference link provided for Installing Anaconda on Linux)
Once installation is done, go to <anaconda installed folder>/bin, run command conda init. It will make entries in .bashrc file. Repeat this step for the users for which you need to initialize conda.
If conda activate base environment after login, then run command conda config –set auto_activate_base false.
Certificate Configuration
User will need the certificate configuration to Connect Python and Anaconda Repositories.
Follow the steps given below to connect to the Python and Anaconda repositories:
Get Root CA Bundle. You can find it at /etc/pki/tls/cert.pem.
If /etc/pki/tls/cert.pem file is available, create a copy of the file and append the below certificate content at the end of the copied .pem file, then proceed to Step 3.
echo -n |openssl s\_client -connect <namenode\_host>:<namenode\_port> | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > impetus-<namenode\_host>.pem
If Hadoop is in HA mode then create two certificates (one for each name node) and append those certificate contents in the copied .pem file.- If not available, then follow the below steps to generate the self-signed certificates using openssl cli on the respective namenodes host.
To generate .keystore file:
keytool -genkeypair -keystore wn0-saxkaf.keystore -keyalg RSA -alias cmhost -dname "CN=$(hostname -f),OU=Impetus,O=Impetus,L=Indore,ST=MP,C=IN" -storepass Impetus1! -keypass Impetus1!
To generate .pem file:
keytool -keystore /opt/cloudera/security/jks/truststore.jks -importcert -alias $(hostname -f) -file $(hostname -f).pem -storepass impetus
If Hadoop is in HA mode then create two certificates (one for each name node) and append those certificate contents in any one of the .pem files.Add this .pem certificate in pip and conda configs using the following command (run the command as a Gathr user):
pip config set global.cert /path/to/cert/file conda config --set ssl\_verify /path/to/cert/file
Export the generated certificate using the
REQUESTS\_CA\_BUNDLE
variable in .bashrc file pointing to the combined certificate .pem file.Restart the Jupyter Notebook service.
Perform these steps where Jupyter is running.
Install Network File System (NFS) Server
Install the below package for NFS server and client using the yum command.
yum install -y nfs-utils
After installing it, you can verify the nfs-utils version and installation status.
Next, create the directory structure "/home/sax/repo460/" on NFS server node.
Once the packages are installed, enable and start NFS services.
systemctl start nfs-server
systemctl enable nfs-server
Please validate that /etc/exports got created.
Add below path to /etc/exports:
Create /python_libraries directory and place NFS packages there for python2.
Now restart the NFS server.
systemctl restart nfs-server.service
If you have any feedback on Gathr documentation, please email us!