Apache Airflow Installation
Gathr supports Airflow Version 2.1.2 (Airflow2) respectively.
This topic captures installation steps for a fresh installation of Airflow2 and also the steps to upgrade from Airflow1 to Airflow2.
Airflow2 Installation/Upgrade
Given below are the steps to do a fresh installation of Airflow2 (Version: 2.1.2) and also to upgrade from Airflow1 to Airflow2.
Prerequisites
Default Python must be 2.7.
Python and Python2 must point to Python 2.7.
Python 3.8.8 must be installed. Python3 must point to Python 3.8.x.
pip and pip2 must point to pip2.7.
pip3 must point to pip3.8.8
Make sure that the version of SQLite database is greater than 3.15.0 (For Airflow 2 only).
Remove Airflow 1.x
If you have an installation of Airflow 1.10.5 with Python 2.7, then first follow these steps for uninstalling Airflow 1.10.5.
If not, then skip these steps.
Unschedule all workflows on Gathr.
Run the below command and copy the airflow installation location (i.e., /usr/lib/python2.7/site-packages)
sax> pip2 show apache-airflow
Uninstall Airflow 1.10.5 using the below command:
root> pip2 uninstall apache-airflow
Go to Airflow installation location (i.e., /usr/lib/python2.7/site-packages) and remove all the folders related to Airflow.
Run the below command:
root> whereis airflow root> rm -rf /usr/bin/airflow
It will show the airflow executable (i.e., /usr/bin/airflow). Delete this file.
Go to AIRFLOW_HOME, take the backup of
airflow.cfg
file using the below command:sax> cd $AIRFLOW\_HOME sax> mv airflow.cfg airflow.cfg.bck
Go to
AIRFLOW\_HOME
and remove contents fromdags
folder andplugin
folder using the below command:sax> cd $AIRFLOW\_HOME/dags sax> rm -rf \* sax> cd $AIRFLOW\_HOME/plugins sax> rm -rf \*
Airflow2 Installation/Upgrade Steps
Create a folder, that will be used as Airflow home using the below command:
sax> mkdir /home/sax/airflow\_home
Create a folder
dags
using the below command:sax > mkdir /home/sax/airflow\_home/dags
Login with root user creds, open
.bashrc
file and append the below statement in the same.export SLUGIFY\_USES\_TEXT\_UNIDECODE=yes
Login with sax user creds, open .bashrc file and add airflow home as env:
export AIRFLOW\_HOME=/home/sax/airflow\_home
Install Airflow using the below command:
root > pip3 install apache-airflow==2.1.2
Initialize the Airflow database using the below command:
sax> airflow db init
To configure a different database, please see Database Configuration.
To know more about how to get started with Apache Airflow, refer to the link below:
https://airflow.apache.org/docs/apache-airflow/stable/start/index.html
Airflow Providers Installation
The next step is to install the Airflow providers.
Use the below commands to install the Airflow providers:
root> yum install mariadb-devel (for ubuntu run sudo apt-get install libmysqlclient-dev
sudo apt-get install libmariadbclient-dev)
root>pip3 install apache-airflow-providers-apache-hdfs==1.0.1
root>pip3 install apache-airflow-providers-postgres==1.0.2
root>pip3 install apache-airflow-providers-mysql==1.1.0
root>pip3 install apache-airflow-providers-microsoft-mssql==1.1.0
root>pip3 install apache-airflow-providers-sftp==1.2.0
root>pip3 install apache-airflow-providers-ssh==1.3.0
root>pip3 install apache-airflow-providers-vertica==1.0.1
root>pip3 install kafka-python==2.0.2
root>pip3 install holidays==0.9.10
root>pip3 install apache-airflow-providers-http==1.1.1
root>pip3 install gssapi==1.7.0
To know more about Apache Airflow installation, refer to the link below:
https://airflow.apache.org/installation.html
Kerberos Support
Use the below commands to install the Kerberos-related system packages.
root>yum install cyrus-sasl-devel.x86\_64
root>pip3 install apache-airflow[kerberos]==2.1.2
Config File Updates <Configuration>
Go to $AIRFLOW_HOME and open airflow.cfg
file.
Change the following properties in the file:
Properties | Values |
---|---|
base_url | = http://ipaddress:port (i.e. http://172.29.59.97:9292) |
web_server_host | = ipaddress |
web_server_port | = port (i.e. 9292) |
Add SMTP details for email under section [smtp] in config file.
Uncomment and provide values for the following properties:
- smtp_host
- smtp_user
- smtp_password
- smtp_port
- smtp_mail_from
catchup_by_default | = False |
dag_dir_list_interval | = 5 |
executor | = LocalExecutor |
If the environment is Kerberos Security enabled, then add the following configurations:
security [kerberos] | = Kerberos |
ccache | = cache file path |
principal | = user principal |
reinit_frequency | = 3600 |
kinit_path | = path to kinit command (i.e. kinit) |
keytab | = keytab file file (i.e. /etc/security/keytabs/service.keytab) |
Database Configuration
Steps for Airflow Upgrade
Copy the value of property sql_alchemy_conn from
airflow.cfg.bck
file.Provide the copied value in
airflow.cfg
file for property sql_alchemy_conn.Run the below command:
sax>airflow db upgrade
Steps for Fresh Installation
Airflow uses SQLite as the default database. It also allows user to change to a preferred database.
Steps to configure Postgres as the preferred database are given below:
Create
airflow
user using below command:sudo -u postgres createuser --interactive
Enter the name of role to add: airflow
Shall the new role be a superuser? (y/n) n
Shall the new role be allowed to create databases? (y/n) n
Shall the new role be allowed to create more new roles? (y/n) n
Set the password for
airflow
user using the below command:postgres=# ALTER USER airflow WITH PASSWORD 'airflow';
Create Airflow database using below command:
postgres=# CREATE DATABASE airflow;
Grant permission to Airflow database using below command:
Postgres=# GRANT ALL PRIVILEGES ON DATABASE airflow to airflow;
Open
airflow.cnf
file and provide Postgres details (i.e username, password, ipaddress:port and databasename).sql\_alchemy\_conn = postgresql://username:password@ipaddress/databasename
Generate the new fernet key for fresh installation and update this value in Airflow.
- Open the python3 terminal and import the fernet module by executing the below command:
from cryptography.fernet import Fernet
- Generate the fernet key using the below command:
fernet\_key= Fernet.generate\_key()
- Get the newly generated fernet key on console using below command:
print(fernet\_key.decode()) # <your fernet\_key>
Store the generated fernet key securly.Update this fernet key in
airflow.cfg
file which is present in the following path:/home/sax/airflow\_home/
The commands to update Fernet key in the config file are:
vi airflow\_home/airflow.cfg /fernet Fenet\_key=# paste the fernet\_key here which was generated using above steps. :wq!
To know more about usage of fernet in Airflow, refer to the link below:
https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/fernet.html
Now, run the below command to setup the database:
sax> airflow db init
If SQLite version is lesser than 3.15.0, then below commands can be used for the database upgrade. (For Airflow 2 only)
root> wget https://www.sqlite.org/src/tarball/sqlite.tar.gz root> tar xzf sqlite.tar.gz root> cd sqlite/ root> export CFLAGS="-DSQLITE\_ENABLE\_FTS3 \ -DSQLITE\_ENABLE\_FTS3\_PARENTHESIS \ -DSQLITE\_ENABLE\_FTS4 \ -DSQLITE\_ENABLE\_FTS5 \ -DSQLITE\_ENABLE\_JSON1 \ -DSQLITE\_ENABLE\_LOAD\_EXTENSION \ -DSQLITE\_ENABLE\_RTREE \ -DSQLITE\_ENABLE\_STAT4 \ -DSQLITE\_ENABLE\_UPDATE\_DELETE\_LIMIT \ -DSQLITE\_SOUNDEX \ -DSQLITE\_TEMP\_STORE=3 \ -DSQLITE\_USE\_URI \ -O2 \ -fPIC" root> export PREFIX="/usr/local" root> LIBS="-lm" ./configure --disable-tcl --enable-shared --enable-tempstore=always --prefix="$PREFIX" root> make root> make install
Create Admin User
Run the below command to create an admin user in airflow:
sax> airflow users create --firstname <firstname> --lastname <lastname> --password <password> --role Admin --username <firstname> --email <userβs email ID>
You can use same command to generate multiple users for Airflow with different roles.
Gathr supports default authentication method which is Airflow DB authentication.
Plugin Installation
Steps to add Gathr Airflow Plugin in Airflow:
Create plugins folder in Airflow home (if it does not exist) i.e. $AIRFLOW_HOME/plugins.
Go to the folder <sax_home>/ conf/common/airflow-plugin/airflow2/ and copy the content from this folder to Airflow plugins folder.
Start airflow using the below command:
sax> airflow webserver -p <port\_number>
Start airflow scheduler using the below command:
sax> airflow scheduler
If you have any feedback on Gathr documentation, please email us!