Apache Airflow Installation

Gathr supports Airflow Version 2.1.2 (Airflow2) respectively.

This topic captures installation steps for a fresh installation of Airflow2 and also the steps to upgrade from Airflow1 to Airflow2.

Airflow2 Installation/Upgrade

Given below are the steps to do a fresh installation of Airflow2 (Version: 2.1.2) and also to upgrade from Airflow1 to Airflow2.

Prerequisites

  • Default Python must be 2.7.

  • Python and Python2 must point to Python 2.7.

  • Python 3.8.8 must be installed. Python3 must point to Python 3.8.x.

  • pip and pip2 must point to pip2.7.

  • pip3 must point to pip3.8.8

  • Make sure that the version of SQLite database is greater than 3.15.0 (For Airflow 2 only).

Remove Airflow 1.x

If you have an installation of Airflow 1.10.5 with Python 2.7, then first follow these steps for uninstalling Airflow 1.10.5.

If not, then skip these steps.

  1. Unschedule all workflows on Gathr.

  2. Run the below command and copy the airflow installation location (i.e., /usr/lib/python2.7/site-packages)

    sax> pip2 show apache-airflow
    
  3. Uninstall Airflow 1.10.5 using the below command:

    root> pip2 uninstall apache-airflow
    
  4. Go to Airflow installation location (i.e., /usr/lib/python2.7/site-packages) and remove all the folders related to Airflow.

  5. Run the below command:

    root> whereis airflow
    root> rm -rf /usr/bin/airflow
    

    It will show the airflow executable (i.e., /usr/bin/airflow). Delete this file.

  6. Go to AIRFLOW_HOME, take the backup of airflow.cfg file using the below command:

    sax> cd $AIRFLOW\_HOME
    sax> mv airflow.cfg airflow.cfg.bck
    
  7. Go to AIRFLOW\_HOME and remove contents from dags folder and plugin folder using the below command:

    sax> cd $AIRFLOW\_HOME/dags
    sax> rm -rf \*
    sax> cd $AIRFLOW\_HOME/plugins
    sax> rm -rf \*
    

Airflow2 Installation/Upgrade Steps

  1. Create a folder, that will be used as Airflow home using the below command:

    sax> mkdir /home/sax/airflow\_home
    
  2. Create a folder dags using the below command:

    sax > mkdir /home/sax/airflow\_home/dags
    
  3. Login with root user creds, open .bashrc file and append the below statement in the same.

    export SLUGIFY\_USES\_TEXT\_UNIDECODE=yes
    
  4. Login with sax user creds, open .bashrc file and add airflow home as env:

    export AIRFLOW\_HOME=/home/sax/airflow\_home
    
  5. Install Airflow using the below command:

    root > pip3 install apache-airflow==2.1.2
    
  6. Initialize the Airflow database using the below command:

    sax> airflow db init
    

To configure a different database, please see Database Configuration.

To know more about how to get started with Apache Airflow, refer to the link below:

https://airflow.apache.org/docs/apache-airflow/stable/start/index.html

Airflow Providers Installation

The next step is to install the Airflow providers.

Use the below commands to install the Airflow providers:

root> yum install mariadb-devel (for ubuntu run sudo apt-get install libmysqlclient-dev
sudo apt-get install libmariadbclient-dev)
root>pip3 install apache-airflow-providers-apache-hdfs==1.0.1
root>pip3 install apache-airflow-providers-postgres==1.0.2
root>pip3 install apache-airflow-providers-mysql==1.1.0
root>pip3 install apache-airflow-providers-microsoft-mssql==1.1.0
root>pip3 install apache-airflow-providers-sftp==1.2.0
root>pip3 install apache-airflow-providers-ssh==1.3.0
root>pip3 install apache-airflow-providers-vertica==1.0.1
root>pip3 install kafka-python==2.0.2
root>pip3 install holidays==0.9.10
root>pip3 install apache-airflow-providers-http==1.1.1
root>pip3 install gssapi==1.7.0

To know more about Apache Airflow installation, refer to the link below:

https://airflow.apache.org/installation.html

Kerberos Support

Use the below commands to install the Kerberos-related system packages.

root>yum install cyrus-sasl-devel.x86\_64
root>pip3 install apache-airflow[kerberos]==2.1.2

Config File Updates <Configuration>

Go to $AIRFLOW_HOME and open airflow.cfg file.

Change the following properties in the file:

PropertiesValues
base_url= http://ipaddress:port (i.e. http://172.29.59.97:9292)
web_server_host= ipaddress
web_server_port= port (i.e. 9292)

Add SMTP details for email under section [smtp] in config file.

Uncomment and provide values for the following properties:

  • smtp_host
  • smtp_user
  • smtp_password
  • smtp_port
  • smtp_mail_from
catchup_by_default= False
dag_dir_list_interval= 5
executor= LocalExecutor

If the environment is Kerberos Security enabled, then add the following configurations:

security
[kerberos]
= Kerberos
ccache= cache file path
principal= user principal
reinit_frequency= 3600
kinit_path= path to kinit command (i.e. kinit)
keytab= keytab file file (i.e. /etc/security/keytabs/service.keytab)

Database Configuration

Steps for Airflow Upgrade

  1. Copy the value of property sql_alchemy_conn from airflow.cfg.bck file.

  2. Provide the copied value in airflow.cfg file for property sql_alchemy_conn.

  3. Run the below command:

    sax>airflow db upgrade
    

Steps for Fresh Installation

Airflow uses SQLite as the default database. It also allows user to change to a preferred database.

Steps to configure Postgres as the preferred database are given below:

  1. Create airflow user using below command:

    sudo -u postgres createuser --interactive
    

    Enter the name of role to add: airflow

    Shall the new role be a superuser? (y/n) n

    Shall the new role be allowed to create databases? (y/n) n

    Shall the new role be allowed to create more new roles? (y/n) n

  2. Set the password for airflow user using the below command:

    postgres=# ALTER USER airflow WITH PASSWORD 'airflow';
    
  3. Create Airflow database using below command:

    postgres=# CREATE DATABASE airflow;
    
  4. Grant permission to Airflow database using below command:

    Postgres=# GRANT ALL PRIVILEGES ON DATABASE airflow to airflow;
    
  5. Open airflow.cnf file and provide Postgres details (i.e username, password, ipaddress:port and databasename).

    sql\_alchemy\_conn = postgresql://username:password@ipaddress/databasename
    
  6. Generate the new fernet key for fresh installation and update this value in Airflow.

    • Open the python3 terminal and import the fernet module by executing the below command:
    from cryptography.fernet import Fernet
    
    • Generate the fernet key using the below command:
    fernet\_key= Fernet.generate\_key()
    
    • Get the newly generated fernet key on console using below command:
    print(fernet\_key.decode()) # <your fernet\_key>
    
    • Update this fernet key in airflow.cfg file which is present in the following path:

      /home/sax/airflow\_home/

    The commands to update Fernet key in the config file are:

    vi airflow\_home/airflow.cfg
    /fernet
    Fenet\_key=# paste the fernet\_key here which was generated using above steps.
    :wq!
    

    To know more about usage of fernet in Airflow, refer to the link below:

    https://airflow.apache.org/docs/apache-airflow/stable/security/secrets/fernet.html

  7. Now, run the below command to setup the database:

    sax> airflow db init
    

    If SQLite version is lesser than 3.15.0, then below commands can be used for the database upgrade. (For Airflow 2 only)

    root> wget https://www.sqlite.org/src/tarball/sqlite.tar.gz
    root> tar xzf sqlite.tar.gz
    root> cd sqlite/
    root> export CFLAGS="-DSQLITE\_ENABLE\_FTS3 \
    -DSQLITE\_ENABLE\_FTS3\_PARENTHESIS \
    -DSQLITE\_ENABLE\_FTS4 \
    -DSQLITE\_ENABLE\_FTS5 \
    -DSQLITE\_ENABLE\_JSON1 \
    -DSQLITE\_ENABLE\_LOAD\_EXTENSION \
    -DSQLITE\_ENABLE\_RTREE \
    -DSQLITE\_ENABLE\_STAT4 \
    -DSQLITE\_ENABLE\_UPDATE\_DELETE\_LIMIT \
    -DSQLITE\_SOUNDEX \
    -DSQLITE\_TEMP\_STORE=3 \
    -DSQLITE\_USE\_URI \
    -O2 \
    -fPIC"
    root> export PREFIX="/usr/local"
    root> LIBS="-lm" ./configure --disable-tcl --enable-shared --enable-tempstore=always --prefix="$PREFIX"
    root> make
    root> make install
    

Create Admin User

Run the below command to create an admin user in airflow:

sax> airflow users create --firstname <firstname> --lastname <lastname> --password <password> --role Admin --username <firstname> --email <user’s email ID>

You can use same command to generate multiple users for Airflow with different roles.

Gathr supports default authentication method which is Airflow DB authentication.

Plugin Installation

Steps to add Gathr Airflow Plugin in Airflow:

  1. Create plugins folder in Airflow home (if it does not exist) i.e. $AIRFLOW_HOME/plugins.

  2. Go to the folder <sax_home>/ conf/common/airflow-plugin/airflow2/ and copy the content from this folder to Airflow plugins folder.

    Start airflow using the below command:

    sax> airflow webserver -p <port\_number>
    

    Start airflow scheduler using the below command:

    sax> airflow scheduler
    
Top