Processing Engine

Note: Some of the properties reflected are not feasible with Multi-Cloud version of Gathr. These properties are marked with **

Configurations properties related to application processing engines come under this category. This category is further divided into two sub-categories.

Spark

FieldDescription
Spark Livy URL **Livy web server URL through which gathr submit pipelines on Spark.
Spark Home **The spark installation directory.
Spark Master URL **It is the Spark Master URL for e.g. spark://host1:7077
Spark cluster manager **Defines spark cluster manager i.e. standalone or yarn.
Spark Job Server Log DirectoryDirectory path where pipeline logs will be generated when using Spark Job server.
Spark UI PortIt is the port on which the spark master UI is running.
spark.history.serverThe history server URL.
Spark Hadoop is HDPIf your environment is HDP, set it to True, otherwise set it to false and use for setting proxy user
Resource Manager HostThe resource manager hostname used for spark yarn deployment.
Resource Manager Webapp PortYarn Resource Manager UI Port.
Resource Manager PortThe resource manager port used for storm-yarn deployment.
ResourceManager High AvailabilityEnables Resource Manager’s High Availability.
ResourceManager HA Logical NamesResourceManager High Availability Logical IDs defined at HA configuration.
ResourceManager HA HostsResourceManager High Availability host names defined at HA configuration.
ResourceManager HA ZK AddressResourceManager High Availability ZooKeeper-Quorum’s address which is defined for HA configuration.
Spark Job Submit ModeSubmit mode of Spark pipeline using Job-Server.
Spark UI HostHost name of the Spark Master.
Job Server Spark HomeThe spark installation directory with which Job Server is configured.
Job Server URLThe host URL of Job Server.
Spark REST Host and PortSpark REST Host name and port for e.g Host1:6066
Spark Python PathThis environment variable is used to augment the default search path for Python module files. Directories and individual zip files containing pure Python modules can be added to this path. gathr uses this variable to find PySpark modules usually located at $SPARK_HOME/python/lib.
Top