Processing Engine
In this article
Note: Some of the properties reflected are not feasible with Multi-Cloud version of Gathr. These properties are marked with **
Configurations properties related to application processing engines come under this category. This category is further divided into two sub-categories.
Spark
Field | Description |
---|---|
Spark Livy URL ** | Livy web server URL through which gathr submit pipelines on Spark. |
Spark Home ** | The spark installation directory. |
Spark Master URL ** | It is the Spark Master URL for e.g. spark://host1:7077 |
Spark cluster manager ** | Defines spark cluster manager i.e. standalone or yarn. |
Spark Job Server Log Directory | Directory path where pipeline logs will be generated when using Spark Job server. |
Spark UI Port | It is the port on which the spark master UI is running. |
spark.history.server | The history server URL. |
Spark Hadoop is HDP | If your environment is HDP, set it to True, otherwise set it to false and use for setting proxy user |
Resource Manager Host | The resource manager hostname used for spark yarn deployment. |
Resource Manager Webapp Port | Yarn Resource Manager UI Port. |
Resource Manager Port | The resource manager port used for storm-yarn deployment. |
ResourceManager High Availability | Enables Resource Manager’s High Availability. |
ResourceManager HA Logical Names | ResourceManager High Availability Logical IDs defined at HA configuration. |
ResourceManager HA Hosts | ResourceManager High Availability host names defined at HA configuration. |
ResourceManager HA ZK Address | ResourceManager High Availability ZooKeeper-Quorum’s address which is defined for HA configuration. |
Spark Job Submit Mode | Submit mode of Spark pipeline using Job-Server. |
Spark UI Host | Host name of the Spark Master. |
Job Server Spark Home | The spark installation directory with which Job Server is configured. |
Job Server URL | The host URL of Job Server. |
Spark REST Host and Port | Spark REST Host name and port for e.g Host1:6066 |
Spark Python Path | This environment variable is used to augment the default search path for Python module files. Directories and individual zip files containing pure Python modules can be added to this path. gathr uses this variable to find PySpark modules usually located at $SPARK_HOME/python/lib. |
If you have any feedback on Gathr documentation, please email us!