Default

Note: Some of the properties reflected are not feasible with Multi-Cloud version of Gathr. These properties are marked with **

All default or shared kind of configurations properties come under this category. This category is further divided into various sub-categories.

Platform

Field	Description
Application Logging Level	The logging level to be used for gathr logs.
Gathr HTTPs Enabled	Whether gathr application support HTTPs protocol or not.
Spark HTTPs Enabled	Whether Spark server support HTTPs protocol or not.
Test Connection Time Out	Timeout for test connection (in ms).
Java Temp Directory	The temp directory location.
Gathr Reporting Period	Whether to enable View Data link in application or not.
View Data Enabled	Whether to enable View Data link in application or not.
TraceMessage Compression	The type of compression used on emitted TraceMessage from any component.
Message Compression	The type of compression used on emitted object from any component.
Enable Gathr Monitoring Flag	Flag to tell if monitoring is enabled or not.
CEP Type	Defines the name of the cep used. Possible value is esper as of now.
Enable Esper HA Global	To enable or disable HA.
CepHA Wait Interval	The wait interval of primary CEP task node.
Gathr Scheduler Interval	The topology stopped alert scheduler’s time interval in seconds.
Enable Gathr Scheduler	Flag to enable or disable the topology stopped alert.
Gathr Session Timeout	The timeout for a login session in gathr.
Enable dashboard	Defines whether dashboard is enable or disable.
Enable Log Agent	Defines if Agent Configuration option should be visible on gathr GUI or not.
Enable Storm Error Search	Enable showing pipeline Application Errors tab using LogMonitoring search page.
Gathr Pipeline Error Search Tenant Token	Tenant token for Pipeline Error Search.
Gathr Storm Error Search Index Expression	Pipeline application error index expression (time based is expression to create indexes in ES or Solr, that is used during retrieval also).
Kafka Spout Connection Retry Sleep Time	Time between consecutive Kafka spout connection retry.
Cluster Manager Home URL	The URL of gathr Cluster Manager
Gathr Pipeline Log Location	gathr Pipeline Log Location.
HDFS Location for Pipeline Jars	HDFS Location for Pipeline Jars.
Scheduler Table Prefix	Tables name starting with a prefix which are related to storing scheduler’s state.
Scheduler Thread Pool Class	Class used to implement thread pool for the scheduler.
Scheduler Thread Pool Thread Count	This count can be any positive integer, although only numbers between 1 and 100 are practical. This is the number of threads that are available for concurrent execution of jobs. If only a few jobs run a few times a day, then 1 thread is plenty. However if multiple jobs, with most of them running every minute, then you probably want a thread count like 50 or 100 (this is dependent on the nature of the jobs performed and available resources).
Scheduler Datasource Max Connections	The maximum number of connections that the scheduler datasource can create in its pool of connections.
Scheduler Misfire Threshold Time	Milliseconds the scheduler will tolerate a trigger to pass its next-fire-time by, before being considered misfired.
HDP Version	Version of HDP ecosystem.
CDH Version	Version of CDH ecosystem.
Audit Targets	Defines the Audit Logging Implementation to be use in the application, Default is file.
Enable Audit	Defines the value (true/false) for enabling audit in application.
Persistence Encryption Key	Specifies the encryption key used to encrypt data in persistence.
Ambari HTTPs Enabled	Whether Ambari server support HTTPs protocol or not.
Graphite HTTPs Enabled	Whether Graphite server support HTTPs protocol or not.
Elastic Search HTTPs Enabled	Whether Elasticsearch engine support HTTPs protocol or not.
SQL Query Execution Log File Path	File location for logging gathr SQL query execution statistics.
SQL Query Execution Threshold Time (in ms)	Defines the max limit of execution time for sql queries after which event will be logged (in ms).
Lineage Persistence Store	The data store that will be used by data lineage feature.
Aspectjweaver jar location	The absolute path of aspectweaver jar required for inspect pipeline or data lineage.
Is Apache Environment	Default value is false. For all apache environment set it to “true”.

Zookeeper

Field	Description
Zookeeper Retry Count	Zookeeper connection retry count.
Zookeeper Retry Delay Interval	Defines the retry interval for the zookeeper connection.
Zookeeper Session Timeout	Zookeeper’s session timeout time.

Spark

Field	Description
Model Registration Validation Timeout(in seconds)	The time, in seconds, after which the MLlib, ML or H2O model registration and validation process will be failed if the process not complete.
Spark Fetch Schema Timeout(in seconds)	The time, in seconds, after which the fetch schema process of register table will be failed if the process not complete.
Spark Failover Scheduler Period(in ms)	Regular intervals to run scheduler tasks. Only applicable for testing connection of Data Sources in running pipeline.
Spark Failover Scheduler Delay(in ms)	Delay after which a scheduler task can run once it is ready. Only applicable for testing connection of Data Sources in running pipeline.
Refresh Superuser Pipelines and Connections	Whether to refresh Superuser Pipelines and Default Connections in database while web studio restart.
Gathr SparkErrorSearchPipeline Index Expression **	Pipeline application error index expression (time based js expression to create indexes in ES or Solr, that is used during retrieval).
Enable Spark Error Search **	Enabled to index and search spark pipeline error in LogMonitoring.
Register Model Minimum Memory	Minimum memory required for web studio to register tables, MLlib, ML or H2O models. Example -Xms512m.
Register Model Maximum Memory	Maximum memory required for web studio to register tables, MLlib, ML or H2O models. Example -Xmx2048m.
H2O Jar Location	Local file system’s directory location at which H2O model jar will be placed after model registration.
H2O Model HDFS Jar Location	HDFS path location at which H2O model jar will be placed after model registration.
Spark Monitoring Scheduler Delay(in ms) **	Specifies the Spark monitoring scheduler delay in milliseconds.
Spark Monitoring Scheduler Period(in ms) **	Specifies the Spark monitoring scheduler period in milliseconds.
Spark Monitoring Enable **	Specifies the flag to enable the spark monitoring.
Spark Executor Java Agent Config	Spark Executor Java Agent configuration to monitor executor process, the command includes jar path, configuration file path and Name of the process.
Spark JVM Monitoring Enable **	Specifies the flag to enable the spark monitoring.
ES query monitoring index name	Provide the ES query monitoring index name which is required for indexing the data of query streaming.
Scheduler period for es monitoring purging	Scheduler period for es monitoring purging in seconds.
Rotation policy for of ES monitoring graph	Specify the rotation policy for index creation for ES monitoring graph (daily for a period of one day and weekly for 7 days).
Purging duration of ES monitoring index	Purge duration for ES in seconds for es monitoring graph index. Index created before this duration will be deleted.
Enable purging scheduler for ES Graph monitoring	Check the checkbox to enable purging scheduler for ES Graph monitoring.
Spark Version **	By default the version is set to 2.3. Note: Set spark version to 2.2 for HDP 2.6.3”
Livy Supported JARs Location **	HDFS location where livy related jar file and application streaming jar file have been kept.
Livy Session Driver Memory **	Minimum memory that will be allocated to driver while creating livy session.
Livy Session Driver Vcores **	Minimum virtual cores that will be allocated to driver while creating Livy session.
Livy Session Executor Memory **	Minimum executor instances that will be allocated while executing while creating Livy seconds where sample data has been kept while schema auto detection.
Livy Session Executor Vcores **	Minimum virtual cores that will be allocated to executor while creating Livy session.
Livy Session Executor Instances **	Minimum executor instances that will be allocated while executing while creating Livy session.HDFS where sample data has been kept while schema auto detection.
Livy Custom Jar HDFS Path **	The full qualified path of HDFS where uploaded custom jar has been kept while creating pipeline.
Livy Data Fetch Timeout **	The query time interval in seconds for fetching data while data inspection.
isMonitoringGraphsEnabled	Whether monitoring graph is enabled or not.
ES query monitoring index name	this property stores the data of monitoring in this given index of default ES connection.
Scheduler period for ES monitoring purging	in this time interval purging scheduler will invoke and check whether the above index is eligible for purging (in sec.) (tomcat restart require).
Rotation policy of ES monitoring graph	“It can have two values daily or weekly” If daily index will be rotated daily else weekly means only a single day data will be stored in single index otherwise a data of a week will be stored in an index.
Purging duration of ES monitoring index	It’s a duration after which index will be deleted default is 604800 sec. Means index will be deleted after 1 week.” (tomcat restart requires)
Enable purging scheduler for ES Graph monitoring	If we need purging of index or not depend on this flag. Purging will not take place if flag is disable. It requires restart of Tomcat Server.

RabbitMQ

Field	Description
RabbitMQ Max Retries	Defines maximum number of retries for the RabbitMQ connection.
RabbitMQ Retry Delay Interval	Defines the retry delay intervals for RabbitMQ connection.
RabbitMQ Session Timeout	Defines session timeout for the RabbitMQ connection.
Real-time Alerts Exchange Name	Defines the RabbitMQ exchange name for real time alert data.

Kafka

Field	Description
Kafka Message Fetch Size Bytes	The number of byes of messages to attempt to fetch for each topic-partition in each fetch request.
Kafka Producer Type	Defines whether Kafka producing data in async or sync mode.
Kafka Zookeeper Session Timeout(in ms)	The Kafka Zookeeper Connection timeout.
Kafka Producer Serializer Class	The class name of the Kafka producer key serializer used.
Kafka Producer Partitioner Class	The class name of the Kafka producer partitioner used.
Kafka Key Serializer Class	The class name of the Kafka producer serializer used.
Kafka 0.9 Producer Serializer Class	The class name of the Kafka 0.9 producer key serializer used.
Kafka 0.9 Producer Partitioner Class	The class name of the Kafka 0.9 producer partitioner used.
Kafka 0.9 Key Serializer Class	The class name of the Kafka 0.9 producer serializer used.
Kafka Producer Batch Size	The batch size of data produced at Kafka from log agent.
Kafka Producer Topic Metadata Refresh Interval(in ms)	The metadata refresh time taken by Kafka when there is a failure.
Kafka Producer Retry Backoff(in ms)	The amount of time that the Kafka producer waits before refreshing the metadata.
Kafka Producer Message Send Max Retry Count	The number of times the producer will automatically retry a failed send request.
Kafka Producer Request Required Acks	The acknowledgment of when a produce request is considered completed.

Security

Field	Description
Kerberos Sections	Section names in keytab_login.conf for which keytabs must be extracted from pipeline if krb.config.override is set to true.
Hadoop Security Enabled	Set to true if Hadoop in use is secured with Kerberos Authentication.
Kafka Security Enabled	Set to true if Kafka in use is secured with Kerberos Authentication.
Solr Security Enabled	Set to true if Solr in use is secured with Kerberos Authentication.
Keytab login conf file Path	Specify path for keytab_login.conf file.

CloudTrial

Field	Description
Cloud Trial	The flag for Cloud Trial. Possible values are True/False.
Cloud Trial Max Datausage Monitoring Size (in bytes)	The maximum data usage limit for cloud trial.
Cloud Trial Day Data Usage Monitoring Size (in bytes)	The maximum data usage for FTP User.
Cloud Trial Data Usage Monitoring From Time	The time from where to enable the data usage monitoring.
Cloud Trial Workers Limit	The maximum number of workers for FTP user.
FTP Service URL	The URL of FTP service to create the FTP directory for logged in user (required only for cloud trial).
FTP Disk Usage Limit	The disk usage limit for FTP users.
FTP Base Path	The base path for the FTP location.

Monitoring

Enable Monitoring Graphs	Set to True to enable Monitoring and to view monitoring graphs.
QueryServer Monitoring Flag	Defines the flag value (true/false) for enabling the query monitoring.
QueryServer Moniting Reporters Supported	Defines the comma-separated list of appenders where metrics will be published. Valid values are graphite, console, logger.
QueryServer Metrics Conversion Rate Unit	Specifies the unit of rates for calculating the queryserver metrics.
QueryServer Metrics Duration Rate Unit	Specifies the unit of duration for the queryserver metrics.
QueryServer Metrics Report Duration	Time period after which query server metrics should be published.
Query Retries	Specifies the number of retries to make a query in indexing.
Query Retry Interval (in ms)	Defines query retry interval in milliseconds.
Error Search Scroll Size	Number of records to fetch in each page scroll. Default value is 10.
Error Search Scroll Expiry Time (in secs)	Time after which search results will expire. Default value is 300 seconds.
Index Name Prefix	Prefix to use for error search system index creation. The prefix will be used to evaluate exact index name with partitioning. Default value is sax_error_.
Index number of shards	Number of shards to create in the error search index. Default value is 5.
Index Replication Factor	Number of replica copies to maintain for each index shard. Default value is 0.
Index Scheduler Frequency (in secs)	Interval (in secs) after which scheduler will collect error data and index in index store.
Index Partitioning Duration (in hours)	Time duration after which a new index will be created using partitioning. Default value is 24 hours.
Data Retention Time (in days)	Time duration for retaining old data. Data above this threshold will be deleted by scheduler. Default value is 60 days.

Audit

Field	Description	Default Value
Enable Event Auditing	Defines the value for enabling events auditing in the application.	true
Events Collection Frequency (in secs)	Time interval (in seconds) in which batch of captured events will be processed for indexing.	10
Events Search Scroll size	Number of records to fetch in each page scroll on result table.	100
Events Search Scroll Expiry (in secs)	Time duration (in seconds) for search scroll window to expire.	300
Events Index Name Prefix	Prefix string for events index name. The prefix will be used to evaluate exact target index name while data partitioning process.	sax_audit_
Events Index Number of Shards	Number of shards to create for events index.	5
Events Index Replication Factor	Number of replica copies to maintain for each index shard.	0
Index Partitioning Duration (in hours)	Time duration (in hours) after which a new index will be created for events data. A partition number will be calculated based on this property. This calculated partition number prefixed with Events Index Name Prefix value will make target index name.	24
Events Retention Time (in days)	Retention time (in days) of data after which it will be auto deleted.	60
Events Indexing Retries	Number of retries to index events data before sending it to a WAL file.	5
Events Indexing Retries Interval (in milliseconds)	It defines the retries interval (in milliseconds) to perform subsequent retries.	3000

Query Server

Field	Description
QueryServer Monitoring Flag	The flag value (true/false) for enabling the query monitoring.
QueryServer Monitoring Reporters Supported	The comma-separated list of appenders where metrics will be published. Valid values are graphite, console, logger.
QueryServer Metrics Conversion Rate Unit	Specifies the unit of rates for calculating the queryserver metrics.
QueryServer Metrics Duration Rate Unit	Specifies the unit of duration for the queryserver metrics.
QueryServer Metrics Report Duration	Time after which query server metrics should be published.
QueryServer Metrics Report Duration Unit	The units for reporting query server metrics.
Query Retries	The number of retries to make a query in indexing.
Query Retry Interval (in ms)	Defines query retry interval in milliseconds.

Others

Field	Description
Audit Targets	Defines the audit logging implementation to be used in the application, Default is fine.
ActiveMQ Connection Timeout(in ms)	Defines the active MQTT connection timeout interval in ms.
MQTT Max Retries	Max retries of MQTT server.
MQTT Retry Delay Interval	Retry interval, in milliseconds, for MQTT retry mechanism.
JMS Max Retries	Max retries of JMS server.
JMS Retry Delay Interval	Retry interval, in milliseconds, for JMS retry mechanism.
Metrics Conversion Rate Unit	Specifies the unit of rates for calculating the queryserver metrics.
Metrics Duration Rate Unit	Specifies the unit of duration for the metrics.
Metrics Report Duration	Specifies the duration at interval of which reporting of metrics will be done.
Metrics Report Duration Unit	Specifies the unit of the duration at which queryserver metrics will be reported.
Gathr Default Tenant Token	Token of user for HTTP calls to LogMonitoring for adding/modifying system info.
LogMonitoring Dashboard Interval(in min)	Log monitoring application refresh interval.
Logmonitoring Supervisors Servers	Servers dedicated to run LogMonitoring pipeline.
Export Search Raw Field	Comma separated fields to export LogMonitoring search result.
Elasticsearch Keystore download path prefix	Elasticsearch keystore download path prefix in case of uploading keystore.
Tail Logs Server Port	Listening port number where tail command will listen incoming streams of logs, default is 9001.
Tail Logs Max Buffer Size	Maximum number of lines, that can be stored on browser, default is 1000.
sax.datasets.profile.frequency.distribution.count.limit	Defines the number of distinct values to be shown in the frequency distribution graph of a column in a Dataset.
sax.datasets.profile.generator.json.template	common/templates/DatasetProfileGenerator.json Template of the spark job used to generate profile of a Dataset.
Pipeline Test Connection Enabled	Check mark the checkbox to enable the email notification when a pipeline component is down.
Pipeline Error Notification Email IDs	Provide comma separated email IDs for pipeline error notification.
Pipeline Test Connection Retry Counts	Provide value for Pipeline Test Connection Retry Counts.
Pipeline Test Connection Retry Interval Limits	Provide value for Pipeline Test Connection Retry Interval Limits.
sax.python.command	Provide the sax.python.command
Load IDW functions on Inspect And Pipeline Run	Select the check-box for loading the IDW functions on Inspect And Pipeline Run.
Impersonation User Editable	Select the check-box for impersonating the User Editable option.
Superuser connections allowed	Select the check-box for allowing the superuser connections functionality.
Metering Retention Period(days)	Provide the value for Metering Retention Period(days).
H2O Auth Enabled	Select the check-box to enable the H2O Auth.
Pipeline History Max Fetch Size	Number of records to be fetched on pipeline history page. The default value is 100.
Pipeline Start Scheduler Timeout(seconds)	provide value for Pipeline Start Scheduler Timeout(seconds).
Enabled Pipeline Health Checker	Select the check-box option to enabled Pipeline Health Checker.
Pipeline Health Checker Frequency(in secs)	Provide value for Pipeline Health Checker Frequency(in secs).
Pipeline Health Checker Notification Frequency(in mins)	Provide value for Pipeline Health Checker Notification Frequency(in mins).
Ignore schedule if pipeline is active	Option to Ignore schedule if pipeline is active. If this option is set as false and pipeline is found active in the next schedule it will kill the pipeline. and start the next schedule.
Gathr Log Directory	Provide Gathr Log Directory. Example: Default.
Enabled Config Provider	Select the check-box to Enabled Config Provider.
Config Provider Rest Service HTTP connection Timeout	Provide details for Config Provider Rest Service HTTP connection Timeout.
Config provider Implemenation Class	Provide details for Config provider Implemenation Class.
Config Provider Property File Path	Provide details for Config Provider Property File Path. Example: /tmp/configprovider.property
Config Provider Auth Enabled	Select check-box for Config Provider Auth Enabled option.
Config Provider Auth User Name	Provide the Config Provider Auth User Name. Example: admin
Config Provider Auth Password	Provide the Config Provider Auth Password.
User Login Expire Period	Provide user login expire period in hours. -1 indicates no expiration.
Pipeline Test Connection Enabled	Select the check-box to enable the notification when a pipeline component is down.
Scheduler Heartbeat Check Interval	Provide the Scheduler Heartbeat Check Interval.
Maintenance Mode Enabled	Provide true or false value for enabling the email notification in case pipeline component stops working.
Restart Pipeline Kill Wait Time	Provide value for Restart Pipeline Kill Wait Time.
Pipeline Heartbeat Publish Interval	Provide value for Pipeline Heartbeat Publish Interval.
Enable Batch History Count	Provide true or false value for Enable Batch History Count option.
Enable Univocity Parser for delimited data	Select the check-box to enable Univocity Parser for delimited data.
Contextual Logs	A detailed contextual information (e.g. userName, roles, projectName) will be appended in the logs once this option is enabled.
Enable Event Notifier	Check this option to enable event notification based on the provided event notifier type. For example: SNS. Provide the AWS Key Id and AWS Secret Key.
Publish Accumulator Data On Elasticsearch	Check this option to enable Publish Accumulator Data On Elasticsearch.
isConnectionJarPathTenantDependent	Option to allow upload jars according to tenant ID.
Cluster-Mediator ES Logs Index	Details for Cluster-Mediator ES Logs Index.
EMR Services ES Logs Index	Details for EMR Services ES Logs Index.
ES Index Mapping	Details for ES Index Mapping.
External Service Logs Index	Details for External Service Logs Index.
GCP Dataproc Services ES Logs Index	Details for GCP Dataproc Services ES Logs Index.
Inspection ES Logs Index	Details for Inspection ES Logs Index.
Local Pipeline ES Logs Index	Details for Local Pipeline ES Logs Index.
MWAA Services ES Logs Index	Details for MWAA Services ES Logs Index.
Pipeline ES Logs Index	Details for Pipeline ES Logs Index.
Webstudio ES Logs Index	Details for Webstudio ES Logs Index.
Gathr Additional Jars Classpath	Details for Gathr Additional Jars Classpath.
Spark configurations file for FingerPrinting	Path for config files contains spark submit configurations for fingerprinting usecase.
Check resources before submitting Pipeline	Check available resources on resource manager before submitting Pipeline.
Bulk pipeline submission queue capacity	Provide value for Bulk pipeline submission queue capacity option.
Number of threads for bulk pipeline submit	Provide value for number of threads for bulk pipeline submit.
Bulk pipeline submit monitor thread’s frequency (secs)	Provide details for Bulk pipeline submit monitor thread’s frequency (secs).
Bulk pipeline submission connection timeout (secs)	Provide value for Bulk pipeline submission connection timeout (secs).
Bulk pipeline submit sleep when idle (secs)	Provide value for bulk pipeline submission queue capacity.
Store Alerts	Upon enabling this option the alerts will be stored in the database.
DB Notifier purging scheduler duration (mins)	Provide value for DB Notifier purging scheduler duration in minutes. Default value is 720.
DB Notifier alerts type	Option to provide the type of DB Notifier alert type. Example: Pipeline Status Alert, Pipeline Stats Alert, Pipeline Error Mode alert, Pipeline Stats alert and * for saving all types of alerts.
DB Notifier max alert day count	The alerts will get deleted from the database based on the value provide (in days).

Click the SAVE button to save the configuration details.

If you have any feedback on Gathr documentation, please email us!

Default

Platform #

Zookeeper #

Spark #

RabbitMQ #

Kafka #

Security #

CloudTrial #

Monitoring #

Audit #

Query Server #

Others #