Default

Note: Some of the properties reflected are not feasible with Multi-Cloud version of Gathr. These properties are marked with **

All default or shared kind of configurations properties come under this category. This category is further divided into various sub-categories.

Platform

FieldDescription
Application Logging LevelThe logging level to be used for gathr logs.
Gathr HTTPs EnabledWhether gathr application support HTTPs protocol or not.
Spark HTTPs EnabledWhether Spark server support HTTPs protocol or not.
Test Connection Time OutTimeout for test connection (in ms).
Java Temp DirectoryThe temp directory location.
Gathr Reporting PeriodWhether to enable View Data link in application or not.
View Data EnabledWhether to enable View Data link in application or not.
TraceMessage CompressionThe type of compression used on emitted TraceMessage from any component.
Message CompressionThe type of compression used on emitted object from any component.
Enable Gathr Monitoring FlagFlag to tell if monitoring is enabled or not.
CEP TypeDefines the name of the cep used. Possible value is esper as of now.
Enable Esper HA GlobalTo enable or disable HA.
CepHA Wait IntervalThe wait interval of primary CEP task node.
Gathr Scheduler IntervalThe topology stopped alert scheduler’s time interval in seconds.
Enable Gathr SchedulerFlag to enable or disable the topology stopped alert.
Gathr Session TimeoutThe timeout for a login session in gathr.
Enable dashboardDefines whether dashboard is enable or disable.
Enable Log AgentDefines if Agent Configuration option should be visible on gathr GUI or not.
Enable Storm Error SearchEnable showing pipeline Application Errors tab using LogMonitoring search page.
Gathr Pipeline Error Search Tenant TokenTenant token for Pipeline Error Search.
Gathr Storm Error Search Index ExpressionPipeline application error index expression (time based is expression to create indexes in ES or Solr, that is used during retrieval also).
Kafka Spout Connection Retry Sleep TimeTime between consecutive Kafka spout connection retry.
Cluster Manager Home URLThe URL of gathr Cluster Manager
Gathr Pipeline Log Locationgathr Pipeline Log Location.
HDFS Location for Pipeline JarsHDFS Location for Pipeline Jars.
Scheduler Table PrefixTables name starting with a prefix which are related to storing scheduler’s state.
Scheduler Thread Pool ClassClass used to implement thread pool for the scheduler.
Scheduler Thread Pool Thread Count

This count can be any positive integer, although only numbers between 1 and 100 are practical.

This is the number of threads that are available for concurrent execution of jobs.

If only a few jobs run a few times a day, then 1 thread is plenty. However if multiple jobs, with most of them running every minute, then you probably want a thread count like 50 or 100 (this is dependent on the nature of the jobs performed and available resources).

Scheduler Datasource Max ConnectionsThe maximum number of connections that the scheduler datasource can create in its pool of connections.
Scheduler Misfire Threshold TimeMilliseconds the scheduler will tolerate a trigger to pass its next-fire-time by, before being considered misfired.
HDP VersionVersion of HDP ecosystem.
CDH VersionVersion of CDH ecosystem.
Audit TargetsDefines the Audit Logging Implementation to be use in the application, Default is file.
Enable AuditDefines the value (true/false) for enabling audit in application.
Persistence Encryption KeySpecifies the encryption key used to encrypt data in persistence.
Ambari HTTPs EnabledWhether Ambari server support HTTPs protocol or not.
Graphite HTTPs EnabledWhether Graphite server support HTTPs protocol or not.
Elastic Search HTTPs EnabledWhether Elasticsearch engine support HTTPs protocol or not.
SQL Query Execution Log File PathFile location for logging gathr SQL query execution statistics.
SQL Query Execution Threshold Time (in ms)Defines the max limit of execution time for sql queries after which event will be logged (in ms).
Lineage Persistence StoreThe data store that will be used by data lineage feature.
Aspectjweaver jar locationThe absolute path of aspectweaver jar required for inspect pipeline or data lineage.
Is Apache EnvironmentDefault value is false. For all apache environment set it to “true”.

Zookeeper

FieldDescription
Zookeeper Retry CountZookeeper connection retry count.
Zookeeper Retry Delay IntervalDefines the retry interval for the zookeeper connection.
Zookeeper Session TimeoutZookeeper’s session timeout time.

Spark

FieldDescription
Model Registration Validation Timeout(in seconds)The time, in seconds, after which the MLlib, ML or H2O model registration and validation process will be failed if the process not complete.
Spark Fetch Schema Timeout(in seconds)The time, in seconds, after which the fetch schema process of register table will be failed if the process not complete.
Spark Failover Scheduler Period(in ms)Regular intervals to run scheduler tasks. Only applicable for testing connection of Data Sources in running pipeline.
Spark Failover Scheduler Delay(in ms)Delay after which a scheduler task can run once it is ready. Only applicable for testing connection of Data Sources in running pipeline.
Refresh Superuser Pipelines and ConnectionsWhether to refresh Superuser Pipelines and Default Connections in database while web studio restart.
Gathr SparkErrorSearchPipeline Index Expression **Pipeline application error index expression (time based js expression to create indexes in ES or Solr, that is used during retrieval).
Enable Spark Error Search **Enabled to index and search spark pipeline error in LogMonitoring.
Register Model Minimum MemoryMinimum memory required for web studio to register tables, MLlib, ML or H2O models. Example -Xms512m.
Register Model Maximum MemoryMaximum memory required for web studio to register tables, MLlib, ML or H2O models. Example -Xmx2048m.
H2O Jar LocationLocal file system’s directory location at which H2O model jar will be placed after model registration.
H2O Model HDFS Jar LocationHDFS path location at which H2O model jar will be placed after model registration.
Spark Monitoring Scheduler Delay(in ms) **Specifies the Spark monitoring scheduler delay in milliseconds.
Spark Monitoring Scheduler Period(in ms) **Specifies the Spark monitoring scheduler period in milliseconds.
Spark Monitoring Enable **Specifies the flag to enable the spark monitoring.
Spark Executor Java Agent ConfigSpark Executor Java Agent configuration to monitor executor process, the command includes jar path, configuration file path and Name of the process.
Spark JVM Monitoring Enable **Specifies the flag to enable the spark monitoring.
ES query monitoring index nameProvide the ES query monitoring index name which is required for indexing the data of query streaming.
Scheduler period for es monitoring purgingScheduler period for es monitoring purging in seconds.
Rotation policy for of ES monitoring graphSpecify the rotation policy for index creation for ES monitoring graph (daily for a period of one day and weekly for 7 days).
Purging duration of ES monitoring indexPurge duration for ES in seconds for es monitoring graph index. Index created before this duration will be deleted.
Enable purging scheduler for ES Graph monitoringCheck the checkbox to enable purging scheduler for ES Graph monitoring.
Spark Version **

By default the version is set to 2.3.

Note: Set spark version to 2.2 for HDP 2.6.3”

Livy Supported JARs Location **HDFS location where livy related jar file and application streaming jar file have been kept.
Livy Session Driver Memory **Minimum memory that will be allocated to driver while creating livy session.
Livy Session Driver Vcores **Minimum virtual cores that will be allocated to driver while creating Livy session.
Livy Session Executor Memory **Minimum executor instances that will be allocated while executing while creating Livy seconds where sample data has been kept while schema auto detection.
Livy Session Executor Vcores **Minimum virtual cores that will be allocated to executor while creating Livy session.
Livy Session Executor Instances **Minimum executor instances that will be allocated while executing while creating Livy session.HDFS where sample data has been kept while schema auto detection.
Livy Custom Jar HDFS Path **The full qualified path of HDFS where uploaded custom jar has been kept while creating pipeline.
Livy Data Fetch Timeout **The query time interval in seconds for fetching data while data inspection.
isMonitoringGraphsEnabledWhether monitoring graph is enabled or not.
ES query monitoring index namethis property stores the data of monitoring in this given index of default ES connection.
Scheduler period for ES monitoring purgingin this time interval purging scheduler will invoke and check whether the above index is eligible for purging (in sec.) (tomcat restart require).
Rotation policy of ES monitoring graph

“It can have two values daily or weekly”

If daily index will be rotated daily else weekly means only a single day data will be stored in single index otherwise a data of a week will be stored in an index.

Purging duration of ES monitoring indexIt’s a duration after which index will be deleted default is 604800 sec. Means index will be deleted after 1 week.” (tomcat restart requires)
Enable purging scheduler for ES Graph monitoringIf we need purging of index or not depend on this flag. Purging will not take place if flag is disable. It requires restart of Tomcat Server.

RabbitMQ

FieldDescription
RabbitMQ Max RetriesDefines maximum number of retries for the RabbitMQ connection.
RabbitMQ Retry Delay IntervalDefines the retry delay intervals for RabbitMQ connection.
RabbitMQ Session TimeoutDefines session timeout for the RabbitMQ connection.
Real-time Alerts Exchange NameDefines the RabbitMQ exchange name for real time alert data.

Kafka

FieldDescription
Kafka Message Fetch Size BytesThe number of byes of messages to attempt to fetch for each topic-partition in each fetch request.
Kafka Producer TypeDefines whether Kafka producing data in async or sync mode.
Kafka Zookeeper Session Timeout(in ms)The Kafka Zookeeper Connection timeout.
Kafka Producer Serializer ClassThe class name of the Kafka producer key serializer used.
Kafka Producer Partitioner ClassThe class name of the Kafka producer partitioner used.
Kafka Key Serializer ClassThe class name of the Kafka producer serializer used.
Kafka 0.9 Producer Serializer ClassThe class name of the Kafka 0.9 producer key serializer used.
Kafka 0.9 Producer Partitioner ClassThe class name of the Kafka 0.9 producer partitioner used.
Kafka 0.9 Key Serializer ClassThe class name of the Kafka 0.9 producer serializer used.
Kafka Producer Batch SizeThe batch size of data produced at Kafka from log agent.
Kafka Producer Topic Metadata Refresh Interval(in ms)The metadata refresh time taken by Kafka when there is a failure.
Kafka Producer Retry Backoff(in ms)The amount of time that the Kafka producer waits before refreshing the metadata.
Kafka Producer Message Send Max Retry CountThe number of times the producer will automatically retry a failed send request.
Kafka Producer Request Required AcksThe acknowledgment of when a produce request is considered completed.

Security

FieldDescription
Kerberos SectionsSection names in keytab_login.conf for which keytabs must be extracted from pipeline if krb.config.override is set to true.
Hadoop Security EnabledSet to true if Hadoop in use is secured with Kerberos Authentication.
Kafka Security EnabledSet to true if Kafka in use is secured with Kerberos Authentication.
Solr Security EnabledSet to true if Solr in use is secured with Kerberos Authentication.
Keytab login conf file PathSpecify path for keytab_login.conf file.

CloudTrial

FieldDescription
Cloud TrialThe flag for Cloud Trial. Possible values are True/False.
Cloud Trial Max Datausage Monitoring Size (in bytes)The maximum data usage limit for cloud trial.
Cloud Trial Day Data Usage Monitoring Size (in bytes)The maximum data usage for FTP User.
Cloud Trial Data Usage Monitoring From TimeThe time from where to enable the data usage monitoring.
Cloud Trial Workers LimitThe maximum number of workers for FTP user.
FTP Service URLThe URL of FTP service to create the FTP directory for logged in user (required only for cloud trial).
FTP Disk Usage LimitThe disk usage limit for FTP users.
FTP Base PathThe base path for the FTP location.

Monitoring

Enable Monitoring GraphsSet to True to enable Monitoring and to view monitoring graphs.
QueryServer Monitoring FlagDefines the flag value (true/false) for enabling the query monitoring.
QueryServer Moniting Reporters SupportedDefines the comma-separated list of appenders where metrics will be published. Valid values are graphite, console, logger.
QueryServer Metrics Conversion Rate UnitSpecifies the unit of rates for calculating the queryserver metrics.
QueryServer Metrics Duration Rate UnitSpecifies the unit of duration for the queryserver metrics.
QueryServer Metrics Report DurationTime period after which query server metrics should be published.
Query RetriesSpecifies the number of retries to make a query in indexing.
Query Retry Interval (in ms)Defines query retry interval in milliseconds.
Error Search Scroll SizeNumber of records to fetch in each page scroll. Default value is 10.
Error Search Scroll Expiry Time (in secs)Time after which search results will expire. Default value is 300 seconds.
Index Name PrefixPrefix to use for error search system index creation. The prefix will be used to evaluate exact index name with partitioning. Default value is sax_error_.
Index number of shardsNumber of shards to create in the error search index. Default value is 5.
Index Replication FactorNumber of replica copies to maintain for each index shard. Default value is 0.
Index Scheduler Frequency (in secs)Interval (in secs) after which scheduler will collect error data and index in index store.
Index Partitioning Duration (in hours)Time duration after which a new index will be created using partitioning. Default value is 24 hours.
Data Retention Time (in days)Time duration for retaining old data. Data above this threshold will be deleted by scheduler. Default value is 60 days.

Audit

FieldDescriptionDefault Value
Enable Event AuditingDefines the value for enabling events auditing in the application.true

Events Collection Frequency

(in secs)

Time interval (in seconds) in which batch of captured events will be processed for indexing.10
Events Search Scroll sizeNumber of records to fetch in each page scroll on result table.100

Events Search Scroll Expiry

(in secs)

Time duration (in seconds) for search scroll window to expire.300
Events Index Name PrefixPrefix string for events index name. The prefix will be used to evaluate exact target index name while data partitioning process.sax_audit_
Events Index Number of ShardsNumber of shards to create for events index.5
Events Index Replication FactorNumber of replica copies to maintain for each index shard.0

Index Partitioning Duration

(in hours)

Time duration (in hours) after which a new index will be created for events data. A partition number will be calculated based on this property. This calculated partition number prefixed with Events Index Name Prefix value will make target index name.24
Events Retention Time (in days)Retention time (in days) of data after which it will be auto deleted.60
Events Indexing RetriesNumber of retries to index events data before sending it to a WAL file.5

Events Indexing Retries Interval

(in milliseconds)

It defines the retries interval (in milliseconds) to perform subsequent retries.3000

Query Server

FieldDescription
QueryServer Monitoring FlagThe flag value (true/false) for enabling the query monitoring.
QueryServer Monitoring Reporters SupportedThe comma-separated list of appenders where metrics will be published. Valid values are graphite, console, logger.
QueryServer Metrics Conversion Rate UnitSpecifies the unit of rates for calculating the queryserver metrics.
QueryServer Metrics Duration Rate UnitSpecifies the unit of duration for the queryserver metrics.
QueryServer Metrics Report DurationTime after which query server metrics should be published.
QueryServer Metrics Report Duration UnitThe units for reporting query server metrics.
Query RetriesThe number of retries to make a query in indexing.
Query Retry Interval (in ms)Defines query retry interval in milliseconds.

Others

FieldDescription
Audit TargetsDefines the audit logging implementation to be used in the application, Default is fine.
ActiveMQ Connection Timeout(in ms)Defines the active MQTT connection timeout interval in ms.
MQTT Max RetriesMax retries of MQTT server.
MQTT Retry Delay IntervalRetry interval, in milliseconds, for MQTT retry mechanism.
JMS Max RetriesMax retries of JMS server.
JMS Retry Delay IntervalRetry interval, in milliseconds, for JMS retry mechanism.
Metrics Conversion Rate UnitSpecifies the unit of rates for calculating the queryserver metrics.
Metrics Duration Rate UnitSpecifies the unit of duration for the metrics.
Metrics Report DurationSpecifies the duration at interval of which reporting of metrics will be done.
Metrics Report Duration UnitSpecifies the unit of the duration at which queryserver metrics will be reported.
Gathr Default Tenant TokenToken of user for HTTP calls to LogMonitoring for adding/modifying system info.
LogMonitoring Dashboard Interval(in min)Log monitoring application refresh interval.
Logmonitoring Supervisors ServersServers dedicated to run LogMonitoring pipeline.
Export Search Raw FieldComma separated fields to export LogMonitoring search result.
Elasticsearch Keystore download path prefixElasticsearch keystore download path prefix in case of uploading keystore.
Tail Logs Server PortListening port number where tail command will listen incoming streams of logs, default is 9001.
Tail Logs Max Buffer SizeMaximum number of lines, that can be stored on browser, default is 1000.
sax.datasets.profile.frequency.distribution.count.limitDefines the number of distinct values to be shown in the frequency distribution graph of a column in a Dataset.
sax.datasets.profile.generator.json.template

common/templates/DatasetProfileGenerator.json

Template of the spark job used to generate profile of a Dataset.

Pipeline Test Connection EnabledCheck mark the checkbox to enable the email notification when a pipeline component is down.
Pipeline Error Notification Email IDsProvide comma separated email IDs for pipeline error notification.
Pipeline Test Connection Retry CountsProvide value for Pipeline Test Connection Retry Counts.
Pipeline Test Connection Retry Interval LimitsProvide value for Pipeline Test Connection Retry Interval Limits.
sax.python.commandProvide the sax.python.command
Load IDW functions on Inspect And Pipeline RunSelect the check-box for loading the IDW functions on Inspect And Pipeline Run.
Impersonation User EditableSelect the check-box for impersonating the User Editable option.
Superuser connections allowedSelect the check-box for allowing the superuser connections functionality.
Metering Retention Period(days)Provide the value for Metering Retention Period(days).
H2O Auth EnabledSelect the check-box to enable the H2O Auth.
Pipeline History Max Fetch SizeNumber of records to be fetched on pipeline history page. The default value is 100.
Pipeline Start Scheduler Timeout(seconds)provide value for Pipeline Start Scheduler Timeout(seconds).
Enabled Pipeline Health CheckerSelect the check-box option to enabled Pipeline Health Checker.
Pipeline Health Checker Frequency(in secs)Provide value for Pipeline Health Checker Frequency(in secs).
Pipeline Health Checker Notification Frequency(in mins)Provide value for Pipeline Health Checker Notification Frequency(in mins).
Ignore schedule if pipeline is activeOption to Ignore schedule if pipeline is active. If this option is set as false and pipeline is found active in the next schedule it will kill the pipeline. and start the next schedule.
Gathr Log DirectoryProvide Gathr Log Directory. Example: Default.
Enabled Config ProviderSelect the check-box to Enabled Config Provider.
Config Provider Rest Service HTTP connection TimeoutProvide details for Config Provider Rest Service HTTP connection Timeout.
Config provider Implemenation ClassProvide details for Config provider Implemenation Class.
Config Provider Property File PathProvide details for Config Provider Property File Path. Example: /tmp/configprovider.property
Config Provider Auth EnabledSelect check-box for Config Provider Auth Enabled option.
Config Provider Auth User NameProvide the Config Provider Auth User Name. Example: admin
Config Provider Auth PasswordProvide the Config Provider Auth Password.
User Login Expire PeriodProvide user login expire period in hours. -1 indicates no expiration.
Pipeline Test Connection EnabledSelect the check-box to enable the notification when a pipeline component is down.
Scheduler Heartbeat Check IntervalProvide the Scheduler Heartbeat Check Interval.
Maintenance Mode EnabledProvide true or false value for enabling the email notification in case pipeline component stops working.
Restart Pipeline Kill Wait TimeProvide value for Restart Pipeline Kill Wait Time.
Pipeline Heartbeat Publish IntervalProvide value for Pipeline Heartbeat Publish Interval.
Enable Batch History CountProvide true or false value for Enable Batch History Count option.
Enable Univocity Parser for delimited dataSelect the check-box to enable Univocity Parser for delimited data.
Contextual LogsA detailed contextual information (e.g. userName, roles, projectName) will be appended in the logs once this option is enabled.
Enable Event NotifierCheck this option to enable event notification based on the provided event notifier type. For example: SNS. Provide the AWS Key Id and AWS Secret Key.
Publish Accumulator Data On ElasticsearchCheck this option to enable Publish Accumulator Data On Elasticsearch.
isConnectionJarPathTenantDependentOption to allow upload jars according to tenant ID.
Cluster-Mediator ES Logs IndexDetails for Cluster-Mediator ES Logs Index.
EMR Services ES Logs IndexDetails for EMR Services ES Logs Index.
ES Index MappingDetails for ES Index Mapping.
External Service Logs IndexDetails for External Service Logs Index.
GCP Dataproc Services ES Logs IndexDetails for GCP Dataproc Services ES Logs Index.
Inspection ES Logs IndexDetails for Inspection ES Logs Index.
Local Pipeline ES Logs IndexDetails for Local Pipeline ES Logs Index.
MWAA Services ES Logs IndexDetails for MWAA Services ES Logs Index.
Pipeline ES Logs IndexDetails for Pipeline ES Logs Index.
Webstudio ES Logs IndexDetails for Webstudio ES Logs Index.
Gathr Additional Jars ClasspathDetails for Gathr Additional Jars Classpath.
Spark configurations file for FingerPrintingPath for config files contains spark submit configurations for fingerprinting usecase.
Check resources before submitting PipelineCheck available resources on resource manager before submitting Pipeline.
Bulk pipeline submission queue capacityProvide value for Bulk pipeline submission queue capacity option.
Number of threads for bulk pipeline submitProvide value for number of threads for bulk pipeline submit.
Bulk pipeline submit monitor thread’s frequency (secs)Provide details for Bulk pipeline submit monitor thread’s frequency (secs).
Bulk pipeline submission connection timeout (secs)Provide value for Bulk pipeline submission connection timeout (secs).
Bulk pipeline submit sleep when idle (secs)Provide value for bulk pipeline submission queue capacity.
Store AlertsUpon enabling this option the alerts will be stored in the database.
DB Notifier purging scheduler duration (mins)Provide value for DB Notifier purging scheduler duration in minutes. Default value is 720.
DB Notifier alerts typeOption to provide the type of DB Notifier alert type. Example: Pipeline Status Alert, Pipeline Stats Alert, Pipeline Error Mode alert, Pipeline Stats alert and * for saving all types of alerts.
DB Notifier max alert day countThe alerts will get deleted from the database based on the value provide (in days).

Click the SAVE button to save the configuration details.

Top