Default
Note: Some of the properties reflected are not feasible with Multi-Cloud version of Gathr. These properties are marked with **
All default or shared kind of configurations properties come under this category. This category is further divided into various sub-categories.
Platform
Field | Description |
---|---|
Application Logging Level | The logging level to be used for gathr logs. |
Gathr HTTPs Enabled | Whether gathr application support HTTPs protocol or not. |
Spark HTTPs Enabled | Whether Spark server support HTTPs protocol or not. |
Test Connection Time Out | Timeout for test connection (in ms). |
Java Temp Directory | The temp directory location. |
Gathr Reporting Period | Whether to enable View Data link in application or not. |
View Data Enabled | Whether to enable View Data link in application or not. |
TraceMessage Compression | The type of compression used on emitted TraceMessage from any component. |
Message Compression | The type of compression used on emitted object from any component. |
Enable Gathr Monitoring Flag | Flag to tell if monitoring is enabled or not. |
CEP Type | Defines the name of the cep used. Possible value is esper as of now. |
Enable Esper HA Global | To enable or disable HA. |
CepHA Wait Interval | The wait interval of primary CEP task node. |
Gathr Scheduler Interval | The topology stopped alert scheduler’s time interval in seconds. |
Enable Gathr Scheduler | Flag to enable or disable the topology stopped alert. |
Gathr Session Timeout | The timeout for a login session in gathr. |
Enable dashboard | Defines whether dashboard is enable or disable. |
Enable Log Agent | Defines if Agent Configuration option should be visible on gathr GUI or not. |
Enable Storm Error Search | Enable showing pipeline Application Errors tab using LogMonitoring search page. |
Gathr Pipeline Error Search Tenant Token | Tenant token for Pipeline Error Search. |
Gathr Storm Error Search Index Expression | Pipeline application error index expression (time based is expression to create indexes in ES or Solr, that is used during retrieval also). |
Kafka Spout Connection Retry Sleep Time | Time between consecutive Kafka spout connection retry. |
Cluster Manager Home URL | The URL of gathr Cluster Manager |
Gathr Pipeline Log Location | gathr Pipeline Log Location. |
HDFS Location for Pipeline Jars | HDFS Location for Pipeline Jars. |
Scheduler Table Prefix | Tables name starting with a prefix which are related to storing scheduler’s state. |
Scheduler Thread Pool Class | Class used to implement thread pool for the scheduler. |
Scheduler Thread Pool Thread Count | This count can be any positive integer, although only numbers between 1 and 100 are practical. This is the number of threads that are available for concurrent execution of jobs. If only a few jobs run a few times a day, then 1 thread is plenty. However if multiple jobs, with most of them running every minute, then you probably want a thread count like 50 or 100 (this is dependent on the nature of the jobs performed and available resources). |
Scheduler Datasource Max Connections | The maximum number of connections that the scheduler datasource can create in its pool of connections. |
Scheduler Misfire Threshold Time | Milliseconds the scheduler will tolerate a trigger to pass its next-fire-time by, before being considered misfired. |
HDP Version | Version of HDP ecosystem. |
CDH Version | Version of CDH ecosystem. |
Audit Targets | Defines the Audit Logging Implementation to be use in the application, Default is file. |
Enable Audit | Defines the value (true/false) for enabling audit in application. |
Persistence Encryption Key | Specifies the encryption key used to encrypt data in persistence. |
Ambari HTTPs Enabled | Whether Ambari server support HTTPs protocol or not. |
Graphite HTTPs Enabled | Whether Graphite server support HTTPs protocol or not. |
Elastic Search HTTPs Enabled | Whether Elasticsearch engine support HTTPs protocol or not. |
SQL Query Execution Log File Path | File location for logging gathr SQL query execution statistics. |
SQL Query Execution Threshold Time (in ms) | Defines the max limit of execution time for sql queries after which event will be logged (in ms). |
Lineage Persistence Store | The data store that will be used by data lineage feature. |
Aspectjweaver jar location | The absolute path of aspectweaver jar required for inspect pipeline or data lineage. |
Is Apache Environment | Default value is false. For all apache environment set it to “true”. |
Zookeeper
Field | Description |
---|---|
Zookeeper Retry Count | Zookeeper connection retry count. |
Zookeeper Retry Delay Interval | Defines the retry interval for the zookeeper connection. |
Zookeeper Session Timeout | Zookeeper’s session timeout time. |
Spark
Field | Description |
---|---|
Model Registration Validation Timeout(in seconds) | The time, in seconds, after which the MLlib, ML or H2O model registration and validation process will be failed if the process not complete. |
Spark Fetch Schema Timeout(in seconds) | The time, in seconds, after which the fetch schema process of register table will be failed if the process not complete. |
Spark Failover Scheduler Period(in ms) | Regular intervals to run scheduler tasks. Only applicable for testing connection of Data Sources in running pipeline. |
Spark Failover Scheduler Delay(in ms) | Delay after which a scheduler task can run once it is ready. Only applicable for testing connection of Data Sources in running pipeline. |
Refresh Superuser Pipelines and Connections | Whether to refresh Superuser Pipelines and Default Connections in database while web studio restart. |
Gathr SparkErrorSearchPipeline Index Expression ** | Pipeline application error index expression (time based js expression to create indexes in ES or Solr, that is used during retrieval). |
Enable Spark Error Search ** | Enabled to index and search spark pipeline error in LogMonitoring. |
Register Model Minimum Memory | Minimum memory required for web studio to register tables, MLlib, ML or H2O models. Example -Xms512m. |
Register Model Maximum Memory | Maximum memory required for web studio to register tables, MLlib, ML or H2O models. Example -Xmx2048m. |
H2O Jar Location | Local file system’s directory location at which H2O model jar will be placed after model registration. |
H2O Model HDFS Jar Location | HDFS path location at which H2O model jar will be placed after model registration. |
Spark Monitoring Scheduler Delay(in ms) ** | Specifies the Spark monitoring scheduler delay in milliseconds. |
Spark Monitoring Scheduler Period(in ms) ** | Specifies the Spark monitoring scheduler period in milliseconds. |
Spark Monitoring Enable ** | Specifies the flag to enable the spark monitoring. |
Spark Executor Java Agent Config | Spark Executor Java Agent configuration to monitor executor process, the command includes jar path, configuration file path and Name of the process. |
Spark JVM Monitoring Enable ** | Specifies the flag to enable the spark monitoring. |
ES query monitoring index name | Provide the ES query monitoring index name which is required for indexing the data of query streaming. |
Scheduler period for es monitoring purging | Scheduler period for es monitoring purging in seconds. |
Rotation policy for of ES monitoring graph | Specify the rotation policy for index creation for ES monitoring graph (daily for a period of one day and weekly for 7 days). |
Purging duration of ES monitoring index | Purge duration for ES in seconds for es monitoring graph index. Index created before this duration will be deleted. |
Enable purging scheduler for ES Graph monitoring | Check the checkbox to enable purging scheduler for ES Graph monitoring. |
Spark Version ** | By default the version is set to 2.3. Note: Set spark version to 2.2 for HDP 2.6.3” |
Livy Supported JARs Location ** | HDFS location where livy related jar file and application streaming jar file have been kept. |
Livy Session Driver Memory ** | Minimum memory that will be allocated to driver while creating livy session. |
Livy Session Driver Vcores ** | Minimum virtual cores that will be allocated to driver while creating Livy session. |
Livy Session Executor Memory ** | Minimum executor instances that will be allocated while executing while creating Livy seconds where sample data has been kept while schema auto detection. |
Livy Session Executor Vcores ** | Minimum virtual cores that will be allocated to executor while creating Livy session. |
Livy Session Executor Instances ** | Minimum executor instances that will be allocated while executing while creating Livy session.HDFS where sample data has been kept while schema auto detection. |
Livy Custom Jar HDFS Path ** | The full qualified path of HDFS where uploaded custom jar has been kept while creating pipeline. |
Livy Data Fetch Timeout ** | The query time interval in seconds for fetching data while data inspection. |
isMonitoringGraphsEnabled | Whether monitoring graph is enabled or not. |
ES query monitoring index name | this property stores the data of monitoring in this given index of default ES connection. |
Scheduler period for ES monitoring purging | in this time interval purging scheduler will invoke and check whether the above index is eligible for purging (in sec.) (tomcat restart require). |
Rotation policy of ES monitoring graph | “It can have two values daily or weekly” If daily index will be rotated daily else weekly means only a single day data will be stored in single index otherwise a data of a week will be stored in an index. |
Purging duration of ES monitoring index | It’s a duration after which index will be deleted default is 604800 sec. Means index will be deleted after 1 week.” (tomcat restart requires) |
Enable purging scheduler for ES Graph monitoring | If we need purging of index or not depend on this flag. Purging will not take place if flag is disable. It requires restart of Tomcat Server. |
RabbitMQ
Field | Description |
---|---|
RabbitMQ Max Retries | Defines maximum number of retries for the RabbitMQ connection. |
RabbitMQ Retry Delay Interval | Defines the retry delay intervals for RabbitMQ connection. |
RabbitMQ Session Timeout | Defines session timeout for the RabbitMQ connection. |
Real-time Alerts Exchange Name | Defines the RabbitMQ exchange name for real time alert data. |
Kafka
Field | Description |
---|---|
Kafka Message Fetch Size Bytes | The number of byes of messages to attempt to fetch for each topic-partition in each fetch request. |
Kafka Producer Type | Defines whether Kafka producing data in async or sync mode. |
Kafka Zookeeper Session Timeout(in ms) | The Kafka Zookeeper Connection timeout. |
Kafka Producer Serializer Class | The class name of the Kafka producer key serializer used. |
Kafka Producer Partitioner Class | The class name of the Kafka producer partitioner used. |
Kafka Key Serializer Class | The class name of the Kafka producer serializer used. |
Kafka 0.9 Producer Serializer Class | The class name of the Kafka 0.9 producer key serializer used. |
Kafka 0.9 Producer Partitioner Class | The class name of the Kafka 0.9 producer partitioner used. |
Kafka 0.9 Key Serializer Class | The class name of the Kafka 0.9 producer serializer used. |
Kafka Producer Batch Size | The batch size of data produced at Kafka from log agent. |
Kafka Producer Topic Metadata Refresh Interval(in ms) | The metadata refresh time taken by Kafka when there is a failure. |
Kafka Producer Retry Backoff(in ms) | The amount of time that the Kafka producer waits before refreshing the metadata. |
Kafka Producer Message Send Max Retry Count | The number of times the producer will automatically retry a failed send request. |
Kafka Producer Request Required Acks | The acknowledgment of when a produce request is considered completed. |
Security
Field | Description |
---|---|
Kerberos Sections | Section names in keytab_login.conf for which keytabs must be extracted from pipeline if krb.config.override is set to true. |
Hadoop Security Enabled | Set to true if Hadoop in use is secured with Kerberos Authentication. |
Kafka Security Enabled | Set to true if Kafka in use is secured with Kerberos Authentication. |
Solr Security Enabled | Set to true if Solr in use is secured with Kerberos Authentication. |
Keytab login conf file Path | Specify path for keytab_login.conf file. |
CloudTrial
Field | Description |
---|---|
Cloud Trial | The flag for Cloud Trial. Possible values are True/False. |
Cloud Trial Max Datausage Monitoring Size (in bytes) | The maximum data usage limit for cloud trial. |
Cloud Trial Day Data Usage Monitoring Size (in bytes) | The maximum data usage for FTP User. |
Cloud Trial Data Usage Monitoring From Time | The time from where to enable the data usage monitoring. |
Cloud Trial Workers Limit | The maximum number of workers for FTP user. |
FTP Service URL | The URL of FTP service to create the FTP directory for logged in user (required only for cloud trial). |
FTP Disk Usage Limit | The disk usage limit for FTP users. |
FTP Base Path | The base path for the FTP location. |
Monitoring
Enable Monitoring Graphs | Set to True to enable Monitoring and to view monitoring graphs. |
---|---|
QueryServer Monitoring Flag | Defines the flag value (true/false) for enabling the query monitoring. |
QueryServer Moniting Reporters Supported | Defines the comma-separated list of appenders where metrics will be published. Valid values are graphite, console, logger. |
QueryServer Metrics Conversion Rate Unit | Specifies the unit of rates for calculating the queryserver metrics. |
QueryServer Metrics Duration Rate Unit | Specifies the unit of duration for the queryserver metrics. |
QueryServer Metrics Report Duration | Time period after which query server metrics should be published. |
Query Retries | Specifies the number of retries to make a query in indexing. |
Query Retry Interval (in ms) | Defines query retry interval in milliseconds. |
Error Search Scroll Size | Number of records to fetch in each page scroll. Default value is 10. |
Error Search Scroll Expiry Time (in secs) | Time after which search results will expire. Default value is 300 seconds. |
Index Name Prefix | Prefix to use for error search system index creation. The prefix will be used to evaluate exact index name with partitioning. Default value is sax_error_. |
Index number of shards | Number of shards to create in the error search index. Default value is 5. |
Index Replication Factor | Number of replica copies to maintain for each index shard. Default value is 0. |
Index Scheduler Frequency (in secs) | Interval (in secs) after which scheduler will collect error data and index in index store. |
Index Partitioning Duration (in hours) | Time duration after which a new index will be created using partitioning. Default value is 24 hours. |
Data Retention Time (in days) | Time duration for retaining old data. Data above this threshold will be deleted by scheduler. Default value is 60 days. |
Audit
Field | Description | Default Value |
---|---|---|
Enable Event Auditing | Defines the value for enabling events auditing in the application. | true |
Events Collection Frequency (in secs) | Time interval (in seconds) in which batch of captured events will be processed for indexing. | 10 |
Events Search Scroll size | Number of records to fetch in each page scroll on result table. | 100 |
Events Search Scroll Expiry (in secs) | Time duration (in seconds) for search scroll window to expire. | 300 |
Events Index Name Prefix | Prefix string for events index name. The prefix will be used to evaluate exact target index name while data partitioning process. | sax_audit_ |
Events Index Number of Shards | Number of shards to create for events index. | 5 |
Events Index Replication Factor | Number of replica copies to maintain for each index shard. | 0 |
Index Partitioning Duration (in hours) | Time duration (in hours) after which a new index will be created for events data. A partition number will be calculated based on this property. This calculated partition number prefixed with Events Index Name Prefix value will make target index name. | 24 |
Events Retention Time (in days) | Retention time (in days) of data after which it will be auto deleted. | 60 |
Events Indexing Retries | Number of retries to index events data before sending it to a WAL file. | 5 |
Events Indexing Retries Interval (in milliseconds) | It defines the retries interval (in milliseconds) to perform subsequent retries. | 3000 |
Query Server
Field | Description |
---|---|
QueryServer Monitoring Flag | The flag value (true/false) for enabling the query monitoring. |
QueryServer Monitoring Reporters Supported | The comma-separated list of appenders where metrics will be published. Valid values are graphite, console, logger. |
QueryServer Metrics Conversion Rate Unit | Specifies the unit of rates for calculating the queryserver metrics. |
QueryServer Metrics Duration Rate Unit | Specifies the unit of duration for the queryserver metrics. |
QueryServer Metrics Report Duration | Time after which query server metrics should be published. |
QueryServer Metrics Report Duration Unit | The units for reporting query server metrics. |
Query Retries | The number of retries to make a query in indexing. |
Query Retry Interval (in ms) | Defines query retry interval in milliseconds. |
Others
Field | Description |
---|---|
Audit Targets | Defines the audit logging implementation to be used in the application, Default is fine. |
ActiveMQ Connection Timeout(in ms) | Defines the active MQTT connection timeout interval in ms. |
MQTT Max Retries | Max retries of MQTT server. |
MQTT Retry Delay Interval | Retry interval, in milliseconds, for MQTT retry mechanism. |
JMS Max Retries | Max retries of JMS server. |
JMS Retry Delay Interval | Retry interval, in milliseconds, for JMS retry mechanism. |
Metrics Conversion Rate Unit | Specifies the unit of rates for calculating the queryserver metrics. |
Metrics Duration Rate Unit | Specifies the unit of duration for the metrics. |
Metrics Report Duration | Specifies the duration at interval of which reporting of metrics will be done. |
Metrics Report Duration Unit | Specifies the unit of the duration at which queryserver metrics will be reported. |
Gathr Default Tenant Token | Token of user for HTTP calls to LogMonitoring for adding/modifying system info. |
LogMonitoring Dashboard Interval(in min) | Log monitoring application refresh interval. |
Logmonitoring Supervisors Servers | Servers dedicated to run LogMonitoring pipeline. |
Export Search Raw Field | Comma separated fields to export LogMonitoring search result. |
Elasticsearch Keystore download path prefix | Elasticsearch keystore download path prefix in case of uploading keystore. |
Tail Logs Server Port | Listening port number where tail command will listen incoming streams of logs, default is 9001. |
Tail Logs Max Buffer Size | Maximum number of lines, that can be stored on browser, default is 1000. |
sax.datasets.profile.frequency.distribution.count.limit | Defines the number of distinct values to be shown in the frequency distribution graph of a column in a Dataset. |
sax.datasets.profile.generator.json.template | common/templates/DatasetProfileGenerator.json Template of the spark job used to generate profile of a Dataset. |
Pipeline Test Connection Enabled | Check mark the checkbox to enable the email notification when a pipeline component is down. |
Pipeline Error Notification Email IDs | Provide comma separated email IDs for pipeline error notification. |
Pipeline Test Connection Retry Counts | Provide value for Pipeline Test Connection Retry Counts. |
Pipeline Test Connection Retry Interval Limits | Provide value for Pipeline Test Connection Retry Interval Limits. |
sax.python.command | Provide the sax.python.command |
Load IDW functions on Inspect And Pipeline Run | Select the check-box for loading the IDW functions on Inspect And Pipeline Run. |
Impersonation User Editable | Select the check-box for impersonating the User Editable option. |
Superuser connections allowed | Select the check-box for allowing the superuser connections functionality. |
Metering Retention Period(days) | Provide the value for Metering Retention Period(days). |
H2O Auth Enabled | Select the check-box to enable the H2O Auth. |
Pipeline History Max Fetch Size | Number of records to be fetched on pipeline history page. The default value is 100. |
Pipeline Start Scheduler Timeout(seconds) | provide value for Pipeline Start Scheduler Timeout(seconds). |
Enabled Pipeline Health Checker | Select the check-box option to enabled Pipeline Health Checker. |
Pipeline Health Checker Frequency(in secs) | Provide value for Pipeline Health Checker Frequency(in secs). |
Pipeline Health Checker Notification Frequency(in mins) | Provide value for Pipeline Health Checker Notification Frequency(in mins). |
Ignore schedule if pipeline is active | Option to Ignore schedule if pipeline is active. If this option is set as false and pipeline is found active in the next schedule it will kill the pipeline. and start the next schedule. |
Gathr Log Directory | Provide Gathr Log Directory. Example: Default. |
Enabled Config Provider | Select the check-box to Enabled Config Provider. |
Config Provider Rest Service HTTP connection Timeout | Provide details for Config Provider Rest Service HTTP connection Timeout. |
Config provider Implemenation Class | Provide details for Config provider Implemenation Class. |
Config Provider Property File Path | Provide details for Config Provider Property File Path. Example: /tmp/configprovider.property |
Config Provider Auth Enabled | Select check-box for Config Provider Auth Enabled option. |
Config Provider Auth User Name | Provide the Config Provider Auth User Name. Example: admin |
Config Provider Auth Password | Provide the Config Provider Auth Password. |
User Login Expire Period | Provide user login expire period in hours. -1 indicates no expiration. |
Pipeline Test Connection Enabled | Select the check-box to enable the notification when a pipeline component is down. |
Scheduler Heartbeat Check Interval | Provide the Scheduler Heartbeat Check Interval. |
Maintenance Mode Enabled | Provide true or false value for enabling the email notification in case pipeline component stops working. |
Restart Pipeline Kill Wait Time | Provide value for Restart Pipeline Kill Wait Time. |
Pipeline Heartbeat Publish Interval | Provide value for Pipeline Heartbeat Publish Interval. |
Enable Batch History Count | Provide true or false value for Enable Batch History Count option. |
Enable Univocity Parser for delimited data | Select the check-box to enable Univocity Parser for delimited data. |
Contextual Logs | A detailed contextual information (e.g. userName, roles, projectName) will be appended in the logs once this option is enabled. |
Enable Event Notifier | Check this option to enable event notification based on the provided event notifier type. For example: SNS. Provide the AWS Key Id and AWS Secret Key. |
Publish Accumulator Data On Elasticsearch | Check this option to enable Publish Accumulator Data On Elasticsearch. |
isConnectionJarPathTenantDependent | Option to allow upload jars according to tenant ID. |
Cluster-Mediator ES Logs Index | Details for Cluster-Mediator ES Logs Index. |
EMR Services ES Logs Index | Details for EMR Services ES Logs Index. |
ES Index Mapping | Details for ES Index Mapping. |
External Service Logs Index | Details for External Service Logs Index. |
GCP Dataproc Services ES Logs Index | Details for GCP Dataproc Services ES Logs Index. |
Inspection ES Logs Index | Details for Inspection ES Logs Index. |
Local Pipeline ES Logs Index | Details for Local Pipeline ES Logs Index. |
MWAA Services ES Logs Index | Details for MWAA Services ES Logs Index. |
Pipeline ES Logs Index | Details for Pipeline ES Logs Index. |
Webstudio ES Logs Index | Details for Webstudio ES Logs Index. |
Gathr Additional Jars Classpath | Details for Gathr Additional Jars Classpath. |
Spark configurations file for FingerPrinting | Path for config files contains spark submit configurations for fingerprinting usecase. |
Check resources before submitting Pipeline | Check available resources on resource manager before submitting Pipeline. |
Bulk pipeline submission queue capacity | Provide value for Bulk pipeline submission queue capacity option. |
Number of threads for bulk pipeline submit | Provide value for number of threads for bulk pipeline submit. |
Bulk pipeline submit monitor thread’s frequency (secs) | Provide details for Bulk pipeline submit monitor thread’s frequency (secs). |
Bulk pipeline submission connection timeout (secs) | Provide value for Bulk pipeline submission connection timeout (secs). |
Bulk pipeline submit sleep when idle (secs) | Provide value for bulk pipeline submission queue capacity. |
Store Alerts | Upon enabling this option the alerts will be stored in the database. |
DB Notifier purging scheduler duration (mins) | Provide value for DB Notifier purging scheduler duration in minutes. Default value is 720. |
DB Notifier alerts type | Option to provide the type of DB Notifier alert type. Example: Pipeline Status Alert, Pipeline Stats Alert, Pipeline Error Mode alert, Pipeline Stats alert and * for saving all types of alerts. |
DB Notifier max alert day count | The alerts will get deleted from the database based on the value provide (in days). |
Click the SAVE button to save the configuration details.
If you have any feedback on Gathr documentation, please email us!