Amazon Athena ETL Source

💡

Amazon Athena is a Premium Connector.

See the Connector Marketplace topic. Please request your administrator to start a trial or subscribe to the Premium Amazon Athena connector.

In Gathr, it can be added as a channel to help in fetching customers’ and prospects’ data and transform it as needed before storing it in a desired data warehouse to run further analytics.

Schema Type

See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.

After providing schema type details, the next step is to configure the data source.

Data Source Configuration

Configure the data source parameters as explained below.

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Amazon Athena earlier. Or create one as explained in the topic - Amazon Athena Connection →

Use the Test Connection option to ensure that the connection with the Amazon Athena channel is established successfully.

A success message states that the connection is available. In case of any error in test connection, edit the connection to resolve the issue before proceeding further.

Data Source

Data sources will list as per the configured connection. Select the data source to read the data.

Database Name

Database will list as per the configured connection. Select the database to read the data.

Table Name

Tables will list as per the configured connection. Select the table to read the data.

If you selected the Fetch From Source method to design the application, the Fields would list as per the Entity chosen in the previous configuration parameter. Select the fields or provide a custom query to read the desired records from Amazon Athena.

Fields

The conditions to fetch source data from a Amazon Athena table can be specified using this option.

Select Fields: Select the column(s) of the entity that should be read.

Custom Query: Provide an SQL query specifying the read conditions for the source data.

Example: SELECT "Id" FROM Companies

If you selected the Upload Data File method to design the application, provide a custom query to fetch records from the Amazon Athena entity specified in the previous configuration.

Query

The conditions to fetch source data from a Amazon Athena table can be specified using this option.

Provide an SQL query specifying the read conditions for the source data.

Example: SELECT "Id" FROM Companies

More Configurations

This section contains additional configuration parameters.

Read Options

SkipHeaderLineCount

Specifies the number of header rows to skip for SELECT queries.

This most commonly used for Athena tables that point towards a CSV data source.

If the CSV data source has headers, set Skip Header Line Count to 1.

Clean Query Results

Amazon Athena produces cache files with every query, in the folder specified in S3 Staging Directory.

Clean Query Results specifies whether these files should be deleted once the connection is closed.

Query Passthrough

This option passes the query to the Amazon Athena server without changing it.

Partitioning

Enable Partitioning

This enables parallel reading of the data from the entity.

Partitioning is disabled by default.

If enabled, an additional option will appear to configure the partitioning conditions.

Column

The selected column will be used to partition the data.

Max Rows per Partition: Enter the maximum number of rows to be read in a single request.

Example: 10,000

It implies that a maximum number of 10,000 rows can be read in one partition.

Advanced

Fetch Size

The fetch size determines the number of rows to be fetched per round trip. The default value is 1000.

Query Timeout

The timeout in seconds for requests issued by the provider to download large result sets. If the QueryTimeout property is set to 0, operations will not time out; instead, they will run until they complete successfully or encounter an error condition.

It applies to execution time of the operation as a whole rather than individual HTTP operations.

If QueryTimeout expires and the request has not finished being processed, then it raises an error condition.

Simple Upload Limit

This setting specifies the threshold, in bytes, above which the provider will choose to perform a multipart upload rather than uploading everything in one request.

Add Configuration: Additional properties can be added using this option as key-value pairs.

Detect Schema

Check the populated schema details. For more details, see Schema Preview →

Pre Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →.

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

If you have any feedback on Gathr documentation, please email us!

Amazon Athena ETL Source

Schema Type #

Data Source Configuration #

Connection Name #

Data Source #

Database Name #

Table Name #

Fields #

Query #

More Configurations #

SkipHeaderLineCount #

Clean Query Results #

Query Passthrough #

Enable Partitioning #

Column #

Fetch Size #

Query Timeout #

Simple Upload Limit #

Detect Schema #

Pre Action #

Notes #