Microsoft Excel ETL Source
Microsoft Excel data source allows you to read data from excel files.
Schema Type
See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.
After providing schema type details, the next step is to configure the data source.
Data Source Configuration
Configure the data source parameters as explained below.
Type
Option to select source for fetching excel data by either doing an excel file upload or by selecting S3 as type to provide directory location or file where excel source files are stored.
If Type is selected as File Upload
Update the Header Included and Contains Formulas fields, then proceed by uploading the Excel file and selecting the preferred sheet(s).
Upload
Option to upload the Excel source file.
Sheet Name
Preferred sheet(s) should be selected from which the schema will get inferred and the selected sheets will be considered to fetch data during runtime.
If Type is selected as S3
Connection Name
Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Amazon S3 earlier. Or create one as explained in the topic - Amazon S3 Connection →
Bucket Name
Buckets are storage units used to store objects, which consists of data and meta-data that describes the data.
Path
A directory path or preferred excel file should be provided from S3.
Sheet Name
Preferred sheet(s) should be selected from which the schema will get inferred and the selected sheets will be considered to fetch data during runtime.
While using multiple MS Excel sources, either choose all sheets or select a limited number of sheets in each MS Excel source.
The application can read formulas from Excel only if all sheets are selected.
Header Included
Option to enable or disable the scanning of first row as a header for excel files. This option is disabled by default.
When enabled, an additional option Include Header will get displayed.
Include Header
If All is selected, then the first row will be considered as header of all the sheets that are selected in the source excel file.
If First Sheet is selected, then the first row of only the first sheet will be considered as header out of all the sheets that are selected in the source excel file.
Contains Formulas
Check this option if your sheets contain formulas.
The application can read formulas from Excel only if all sheets are selected.
While using multiple MS Excel sources, choose all sheets in each MS Excel source.
The below condition is applicable if the Contains Formula field is set to True.
To parse date time columns in an MS Excel source, use an Expression Evaluator processor and provide an expression in the following format.
CASE WHEN CAST(date_column AS DOUBLE) IS NOT NULL
THEN FROM_UNIXTIME(UNIX_TIMESTAMP(‘1900-01-01 00:00:000’) + (date_column- 2) * 86400)
ELSE date_column
END
Add Configuration: Additional properties can be added using this option as key-value pairs.
Detect Schema
Check the populated schema details. For more details, see Schema Preview →
Incremental Read
Optionally, to provide source data if the type is selected as S3, you can enable incremental read. For more details, see Amazon S3 Incremental Configuration →
Pre Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!