Microsoft Excel ETL Source

Microsoft Excel data source allows you to read data from excel files.

Schema Type

See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.

After providing schema type details, the next step is to configure the data source.

Data Source Configuration

Configure the data source parameters as explained below.

Type

Option to select source for fetching excel data by either doing an excel file upload or by selecting S3 as type to provide directory location or file where excel source files are stored.


If Type is selected as File Upload

Update the Header Included and Contains Formulas fields, then proceed by uploading the Excel file and selecting the preferred sheet(s).

Upload

Option to upload the Excel source file.

Sheet Name

Preferred sheet(s) should be selected from which the schema will get inferred and the selected sheets will be considered to fetch data during runtime.


If Type is selected as S3

Connection Name

Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Amazon S3 earlier. Or create one as explained in the topic - Amazon S3 Connection →

Bucket Name

Buckets are storage units used to store objects, which consists of data and meta-data that describes the data.

Path

A directory path or preferred excel file should be provided from S3.

Sheet Name

Preferred sheet(s) should be selected from which the schema will get inferred and the selected sheets will be considered to fetch data during runtime.


Header Included

Option to enable or disable the scanning of first row as a header for excel files. This option is disabled by default.

When enabled, an additional option Include Header will get displayed.

Include Header

If All is selected, then the first row will be considered as header of all the sheets that are selected in the source excel file.

If First Sheet is selected, then the first row of only the first sheet will be considered as header out of all the sheets that are selected in the source excel file.


Contains Formulas

Check this option if your sheets contain formulas.

The application can read formulas from Excel only if all sheets are selected.

While using multiple MS Excel sources, choose all sheets in each MS Excel source.

The below condition is applicable if the Contains Formula field is set to True.

To parse date time columns in an MS Excel source, use an Expression Evaluator processor and provide an expression in the following format.

CASE WHEN CAST(date_column AS DOUBLE) IS NOT NULL
	THEN FROM_UNIXTIME(UNIX_TIMESTAMP(‘1900-01-01 00:00:000’) + (date_column- 2) * 86400)
	ELSE date_column
END

ms-excel_date_time_expression


Add Configuration: Additional properties can be added using this option as key-value pairs.


Detect Schema

Check the populated schema details. For more details, see Schema Preview →

Incremental Read

Optionally, to provide source data if the type is selected as S3, you can enable incremental read. For more details, see Amazon S3 Incremental Configuration →

Pre Action

To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Top