Amazon S3 Data Asset Source

Create a Data Asset Through Amazon S3

To create a data asset through S3 Source, configure parameters as follows:

Connection Name

Connections are the service identifiers.

A connection name can be selected from the list if you have created and saved connection details for Amazon S3 earlier.

Or create one as explained in Amazon S3 Connection → topic.


Bucket Name

Enter the name of the S3 bucket that contains your data.

This specifies the storage location from which your data will be sourced, enabling seamless integration into the system for further processing and analysis.


Path

Specify the directory or path within the S3 bucket where your data is located.


File Format

Choose the appropriate format of the sample data to be fetched from S3, supporting CSV, JSON, TEXT, Parquet, ORC, or AVRO file types.

For CSV files, you can additionally specify a delimiter.


Is Header Included

For CSV file formats, indicate whether the source file includes a header row.

Enabling this option assists in proper data alignment by recognizing the header row as column labels.

This ensures accurate representation and manipulation of your data within the system.


Max No of Rows

Specify the maximum number of sample records you wish to keep in the data asset.

This feature helps in obtaining a manageable subset of data for testing and design purposes, facilitating efficient application development while optimizing resource usage.


Sampling Method

This option offers flexibility in how you retrieve sample data.

Following are the ways:

  • Top N: Retrieve the specified number of initial records from the data source based on the specified maximum number of rows. This is particularly useful when you want to analyze or design with a specific set of initial records.

  • Random Sample: Fetch a random subset of records from your sample data, ensuring a diverse representation. This approach is valuable when you require a more comprehensive assessment of your data’s characteristics.

Top