ADLS Emitter
In this article
ADLS Emitter Configuration
Field | Description |
---|---|
Save As Dataset | Check mark the checkbox to save the schema as dataset. |
Scope | Select the scope of dataset as Project or Workspace. |
Dataset Name | Provide the dataset name. |
Access Option | ADLS access option to access data lake storage using DBFS mount point or directly access the container and folder path. |
Connection Name | Connections are the Service identifiers. Select the connection name from the available list of connections, from where you would like to read the data. |
Container | Provide the ADLS container name in which the transformed data should be emitted. |
Path | Provide the directory path for ADLS file system. |
Output Type | Select the output format in which the results will be processed. The available output type options are: Avro, Delimited, JSON, Parquet, ORC and xml. Based on the selected output type, the supported compression algorithms will be selected. |
Delimiter | This option is available upon selecting the output type as Delimited to select the message field separator type. |
Output Fields | Select fields that need to be a part of the output data. |
Partitioning Required | If checked, data will be partitioned. |
Partition Columns | Select fields on which data will be partitioned. |
Save Mode | Save Mode is used to specify the expected behavior of saving data to a data sink. ErrorifExist: When persisting data, if the data already exists, an exception is expected to be thrown. Append: When persisting data, if data/table already exists, contents of the Schema are expected to be appended to existing data. Overwrite: When persisting data, if data/table already exists, existing data is expected to be overwritten by the contents of the Data. Ignore: When persisting data, if data/table already exists, the save operation is expected to not save the contents of the Data and to not change the existing data. This is similar to a CREATE TABLE IF NOT EXISTS in SQL. |
Compression Type | Supported algorithm used to compress the data. |
Based on the above selected Output Type, the supported compression algorithms will be available under Compression Type drop-down list.
The list of Supported Compression Type Algorithms with the selected Output Type is mentioned below:
Output Type | Compression Type |
---|---|
Avro | None Deflate BZIP2 SNAPPY X2 |
Delimited | None Deflate GZIP BZIP2 SNAPPY LZ4 |
JSON | None Deflate GZIP BZIP2 SNAPPY LZ4 |
Parquet | None GZIP LZO SNAPPY LZ4 |
ORC | None LZO SNAPPY ZLIB |
ADD CONFIGURATIONS | User can add further configurations (Optional). Add various Spark configurations as per requirement. Example: Perform imputation by clicking the ADD CONFIGURATION button. For imputation replace nullValue/emptyValue with the entered value across the data. (Optional) Example: nullValue =123, the output will replace all null values with 123 |
Environment Params | User can add further environment parameters. (Optional) |
If you have any feedback on Gathr documentation, please email us!