Data Preparation

Data preparation is the process of collecting, cleaning and organizing data for analysis. Prepare the data with operations and expressions to generate customized data. Graphs and statistics are also used to present the prepared data.

To perform data preparation, configure a Data Source (For example, S3). Once the Data Source is configured and saved, you can run inspection on the columns of the schema that are reflected in the Data Preparation window. This allows you to build the application while you are interacting with the data.

To perform Data Preparation, click on the eye icon of any component. The component’s schema will appear displayed in Data Preparation window.

By default, the data is displayed in Summary view with Profile Pane on the screen.

Data_Preparation

The actions available on the transformations preview page are described below:

Sort: Option to sort the column entries either by count or by domain.

Operations: Option to swiftly apply most popular transformations on the values of each column.

Create New Column: Option to add new column by providing a column name, expression and add values for the same.

Keep/Remove Column: Option to select the desired columns and choose to either keep or remove them for further processing.

Profile Pane: The profile pane appears by default and it shows distribution of data in each column. You can edit column values in this view. The bar graphs corresponding to the values in each column will be shown in tabular format.

Data Pane: The schema takes the form of columns and is divided in records. You can apply operations on an entry of a column or any possible combination, corresponding to which a new expression is added, which keeps creating your application as and when you apply more actions.

Display Columns: Option to display only the column(s) that you prefer to see while working on the transformations stage of your ingestion application. The columns will not be removed from the dataset schema, but not displayed in data preparation window.

Reload Inspect: Option to reload the auto-inspect data for the selected component type.

Close: Option to close the transformations preview page and redirect to the canvas.

Download Result: Option to download a component’s schema result.

Top