Scala Processor

Scala is a general-purpose programming language built on the Java virtual machine.

It can interact with data that is stored in a distributed manner. Also, it can be used to process and analyze big data.

The Scala processor can be used for writing custom code in Scala language.

Please email to Gathr Support to enable the Scala Processor.

There are several Code Snippets that are available in the application and the same are explained in this topic to get you started with the Scala processor.

Processor Configuration

Configure the processor parameters as explained below.

Package Name

Name of the package for the Scala code class.

Class Name

Name of the Class for the Scala code.

Imports

Import statements for the Scala code.

Input Source

Input Source for the Scala Code.

Scala Code

Scala code to perform the operations on the JSON RDD object.


Ask AI Assistant

Use the AI assistant feature to simplify the creation of Scala queries.

It allows you to generate complex Scala queries effortlessly, using natural language inputs as your guide.

Describe your desired expression in plain, conversational language. The AI assistant will understand your instructions and transform them into a functional SQL query.

Tailor queries to your specific requirements, whether it’s for data transformation, filtering, calculations, or any other processing task.

Note: Press Ctrl + Space to list input columns and Ctrl + Enter to submit your request.

Input Example:

Select those records whose last_login_date is less than 60 days from current_date.


Jar Upload

Jar file is generated when you build Scala code.

Here, you can upload the third party jars so that the API’s can be utilized in Scala code by adding the import statements.

Notes

Optionally, enter notes in the Notes → tab and save the configuration.

Code Snippets

Described below are some sample Scala code use cases:

Add a new column with constant value to existing dataframe

Description

This script demonstrates how to add a column with a constant value.

In the sample code given below, a column Constant_Column with value Constant_Value is added to an existing dataframe.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
import org.apache.spark.sql.functions._
dataset.withColumn("Constant_Column", lit("Constant_Value"))

Example

Input Data Set

CustomerIdSurnameCreditScoreGeographyAgeBalanceEstSalary
15633059Fanucci413France3406534.18
15604348Allard710Spain22099645.04
15693683Yuille814Germany2997086.4197276.13
15738721Graham773Spain41102827.4464595.25

Output Data Set

CustomerIdSurnameCreditScoreGeographyAgeBalanceEstSalaryConstant_Column
15633059Fanucci413France3406534.18Constant_Value
15604348Allard710Spain22099645.04Constant_Value
15693683Yuille814Germany2997086.4197276.13Constant_Value
15738721Graham773Spain41102827.4464595.25Constant_Value

Add a new column with random value to existing dataframe

Description

This script demonstrates how to add a column with random values.

Here, column Random_Column is added with random integer values.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
import org.apache.spark.sql.functions.rand
dataset.withColumn("random", (rand * 100).cast("bigint"))

Example

Input Data Set

CustomerIdSurnameCreditScoreGeographyAgeBalanceEstSalary
15633059Fanucci413France3406534.18
15604348Allard710Spain22099645.04
15693683Yuille814Germany2997086.4197276.13
15738721Graham773Spain41102827.4464595.25

Output Data Set

CustomerIdSurnameCreditScoreGeographyAgeBalanceEstSalaryRandom_Column
15633059Fanucci413France3406534.180.0241309661
15604348Allard710Spain22099645.040.5138384557
15693683Yuille814Germany2997086.4197276.130.2652246569
15738721Graham773Spain41102827.4464595.250.8454138247

Add a new column using expression with existing columns

Description

This script demonstrates how to add new column using existing columns.

Here, column Transformed_Column is added by multiplying columns EstimatedSalary with Tenure.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
import org.apache.spark.sql.functions._
dataset.withColumn("Transformed_Column", col("EstimatedSalary") * col("Tenure"))

Example

Input Data Set

CustomerIdSurnameCrScoreAgeTenureBalanceEstimatedSalary
15633059Fanucci41334906534.18
15604348Allard710228099645.04
15693683Yuille81429897086.4197276.13
15738721Graham773419102827.4464595.25

Output Data Set

CustomerIdSurnameCrScoreAgeTenureBalanceEstimatedSalaryTransformed_Column
15633059Fanucci41334906534.1858807.62
15604348Allard710228099645.04797160.32
15693683Yuille81429897086.4197276.131578209.04
15738721Graham773419102827.4464595.25581357.25

Transform an existing column

Description

This script demonstrates how to transform a column.

Here, rounding off values of column Balance and converting to integer.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
import org.apache.spark.sql.functions._
dataset.withColumn("Balance" , round(col("Balance")).cast("int"))

Example

Input Data Set

CustomerIdSurnameCreditScoreGeographyGenderAgeTenureBalance
15633059Fanucci413FranceMale3490
15604348Allard710SpainMale2280
15693683Yuille814GermanyMale29897086.4
15738721Graham773SpainMale419102827.44

Output Data Set

CustomerIdSurnameCreditScoreGeographyGenderAgeTenureBalance
15633059Fanucci413FranceMale3490
15604348Allard710SpainMale2280
15693683Yuille814GermanyMale29897086
15738721Graham773SpainMale419102827

Filter data on basis of some condition

Description

This script demonstrates how to filter data on the basis of some condition.

Here, Customers having Age>30 are selected.

Sample Code

val sparkSession=dataset.sparkSession
import sparkSession.implicits._
dataset.select($"*").filter($"Age">30)

Example

Input Data Set

CustomerIdSurnameCreditScoreGeographyGenderAge
15633059Fanucci413FranceMale34
15604348Allard710SpainMale22
15693683Yuille814GermanyMale29
15738721Graham773SpainMale41

Output Data Set

CustomerIdSurnameCreditScoreGeographyGenderAge
15633059Fanucci413FranceMale34
15738721Graham773SpainMale41
Top