Expression Evaluator Processor

This processor is responsible for performing transformation operations on an incoming dataset, e.g., replace, format, trim, uppercase, lowercase, etc. It uses spark expression language for preparing transformation query.

Expression Evaluator Configuration

To add an Expression Evaluator processor into your pipeline, drag the processor to the canvas and right-click on it to configure.

FieldDescription
Expression

Provide SQL expression using SparkSQL functions, that evaluates to a column. Evaluated value can be updated to an existing column or can be assigned to a new column.

Note: Enclose string literal/constant values inside single quotes e.g. ‘John’, ‘Suite 130 Los Gatos, CA 95032, US’.

ValidateValidate the expressions applied on the column.

All the columns of the schema are populated and you can apply any transformation to it. The functions that can be applied are all listed in a table to the right.

Refer to the Expression Evaluator section in the topic Data Preparation Processors.


Enlisted Functions

ABS

FieldDescription
DescriptionComputes the absolute value.
Parametersarg0: The column for which absolute value to be calculated.
ReturnsReturns the computed absolute value.
ThrowsApplicationException
Exampleabs(77.76) will return 77.76

ACOS

FieldDescription
DescriptionComputes the cosine inverse of the given value; the returned angle is in the range 0.0 through pi.
Parametersarg0: The column for which cosine inverse to be calculated.
ReturnsReturns the computed cosine inverse in the range 0.0 through pi.
ThrowsApplicationException
Exampleacos(0.45) will return 1.104031001096478

ADD_MONTHS

FieldDescription
DescriptionComputes the date that is ‘arg1’ after ‘arg0’
Parameters

arg0: The date to which months to be added.

arg1: No of months to be added.

ReturnsReturns the date that is numMonths after startDate
ThrowsApplicationException
Exampleadd_months(“2009-03-01”,2) will return “2009-05-01”

ARRAY

FieldDescription
DescriptionReturns an array with the given elements
Parametersarg0: The given columns to create array column
ReturnsReturns an array with the given elements
ThrowsApplicationException
Examplearray(1, 2, 3) will return [1,2,3]

ARRAY_CONTAINS

FieldDescription
DescriptionReturns TRUE if the array contains value.
Parameters

arg0:An array column

arg1:A value to be checked

ReturnsA boolean true/false
ThrowsApplicationException
Example

We have taken column1.colors as [“black”,“red”]

array_contains(@{column.schema.column1.colrs},“red”) will return true


ARRAY_DISTINCT

FieldDescription
DescriptionRemoves duplicate values from the array
Parametersarg0: The given array column
ReturnsReturns the array with duplicate values removed
ThrowsApplicationException
Examplearray_distinct(array(1, 2, 3, null, 3)) will return [1,2,3,null]

ARRAY_EXCEPT

FieldDescription
DescriptionReturns an array of the elements in array1 but not in array2, without duplicates
Parameters

arg0: First array column

arg1: Second array column

ReturnsReturns an array of the elements in array1 but not in array2, without duplicates
ThrowsApplicationException
Examplearray_except(array(1, 2, 3), array(1, 3, 5)) will return [2]

ARRAY_INTERSECT

FieldDescription
DescriptionPerforms intersection of array1 and array2, without duplicates.
Parameters

arg0: First array column

arg1: Second array column

ReturnsReturns an array of the elements in the intersection of array1 and array2, without duplicates
ThrowsApplicationException
Examplearray_intersect(array(1, 2, 3), array(1, 3, 5)) will return [1,3]

ARRAY_JOIN

FieldDescription
DescriptionConcatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered
Parameters

arg0: array column

arg1: delimiter

arg2: nullReplacement

ReturnsReturns the concatenated array
ThrowsApplicationException
Examplearray_join(array(‘hello’, null ,‘world’), ’ ‘, ‘,’) will return hello , world

ARRAY_MAX

FieldDescription
DescriptionReturns the maximum value in the array. NULL elements are skipped
Parametersarg0: The array column
ReturnsReturns the maximum value in the array. NULL elements are skipped
ThrowsApplicationException
Examplearray_max(array(1, 20, null, 3)) will return 20

ARRAY_MIN

FieldDescription
DescriptionReturns the minimum value in the array. NULL elements are skipped
Parametersarg0: The array column
ReturnsReturns the minimum value in the array. NULL elements are skipped
ThrowsApplicationException
Examplearray_min(array(1, 20, null, 3)) will return 1

ARRAY_POSITION

FieldDescription
DescriptionReturns the (1-based) index of the first element of the array as long
Parameters

arg0: The array column

arg1: The position

ReturnsReturns the (1-based) index of the first element of the array as long
ThrowsApplicationException
Examplearray_position(array(3, 2, 1), 1) will return 3

array_remove

FieldDescription
DescriptionRemove all elements that equal to element from array.
Parameters

arg0: The array column.

arg1: The position.

ReturnsReturns the array with elements removed.
ThrowsApplicationException
Examplearray_remove(array(1, 2, 3, null, 3), 3) will return [1,2,null]

array_repeat

FieldDescription
DescriptionReturns the array containing element count times.
Parameters

arg0: The array column.

arg1: The count.

ReturnsReturns the array containing element count times.
ThrowsApplicationException
Examplearray_repeat(‘123’, 2) will return [“123”,“123”]

array_sort

FieldDescription
DescriptionSorts the input array in ascending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.
Parametersarg0: The array column.
ReturnsReturns the sorted array.
ThrowsApplicationException
Examplearray_sort(array(‘b’, ’d’, null, ‘c’, ‘a’)) will return [“a”,“b”,“c”,“d”,null]

array_union

FieldDescription
DescriptionReturns an array of the elements in the union of array1 and array2, without duplicates.
Parameters

arg0: The first array column.

arg1: The second array column.

ReturnsReturns an array of the elements in the union of array1 and array2, without duplicates.
ThrowsApplicationException
Examplearray_union(array(1, 2, 3), array(1, 3, 5)) will return [1,2,3,5]

arrays_overlap

FieldDescription
DescriptionReturns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.
Parameters

arg0: The first array column.

arg1: The second array column.

ReturnsReturns true or false.
ThrowsApplicationException
Examplearrays_overlap(array(1, 2, 3), array(3, 4, 5)) will return true

arrays_zip

FieldDescription
DescriptionReturns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
Parametersarg0: The Columns to be zipped.
ReturnsReturns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
ThrowsApplicationException
Examplearrays_zip(array(1, 2, 3), array(2, 3, 4)) Will return [{“0”:1,“1”:2},{“0”:2,“1”:3},{“0”:3,“1”:4}]

ascii

FieldDescription
DescriptionComputes the numeric value of the first character of the string column, and returns the result as an int column.
Parametersarg0: The string for which the first character’s numeric value to be calculated.
ReturnsReturns the ascii value as an int column.
ThrowsApplicationException
Exampleascii(“An apple”) will return 65

asin

FieldDescription
DescriptionComputes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2
Parametersarg0: The column for which sine inverse to be calculated.
ReturnsReturns the computed sine inverse in the range -pi/2 through pi/2.
ThrowsApplicationException
Exampleasin(0.45) will return 0.4667653256984187

atan

FieldDescription
DescriptionComputes the tangent inverse of the given value.
Parametersarg0: The column for which tangent inverse to be calculated
ReturnsReturns the computed the tangent inverse.
ThrowsApplicationException
Exampleatan(0.45) will return 0.42285391621948626

atan2

FieldDescription
DescriptionComputes the angle theta from the conversion of rectangular coordinates (arg0, arg1) to polar coordinates (arg1, theta).
Parameters

arg0: The x rectangular coordinate.

arg1: The y rectangular coordinate.

ReturnsReturns the computed angle theta.
ThrowsApplicationException
Exampleatan2(12, 71.21) will return 1.403849169952035

base64

FieldDescription
DescriptionConverts the argument from a binary bin to a base 64 string.
Parametersarg0: The Column to be converted to base64 string.
ReturnsReturns the base64 string.
ThrowsApplicationException
Examplebase64(‘Spark SQL’) will return U3BhcmsgU1FM

bigint

FieldDescription
DescriptionCasts the value expr to the target data type bigint.
Parametersarg0:The column or expression.
ReturnsReturns the bigint column or value.
ThrowsApplicationException
ExampleSELECT bigint(column); will cast column to bigint

bin

FieldDescription
DescriptionReturns the string representation of the long value expr represented in binary
Parametersarg0: The numerical Column to be converted to represented in binary.
ReturnsReturns the binary representation.
ThrowsApplicationException
Examplebin(13) will return 1101

boolean

FieldDescription
DescriptionCasts the value expr to the target data type boolean.
Parametersarg0:The column or expression.
ReturnsReturns the boolean column or value.
ThrowsApplicationException
ExampleSELECT boolean(expr); will cast expression to boolean

bround

FieldDescription
DescriptionComputes the value of the column arg0 rounded to 0 decimal places with HALF_EVEN round mode.
Parametersarg0: The column for which value rounded to 0 decimal places with HALF_EVEN round mode to be calculated.
ReturnsReturns the computed value.
ThrowsApplicationException
Examplebround(71.21) will return 71.0

cast

FieldDescription
DescriptionCasts the value expr to the target data type
Parametersarg0:The column or expression.
ReturnsReturns the targeted type
ThrowsApplicationException
ExampleSELECT cast(‘10’ as int); will cast 10 to int

cbrt

FieldDescription
DescriptionComputes the cube-root of the given value.
Parametersarg0: The column for which cube-root to be calculated
ReturnsReturns the computed cube-root.
ThrowsApplicationException
Examplecbrt(80.89) will return 4.324789202233814

ceil

FieldDescription
DescriptionComputes the ceiling of the given value.
Parametersarg0: The column for which ceiling to be calculated.
ReturnsReturns the computed ceiling.
ThrowsApplicationException
Exampleceil(77.76) will return 4.2682720044742055

ceiling

FieldDescription
DescriptionComputes the ceiling of the given value.
Parametersarg0: The column for which ceiling to be calculated.
ReturnsReturns the computed ceiling.
ThrowsApplicationException
ExampleSELECT ceiling(-0.1); will return 0

char

FieldDescription
DescriptionConverts the ASCII value to equivalent character. If n is larger than 256 the result is equivalent to chr(n % 256).
Parametersarg0: The ASCII value.
ReturnsReturns the ASCII character having the binary equivalent to expr.
ThrowsApplicationException
ExampleSELECT char(65); will return A

char_length

FieldDescription
DescriptionCalculates the character length of string data or number of bytes of binary data. The length of string data includes the trailing spaces. The length of binary data includes binary zeros.
Parametersarg0: The string column or expression.
ReturnsReturns the character length of string data or number of bytes of binary data.
ThrowsApplicationException
ExampleSELECT char_length(‘Spark SQL ‘); will return 10

chr

FieldDescription
DescriptionConverts the ASCII character having the binary equivalent to expr. If n is larger than 256 the result is equivalent to chr(n % 256)
Parametersarg0: The ASCII value.
ReturnsReturns the ASCII character having the binary equivalent to expr.
ThrowsApplicationException
ExampleSELECT chr(65); will return A

coalesce

FieldDescription
DescriptionReturns the first non-null argument if exists. Otherwise, null.
Parametersarg0:columns representing expressions.
ReturnsReturns the first non-null argument if exists. Otherwise, null.
ThrowsApplicationException
Examplecoalesce(NULL, 1, NULL) will return 1

concat

FieldDescription
DescriptionConcatenates multiple input string columns together into a single string column.
Parametersarg0: The String columns to be concatenated.
ReturnsReturns the concatenated string as a single string column.
ThrowsApplicationException
Exampleconcat(“format”,“string”) will return “formatstring”

concat_ws

FieldDescription
DescriptionConcatenates multiple input string columns together into a single string column, using the given separator.
Parameters

arg0: The separator to be used.

arg1: The String columns to be concatenated.

ReturnsReturns the concatenated strings using the given separator as a single string column.
ThrowsApplicationException
Exampleconcat_ws("-",“format”,“string”) will return “format-string”

conv

FieldDescription
DescriptionConverts a number from a given base to another.
Parameters

arg0:A number/String number column

arg1:Integer value of base from which a number is to be converted

arg2:Integer value of base to which a number is to be converted

ReturnsA string value.
ThrowsApplicationException
Example

We have taken column1 as ‘258’

conv(@{column.schema.column1},10,2) will return 100000010


cos

FieldDescription
DescriptionComputes the cosine of the given value.
Parametersarg0: The column for which cosine to be calculated.
ReturnsReturns the computed cosine.
ThrowsApplicationException
Examplecos(76.56) will return 0.3977126102073901

cosh

FieldDescription
DescriptionComputes the hyperbolic cosine of the given value.
Parametersarg0: The column for which hyperbolic cosine to be calculated.
ReturnsReturns the computed hyperbolic cosine.
ThrowsApplicationException
Examplecosh(71.21) will return -0.5004897466536994

crc32

FieldDescription
DescriptionComputes a cyclic redundancy check value for string.
Parametersarg0:A string argument
Returnsbigint value
ThrowsApplication Exception
Example

We have taken column1 as ‘ABC’

crc32(@{column.schema_id.column1}) will return 2743272264


current_date

FieldDescription
DescriptionComputes the current date as a date column.
Parameters-
ReturnsReturns the current date as a date column
ThrowsApplicationException
Examplecurrent_date() will return the current date.

current_timestamp

FieldDescription
DescriptionComputes the current timestamp as a timestamp column.
Parameters-
ReturnsReturns the current timestamp as a timestamp column.
ThrowsApplicationException
Examplecurrent_timestamp() will return the current timestamp.

date_add

FieldDescription
DescriptionComputes the date that is ‘arg1’ days after start date
Parameters

arg0: The date to which days to be added.

arg1: No of days to be added.

ReturnsReturns the computed date.
ThrowsApplicationException
Exampledate_add(“2009-03-01”,2) will return “2009-03-03”

date_format

FieldDescription
DescriptionConverts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument.
Parameters

arg0: The date/timestamp/string to be converted.

arg1: The format to which the date/timestamp/string to be converted.

ReturnsReturns the converted string.
ThrowsApplicationException
Exampledate_format(“2009-03-01”,“MM-dd-yyyy”) will return “03-01-2009”

date_sub

FieldDescription
DescriptionComputes the date that is ‘arg1’ days before start date.
Parameters

arg0: The date to which days to be substracted.

arg1: No of days to be substracted.

ReturnsReturns the computed date.
ThrowsApplicationException
Exampledate_sub(“2009-03-02”,1) will return “2009-03-01”

date_trunc

FieldDescription
DescriptionReturns timestamp ts truncated to the unit specified by the format model fmt. fmt should be one of [“YEAR” “YYYY” “YY” “MON” “MONTH” “MM” “DAY” “DD” “HOUR” “MINUTE” “SECOND” “WEEK” “QUARTER”]
Parameters

arg0: The fmt format.

arg1: The ts timestamp.

ReturnsReturns timestamp ts truncated to the unit specified by the format model fmt.
ThrowsApplicationException
Exampledate_trunc(‘YEAR’, ‘2015-03-05T09:32:05.359’) will return 2015-01-01 00:00:00

datediff

FieldDescription
DescriptionComputes the number of days from ‘arg0’ date to ‘arg1’ date.
Parameters

arg0: The end date.

arg1: The start date.

ReturnsReturns the computed number of days.
ThrowsApplicationException.
Exampledatediff(“2009-03-01”,“2009-02-27”) will return 2

day

FieldDescription
DescriptionExtracts and returns the day of month of the given date/timestamp.
Parametersarg0: The column or expression.
ReturnsReturns the day of month of the date/timestamp.
ThrowsApplicationException
ExampleSELECT day(‘2009-07-30’); will return 30.

dayofmonth

FieldDescription
DescriptionExtracts the day of the month as an integer from a given date/timestamp/string.
Parametersarg0: The date/timestamp/string from which the day of month to be extracted.
ReturnsReturns the extracted day as an integer.
ThrowsApplicationException
Exampledayofmonth(“2009-03-01”) will return 1

dayofweek

FieldDescription
DescriptionReturns the day of the week for date/timestamp (1 = Sunday).
Parametersarg0: The date column.
ReturnsReturns the day of the week for date/timestamp
ThrowsApplicationException
ExampleDayofweek(‘2009-07-30’) will return 5

dayofyear

FieldDescription
DescriptionExtracts the day of the year as an integer from a given date/timestamp/string.
Parametersarg0: The date/timestamp/string from which the day of year to be extracted.
ReturnsReturns the extracted day as an integer.
ThrowsApplicationException
Exampledayofyear(“2017-12-15”) will return 349

decimal

FieldDescription
DescriptionCasts the value expr to the target data type decimal.
Parametersarg0:The column or expression.
ReturnsReturns the decimal column or value.
ThrowsApplicationException
ExampleSELECT decimal(column); will cast column to decimal

decode

FieldDescription
DescriptionDecodes the first argument using the second argument character set.
Parameters

arg0:Column to be decoded.

arg1: The charset.

ReturnsReturns the decoded column.
ThrowsApplicationException
Exampledecode(encode(‘abc’, ‘utf-8’), ‘utf-8’) will return abc

degrees

FieldDescription
DescriptionConverts an angle measured in radians to an approximately equivalent angle measured in degrees.
Parametersarg0: The column for which the equivalent angle measured in degrees to be calculated.
ReturnsReturns the converted angle measured in degrees.
ThrowsApplicationException.
Exampledegrees(71.21) will return 4080.0324066707394

dense_rank

FieldDescription
DescriptionComputes the rank of a value in a group of values. Unlike the function rank, dense_rank will not produce gaps in the ranking sequence.
Parametersarg0:Not applicable
ReturnsThe calculated dense rank.
ThrowsApplicationException.
Exampleselect dense_rank() OVER (order by col) will return 1,2,3,4…

element_at

FieldDescription
DescriptionFor Array Column Returns element of array at given (1-based) index. If index < 0, accesses elements from the last to the first. For map column,Returns NULL if the index exceeds the length of the array.
Parameters

arg0:Array or map column.

arg1: index or keyt.

ReturnsReturns the element.
ThrowsApplicationException.
Exampleelement_at(array(1, 2, 3), 2) will return 2 and element_at(map(1, ‘a’, 2, ‘b’), 2) will return b

encode

FieldDescription
DescriptionEncodes the first argument using the second argument character set.
Parameters

arg0:Column to be encoded.

arg1: The charset.

ReturnsReturns the encoded column.
ThrowsApplicationException.
Exampleencode(‘abc’, ‘utf-8’) will return abc

exp

FieldDescription
DescriptionComputes the exponential of the given value.
Parametersarg0: The column for which exponential to be calculated.
ReturnsReturns the computed exponential.
ThrowsApplicationException.
Exampleexp(0.78) will return 2.18147220308578

explode

FieldDescription
DescriptionSeparates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns.
Parametersarg0: The expr Column.
ReturnsReturns the exploded column.
ThrowsApplicationException.
Exampleexplode(array(10, 20)) will return 10, 20 in a new column.

explode_outer

FieldDescription
DescriptionSeparates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns.
Parametersarg0: The expr Column.
ReturnsReturns the exploded column.
ThrowsApplicationException.
Exampleexplode_outer(array(10, 20)) will return 10, 20

expm1

FieldDescription
DescriptionComputes the exponential of the given value minus one.
Parametersarg0: The column for which exponential minus one to be calculated.
ReturnsReturns the computed value.
ThrowsApplicationException.
Exampleexpm1(0.23) will return 0.2586000151807663

expr

FieldDescription
DescriptionParses the expression string into the column that it represents.
Parametersarg0: The expression string to be parsed.
ReturnsReturns the parsed expression string.
ThrowsApplicationException.
Exampleexpr(“colA”, “colB as newName”) will return two columns colA and newName

factorial

FieldDescription
DescriptionComputes the factorial of the given value.
Parametersarg0: The column for which factorial to be calculated.
ReturnsReturns the computed factorial.
ThrowsApplicationException.
ExampleFactorial(11) will return 39916800

flatten

FieldDescription
DescriptionTransforms an array of arrays into a single array.
Parametersarg0: The array of array Column.
ReturnsReturns the flatten array.
ThrowsApplicationException.
Exampleflatten(array(array(1, 2), array(3, 4))) will return [1,2,3,4]

float

FieldDescription
DescriptionCasts the value expr to the target data type float.
Parametersarg0:The column or expression.
ReturnsReturns the float column or value.
ThrowsApplicationException
ExampleSELECT float(column); will cast column to float

floor

FieldDescription
DescriptionComputes the floor of the given value.
Parametersarg0: The column for which floor to be calculated.
ReturnsReturns the computed floor.
ThrowsApplicationException.
Examplefloor(71.21) will return 71

format_number

FieldDescription
DescriptionFormats numeric column arg0 to a format like ‘#,###,###.##’, rounded to arg1 decimal places, and returns the result as a string column.
Parameters

arg0: The column to be formated.

arg1: The integer specifying the decimal places to be used for rounding.

ReturnsReturns the formated result as a string column.
ThrowsApplicationException.
Exampleformat_number(7120.12, 1) will return 7,120.1

format_string

FieldDescription
DescriptionFormats the arguments in printf-style and returns the result as a string column.
Parameters

arg0: The pintf style format.

arg1: The columns to be formatted.

ReturnsReturns the formated arguments as a string column.
ThrowsApplicationException.
Example

We have taken column1 as “cow” , column2 as “moon” and column3 as 2

format_string(“the %s jumped over the %s, %d times”,@{column.schema.column1},@{column.schema.column2},@{column.schema.column3}) will return “the cow jumped over the moon 2 times”


from_json

FieldDescription
DescriptionReturns a struct value with the given jsonStr and schema.
Parameters

arg0:The Json string column.

arg1: The schema column.

arg2: The properties map.

ReturnsReturns the struct value.
ThrowsApplicationException.
Examplefrom_json(’{“a”:1, “b”:0.8}’, ‘a INT, b DOUBLE’) will return {“a”:1, “b”:0.8} and from_json(’{“time”:“26/08/2015”}’, ’time Timestamp’, map(’timestampFormat’, ‘dd/MM/yyyy’))will return {“time”:“2015-08-26 00:00:00.0”}

from_unixtime

FieldDescription
DescriptionConverts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string representing the timestamp of that moment in the current system time zone in the given format.
Parameters

arg0: The number of seconds from unix epoch.

arg1: The format for timestamp in current system timezone to which conversion has to be done.

ReturnsReturns the converted string.
ThrowsApplicationException.
Examplefrom_unixtime(1255033470,“yyyy-dd-MM”) will return 2009-09-10

from_utc_timestamp

FieldDescription
DescriptionGiven a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in UTC, and renders that time as a timestamp in the given time zone.
Parameters

arg0:timestamp column.

arg1: Timezone column.

ReturnsReturns the timestamp.
ThrowsApplicationException.
Examplefrom_utc_timestamp(‘2016-08-31’, ‘Asia/Seoul’) will return 2016-08-31 09:00:00

get_json_object

FieldDescription
DescriptionExtracts a json object from path.
Parameters

arg0:The json txt column.

arg1: the path.

ReturnsReturns the extracted json object.
ThrowsApplicationException.
Exampleget_json_object(’{“a”:“b”}’, ‘$.a’) will return b

greatest

FieldDescription
DescriptionIt gives the greatest value of the list of values.This function takes at least 2 parameters.
Parameters

arg0:A column from the schema

arg1:A column from the schema

ReturnsA Column.
ThrowsApplicationException.
Example

We have taken column1 as ‘258’, column2 as ‘259’

greatest(@{column.schema.column1},@{column.schema.column2}) will return column2’s values


hash

FieldDescription
DescriptionReturns a hash value of the arguments.
Parametersarg0:The columns for which hash to be calculated.
ReturnsReturns a hash value of the arguments.
ThrowsApplicationException.
Examplehash(‘Spark’, array(123), 2) will return -1321691492

hex

FieldDescription
DescriptionIf the argument is an INT, hex returns the number as a STRING in hexadecimal format. Otherwise if the number is a STRING, it converts each character into its hexadecimal representation and returns the resulting STRING.
Parametersarg0:A int/string column
ReturnsA string value.
ThrowsApplicationException.
Example

We have taken column1 as 258,

hex(@{column.schema.column1}) will return 102


hour

FieldDescription
DescriptionExtracts the hours as an integer from a given date/timestamp/string.
Parametersarg0: The date/timestamp/string from which the hours to be extracted.
ReturnsReturns the extracted hours as an integer.
ThrowsApplicationException.
Examplehour(“2017-12-15 11:02:03”) will return 11

hypot

FieldDescription
DescriptionComputes sqrt(arg0^2^ + arg1^2^) without intermediate overflow or underflow.
Parameters

arg0: Will be used while computing sqrt.

arg1: Will be used while computing sqrt.

ReturnsReturns the computed sqrt(arg0^2^ + arg1^2^).
ThrowsApplicationException.
Examplehypot(71.21, 10.5) will return 71.97995533209642

ifnull

FieldDescription
DescriptionReturns expr2 if expr1 is null, or expr1 otherwise.
Parameters

arg0:The first column expression.

arg1:The second column expression.

ReturnsReturns the binary.
ThrowsApplicationException.
ExampleSELECT nvl(NULL, array(‘2’)); will return [“2”]

initcap

FieldDescription
DescriptionComputes a new string column by converting the first letter of each word to uppercase.
Parametersarg0: The input string.
ReturnsReturns the converted string column.
ThrowsApplicationException.
Exampleinitcap(“apple”) will return “Apple”

input_file_name

FieldDescription
DescriptionReturns the name of the file being read, or empty string if not available.
Parameters-
ReturnsReturns the name of the file being read, or empty string if not available.
ThrowsApplicationException.
Exampleinput_file_name() - will return the name of the file being read

instr

FieldDescription
DescriptionLocate the position of the first occurrence of given substring in the given string column.
Parameters

arg0: The string column in which the location to be determined.

arg1: The substring for which the position to be determined.

ReturnsReturns the position of the first occurrence of substring.
ThrowsApplicationException.
Exampleinstr(“apple”,“le”) will return 4

int

FieldDescription
DescriptionCasts the value expr to the target data type int.
Parametersarg0:The column or expression.
ReturnsReturns the int column or value.
ThrowsApplicationException.
ExampleSELECT int(column); will cast column to int

isnan

FieldDescription
DescriptionReturns true if the input is Not a Number.
Parametersarg0:A column whose values needs to be checked
ReturnsA boolean true if the value is not a number
ThrowsApplication Exception.
ExampleWe have taken column1 as ‘abc’ isnan(@{column.schema.column1}) will return true

isnotnull

FieldDescription
DescriptionChecks if the given expression is not null.
Parametersarg0:The column or expression.
ReturnsReturns true if expr is not null, or false otherwise.
ThrowsApplicationException.
ExampleSELECT isnotnull(1); will return true.

isnull

FieldDescription
DescriptionReturns true if a is NULL and false otherwise.
Parametersarg0:A column whose values needs to be checked
ReturnsA boolean true if the value is null.
ThrowsApplication Exception.
ExampleWe have taken column1 as ‘abc’ isnull(@{column.schema.column1}) will return false

lag

FieldDescription
DescriptionReturns the value of input at the offsetth row before the current row in the window. The default value of offset is 1 and the default value of default is null. If the value of input at the offsetth row is null, null is returned. If there is no such offset row (e.g., when the offset is 1, the first row of the window does not have any previous row), default is returned.
Parameters

arg0:The input - a string expression to evaluate offset rows before the current row.

arg1:The offset- an int expression which is rows to jump back in the partition.

arg1:The default- a string expression which is to use when the offset row does not exist.

ReturnsReturns the value of input at the offsetth row before the current row in the window.
ThrowsApplicationException.
Exampleselect lag(col, 1) OVER (order by col)

last_day

FieldDescription
DescriptionGiven a date column, returns the last day of the month which the given date belongs to.
Parametersarg0: The date from which last day of month to be extracted.
ReturnsReturns the computed last day.
ThrowsApplicationException.
Examplelast_day(“2017-12-15”) will return “2017-12-31”

last_value

FieldDescription
DescriptionReturns the last value of expr for a group of rows. If isIgnoreNull is true, returns only non-null values.
Parameters

arg0:The column or expression.

arg1:The isIgnoreNull.

ReturnsReturns the last value of expr.
ThrowsApplicationException.
ExampleSELECT last_value(col) FROM VALUES (10), (5), (20) AS tab(col); will return 20

lcase

FieldDescription
DescriptionConverts str with all characters changed to lowercase.
Parametersarg0:The column or expression.
ReturnsReturns str with all characters changed to lowercase.
ThrowsApplicationException
ExampleSELECT lcase(‘SparkSql’); will return sparksql

lead

FieldDescription
DescriptionReturns the value of input at the offsetth row after the current row in the window. The default value of offset is 1 and the default value of default is null. If the value of input at the offsetth row is null, null is returned. If there is no such an offset row (e.g., when the offset is 1, the last row of the window does not have any subsequent row), default is returned.
Parameters

arg0:The input- a string expression to evaluate offset rows after the current row.

arg1:The offset- an int expression which is rows to jump ahead in the partition.

arg1:The default- a string expression which is to use when the offset is larger than the window. The default value is null.

ReturnsReturns the value of input at the offsetth row before the current row in the window.
ThrowsApplicationException
Exampleselect lead(col, 1) OVER (order by col)

least

FieldDescription
DescriptionIt gives the least value of the list of values.This function takes at least 2 parameters.
Parameters

arg0:A column from the schema

arg1:A column from the schema

ReturnsA column.
ThrowsApplicationException
Example

We have taken column1 as ‘258’, column2 as ‘259’

least(@{column.schema.column1},@{column.schema.column2}) will return column1’s values


length

FieldDescription
DescriptionComputes the length of a given string or binary column.
Parametersarg0: The string for which the length to be determined.
ReturnsReturns the computed length.
ThrowsApplicationException
Examplelength(“apple”) will return 5

levenshtein

FieldDescription
DescriptionComputes the Levenshtein distance of the two given string columns.
Parameters

arg0: The first string column from which the Levenshtein distance from the second string column to be determined.

arg1: The second string column.

ReturnsReturns the computed Levenshtein distance.
ThrowsApplicationException
Examplelevenshtein(“kitten”, “sitting”) will return 3

like

FieldDescription
Descriptionstr like pattern - Returns true if str matches pattern, null if any arguments are null, false otherwise.
Parameters

arg0:A string expression.

arg1:The pattern string which is matched.

ReturnsReturns true, false or null.
ThrowsApplicationException
ExampleSELECT ‘%SystemDrive%UsersJohn’ like ‘%SystemDrive%Users%’; will return true

ln

FieldDescription
DescriptionComputes the natural logarithm of the given value.
Parametersarg0: The column for which natural logarithm to be calculated.
ReturnsReturns the computed natural logarithm.
ThrowsApplicationException
Exampleln(20) will return 2.995732273553991

locate

FieldDescription
DescriptionLocate the position of the first occurrence of given substring in a string column, after the given position.
Parameters

arg0: The String for which the location to be determined.

arg1: The string in which the location to be determined.

arg2: The position after which the location to be determined.

ReturnsReturns the position of the first occurrence of substring.
ThrowsApplicationException
Examplelocate(“apple”,“An apple”,1) will return 3

log

FieldDescription
DescriptionComputes the natural logarithm of the given value.
Parametersarg0: The column for which natural logarithm to be calculated.
ReturnsReturns the computed natural logarithm.
ThrowsApplicationException
Examplelog(20) will return 2.995732273553991

log10

FieldDescription
DescriptionComputes the logarithm with base 10 of the given value.
Parametersarg0: The column for which logarithm to be calculated.
ReturnsReturns the computed logarithm with base 10.
ThrowsApplicationException
Examplelog10(10) will return 1

log1p

FieldDescription
DescriptionComputes the natural logarithm of the given value plus one.
Parametersarg0: The column for which natural logarithm plus one to be calculated.
ReturnsReturns the computed natural logarithm plus one.
ThrowsApplicationException
Examplelog1p(20) will return 3.044522437723423

log2

FieldDescription
DescriptionComputes the logarithm with base 2 of the given value.
Parametersarg0: The column for which logarithm to be calculated.
ReturnsReturns the computed logarithm with base 2.
ThrowsApplicationException
Examplelog2(2) will return 1

lower

FieldDescription
DescriptionConverts a string column to lower case.
Parametersarg0: The string column to be converted to lower case.
ReturnsReturns the converted string.
ThrowsApplicationException
Examplelower(“APple”) will return “apple”

lpad

FieldDescription
DescriptionLeft-pad the string column with the given string, to a given length.
Parameters

arg0: The string column to be left-padded.

arg1: The length for the padding.

arg2: The string to used for left-pad.

ReturnsReturns the Left-padded string.
ThrowsApplicationException
Examplelpad(“SQL Tutorial”, 20, “ABC”) will return “ABCABCABSQL Tutorial”

ltrim

FieldDescription
DescriptionRemoves the leading string contains the characters from the trim string.
Parameters

arg0:the trim string characters to trim, the default value is a single space.

arg1: a string expression.

ReturnsReturns the trimed string.
ThrowsApplicationException
Exampleltrim(‘Sp’,‘SsparkSQLS’)will return ArkSQLS

map

FieldDescription
DescriptionCreates a map with the given key/value pairs.
Parametersarg0:The columns for key and value.
Returnsreturns the map.
ThrowsApplicationException
Examplemap(1.0, ‘2’, 3.0, ‘4’) will return {1.0:“2”,3.0:“4”}

map_concat

FieldDescription
DescriptionReturns the union of all the given maps
Parametersarg0:The map columns.
ReturnsReturns the union of all the given maps
ThrowsApplicationException
Examplemap_concat(map(1, ‘a’, 2, ‘b’), map(2, ‘c’, 3, ’d’)) will return {1:“a”,2:“c”,3:“d”}

map_from_arrays

FieldDescription
DescriptionCreates a map with a pair of the given key/value arrays. All elements in keys should not be null.
Parameters

arg0:Array of keys.

arg1:Array of values.

ReturnsReturns the map.
ThrowsApplicationException
Examplemap_from_arrays(array(1.0, 3.0), array(‘2’, ‘4’)) will return {1.0:“2”,3.0:“4”}

map_from_entries

FieldDescription
DescriptionReturns a map created from the given array of entries.
Parametersarg0:Array of entries.
ReturnsReturns the map.
ThrowsApplicationException
Examplemap_from_entries(array(struct(1, ‘a’), struct(2, ‘b’))) will return {1:“a”,2:“b”}

map_keys

FieldDescription
DescriptionReturns an unordered array containing the keys of the map.
Parametersarg0:Map column.
ReturnsReturns the array.
ThrowsApplicationException
Examplemap_keys(map(1, ‘a’, 2, ‘b’)) will return [1,2]

map_values

FieldDescription
DescriptionReturns an unordered array containing the values of the map.
Parametersarg0:Map column.
ReturnsReturns the array.
ThrowsApplicationException
Examplemap_values(map(1, ‘a’, 2, ‘b’)) will return [“a”,“b”]

md5

FieldDescription
DescriptionCalculates an MD5 128-bit checksum for the string.
Parametersarg0:A string column
ReturnsThe value is returned as a string of 32 hex digits, or NULL if the argument was NULL.
ThrowsApplicationException
Examplemd5(@{column.schema.column1}) will give you an output ‘902fbdd2b1df0c4f70b4a5d23525e932’ if value of column1 is ‘ABC’.

minute

FieldDescription
DescriptionExtracts the minutes as an integer from a given date/timestamp/string.
Parametersarg0: The date/timestamp/string from which the minutes to be extracted.
ReturnsReturns the extracted minutes as an integer.
ThrowsApplicationException
Exampleminute(“2017-12-15 11:02:03”) will return 2

mod

FieldDescription
DescriptionCalculated the remainder of the given expressions.
Parameters

arg0:The first column expression.

arg1:The second column expression.

ReturnsReturns the remainder after expr1/expr2.
ThrowsApplicationException
ExampleSELECT MOD(2, 1.8); will return 0.2

monotonically_increasing_id

FieldDescription
DescriptionA column expression that generates monotonically increasing 64-bit integers.
Parameters-
ReturnsMonotonically increasing integers.
ThrowsApplicationException
Examplemonotonically_increasing_id() will return rows as 0,1,2…

month

FieldDescription
DescriptionExtracts the month as an integer from a given date/timestamp/string.
Parametersarg0: The date/timestamp/string from which the month to be extracted.
ReturnsReturns the extracted month as an integer.
ThrowsApplicationException
Examplemonth(“2017-12-15 11:02:03”) will return 12

months_between

FieldDescription
DescriptionIf timestamp1 is later than timestamp2, then the result is positive. If timestamp1 and timestamp2 are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false.
Parameters

arg0:timestamp1 column.

arg1:timestamp2 column.

arg2:roundoff boolean.

ReturnsReturns the months difference.
ThrowsApplicationException
Examplemonths_between(‘1997-02-28 10:30:00’, ‘1996-10-30’, false) will return 3.9495967741935485

nanvl

FieldDescription
DescriptionReturns expr1 if it’s not NaN, or expr2 otherwise.
Parameters

arg0:expr1 column.

arg1:expr2 column.

ReturnsReturns expr1 if it’s not NaN, or expr2 otherwise.
ThrowsApplicationException
Examplenanvl(cast(‘NaN’ as double), 123) will return 123.0

negative

FieldDescription
DescriptionReturns the negated value of expr.
Parametersarg0:The column or expression.
ReturnsThe negated value.
ThrowsApplicationException
ExampleSELECT negative(1); will return -1

next_day

FieldDescription
DescriptionGiven a date column, returns the first date which is later than the value of the date column that is on the specified day of the week.
Parameters

arg0: The date later which the first date for a particular day of week has to determined.

arg1: The day of week.

ReturnsReturns the computed first date.
ThrowsApplicationException
Examplenext_day(“2017-12-15”,“friday”) will return “2017-12-22”

not

FieldDescription
DescriptionPerform logical not of given column.
Parametersarg0:Given boolean column.
ReturnsReturns logical not of given column.
ThrowsApplicationException
Examplenot(false) will return true.

now

FieldDescription
DescriptionReturns the current timestamp at the start of query evaluation.
Parametersarg0:Not applicable.
ReturnsThe current timestamp
ThrowsApplicationException
ExampleSELECT now(); will return 2020-06-26 15:09:37

nullif

FieldDescription
DescriptionReturns null if expr1 equals to expr2, or expr1 otherwise.
Parameters

arg0:The first column expression.

arg1:The second column expression.

ReturnsReturns the binary.
ThrowsApplicationException
ExampleSELECT nullif(2, 2); will return NULL

nvl

FieldDescription
DescriptionReturns expr2 if expr1 is null, or expr1 otherwise.
Parameters

arg0:The first column expression.

arg1:The second column expression.

ReturnsReturns the binary.
ThrowsApplicationException
ExampleSELECT nvl(NULL, array(‘2’)); will return [“2”]

nvl2

FieldDescription
DescriptionReturns expr2 if expr1 is not null, or expr3 otherwise.
Parameters

arg0:The first column expression.

arg1:The second column expression.

arg1:The third column expression.

ReturnsReturns the binary.
ThrowsApplicationException
ExampleSELECT nvl2(NULL, 2, 1); will return 1

parse_url

FieldDescription
DescriptionExtracts a part from a URL
Parameters

arg0:The URL.

arg1:The part to extract.

arg1:The key.

ReturnsReturns the binary.
ThrowsApplicationException
ExampleSELECT parse_url(‘http://spark.apache.org/path?query=1’, ‘HOST’) will return spark.apache.org

percent_rank

FieldDescription
DescriptionComputes the percentage ranking of a value in a group of values.
Parametersarg0:Not applicable.
ReturnsReturns percentage ranking.
ThrowsApplicationException
Exampleselect percent_rank() OVER (order by col) will return 1,2,3,4….

pi

FieldDescription
DescriptionReturns pi.
Parametersarg0:Not applicable.
ReturnsReturns pi.
ThrowsApplicationException
ExampleSELECT pi(); will return 3.141592653589793

pmod

FieldDescription
DescriptionComputes the positive value of arg0 mod arg1.
Parameters

arg0: The dividend.

arg1: The divisor.

ReturnsReturns the computed positive value of arg0 mod arg1.
ThrowsApplicationException
Examplepmod(19, 0.78) will return 0.2800007

posexplode

FieldDescription
DescriptionSeparates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions.
Parametersarg0: The array expression.
ReturnsReturns multiple rows and columns.
ThrowsApplicationException
Example

SELECT posexplode(array(10,20)); will return 0 10

1 20


posexplode_outer

FieldDescription
DescriptionSeparates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions.
Parametersarg0: The array expression.
ReturnsReturns multiple rows and columns.
ThrowsApplicationException
Example

SELECT posexplode(array(10,20)); will return 0 10

1 20


position

FieldDescription
DescriptionReturns the position of the first occurrence of substr in str after position pos. The given pos and return value are 1-based.
Parameters

arg0:The substr.

arg1:The str.

arg1:The pos.

ReturnsReturns the position.
ThrowsApplicationException
ExampleSELECT position(‘bar’, ‘foobarbar’); will return 4

pow

FieldDescription
DescriptionComputes the value of the first argument raised to the power of the second argument.
Parameters

arg0: The base.

arg1: The exponent.

ReturnsReturns the computed value.
ThrowsApplicationException
Examplepow(20, 2) will return 400

pow_left_arg_double

FieldDescription
DescriptionComputes the value of the first argument raised to the power of the second argument.
Parameters

arg0: The base.

arg1: The exponent.

ReturnsReturns the computed value.
ThrowsApplicationException
Examplepow_left_arg_double(20, 2) will return 400

pow_right_arg_double

FieldDescription
DescriptionComputes the value of the first argument raised to the power of the second argument.
Parameters

arg0: The base.

arg1: The exponent.

ReturnsReturns the computed value.
ThrowsApplicationException
Examplepow_right_arg_double(20, 2) will return 400

quarter

FieldDescription
DescriptionExtracts the quarter as an integer from a given date/timestamp/string.
Parametersarg0: The column for which quarter to be calculated.
ReturnsReturns the extracted quarter as an integer.
ThrowsApplicationException
Examplequarter(“2017-12-22 01:12:00”) will return 4

radians

FieldDescription
DescriptionConverts an angle measured in degrees to an approximately equivalent angle measured in radians.
Parametersarg0: The column for which equivalent angle measured in radians to be calculated.
ReturnsReturns the converted angle measured in radians.
ThrowsApplicationException
Exampleradians(20) will return 0.3490658503988659

rand

FieldDescription
DescriptionGenerate a random column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.
Parameters-
ReturnsReturns the generated column.
ThrowsApplicationException
Examplerand() will return a new column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.

randn

FieldDescription
DescriptionGenerate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
Parameters-
ReturnsReturns the generated column.
ThrowsApplicationException
Examplerandn() will return a new column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.

rank

FieldDescription
DescriptionComputes the rank of a value in a group of values. The result is one plus the number of rows preceding or equal to the current row in the ordering of the partition. The values will produce gaps in the sequence.
Parametersarg0:Not applicable.
ReturnsThe calculated rank.
ThrowsApplicationException
Exampleselect rank() OVER (order by col) will return 1,2,3,4…

regexp_extract

FieldDescription
DescriptionExtract a specific group matched by a Java regex, from the specified string column.
Parameters

arg0: The string column from which the group to be extracted.

arg1: The string specifying the regex.

arg2: The regex group id.

ReturnsReturns the extracted group.
ThrowsApplicationException
Exampleregexp_extract(“foothebar”,“foo(.*?)(bar)”, 2) will return “bar”

regexp_replace

FieldDescription
DescriptionReplace all substrings of the specified string value that match regexp with the given replacement.
Parameters

arg0: The string column from which substrings to be replaced.

arg1: The pattern to be used.

arg2: The replacement.

ReturnsReturns the string after replacement of substrings that match regexp with rep.
ThrowsApplicationException
Exampleregexp_replace(“foobar”, “oo

repeat

FieldDescription
DescriptionRepeats each value in select column n times.
Parameters

arg0:A column needs to be repeated

arg1:Integer value representing no of times arg0 is to be repeated

ReturnsA repeated value.
ThrowsApplicationException
Example

We have taken column1 as ‘str’

repeat(@{column.schema.column1},2) will return ‘strstr’


replace

FieldDescription
DescriptionReplaces all occurrences of search with replace.
Parameters

arg0:str - A string expression.

arg1:search - a string expression. If search is not found in str, str is returned unchanged.

arg1:replace - a string expression. If replace is not specified or is an empty string, nothing replaces the string that is removed from str.

ReturnsReturns the replaced string.
ThrowsApplicationException
ExampleSELECT replace(‘ABCabc’, ‘abc’, ‘DEF’); will return ABCDEF

reverse

FieldDescription
DescriptionReverses the string column and returns it as a new string column.
Parametersarg0: The string column to be reversed.
ReturnsReturns the reversed string column.
ThrowsApplicationException
Examplereverse(“apple”) will return “elppa”

rint

FieldDescription
DescriptionComputes the double value that is closest in value to the argument and is equal to a mathematical integer.
Parametersarg0: The column for which double value to be calculated.
ReturnsReturns the computed value.
ThrowsApplicationException
Examplerint(80.89) will return 81.0

round

FieldDescription
DescriptionComputes the value of the column arg0 rounded to 0 decimal places.
Parametersarg0: The column for which value rounded to 0 decimal places to be calculated.
ReturnsReturns the computed value.
ThrowsApplicationException
Exampleround(80.89) will return 81.0

row_number

FieldDescription
DescriptionAssigns a unique, sequential number to each row, starting with one, according to the ordering of rows within the window partition.
Parametersarg0:Not applicable.
ReturnsThe row number.
ThrowsApplicationException
Exampleselect row_number() OVER (order by col) will return 1,2,3,4….

rpad

FieldDescription
DescriptionRight-pad the string column with the given string, to a given length.
Parameters

arg0: The string column to be right-padded.

arg1: The length for the padding.

arg2: The string to used for right-pad.

ReturnsReturns the Right-padded string.
ThrowsApplicationException
Examplerpad(“SQL Tutorial”, 20, “ABC”) will return “SQL TutorialABCABCAB”

rtrim

FieldDescription
DescriptionTrim the spaces from right end for the specified string value.
Parametersarg0: The string column from which right spaces to be trimmed.
ReturnsReturns the trimmed string value.
ThrowsApplicationException
Examplertrim(“apple “) will return “apple”

schema_of_json

FieldDescription
DescriptionReturns schema in the DDL format of JSON string.
Parametersarg0:Given json string column
ReturnsReturns schema of the json.
ThrowsApplicationException
Exampleschema_of_json(’[{“col”:0}]’) will return array>

second

FieldDescription
DescriptionExtracts the seconds as an integer from a given date/timestamp/string.
Parametersarg0: The date/timestamp/string from which the seconds to be extracted.
ReturnsReturns the seconds as an integer.
ThrowsApplicationException
Examplesecond(“2017-12-15 11:02:03”) will return 3

sequence

FieldDescription
DescriptionGenerates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.Supported types are: byte, short, integer, long, date, timestamp.The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the ‘date’ or ’timestamp’ type then the step expression must resolve to the ‘interval’ type, otherwise to the same type as the start and stop expressions.
Parameters

arg0:start - an expression. The start of the range.

arg1:stop - an expression. The end the range (inclusive).

arg2:step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it’s 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa.

ReturnsReturns the sequence
ThrowsApplicationException
Examplesequence(to_date(‘2018-01-01’), to_date(‘2018-03-01’), interval 1 month) will return [2018-01-01,2018-02-01,2018-03-01]

sha1

FieldDescription
DescriptionCalculates the SHA-1 digest for string and returns the value as a hex string
Parametersarg0:A string column
ReturnsA hex string.
ThrowsApplicationException
Example

We have taken column1 as ‘ABC’

sha1(@{column.schema.column1}) will return ‘3c01bdbb26f358bab27f267924aa2c9a03fcfdb8’


sha2

FieldDescription
DescriptionCalculates the SHA-2 family of hash functions of a string value and returns the value as a hex string. NumBits controls the number of bits in the message digest.
Parameters

arg0:A string column

arg1:one of 224, 256, 384, or 512.

ReturnsA hex string
ThrowsApplicationException
Example

We have taken column1 as ‘Sam’

sha2(@{column.schema.column1},256) will return ‘4ecde249d747d51d8..’


shiftLeft

FieldDescription
DescriptionBitwise left shift, Shifts a b positions to the left.
Parameters

arg0:A number column

arg1:An integer column

ReturnsIf the given value is a long value, this function will return a long value else it will return an integer value
ThrowsApplicationException
Example

We have taken column1 as ‘258’

shiftLeft(@{column.schema.column1},2) will return 1032


shiftRight

FieldDescription
DescriptionBitwise right shift, Shifts a b positions to the right.
Parameters

arg0:A number column

arg1:An integer column

ReturnsIf the given value is a long value, this function will return a long value else it will return an integer value
ThrowsApplicationException
Example

We have taken column1 as ‘258’

shiftRight(@{column.schema.column1},2) will return 64


shiftRightUnsigned

FieldDescription
DescriptionBitwise unsigned right shift, Shifts a b positions to the right.
Parameters

arg0:A number column

arg1:An integer column

ReturnsIf the given value is a long value, this function will return a long value else it will return an integer value
ThrowsApplicationException
Example

We have taken column1 as ‘258’

shiftRightUnsigned(@{column.schema.column1},2) will return 64


shuffle

FieldDescription
DescriptionReturns a random permutation of the given array.
Parametersarg0: array column.
ReturnsReturns a random permutation of the given array.
ThrowsApplicationException
Exampleshuffle(array(1, 20, null, 3))will return [20,null,3,1]

sign

FieldDescription
DescriptionReturns -1.0, 0.0 or 1.0 as expr is negative, 0 or positive.
Parametersarg0:Not applicable.
ReturnsReturns 1.0, 0.0 or 1.0.
ThrowsApplicationException
ExampleSELECT signum(40); will return 1.0

signum

FieldDescription
DescriptionComputes the signum of the given value.
Parametersarg0: The column for which signum to be calculated.
ReturnsReturns the computed signum.
ThrowsApplicationException
Examplesignum(20) will return 1.0

sin

FieldDescription
DescriptionComputes the sine of the given value.
Parametersarg0: The column for which sine to be calculated.
ReturnsReturns the computed sine.
ThrowsApplicationException
Examplesin(20) will return 0.9129452507276277

sinh

FieldDescription
DescriptionComputes the hyperbolic sine of the given value.
Parametersarg0: The column for which hyperbolic sine to be calculated.
ReturnsReturns the computed hyperbolic sine.
ThrowsApplicationException
Examplesinh(20) will return 2.4258259770489514E8

size

FieldDescription
DescriptionReturns the size of an array or a map. The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true. If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input. By default, the spark.sql.legacy.sizeOfNull parameter is set to true.
Parametersarg0: array or map column.
ReturnsReturns the size
ThrowsApplicationException
Examplesize(array(‘b’, ’d’, ‘c’, ‘a’)) will return 4

slice

FieldDescription
DescriptionSubsets array x starting from index start (or starting from the end if start is negative) with the specified length.
Parameters

arg0:array column.

arg1:start index.

arg2:end index

ReturnsReturns an array.
ThrowsApplicationException
Exampleslice(array(1, 2, 3, 4), 2, 2) will return [2,3]

smallint

FieldDescription
DescriptionCasts the value expr to the target data type smallint.
Parametersarg0:The column or expression.
ReturnsReturns the smallint column or value.
ThrowsApplicationException
ExampleSELECT smallint(column); will cast column to smallint

sort_array

FieldDescription
DescriptionSorts the input array in ascending or descending order according to the natural ordering of the array elements. Null elements will be placed at the beginning of the returned array in ascending order or at the end of the returned array in descending order.
Parameters

arg0:array column.

arg1:boolean flag for order.

ReturnsReturns the sorted array.
ThrowsApplicationException
Examplesort_array(array(‘b’, ’d’, null, ‘c’, ‘a’), true) will return [null,“a”,“b”,“c”,“d”]

soundex

FieldDescription
DescriptionReturns Soundex code of the string.
Parametersarg0:String column.
ReturnsReturns Soundex code of the string.
ThrowsApplicationException
Examplesoundex(‘Miller’) will return M460

spark_partition_id

FieldDescription
DescriptionReturns the current partition id.
Parametersarg0:Not applicable.
ReturnsThe partition id.
ThrowsApplicationException
Exampleselect spark_partition_id(); will return 1

split

FieldDescription
DescriptionSplits str around occurrences that match regex
Parameters

arg0: str string column.

arg1:the regex string.

ReturnsReturns the splits.
ThrowsApplicationException
Examplesplit(‘oneAtwoBthreeC’, ‘[ABC]’)will return [“one”,“two”,“three”,””]

sqrt

FieldDescription
DescriptionComputes the square root of the specified float value.
Parametersarg0: The column for which square root to be calculated.
ReturnsReturns the computed square root.
ThrowsApplicationException
ExamplesqlSqrt(20) will return 4.47213595499958

string

FieldDescription
DescriptionCasts the value expr to the target data type string.
Parametersarg0:The column or expression.
ReturnsReturns the string column or value.
ThrowsApplicationException
ExampleSELECT string(column); will cast column to string

struct

FieldDescription
DescriptionCreates a struct with the given field values.
Parametersarg0:columns using which the struct will be created.
Returnsreturns the struct column
ThrowsApplicationException
Example-

substr

FieldDescription
DescriptionSubstring starts at given position and is of given length when given argument is String type or returns the slice of byte array that starts at given position in byte and is of given length when given argument is Binary type
Parameters

arg0: The String column from which substring to be extracted.

arg1: The start position for the substring.

arg2: The end position for the substring.

ReturnsReturns the result substring.
ThrowsApplicationException
Examplesubstring(“foo bar”,4,6) will return “bar”

substring

FieldDescription
DescriptionSubstring starts at given position and is of given length when given argument is String type or returns the slice of byte array that starts at given position in byte and is of given length when given argument is Binary type
Parameters

arg0: The String column from which substring to be extracted.

arg1: The start position for the substring.

arg2: The end position for the substring.

ReturnsReturns the result substring.
ThrowsApplicationException
Examplesubstring(“foo bar”,4,6) will return “bar”

substring_index

FieldDescription
DescriptionComputes the substring from given string before given count occurrences of the given delimiter.
Parameters

arg0: The String column from which substring to be extracted.

arg1: The delimiter.

arg2: The count occurrences for the delimiter.

ReturnsReturns the result substring.
ThrowsApplicationException
Examplesubstring_index(“www.xyz.com”,”.",2) will return “www.xyz”

tan

FieldDescription
DescriptionComputes the tangent of the given value.
Parametersarg0: The column for which tangent to be calculated.
ReturnsReturns the computed tangent.
ThrowsApplicationException
Exampletan(20) will return 2.237160944224742

tanh

FieldDescription
DescriptionComputes the hyperbolic tangent of the given value.
Parametersarg0: The column for which hyperbolic tangent to be calculated.
ReturnsReturns the computed hyperbolic tangent.
ThrowsApplicationException
Exampletanh(20) will return 1.0

timestamp

FieldDescription
DescriptionCasts the value expr to the timestamp type.
Parametersarg0: The string column or expression.
ReturnsReturns the date column or value.
ThrowsApplicationException
ExampleSELECT date(‘2020-06-10 02:12:45’); will return 2020-06-10 02:12:45 as timestamp type.

tinyint

FieldDescription
DescriptionCasts the value expr to the target data type tinyint.
Parametersarg0:The column or expression.
ReturnsReturns the tinyint column or value.
ThrowsApplicationException
ExampleSELECT tinyint(column); will cast column to tinyint

toDegrees

FieldDescription
DescriptionReturns the angle measured in radians to an approximately equivalent angle measured in degrees.
Parametersarg0:A column for which degree to be calculated
ReturnsA double value.
ThrowsApplicationException
Example

We have taken column1 as 3.14159,

toDegrees(@{column.schema.column1}) will return 180


toRadians

FieldDescription
DescriptionReturns the angle measured in degrees to an approximately equivalent angle measured in radians.
Parametersarg0:A column for which radians to be calculated
ReturnsA double value.
ThrowsApplicationException
Example

We have taken column1 as 180,

toRadians(@{column.schema.column1}) will return 3.14159


to_date

FieldDescription
DescriptionParses the date_str expression with the fmt expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt is omitted.
Parameters

arg0: str date_str column column.

arg1:the format string.

Returnsreturns the formatted date.
ThrowsApplicationException
Exampleto_date(‘2016-12-31’, ‘yyyy-MM-dd’) will return 2016-12-31

to_json

FieldDescription
DescriptionReturns a JSON string with a given struct value
Parameters

arg0:struct column.

arg1:additional options map.

ReturnsReturns a JSON string with a given struct value
ThrowsApplicationException
Exampleto_json(named_struct(’time’, to_timestamp(‘2015-08-26’, ‘yyyy-MM-dd’)), map(’timestampFormat’, ‘dd/MM/yyyy’)) will return {“time”:“26/08/2015”}

to_timestamp

FieldDescription
DescriptionParses the timestamp expression with the fmt expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted.
Parameters

arg0:The timestamp column.

arg1:The format string.

ReturnsReturns the formated timestamp.
ThrowsApplicationException
Exampleto_timestamp(‘2016-12-31’, ‘yyyy-MM-dd’) will return 2016-12-31 00:00:00

to_utc_timestamp

FieldDescription
DescriptionGiven a timestamp like ‘2017-07-14 02:40:00.0’, interprets it as a time in the given time zone, and renders that time as a timestamp in UTC
Parameters

arg0:The timestamp column.

arg1:The timezone column.

ReturnsReturns the timestamp.
ThrowsApplicationException
Exampleto_utc_timestamp(‘2016-08-31’, ‘Asia/Seoul’) will return 2016-08-30 15:00:00

transform

FieldDescription
DescriptionTransforms elements in an array using the function.
Parameters

arg0:The array expression.

arg1:The function to apply on each element of the array expression.

ReturnsReturns the transformed array.
ThrowsApplicationException
ExampleSELECT transform(array(1, 2, 3), x -> x + 1); will return [2,3,4]

translate

FieldDescription
DescriptionTranslate any character in the given string by a given character in given replaceString.
Parameters

arg0: The string column in which the translation to be done.

arg1: The matching character.

arg2: The replacement charater.

ReturnsReturns the translated string.
ThrowsApplicationException
Exampletranslate(“The foo bar”,“f”,“t”) will return “The too bar”

trim

FieldDescription
DescriptionRemove the leading and trailing trimStr characters from str
Parameters

arg0:The trimStr String column.

arg1:The str string.

ReturnsReturns the trimed string
ThrowsApplicationException
Exampletrim(‘SL’, ‘SsparkSQLS’) will return parkSQ

trunc

FieldDescription
DescriptionComputes the date truncated to the unit specified by the format.
Parameters

arg0: The date to be truncated.

arg1: The format for truncation.

ReturnsReturns truncated date.
ThrowsApplicationException
Exampletrunc(“2017-12-15”,“YEAR”) will return “2017-01-01”

ucase

FieldDescription
DescriptionConverts str with all characters changed to uppercase.
Parametersarg0:The column or expression.
ReturnsReturns str with all characters changed to uppercase.
ThrowsApplicationException
ExampleSELECT lcase(‘SparkSql’); will return SPARKSQL

unbase64

FieldDescription
DescriptionConverts the argument from a base 64 string str to a binary.
Parametersarg0:The base 64 String column.
ReturnsReturns the unbase64 of string.
ThrowsApplicationException
Exampleunbase64(‘U3BhcmsgU1FM’) will return Spark SQL

unhex

FieldDescription
DescriptionConverts hexadecimal expr to binary.
Parametersarg0:The hexadecimal column.
ReturnsReturns the binary.
ThrowsApplicationException
Exampledecode(unhex(‘537061726B2053514C’), ‘UTF-8’) will return Spark SQL

unix_timestamp

FieldDescription
DescriptionConvert time string with given pattern (refer to the topic, Customizing Formats) to Unix time stamp (in seconds), return null if fail.
Parameters

arg0: The time string to be converted.

arg1: The format of the time string.

ReturnsReturns the converted Unix time stamp (in seconds), return null if fail.
ThrowsApplicationException
Exampleunix_timestamp(“2017-12-15 11:56”,“yyyy-MM-dd hh:mm”) will return 1513339008

upper

FieldDescription
DescriptionConverts a string column to upper case.
Parametersarg0: The string column to be converted o upper case.
ReturnsReturns the converted string column.
ThrowsApplicationException
Exampleupper(“aPPle”) will return “APPLE”

uuid

FieldDescription
DescriptionReturns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character string
Parametersarg0:Not applicable.
ReturnsReturns the uuid.
ThrowsApplicationException
ExampleSELECT uuid(); will return 12ee-419a-ac70-88c948edd439

weekday

FieldDescription
DescriptionExtracts and returns the day of week of the given date/timestamp.
Parametersarg0: The column or expression.
ReturnsReturns the day of week of the date/timestamp.
ThrowsApplicationException
ExampleSELECT day(‘2009-07-30’); will return 3.

weekofyear

FieldDescription
DescriptionExtracts the week number as an integer from a given date/timestamp/string.
Parametersarg0: The date/timestamp/string from which the week to be extracted.
ReturnsReturns the converted Unix time stamp (in seconds), return null if fail.
ThrowsApplicationException
Exampleweekofyear(“2017-12-15 11:02:03”) will return 50

xpath

FieldDescription
DescriptionExtracts and returns a string array of values within the nodes of xml that match the XPath expression.
Parameters

arg0:The string xml.

arg1:The xpath expression.

ReturnsReturns the array of strings.
ThrowsApplicationException
ExampleSELECT xpath(’b1b2b3c1c2’,‘a/b/text()’); will return [‘b1’,‘b2’,‘b3’]

xpath_boolean

FieldDescription
DescriptionReturns true if the XPath expression evaluates to true, or if a matching node is found.
Parameters

arg0:The string xml.

arg1:The xpath expression.

ReturnsReturns true or false
ThrowsApplicationException
ExampleSELECT xpath_boolean(’1’,‘a/b’); will return true

xpath_double

FieldDescription
DescriptionEvaluates given xpath expression and returns double value.
Parameters

arg0:The string xml.

arg1:The xpath expression.

ReturnsReturns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
ThrowsApplicationException
ExampleSELECT xpath_double(’12’, ‘sum(a/b)’); will return 3.0

xpath_float

FieldDescription
DescriptionEvaluates given xpath expression and returns float value.
Parameters

arg0:The string xml.

arg1:The xpath expression.

ReturnsReturns a float value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
ThrowsApplicationException
ExampleSELECT xpath_float(’12’, ‘sum(a/b)’); will return 3.0

xpath_int

FieldDescription
DescriptionEvaluates given xpath expression and returns integer value.
Parameters

arg0:The string xml.

arg1:The xpath expression.

ReturnsReturns a integer value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
ThrowsApplicationException
ExampleSELECT xpath_int(’12’, ‘sum(a/b)’); will return 3

xpath_long

FieldDescription
DescriptionEvaluates given xpath expression and returns long value.
Parameters

arg0:The string xml.

arg1:The xpath expression.

ReturnsReturns a long value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
ThrowsApplicationException
ExampleSELECT xpath_long(’12’, ‘sum(a/b)’); will return 3

xpath_number

FieldDescription
DescriptionEvaluates given xpath expression and returns double value.
Parameters

arg0:The string xml.

arg1:The xpath expression.

ReturnsReturns a double value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
ThrowsApplicationException
ExampleSELECT xpath_number(’12’, ‘sum(a/b)’); will return 3.0

xpath_short

FieldDescription
DescriptionEvaluates given xpath expression and returns short integer value.
Parameters

arg0:The string xml.

arg1:The xpath expression.

ReturnsReturns a short integer value, the value zero if no match is found, or NaN if a match is found but the value is non-numeric.
ThrowsApplicationException
ExampleSELECT xpath_short(’12’, ‘sum(a/b)’); will return 3

xpath_string

FieldDescription
DescriptionExtracts and returns the text contents of the first xml node that matches the XPath expression.
Parameters

arg0:The string xml.

arg1:The xpath expression.

ReturnsReturns the string content.
ThrowsApplicationException
ExampleSELECT xpath_string(’bcc’,‘a/c’); will return cc

year

FieldDescription
DescriptionExtracts the year as an integer from a given date/timestamp/string.
Parametersarg0: The date/timestamp/string from which the year to be extracted.
ReturnsReturns the extracted year as an integer.
ThrowsApplicationException
Exampleyear(“2017-12-15 11:02:03”) will return 2017

zip_with

FieldDescription
DescriptionMerges the two given arrays, element-wise, into a single array using function. If one array is shorter, nulls are appended at the end to match the length of the longer array, before applying function.
Parameters

arg0:The first array.

arg1:The second array.

arg1:The function to apply while merging.

ReturnsReturns the merged array.
ThrowsApplicationException
ExampleSELECT zip_with(array(1, 2, 3), array(‘a’, ‘b’, ‘c’), (x, y) -> (y, x)); will return [{“y”:“a”,“x”:1},{“y”:“b”,“x”:2},{“y”:“c”,“x”:3}]

zulu Time Format

FieldDescription
DescriptionReturns the UTC date of input date column or value in given output date format.
Parameters

Date: String date value or date column.

inputDateFormat: Date format of the input date.

outputDateFormat: Date format of the outdate date.

ReturnsUTC date in output date format in string.
ThrowsApplicationException.
Example

zuluTimeFormat(columnName1,“yyyy-MM-dd HH:mm:ss”,“yyyy/MM/dd HHmmss”) will return UTC date in format “yyyy/MM/dd HHmmss”.

Note: If user is using column name then the column should be of string type.

Top