Miscellaneous Functions

Coalesce

Returns the first non-null argument if exists. Otherwise, null.

Input Parameters:

arg0:columns representing expressions.

Output Type Integer Column Configuration Parameters No Returns Returns the first non-null argument if exists. Otherwise, null. Throws Application Exception

Use Case:

coalesce(Column… arg0)

Example:

coalesce(NULL, 1, NULL) will return 1

from_json

Returns a struct value with the given jsonStr and schema.

Input Parameters:

arg0: The Json string column.

arg1: The schema column.

arg2: The properties map.

Configuration Parameters No Returns Returns the struct value. Throws Application Exception

Use Case:

from_json(Column arg0, Column arg1, Object arg2)

Example:

from_json(‘{“a”:1, “b”:0.8}’, ‘a INT, b DOUBLE’) will return {“a”:1, “b”:0.8} and from_json(‘{“time”:“26/08/2015”}’, ‘time Timestamp’, map(‘timestampFormat’, ’dd/MM/yyyy’))will return {“time”:“2015-08-26 00:00:00.0”}

hash

Returns a hash value of the arguments.

Input Parameters:

arg0:The columns for which hash to be calculated.

Configuration Parameters No Returns Returns a hash value of the arguments. Throws Application Exception

Use Case:

hash(Column… arg0)

Example:

hash(‘Spark’, array(123), 2) will return -1321691492

input_file_name

Returns the name of the file being read, or empty string if not available.

Configuration Parameters No Returns Returns the name of the file being read, or empty string if not available. Throws Application Exception

Use Case:

input_file_name()

Example:

input_file_name() - will return the name of the file being read

map

Creates a map with the given key/value pairs.

Input Parameters:

arg0:The columns for key and value.

Configuration Parameters No Returns Returns the map. Throws Application Exception

Use Case:

map(Column… arg0)

Example:

map(1.0, ‘2’, 3.0, ‘4’) will return {1.0:“2”,3.0:“4”}

map_concat

Returns the union of all the given maps

Input Parameters:

arg0:The map columns.

Configuration Parameters No Returns Returns the union of all the given maps. Throws Application Exception

Use Case:

map_concat(Column… arg0)

Example:

map_concat(map(1, ‘a’, 2, ‘b’), map(2, ‘c’, 3, ‘d’)) will return {1:“a”,2:“c”,3:“d”}

map_from_arrays

Creates a map with a pair of the given key/value arrays. All elements in keys should not be null.

Input Parameters:

arg0:Array of keys.

arg1:Array of values.

Configuration Parameters No Returns Returns the map. Throws Application Exception

Use Case:

map_from_arrays(Column arg0, Column arg1)

Example:

map_from_arrays(array(1.0, 3.0), array(‘2’, ‘4’)) will return {1.0:“2”,3.0:“4”}

map_from_entries

Returns a map created from the given array of entries.

Input Parameters:

arg0:Array of entries.

Configuration Parameters No Returns Returns the map. Throws Application Exception

Use Case:

map_from_entries(Column arg0)

Example:

map_from_entries(array(struct(1, ‘a’), struct(2, ‘b’))) will return {1:“a”,2:“b”}

map_keys

Returns an unordered array containing the keys of the map.

Input Parameters:

arg0:Map column.

Configuration Parameters No Returns Returns the array. Throws Application Exception

Use Case:

map_keys(Column arg0)

Example:

map_keys(map(1, ‘a’, 2, ‘b’)) will return [1,2]

map_values

Returns an unordered array containing the values of the map.

Input Parameters:

arg0:Map column.

Configuration Parameters No Returns Returns the array. Throws Application Exception

Use Case:

map_values(Column arg0)

Example:

map_values(map(1, ‘a’, 2, ‘b’)) will return [“a”,“b”]

nanvl

R Returns expr1 if it’s not NaN, or expr2 otherwise.

Input Parameters:

arg0:expr1 column.

arg1:expr2 column.

Configuration Parameters No Returns Returns expr1 if it’s not NaN, or expr2 otherwise. Throws Application Exception

Use Case:

nanvl(Column arg0, Column arg1)

Example:

nanvl(cast(‘NaN’ as double), 123) will return 123.0

not

Perform logical not of given column.

Input Parameters:

arg0:Given boolean column

Configuration Parameters No Returns Returns logical not of given column. Throws Application Exception

Use Case:

not(Column arg0)

Example:

not(false) will return true

schema_of_json

Returns schema in the DDL format of JSON string.

Input Parameters:

arg0:Given json string column

Configuration Parameters No Returns Returns schema of the json. Throws Application Exception

Use Case:

schema_of_json(Column arg0)

Example:

schema_of_json(‘[{“col”:0}]’) will return array>

sequence

Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions. Supported types are: byte, short, integer, long, date, timestamp. The start and stop expressions should resolve to the same type. If start and stop expressions resolve to the ‘date’ or ‘timestamp’ type then the step expression should resolve to the ‘interval’ type, otherwise to the same type as the start and stop expressions.

Input Parameters:

arg0:start - an expression. The start of the range.

arg1:stop - an expression. The end the range (inclusive).

arg2:step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it’s 1 day and -1 day respectively. If start is greater than stop then the step should be negative, and vice versa.

Configuration Parameters No Returns Returns the sequence Throws Application Exception

Use Case:

sequence(Column arg0, Column arg1, Column arg2)

Example:

sequence(to_date(‘2018-01-01’), to_date(‘2018-03-01’), interval 1 month) will return [2018-01-01,2018-02-01,2018-03-01]

size

Returns the size of an array or a map. The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true. If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input. By default, the spark.sql.legacy.sizeOfNull parameter is set to true.

Input Parameters:

arg0: array or map column.

Configuration Parameters No Returns Returns the size Throws Application Exception

Use Case:

size(Column arg0)

Example:

size(array(‘b’, ‘d’, ‘c’, ‘a’)) will return 4

struct

Creates a struct with the given field values.

Input Parameters:

arg0:columns using which the struct will be created.

Configuration Parameters No Returns Returns the struct column Throws Application Exception

Use Case:

struct(Column… arg0)

to_json

Returns a JSON string with a given struct value.

Input Parameters:

arg0:struct column.

arg1:additional options map.

Configuration Parameters No Returns Returns a JSON string with a given struct value. Throws Application Exception

Use Case:

to_json(Column arg0, Object arg1)

Example:

to_json(named_struct(‘time’, to_timestamp(‘2015-08-26’, ‘yyyy-MM-dd’)), map(‘timestampFormat’, ‘dd/MM/yyyy’)) will return {“time”:“26/08/2015”}

unhex

Converts hexadecimal expr to binary.

Input Parameters:

arg0:The hexadecimal column.

Configuration Parameters No Returns Returns the binary. Throws Application Exception

Use Case:

unhex(Column arg0)

Example:

decode(unhex(‘537061726B2053514C’), ‘UTF-8’) will return Spark SQL

Crc32

Computes a cyclic redundancy check value for string.

Input Parameters: Column arg0

arg0 (required) - A string column.

Output Type Integer Column Configuration Parameters No Returns Bigint value. Throws Application Exception

Use Case:

crc32(Column arg0)

Example:

crc32(“ABC”) will return 2743272264

Isnan

Returns true if the input is Not a Number.

Input Parameters: Column arg0

arg0 (required) - A column whose values needs to be checked.

Output Type Boolean Column Configuration Parameters No Returns A boolean true if the value is not a number. Throws Application Exception

Use Case:

isnan(Column arg0)

Example:

isnan(“abc”) will return true

Isnull

Returns true if “a” is NULL and false otherwise.

Input Parameters: Column arg0

arg0 (required) - A column whose values needs to be checked.

Output Type Boolean Column Configuration Parameters No Returns A boolean true if the value is null Throws Application Exception

Use Case:

isnull(Column arg0)

Example:

isnull(“abc”) will return false

Md5

Calculates an MD5 128-bit checksum for the string.

Input Parameters: Column arg0

arg0 (required) - A string column

Output Type String Column Configuration Parameters No Returns The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. Throws Application Exception

Use Case:

md5(Column arg0)

Example:

md5(‘ABC’) will give you an output ‘902fbdd2b1df0c4f70b4a5d23525e932’.

Monotonically_increasing_id

A column expression that generates monotonically increasing 64-bit integers.

Input Parameters: No input arguments.

Output Type Integer Column Configuration Parameters No Returns Monotonically increasing integers. Throws Application Exception

Use Case:

monotonically_increasing_id()

Example:

monotonically_increasing_id() will return rows as 0,1,2…

Sha1

Calculates the SHA-1 digest for string and returns the value as a hex string

Input Parameters: Column arg0

arg0 (requird) - A string column

Output Type String Column Configuration Parameters No Returns A hex string. Throws Application Exception

Use Case:

sha1(Column arg0)

Example:

sha1(“ABC”) will return ‘3c01bdbb26f358bab27f267924aa2c9a03fcfdb8’

Sha2

Calculates the SHA-2 family of hash functions of a string value and returns the value as a hex string. NumBits controls the number of bits in the message digest.

Input Parameters: Column arg0,int numBits

arg0 (required) - A string column

arg1 (required) - one of 224, 256, 384, or 512.

Output Type String Column Configuration Parameters No Returns A hex String. Throws Application Exception

Use Case:

sha2(Column arg0,int numBits)

Example:

sha2(“Sam”,256) will return ‘4ecde249d747d51d8..’

ShiftLeft

Bitwise left shift, Shifts a b positions to the left.

Input Parameters: Column arg0,int numBits

arg0 (required) - A number column

arg1 (required) - An integer column

Output Type Number Column Configuration Parameters No Returns If the given value is a long value, this function will return a long value else it will return an integer value Throws Application Exception

Use Case:

shiftLeft(Column arg0,int arg1)

Example:

shiftLeft(258,2) will return 1032

ShiftRight

Bitwise right shift, Shifts a b positions to the right.

Input Parameters: Column arg0,int numBits

arg0 (required) - A number column

arg1 (required) - An integer column

Output Type Number Column Configuration Parameters No Returns If the given value is a long value, this function will return a long value else it will return an integer value Throws Application Exception

Use Case:

shiftRight(Column arg0,int arg1)

Example:

shiftRight(258,2) will return 64

ShiftRightUnsigned

Bitwise unsigned right shift, Shifts a b positions to the right.

Input Parameters: Column arg0,int numBits

arg0 (required) - A number column

arg1 (required) - An integer column.

Output Type Number Column Configuration Parameters No Returns If the given value is a long value, this function will return a long value else it will return an integer value Throws Application Exception

Use Case:

shiftRightUnsigned(Column arg0,int arg1)

Example:

shiftRightUnsigned(258,2) will return 64

Top