Miscellaneous Functions
Coalesce
Returns the first non-null argument if exists. Otherwise, null.
Input Parameters:
arg0:columns representing expressions.
Output Type Integer Column Configuration Parameters No Returns Returns the first non-null argument if exists. Otherwise, null. Throws Application Exception
Use Case:
coalesce(Column… arg0)
Example:
coalesce(NULL, 1, NULL) will return 1
from_json
Returns a struct value with the given jsonStr and schema.
Input Parameters:
arg0: The Json string column.
arg1: The schema column.
arg2: The properties map.
Configuration Parameters No Returns Returns the struct value. Throws Application Exception
Use Case:
from_json(Column arg0, Column arg1, Object arg2)
Example:
from_json(‘{“a”:1, “b”:0.8}’, ‘a INT, b DOUBLE’) will return {“a”:1, “b”:0.8} and from_json(‘{“time”:“26/08/2015”}’, ‘time Timestamp’, map(‘timestampFormat’, ’dd/MM/yyyy’))will return {“time”:“2015-08-26 00:00:00.0”}
hash
Returns a hash value of the arguments.
Input Parameters:
arg0:The columns for which hash to be calculated.
Configuration Parameters No Returns Returns a hash value of the arguments. Throws Application Exception
Use Case:
hash(Column… arg0)
Example:
hash(‘Spark’, array(123), 2) will return -1321691492
input_file_name
Returns the name of the file being read, or empty string if not available.
Configuration Parameters No Returns Returns the name of the file being read, or empty string if not available. Throws Application Exception
Use Case:
input_file_name()
Example:
input_file_name() - will return the name of the file being read
map
Creates a map with the given key/value pairs.
Input Parameters:
arg0:The columns for key and value.
Configuration Parameters No Returns Returns the map. Throws Application Exception
Use Case:
map(Column… arg0)
Example:
map(1.0, ‘2’, 3.0, ‘4’) will return {1.0:“2”,3.0:“4”}
map_concat
Returns the union of all the given maps
Input Parameters:
arg0:The map columns.
Configuration Parameters No Returns Returns the union of all the given maps. Throws Application Exception
Use Case:
map_concat(Column… arg0)
Example:
map_concat(map(1, ‘a’, 2, ‘b’), map(2, ‘c’, 3, ‘d’)) will return {1:“a”,2:“c”,3:“d”}
map_from_arrays
Creates a map with a pair of the given key/value arrays. All elements in keys should not be null.
Input Parameters:
arg0:Array of keys.
arg1:Array of values.
Configuration Parameters No Returns Returns the map. Throws Application Exception
Use Case:
map_from_arrays(Column arg0, Column arg1)
Example:
map_from_arrays(array(1.0, 3.0), array(‘2’, ‘4’)) will return {1.0:“2”,3.0:“4”}
map_from_entries
Returns a map created from the given array of entries.
Input Parameters:
arg0:Array of entries.
Configuration Parameters No Returns Returns the map. Throws Application Exception
Use Case:
map_from_entries(Column arg0)
Example:
map_from_entries(array(struct(1, ‘a’), struct(2, ‘b’))) will return {1:“a”,2:“b”}
map_keys
Returns an unordered array containing the keys of the map.
Input Parameters:
arg0:Map column.
Configuration Parameters No Returns Returns the array. Throws Application Exception
Use Case:
map_keys(Column arg0)
Example:
map_keys(map(1, ‘a’, 2, ‘b’)) will return [1,2]
map_values
Returns an unordered array containing the values of the map.
Input Parameters:
arg0:Map column.
Configuration Parameters No Returns Returns the array. Throws Application Exception
Use Case:
map_values(Column arg0)
Example:
map_values(map(1, ‘a’, 2, ‘b’)) will return [“a”,“b”]
nanvl
R Returns expr1 if it’s not NaN, or expr2 otherwise.
Input Parameters:
arg0:expr1 column.
arg1:expr2 column.
Configuration Parameters No Returns Returns expr1 if it’s not NaN, or expr2 otherwise. Throws Application Exception
Use Case:
nanvl(Column arg0, Column arg1)
Example:
nanvl(cast(‘NaN’ as double), 123) will return 123.0
not
Perform logical not of given column.
Input Parameters:
arg0:Given boolean column
Configuration Parameters No Returns Returns logical not of given column. Throws Application Exception
Use Case:
not(Column arg0)
Example:
not(false) will return true
schema_of_json
Returns schema in the DDL format of JSON string.
Input Parameters:
arg0:Given json string column
Configuration Parameters No Returns Returns schema of the json. Throws Application Exception
Use Case:
schema_of_json(Column arg0)
Example:
schema_of_json(‘[{“col”:0}]’) will return array>
sequence
Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions. Supported types are: byte, short, integer, long, date, timestamp. The start and stop expressions should resolve to the same type. If start and stop expressions resolve to the ‘date’ or ‘timestamp’ type then the step expression should resolve to the ‘interval’ type, otherwise to the same type as the start and stop expressions.
Input Parameters:
arg0:start - an expression. The start of the range.
arg1:stop - an expression. The end the range (inclusive).
arg2:step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it’s 1 day and -1 day respectively. If start is greater than stop then the step should be negative, and vice versa.
Configuration Parameters No Returns Returns the sequence Throws Application Exception
Use Case:
sequence(Column arg0, Column arg1, Column arg2)
Example:
sequence(to_date(‘2018-01-01’), to_date(‘2018-03-01’), interval 1 month) will return [2018-01-01,2018-02-01,2018-03-01]
size
Returns the size of an array or a map. The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true. If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input. By default, the spark.sql.legacy.sizeOfNull parameter is set to true.
Input Parameters:
arg0: array or map column.
Configuration Parameters No Returns Returns the size Throws Application Exception
Use Case:
size(Column arg0)
Example:
size(array(‘b’, ‘d’, ‘c’, ‘a’)) will return 4
struct
Creates a struct with the given field values.
Input Parameters:
arg0:columns using which the struct will be created.
Configuration Parameters No Returns Returns the struct column Throws Application Exception
Use Case:
struct(Column… arg0)
to_json
Returns a JSON string with a given struct value.
Input Parameters:
arg0:struct column.
arg1:additional options map.
Configuration Parameters No Returns Returns a JSON string with a given struct value. Throws Application Exception
Use Case:
to_json(Column arg0, Object arg1)
Example:
to_json(named_struct(‘time’, to_timestamp(‘2015-08-26’, ‘yyyy-MM-dd’)), map(‘timestampFormat’, ‘dd/MM/yyyy’)) will return {“time”:“26/08/2015”}
unhex
Converts hexadecimal expr to binary.
Input Parameters:
arg0:The hexadecimal column.
Configuration Parameters No Returns Returns the binary. Throws Application Exception
Use Case:
unhex(Column arg0)
Example:
decode(unhex(‘537061726B2053514C’), ‘UTF-8’) will return Spark SQL
Crc32
Computes a cyclic redundancy check value for string.
Input Parameters: Column arg0
arg0 (required) - A string column.
Output Type Integer Column Configuration Parameters No Returns Bigint value. Throws Application Exception
Use Case:
crc32(Column arg0)
Example:
crc32(“ABC”) will return 2743272264
Isnan
Returns true if the input is Not a Number.
Input Parameters: Column arg0
arg0 (required) - A column whose values needs to be checked.
Output Type Boolean Column Configuration Parameters No Returns A boolean true if the value is not a number. Throws Application Exception
Use Case:
isnan(Column arg0)
Example:
isnan(“abc”) will return true
Isnull
Returns true if “a” is NULL and false otherwise.
Input Parameters: Column arg0
arg0 (required) - A column whose values needs to be checked.
Output Type Boolean Column Configuration Parameters No Returns A boolean true if the value is null Throws Application Exception
Use Case:
isnull(Column arg0)
Example:
isnull(“abc”) will return false
Md5
Calculates an MD5 128-bit checksum for the string.
Input Parameters: Column arg0
arg0 (required) - A string column
Output Type String Column Configuration Parameters No Returns The value is returned as a string of 32 hex digits, or NULL if the argument was NULL. Throws Application Exception
Use Case:
md5(Column arg0)
Example:
md5(‘ABC’) will give you an output ‘902fbdd2b1df0c4f70b4a5d23525e932’.
Monotonically_increasing_id
A column expression that generates monotonically increasing 64-bit integers.
Input Parameters: No input arguments.
Output Type Integer Column Configuration Parameters No Returns Monotonically increasing integers. Throws Application Exception
Use Case:
monotonically_increasing_id()
Example:
monotonically_increasing_id() will return rows as 0,1,2…
Sha1
Calculates the SHA-1 digest for string and returns the value as a hex string
Input Parameters: Column arg0
arg0 (requird) - A string column
Output Type String Column Configuration Parameters No Returns A hex string. Throws Application Exception
Use Case:
sha1(Column arg0)
Example:
sha1(“ABC”) will return ‘3c01bdbb26f358bab27f267924aa2c9a03fcfdb8’
Sha2
Calculates the SHA-2 family of hash functions of a string value and returns the value as a hex string. NumBits controls the number of bits in the message digest.
Input Parameters: Column arg0,int numBits
arg0 (required) - A string column
arg1 (required) - one of 224, 256, 384, or 512.
Output Type String Column Configuration Parameters No Returns A hex String. Throws Application Exception
Use Case:
sha2(Column arg0,int numBits)
Example:
sha2(“Sam”,256) will return ‘4ecde249d747d51d8..’
ShiftLeft
Bitwise left shift, Shifts a b positions to the left.
Input Parameters: Column arg0,int numBits
arg0 (required) - A number column
arg1 (required) - An integer column
Output Type Number Column Configuration Parameters No Returns If the given value is a long value, this function will return a long value else it will return an integer value Throws Application Exception
Use Case:
shiftLeft(Column arg0,int arg1)
Example:
shiftLeft(258,2) will return 1032
ShiftRight
Bitwise right shift, Shifts a b positions to the right.
Input Parameters: Column arg0,int numBits
arg0 (required) - A number column
arg1 (required) - An integer column
Output Type Number Column Configuration Parameters No Returns If the given value is a long value, this function will return a long value else it will return an integer value Throws Application Exception
Use Case:
shiftRight(Column arg0,int arg1)
Example:
shiftRight(258,2) will return 64
ShiftRightUnsigned
Bitwise unsigned right shift, Shifts a b positions to the right.
Input Parameters: Column arg0,int numBits
arg0 (required) - A number column
arg1 (required) - An integer column.
Output Type Number Column Configuration Parameters No Returns If the given value is a long value, this function will return a long value else it will return an integer value Throws Application Exception
Use Case:
shiftRightUnsigned(Column arg0,int arg1)
Example:
shiftRightUnsigned(258,2) will return 64
If you have any feedback on Gathr documentation, please email us!