Gathr provides different kind of functions like Date, Lookup, String, Math, Array and Miscellaneous functions. Using these functions, you can directly apply computations on the dataset columns and get desired result.
The functions takes columns and/or literals as arguments.
In order to pass column “A” of a dataset “D” as a function argument, pass for example
There are some exceptions where you need to pass the column name directly.
Following are the functions that take the column name directly.
In order to pass a “string literal” as argument to function pass it in double quotes for example
In order to pass a “number literal” to a function, pass it as it is.
If we have a dataset named schemaName having a string/date/timestamp column date holding the value 2009-03-01 for a particular row, then
add_months@{column.schemaName.date},2) will return 2009-05-01
All the examples presented in this guide have column values instead of the entire expression i.e. @{column.schemaName.columnName}
add_months@{column.schemaName.date},2) will return 2009-05-01
add_months is a function, @{column.schemaName.date} is an expression and 2009-03-01 is the value of the expression.
add_months(@{column.schemaName.date},2) will be shown as:
A detailed description of these functions is given below:
Enables to perform calculations involving dates.
Calculates the new date by adding the given number of months.
Input Parameters: Column arg0, int arg1.
• arg0 (required) – The string/date/timestamp column to which months are to be added.
• arg1 (required)– Number of months to be added.
add_months("2009-03-01",2) will return "2009-05-01"
Computes the current date as a date column.
Input Parameters: There are no function arguments.
current_date() will return the current date.
If used on 2018-06-15 will return 2018-06-15
Computes the current timestamp as a timestamp column.
Input Parameters: There are no function arguments.
current timestamp() will return the current timestamp.
Adds the specified number of months in ‘arg1’ to a date time value in ‘arg0’.
Input Parameters: Column arg0, int arg1
• arg0 (required) – The string/date/timestamp column to which days to be added.
• arg1 (required)– Number of days to be added.
date_add("2009-03-01",2) will return "2009-03-03".
Extracts a json object from path.
Input Parameters: Column arg0, int arg1
• arg1: The path
get_json_object('{"a":"b"}', '$.a') will return b
If timestamp1 is later than timestamp2, then the result is positive. If timestamp1 and timestamp2 are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false.
Input Parameters: Column arg0, int arg1
• arg2:roundoff boolean
months_between('1997-02-28 10:30:00', '1996-10-30', false) will return 3.9495967741935485
Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTC.
arg1:The timezone column.
to_utc_timestamp('2016-08-31', 'Asia/Seoul') will return 2016-08-30 15:00:00
Returns timestamp ts truncated to the unit specified by the format model fmt. fmt should be one of ["YEAR" "YYYY" "YY" "MON" "MONTH" "MM" "DAY" "DD" "HOUR" "MINUTE" "SECOND" "WEEK" "QUARTER"]
arg1: The ts timestamp
date_trunc('YEAR', '2015-03-05T09:32:05.359') will return 2015-01-01 00:00:00
Returns the day of the week for date/timestamp (1 = Sunday).
arg0: The date column.
Dayofweek('2009-07-30') will return 5
Parses the date_str expression with the fmt expression to a date. Returns null with invalid input. By default, it follows casting rules to a date if the fmt is omitted.
arg0: str date_str column column.
arg1:the format string.
to_date('2016-12-31', 'yyyy-MM-dd') will return 2016-12-31
Parses the timestamp expression with the fmt expression to a timestamp. Returns null with invalid input. By default, it follows casting rules to a timestamp if the fmt is omitted.
arg1:The format string.
to_timestamp('2016-12-31', 'yyyy-MM-dd') will return 2016-12-31 00:00:00
Converts a date/timestamp/string column 'arg0' to a value of string in the format specified by the date format given by the 'arg1'.
Input Parameters: Column arg0, String arg1
• arg0 (required) – The date/timestamp/string column to be converted.
• arg1 (required)– The format to which the date/timestamp/string to be converted.
date_format("2009-03-01","MM-dd-yyyy") will return "03-01-2009".
Computes the date that is 'arg1' days before 'arg0'.
Input Parameters: Column arg0, int arg1
• arg0 (required) – The string/date/timestamp column from which days to be subtracted.
• arg1 (required)– Number of days to be subtracted.
date_format("2009-03-01","MM-dd-yyyy") will return "03-01-2009".
Calculates the difference between the dates 'arg1' and 'arg0' date and returns the number of days.
Input Parameters: Column arg0, Column arg1
• arg0 (required) – The end string/date/timestamp column.
• arg1 (required)– The start string/date/timestamp column.
datediff("2009-03-01","2009-02-27") will return 2.
Extracts the day of the year as an integer from a given date/timestamp/string.
• arg0 (required) – The date/timestamp/string from which the day of year to be extracted.
dayofyear("2017-12-15") will return 349
Convert time string with given pattern to Unix time stamp (in seconds), returns null if fails.
https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html
Input Parameters: Column arg0,Object arg1
• arg0 (required) – The string/date/timestamp column to be converted.
• arg1 (required)– The string/date/timestamp format.
unix_timestamp("2017-12-15 11:56","yyyy-MM-dd hh:mm") will return 1513339008.
Gets current Unix timestamp in seconds.
Input Parameters: There are no function arguments.
default_unix_timestamp() will return current time in long format.
Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone.
Input Parameters: Column arg0, Object arg1
• arg0 (required) – The string/date/timestamp column in UTC.
• arg1 (required) – The timezone to which the timestamp has to converted.
from_utc_timestamp("2017-12-15 11:40",IST) will return "2017-12-15 17:10"
Extracts the hours as an integer from a given date/timestamp/string.
• arg0 (required) – The date/timestamp/string column from which the hours to be extracted..
date_format("2009-03-01","MM-dd-yyyy") will return "03-01-2009".
The date/timestamp/string column, returns the last day of the month which the given date belongs to.
• arg0 (required) – The date/timestamp/string column from which last day of month to be extracted...
last_day("2017-12-15") will return "2017-12-31".
These functions help to look up data beyond the application flow.
It is used to fetch the records from Hbase table based on the following input parameters.
Input parameters: String tableName,String rowID, String columnFamily, int resultRow, String resultColumnName.
• tableName (required): Hbase Table name with name space as string type.
• rowID (required): String representation of unique key for record.
• columnFamily (required): Column family name is in string format.
• resultRow (required): Result row number.
• resultColumnName (required): Result column name.
Configuration Parameters: Below are the configuration parameters of lookupHBase function.
• ConnectionName (required): Hbase cluster connection name.
• RowidCaching (required): Caching will be applied on the basis of row id..
lookupHBase("tableName","rowID","columnFamily",1,"age") will return value of age from row 1 of result for given rowId and columnFamily.
It is used to fetch the record from RDBMS table. The application supports database lookup for Postgres, MySQL and Oracle.
Input Parameters: String sqlQuery, int resultRow, String resultColumnName
• sqlQuery(required) – string type, select sql query specific to database.
• resultRow (required): Result row number.
• resultColumnName (required): Result column name.
• ConnectionName(required)– Complete jdbc URL specific to database.
• FunctionReturnType (required)– Return type of the function based on the result column being queried.
lookupRDBMS(“select * from tableName",1,"age") will return value of age for row 1 of query result.
Returns the result with respect to query from ElasticSearch index, outputs the given column for given row.
Input Parameters: String stringQuery, int resultRow, String resultColumnName
• stringQuery(required) – string Elastic search query.
• resultRow (required): Result row number
• resultColumnName (required): Result column name.
• ConnectionName(required)– Elastic search connection name.
• indexName(required)- Elastic search index name.
• indexType(required)- Elastic search index type.
• functionReturnType (required)– Return type of the function based on the result column being queried.
.
lookupES("{\'query\':{\'match_all\':{}}}",1,"age") will return row 1 and column named age from search result of ES index.
This method is used to get the response of web service. Gathr supports REST based web service.
Input Parameters: There is no input parameter.
• endpoint(required): The end point of URL.
• methodType(required): Web service method like GET, POST, etc.
• Content-type(required): The content type like application/JSON, etc.
• wsparams(required): The web service parameters.
.
Returns the result, with respect to query, from Solr Index.
Input Parameters: String queryString, int resultRow, String resultColumnName
• queryString (required): Solr search query.
• resultRow (required): Result row number.
• resultColumnName (required): Result column name.
• connectionName(required)- Solr connection name.
• indexName(required)- Solr index name.
lookupSOLR(''*:*'',1,”age”) will return you search result from Solr index for row 1 and column age of query result.
Returns the boolean result, with respect to path exits or not, from HDFS.
• hdfsPath (required) – Path on HDFS.
• ConnectionName(required)– HDFS Connection name.
• FunctionReturnType (required)– Return type of the function based on the result column being queried.
lookupHDFSPATH(''hdfsPath'') will return whether the path exists or not on HDFS.
Returns the Base64 encoded result, with respect to row and column-family passed as argument to the function, from HBase table, outputs the given column for given row.
Input parameters: String tableName,String rowID, String columnFamily, int resultRow, String resultColumnName
• tableName (required): Hbase Table name with name space as string type.
• rowID (required): String representation of unique key for record.
• columnFamily (required): Column family name is in string format.
• resultRow (required): Result row number.
• resultColumnName (required): Result column name.
Configuration Parameters: Below are the configuration parameters of lookupHBase function.
• ConnectionName(required)– Hbase Connection name.
• RowIdCaching (required)– Caching will be applied on the basis of row id.
.
lookupHBase64("tableName","rowID","columnFamily",1,"age") will return value of age from row 1 of result for given rowId and columnFamily.
Computes the numeric value of the first character of the string column, and returns the result as an int column.
• arg0(required) - The string column for which the first character's numeric value to be calculated.
ascii(“An apple”) will return 65
Converts the argument from a binary bin to a base 64 string.
arg0: The Column to be converted to base64 string.
base64('Spark SQL') will return U3BhcmsgU1FM
Returns the string representation of the long value expr represented in binary.
arg0: The numerical Column to be converted to represented in binary.
Decodes the first argument using the second argument character set.
arg1: The charset
decode(encode('abc', 'utf-8'), 'utf-8') will return abc
Encodes the first argument using the second argument character set.
encode('abc', 'utf-8') will return abc
Removes the leading string contains the characters from the trim string.
arg0:the trim string characters to trim, the default value is a single space.
ltrim('Sp','SsparkSQLS')will return ArkSQLS
Returns Soundex code of the string.
soundex('Miller') will return M460
Splits str around occurrences that match regex
split('oneAtwoBthreeC', '[ABC]')will return ["one","two","three",""]
Remove the leading and trailing trimStr characters from str
arg0:The trimStr String column.
trim('SL', 'SsparkSQLS') will return parkSQ
Converts the argument from a base 64 string str to a binary.
arg0:The base 64 String column
unbase64('U3BhcmsgU1FM') will return Spark SQL
Concatenates multiple input string columns together into a single string column.
Input Parameters: Column... arg0
arg0 (required) - The String columns to be concatenated.
concat("format","string") will return "formatstring"
Concatenates multiple input string columns together into a single string column, using the given separator.
Input Parameters: Object arg0, Column... arg1
• arg0(required) - The separator to be used.
• arg1 (required) - The String columns to be concatenated.
concat_ws("-","format","string") will return "format-string".
Parses the expression string into the column that it represents.
• arg0 (required) - The expression string to be parsed
expr("colA * 3") will return 6 if the value of column 'colA' is 2 for particular row.
Formats the arguments in printf-style and returns the result as a string column.
Input Parameters: Object arg0, Column arg1
• arg0 (required) - The printf style format.
• arg1 (required) - The columns to be formatted.
format_string("the %s jumped over the %s, %d times","cow","moon",2) will return "the cow jumped over the moon 2 times".
Computes a new string column by converting the first letter of each word to uppercase.
• arg0 (required) - The input string column
initcap("apple") will return "Apple")
Locate the position of the first occurrence of given substring in the given string column.
Input Parameters: Column arg0, Object arg1
• arg0 (required) - The string column in which the location to be determined.
• arg1 (required) - The substring for which the position to be determined.
instr("apple","le") will return 4
Computes the length of a given string or binary column.
arg0 (required) - The string column for which the length to be determined.
Computes the Levenshtein distance of the two given string columns.
Input Parameters: Column arg0, Column arg1
• arg0 (required) - The first string column from which the Levenshtein distance from the second string column to be determined.
• arg1 (required) - The second string column
levenshtein("kitten", "sitting") will return 3
Locate the position of the first occurrence of given substring in a string column, after the given position.
Input Parameters: Object arg0, Column arg1, int arg2
• arg0 (required) - The String for which the location to be determined.
• arg1 (required) - The String column in which the location to be determined.
• arg2 (required) - The position after which the location to be determined
locate("apple","An apple",1) will return 3
Converts a string column to lower case.
• arg0 (required) - The string column to be converted to lower case
lower("Apple") will return "apple"
Left-pad the string column with the given string, to a given length.
Input Parameters: Column arg0, int arg1, Object arg2
• arg0 (required) - The string column to be left-padded.
• arg1 (required) - The length for the padding.
• arg2 (required) - The string to use for left-pad.
lpad("SQL Tutorial", 20, "ABC") will return "ABCABCABSQL Tutorial"
Trim the spaces from left end for the specified string column.
• arg0 (required) - The string column from which left spaces to be trimmed.
ltrim(“apple”) will return “apple”
Extracts a specific group matched by a Java regex, from the specified string column.
Arguments: Column arg0, Object arg1, int arg2
• arg0 (required) - The string column from which the group to be extracted.
• arg1 (required) - The string specifying the regex.
• arg2 (required) - The regex group id.
regexp_extract("foothebar","foo(.*?)(bar)", 2) will return "bar"
Replace all substrings of the specified string column that match regexp with the given replacement.
Input Parameters: Column arg0, Object arg1, Object arg2
arg0 (required) - The string column from which substrings to be replaced.
arg1 (required) - The pattern to be used.
arg2 (required) - The replacement.
regexp_replace("foobar", "oo|ar", "") will return "fb"
arg0 (required) - The column name to be renamed.
rename_column("age") will rename "age" to given output column name.
Repeats each value in 'arg0' column 'arg1' times.
Input Parameters: Column arg0,int arg1
arg0 (required) - A column needs to be repeated
arg1 (required) - Integer value representing no of times arg0 is to be repeated.
repeat("str",2) will return 'strstr'
Reverses the string column and returns it as a new string column.
arg0 (required) - The string column to be reversed.
reverse("apple") will return "elppa"
Right-pad the string column with the given string, to a given length.
Input Parameters: Column arg0, int arg1, Object arg2
• arg0 (required) - The string column to be right-padded.
• arg1 (required) - The length for the padding.
• arg2 (required) - The string to use for right-pad.
rpad("SQL Tutorial", 20, "ABC") will return "SQL TutorialABCABCAB"
Trim the spaces from right end for the specified string column.
arg0 (required) - The string column from which right spaces to be trimmed.
rtrim("apple ") will return "apple"
Substring starts at given position and is of given length when given argument is String type or returns the slice of byte array that starts at given position in byte and is of given length when given argument is Binary type.
Input Parameters: Column arg0, int arg1, int arg2
• arg0 (required) - The String column from which substring to be extracted.
• arg1 (required) - The start position for the substring.
• arg2 (required) - The end position for the substring.
substring("foo bar",4,6) will return "bar"
Computes the substring from given string column before given count occurrences of the given delimiter.
Input Parameters: Column arg0, Object arg1, int arg2
• arg0 (required) - The String column from which substring to be extracted.
• arg1 (required) - The delimiter.
• arg2 (required) - The count occurrences for the delimiter.
substring_index("www.xyz.com",".",2) will return www.xyz
Translate any character in the given string by a given character in given replaceString.
Input Parameters: Column arg0, Object arg1, Object arg2
• arg0 (required) - The string column in which the translation to be done.
• arg1 (required) - The matching character.
• arg2 (required) - The replacement character.
translate("The foo bar","f","t") will return "The too bar"
Trim the spaces from both ends for the specified string column.
arg0 (required) - The string column from which spaces to be trimmed.
trim(" An apple ") will return "An apple"
Converts a string column to upper case.
arg0 (required) - The string column to be converted to upper case.
upper("aPPle") will return "APPLE"
arg0 (required) - The column for which absolute value to be calculated.
Computes the cosine inverse of the given value; the returned angle is in the range 0.0 through pi.
arg0 (required) - The column for which cosine inverse to be calculated.
acos(0.45) will return 1.104031001096478
Computes the sine inverse of the given value; the returned angle is in the range -pi/2 through pi/2.
arg0 (required) - The column for which sine inverse to be calculated.
asin(0.45) will return 0.4667653256984187
Computes the tangent inverse of the given value.
arg0 (required) - The column for which tangent inverse to be calculated.
atan(0.45) will return 0.42285391621948626
Computes the angle theta from the conversion of rectangular coordinates (arg0, arg1) to polar coordinates (arg1, theta).
Input Parameters: Column arg0, Column arg1
• arg0 (required) - The column for the x rectangular coordinate.
• arg1 (required) - The column for the y rectangular coordinate.
atan2(12, 71.21) will return 1.403849169952035
Computes the angle theta from the conversion of rectangular coordinates (arg0, arg1) to polar coordinates (arg1, theta).
Input Parameters: double arg0, Column arg1
• arg0 (required) - The x rectangular coordinate.
• arg1 (required) - The column for the y rectangular coordinate.
atan2_left_arg_double(12, 71.21) will return 1.403849169952035
Computes the angle theta from the conversion of rectangular coordinates (arg0, arg1) to polar coordinates (arg1, theta).
Input Parameters: Column arg0, double arg1
• arg0 (required) - The column for the x rectangular coordinate.
• arg1 (required) - The y rectangular coordinate.
atan2_right_arg_double(12, 71.21) will return 1.403849169952035
Computes the value of the column arg0 rounded to 0 decimal places with HALF_EVEN round mode.
arg0 (required) - The column for which value rounded to 0 decimal places with HALF_EVEN round mode to be calculated
bround(71.21) will return 71.0
Round the value of arg0 to arg1 decimal places with HALF_EVEN round mode if arg1 is greater than or equal to 0 or at integral part when arg1 is less than 0.
Input Parameters: Column arg0, int arg1
• arg0 (required) - The column for which value rounded to 0 decimal places with HALF_EVEN round mode to be calculated.
• arg1 (required) - The scale to use
bround_with_scale(71.21, 1) will return 71.2
Computes the cube-root of the given column.
arg0 (required) - The column for which cube-root to be calculated.
cbrt(80.89) will return 4.324789202233814
Computes the ceiling of the given column.
• arg0 (required) - The column for which ceiling to be calculated.
ceil(77.76) will return 4.2682720044742055
Converts a number column from a given base to another.
Input Parameters: Column arg0,Column args1,Column args2
• arg0 (required) - A number/String number column
• arg1 (required) - Integer column of base from which a number is to be converted
• arg2 (required) - Integer column of base to which a number is to be converted.
conv(258,10,2) will return 100000010
Computes the cosine of the given value.
• arg0 (required) - The column for which cosine to be calculated.
cos(76.56) will return 0.3977126102073901
Computes the hyperbolic cosine of the given value.
• arg0 (required) -The column for which hyperbolic cosine to be calculated.
cos(76.56) will return 0.3977126102073901
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
• arg0 (required) -The column for which the equivalent angle measured in degrees to be calculated.
degrees(71.21) will return 4080.0324066707394
Computes the exponential of the given value.
• arg0 (required) -The column for which exponential to be calculated.
exp(0.78) will return 2.18147220308578
Computes the exponential of the given value minus one.
• arg0 (required) -The column for which exponential minus one to be calculated.
expm1(0.23) will return 0.2586000151807663
Computes the factorial of the given value.
• arg0 (required) -The column for which factorial to be calculated.
factorial(11) will return 39916800
Computes the floor of the given column.
• arg0 (required) -The column for which floor to be calculated.
Formats numeric column arg0 to a format like '#,###,###.##', rounded to arg1 decimal places, and returns the result as a string column.
Input Parameters: Column arg0, int arg1
• arg0 (required) -The column to be formated.
• arg1 (required) -The integer specifying the decimal places to be used for rounding.
format_number(7120.12, 1) will return 7,120.1
It gives the greatest value of the list of values. This function takes at least 2 parameters.
Input Parameters: Column arg0,Column args1,...
• arg0 (required) -A column from the schema
• arg1 (required) -A column from the schema
greatest(258,259) will return 259
If the argument is an INT, hex returns the number as a STRING in hexadecimal format. Otherwise if the number is a STRING, it converts each character into its hexadecimal representation and returns the resulting STRING.
• arg0 (required) -A int/string column
Computes sqrt(arg0^2^ + arg1^2^) without intermediate overflow or underflow.
Input Parameters: Column arg0, Column arg1
• arg0 (required) -Will be used while computing sqrt.
• arg1 (required) -Will be used while computing sqrt.
hypot(71.21, 10.5) will return 71.97995533209642
Computes sqrt(arg0^2^ + arg1^2^) without intermediate overflow or underflow.
Input Parameters: double arg0, Column arg1
• arg0 (required) -Will be used while computing sqrt.
• arg1 (required) -Will be used while computing sqrt.
hypot_left_arg_double(71.21, 10.5) will return 71.97995533209642
Computes sqrt(arg0^2^ + arg1^2^) without intermediate overflow or underflow.
Input Parameters: Column arg0, double arg1
• arg0 (required) -Will be used while computing sqrt.
• arg1 (required) -Will be used while computing sqrt.
hypot_right_arg_double(71.21, 10.5) will return 71.97995533209642
It gives the least value of the list of values. This function takes at least 2 parameters.
Input Parameters: Column arg0,Column args1,...
• arg0 (required) - A column from the schema.
• arg1 (required) - A column from the schema.
least(258,259) will return 259
Computes the natural logarithm of the given value.
• arg0 (required) - The column for which natural logarithm to be calculated.
log(20) will return 2.995732273553991
Computes the natural logarithm of the given value plus one.
• arg0 (required) - The column for which natural logarithm plus one to be calculated.
log1p(20) will return 3.044522437723423
Computes the first argument-base logarithm of the second argument.
Input Parameters: double arg0, Column arg1
• arg0 (required) - The base to used.
• arg1 (required) - The column for which logarithm to be calculated.
log_with_base(10, 20) will return 1.301029995663981
Computes the Unary minus, i.e. -(arg0)
• arg0 (required) - The column for which unary minus to be calculated.
Computes the positive value of arg0 mod arg1.
Input Parameters: Column arg0, Column arg1
• arg0 (required) - The dividend.
• arg1 (required) - The divisor.
pmod(19, 0.78) will return 0.2800007
Computes the value of the first argument raised to the power of the second argument.
Input Parameters: Column arg0, Column arg1
• arg1 (required) - The exponent.
Computes the value of the first argument raised to the power of the second argument.
Input Parameters: double arg0, Column arg1
• arg1 (required) - The exponent
pow_left_arg_double(20, 2) will return 400
Computes the value of the first argument raised to the power of the second argument.
Input Parameters: Column arg0, double arg1
• arg1 (required) - The exponent
pow_right_arg_double(20, 2) will return 400
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
• arg0 (required) - The column for which equivalent angle measured in radians to be calculated.
radians(20) will return 0.3490658503988659
Generate a random column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.
Input Parameters: No input arguments.
rand() will return a new column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.
Generate a random column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.
rand_with_seed(2) will return a new column with independent and identically distributed (i.i.d.) samples from 0.0 to 1.0.
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
Input Parameters: No input arguments.
randn() will return a new column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
Generate a column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
• arg0 (required) - The seed
randn_with_seed(2) will return a new column with independent and identically distributed (i.i.d.) samples from the standard normal distribution.
Computes the double value that is closest in value to the argument and is equal to a mathematical integer.
• arg0 (required) - The column for which double value to be calculated.
Computes the value of the column arg0 rounded to 0 decimal places.
arg0 (required) - The column for which value rounded to 0 decimal places to be calculated.
Computes the value of arg0 to scale decimal places if arg1 is greater than or equal to 0 or at integral part when arg1 is less than 0.
Input Parameters: Column arg0, int arg1
• arg0 (required) - The column for which value rounded to 0 decimal places to be calculated.
• arg0 (required) - The seed
round_with_scale(80.89,1) will return 80.9
Computes the signum of the given value.
• arg0 (required) - The column for which signum to be calculated.
Computes the sine of the given value.
• arg0 (required) - The column for which sine to be calculated.
sin(20) will return 0.9129452507276277
Computes the hyperbolic sine of the given value.
• arg0 (required) - The column for which sine to be calculated.
sinh(20) will return 2.4258259770489514E8
Computes the square root of the specified float value.
• arg0 (required) - The column for which square root to be calculated.
sqlSqrt(20) will return 4.47213595499958
Computes the tangent of the given value.
arg0 (required) - The column for which tangent to be calculated.
tan(20) will return 2.237160944224742
Computes the hyperbolic tangent of the given value.
• arg0 (required) - The column for which hyperbolic tangent to be calculated.
Computes the angle measured in radians to an approximately equivalent angle measured in degrees.
• arg0 (required) - The column for which degree to be calculated
toDegrees(3.14159) will return 180
Computes the angle measured in degrees to an approximately equivalent angle measured in radians.
• arg0 (required) - A column for which radians to be calculated.
toRadians(180) will return 3.14159
Returns an array with the given elements.
arg0: The given columns to create array column.
array(1, 2, 3) will return [1,2,3]
Returns an array of the elements in the union of array1 and array2, without duplicates.
arg1: The second array column.
array_union(array(1, 2, 3), array(1, 3, 5)) will return [1,2,3,5]
Removes duplicate values from the array.
arg0: The given array column.
array_distinct(array(1, 2, 3, null, 3)) will return [1,2,3,null]
Returns an array of the elements in array1 but not in array2, without duplicates.
arg1: Second array column.
array_except(array(1, 2, 3), array(1, 3, 5)) will return [2]
Performs intersection of array1 and array2, without duplicates.
arg1: Second array column.
array_intersect(array(1, 2, 3), array(1, 3, 5)) will return [1,3]
Concatenates the elements of the given array using the delimiter and an optional string to replace nulls. If no value is set for nullReplacement, any null value is filtered.
arg2: nullReplacement.
array_join(array('hello', null ,'world'), ' ', ',') will return hello , world
Returns the maximum value in the array. NULL elements are skipped.
arg0: The array column.
array_max(array(1, 20, null, 3)) will return 20
Returns the minimum value in the array. NULL elements are skipped.
arg0: The array column.
array_min(array(1, 20, null, 3)) will return 1
Returns the (1-based) index of the first element of the array as long.
array_position(array(3, 2, 1), 1) will return 3
Remove all elements that equal to element from array.
array_remove(array(1, 2, 3, null, 3), 3) will return [1,2,null]
Returns the array containing element count times.
array_repeat('123', 2) will return ["123","123"]
Sorts the input array in ascending order. The elements of the input array must be order-able. Null elements will be placed at the end of the returned array.
arg0: The array column.
array_sort(array('b', 'd', null, 'c', 'a')) will return ["a","b","c","d",null]
Returns an array of the elements in the union of array1 and array2, without duplicates.
arg1: The second array column.
array_union(array(1, 2, 3), array(1, 3, 5)) will return [1,2,3,5]
Returns true if a1 contains at least a non-null element present also in a2. If the arrays have no common element and they are both non-empty and either of them contains a null element null is returned, false otherwise.
arg1: The second array column.
arrays_overlap(array(1, 2, 3), array(3, 4, 5)) will return true.
Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
arg0: The Columns to be zipped. Returns a merged array of structs in which the N-th struct contains all N-th values of input arrays.
arrays_zip(array(1, 2, 3), array(2, 3, 4)) Will return [{"0":1,"1":2},{"0":2,"1":3},{"0":3,"1":4}]
Returns TRUE if the array contains value.
Input Parameters: Column arg0, Object arg1
• arg0 (required) - An array column.
• arg1 (required) - A value to be checked.
array_contains(["black","red"] ,"red") will return true
Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns.
arg0: The expr column.
explode(array(10, 20)) will return 10, 20 in a new column.
Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns.
explode_outer(array(10, 20)) will return 10, 20.
Computes bitwise NOT of the given integer.
• arg0 (required) - An integer or a double column.
Returns the first non-null argument if exists. Otherwise, null.
arg0:columns representing expressions.
coalesce(NULL, 1, NULL) will return 1
Returns a struct value with the given jsonStr and schema.
from_json('{"a":1, "b":0.8}', 'a INT, b DOUBLE') will return {"a":1, "b":0.8} and from_json('{"time":"26/08/2015"}', 'time Timestamp', map('timestampFormat', 'dd/MM/yyyy'))will return {"time":"2015-08-26 00:00:00.0"}
Returns a hash value of the arguments.
arg0:The columns for which hash to be calculated.
hash('Spark', array(123), 2) will return -1321691492
Returns the name of the file being read, or empty string if not available.
input_file_name() - will return the name of the file being read
Creates a map with the given key/value pairs.
arg0:The columns for key and value.
map(1.0, '2', 3.0, '4') will return {1.0:"2",3.0:"4"}
Returns the union of all the given maps
map_concat(map(1, 'a', 2, 'b'), map(2, 'c', 3, 'd')) will return {1:"a",2:"c",3:"d"}
Creates a map with a pair of the given key/value arrays. All elements in keys should not be null.
map_from_arrays(array(1.0, 3.0), array('2', '4')) will return {1.0:"2",3.0:"4"}
Returns a map created from the given array of entries.
map_from_entries(array(struct(1, 'a'), struct(2, 'b'))) will return {1:"a",2:"b"}
Returns an unordered array containing the keys of the map.
arg0:Map column.
map_keys(map(1, 'a', 2, 'b')) will return [1,2]
Returns an unordered array containing the values of the map.
map_values(map(1, 'a', 2, 'b')) will return ["a","b"]
R Returns expr1 if it's not NaN, or expr2 otherwise.
nanvl(cast('NaN' as double), 123) will return 123.0
Perform logical not of given column.
Returns schema in the DDL format of JSON string.
schema_of_json('[{"col":0}]') will return array>
Generates an array of elements from start to stop (inclusive), incrementing by step. The type of the returned elements is the same as the type of argument expressions.Supported types are: byte, short, integer, long, date, timestamp.The start and stop expressions must resolve to the same type. If start and stop expressions resolve to the 'date' or 'timestamp' type then the step expression must resolve to the 'interval' type, otherwise to the same type as the start and stop expressions.
arg0:start - an expression. The start of the range.
arg1:stop - an expression. The end the range (inclusive).
arg2:step - an optional expression. The step of the range. By default step is 1 if start is less than or equal to stop, otherwise -1. For the temporal sequences it's 1 day and -1 day respectively. If start is greater than stop then the step must be negative, and vice versa.
sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 month) will return [2018-01-01,2018-02-01,2018-03-01]
Returns the size of an array or a map. The function returns -1 if its input is null and spark.sql.legacy.sizeOfNull is set to true. If spark.sql.legacy.sizeOfNull is set to false, the function returns null for null input. By default, the spark.sql.legacy.sizeOfNull parameter is set to true.
arg0: array or map column.
size(array('b', 'd', 'c', 'a')) will return 4
Creates a struct with the given field values.
arg0:columns using which the struct will be created.
Returns a JSON string with a given struct value.
arg1:additional options map.
to_json(named_struct('time', to_timestamp('2015-08-26', 'yyyy-MM-dd')), map('timestampFormat', 'dd/MM/yyyy')) will return {"time":"26/08/2015"}
Converts hexadecimal expr to binary.
arg0:The hexadecimal column.
decode(unhex('537061726B2053514C'), 'UTF-8') will return Spark SQL
Computes a cyclic redundancy check value for string.
• arg0 (required) - A string column.
crc32("ABC") will return 2743272264
Returns true if the input is Not a Number.
• arg0 (required) - A column whose values needs to be checked.
Returns true if “a” is NULL and false otherwise.
• arg0 (required) - A column whose values needs to be checked.
isnull("abc") will return false
Creates a Column of literal value.
• arg0 (required) - A constant value.
lit(1) will return a column having 1 in all the rows of the column.
Calculates an MD5 128-bit checksum for the string.
• arg0 (required) - A string column
md5('ABC') will give you an output '902fbdd2b1df0c4f70b4a5d23525e932'.
A column expression that generates monotonically increasing 64-bit integers.
Input Parameters: No input arguments.
monotonically_increasing_id() will return rows as 0,1,2...
Calculates the SHA-1 digest for string and returns the value as a hex string
• arg0 (requird) - A string column
sha1("ABC") will return '3c01bdbb26f358bab27f267924aa2c9a03fcfdb8'
Calculates the SHA-2 family of hash functions of a string value and returns the value as a hex string. NumBits controls the number of bits in the message digest.
Input Parameters: Column arg0,int numBits
• arg0 (required) - A string column
• arg1 (required) - one of 224, 256, 384, or 512.
sha2("Sam",256) will return '4ecde249d747d51d8..'
Bitwise left shift, Shifts a b positions to the left.
Input Parameters: Column arg0,int numBits
• arg0 (required) - A number column
• arg1 (required) - An integer column If the given value is a long value, this function will return a long value else it will return an integer value
shiftLeft(258,2) will return 1032
Bitwise right shift, Shifts a b positions to the right.
Input Parameters: Column arg0,int numBits
• arg0 (required) - A number column
• arg1 (required) - An integer column If the given value is a long value, this function will return a long value else it will return an integer value
shiftRight(258,2) will return 64
Bitwise unsigned right shift, Shifts a b positions to the right.
Input Parameters: Column arg0,int numBits
• arg0 (required) - A number column
• arg1 (required) - An integer column. If the given value is a long value, this function will return a long value else it will return an integer value