String Functions
Ascii
Computes the numeric value of the first character of the string column, and returns the result as an int column.
Input Parameters: Column arg0
arg0(required) - The string column for which the first character’s numeric value to be calculated.
Output Type String Column Configuration Parameters No Returns The ASCII value as an int column. Throws Application Exception
Use Case:
ascii(Column arg0)
Example:
ascii(“An apple”) will return 65
base64
Converts the argument from a binary bin to a base 64 string.
Input Parameters:
arg0: The Column to be converted to base64 string.
Output Type String Column Configuration Parameters No Returns Returns the base64 string. Throws Application Exception
Use Case:
base64(Column arg0)
Example:
base64(‘Spark SQL’) will return U3BhcmsgU1FM
bin
Returns the string representation of the long value expr represented in binary.
Input Parameters:
arg0: The numerical Column to be converted to represented in binary.
Output Type String Column Configuration Parameters No Returns Returns the binary representation. Throws Application Exception
Use Case:
bin(Column arg0)
Example:
bin(13) will return 1101
decode
Decodes the first argument using the second argument character set.
Input Parameters:
arg0:Column to be decoded.
arg1: The charset
Output Type String Column Configuration Parameters No Returns Returns the decoded column. Throws Application Exception
Use Case:
decode(Column arg0, Object arg1)
Example:
decode(encode(‘abc’, ‘utf-8’), ‘utf-8’) will return abc
encode
Encodes the first argument using the second argument character set.
Input Parameters:
arg0:Column to be encoded.
arg1: The charset
Output Type String Column Configuration Parameters No Returns Returns the encoded column. Throws Application Exception
Use Case:
encode(Column arg0, Object arg1)
Example:
encode(‘abc’, ‘utf-8’) will return abc
itrim
Removes the leading string contains the characters from the trim string.
Input Parameters:
arg0:the trim string characters to trim, the default value is a single space.
arg1: a string expression.
Output Type String Column Configuration Parameters No Returns Returns the trimed string. Throws Application Exception
Use Case:
ltrim(Column arg0, Object arg1)
Example:
ltrim(‘Sp’,‘SsparkSQLS’)will return ArkSQLS
soundex
Returns Soundex code of the string.
Input Parameters:
arg0:String column
Output Type String Column Configuration Parameters No Returns Returns Soundex code of the string. Throws Application Exception
Use Case:
soundex(Column arg0)
Example:
soundex(‘Miller’) will return M460
split
Splits str around occurrences that match regex
Input Parameters:
arg0: str string column.
arg1:the regex string
Output Type String Column Configuration Parameters No Returns Returns the spilts. Throws Application Exception
.
Use Case:
split(Column arg0, Object arg1)
Example:
split(‘oneAtwoBthreeC’, ‘[ABC]’)will return [“one”,“two”,“three”,""]
trim
Remove the leading and trailing trimStr characters from str
Input Parameters:
arg0:The trimStr String column.
arg1:The str string
Output Type String Column Configuration Parameters No Returns Returns the trimed string. Throws Application Exception
Use Case:
trim(Column arg0, Object arg1)
Example:
trim(‘SL’, ‘SsparkSQLS’) will return parkSQ
unbase64
Converts the argument from a base 64 string str to a binary.
Input Parameters:
arg0:The base 64 String column
Output Type String Column Configuration Parameters No Returns Returns the unbase64 of string. Throws Application Exception
Use Case:
unbase64(Column arg0)
Example:
unbase64(‘U3BhcmsgU1FM’) will return Spark SQL
Concat
Concatenates multiple input string columns together into a single string column.
Input Parameters: Column… arg0
arg0 (required) - The String columns to be concatenated.
Output Type String Column Configuration Parameters No Returns The concatenated string as a single string column. Throws Application Exception
Use Case:
concat(Column… arg0)
Example:
concat(“format”,“string”) will return “formatstring”
Concat_ws
Concatenates multiple input string columns together into a single string column, using the given separator.
Input Parameters: Object arg0, Column… arg1
arg0(required) - The separator to be used.
arg1 (required) - The String columns to be concatenated.
Output Type Column Configuration Parameters No Returns The concatenated strings using the given separator as a single string column Throws Application Exception
Use Case:
concat_ws(Object arg0, Column… arg1)
Example:
concat_ws(“-”,“format”,“string”) will return “format-string”.
Expr
Parses the expression string into the column that it represents.
Input Parameters: Object arg0
arg0 (required) - The expression string to be parsed
Output Type Column Configuration Parameters No Returns The column return by the expression evaluation. Throws Application Exception
Use Case:
expr(Object arg0)
Example:
expr(“colA * 3”) will return 6 if the value of column ‘colA’ is 2 for particular row.
Format_string
Formats the arguments in printf-style and returns the result as a string column.
Input Parameters: Object arg0, Column arg1
arg0 (required) - The printf style format.
arg1 (required) - The columns to be formatted.
Output Type String Column Configuration Parameters No Returns The formatted arguments as a string column. Throws Application Exception
Use Case:
format_string(Object arg0, Column… arg1)
Example:
format_string(“the %s jumped over the %s, %d times”,“cow”,“moon”,2) will return “the cow jumped over the moon 2 times”.
Initcap
Computes a new string column by converting the first letter of each word to uppercase.
Input Parameters: Column arg0
arg0 (required) - The input string column
Output Type String Column Configuration Parameters No Returns The converted string column. Throws ApplicationException
Use Case:
initcap(Column arg0)
Example:
initcap(“apple”) will return “Apple”)
Instr
Locate the position of the first occurrence of given substring in the given string column.
Input Parameters: Column arg0, Object arg1
arg0 (required) - The string column in which the location to be determined.
arg1 (required) - The substring for which the position to be determined.
Output Type Integer Column Configuration Parameters No Returns Returns the computed last day. Throws Application Exception
Use Case:
instr(Column arg0, Object arg1)
Example:
instr(“apple”,“le”) will return 4
Length
Computes the length of a given string or binary column.
Input Parameters: Column arg0
arg0 (required) - The string column for which the length to be determined.
Output Type Integer Column Configuration Parameters No Returns The computed length.
Use Case:
length(Column arg0)
Example:
length(“apple”) will return 5
Levenshtein
Computes the Levenshtein distance of the two given string columns.
Input Parameters: Column arg0, Column arg1
arg0 (required) - The first string column from which the Levenshtein distance from the second string column to be determined.
arg1 (required) - The second string column
Output Type Integer Column Configuration Parameters No Returns The computed Levenshtein distance. Throws Application Exception
Use Case:
levenshtein(Column arg0, Column arg1)
Example:
levenshtein(“kitten”, “sitting”) will return 3
Locate
Locate the position of the first occurrence of given substring in a string column, after the given position.
Input Parameters: Object arg0, Column arg1, int arg2
arg0 (required) - The String for which the location to be determined.
arg1 (required) - The String column in which the location to be determined.
arg2 (required) - The position after which the location to be determined
Output Type Integer Column Configuration Parameters No Returns The position of the first occurrence of substring. Throws Application Exception
Use Case:
locate(Object arg0, Column arg1, int arg2)
Example:
locate(“apple”,“An apple”,1) will return 3
Lower
Converts a string column to lower case.
Input Parameters: Column arg0
arg0 (required) - The string column to be converted to lower case
Output Type String Column Configuration Parameters No Returns String in lower case. Throws Application Exception
Use Case:
lower(Column arg0)
Example:
lower(“Apple”) will return “apple”
Lpad
Left-pad the string column with the given string, to a given length.
Input Parameters: Column arg0, int arg1, Object arg2
arg0 (required) - The string column to be left-padded.
arg1 (required) - The length for the padding.
arg2 (required) - The string to use for left-pad.
Output Type String Column Configuration Parameters No Returns The Left-padded string. Throws Application Exception
Use Case:
lpad(Column arg0, int arg1, Object arg2)
Example:
lpad(“SQL Tutorial”, 20, “ABC”) will return “ABCABCABSQL Tutorial”
Ltrim
Trim the spaces from left end for the specified string column.
Input Parameters: Column arg0
arg0 (required) - The string column from which left spaces to be trimmed.
Output Type String Column Configuration Parameters No Returns The trimmed string column.
Use Case:
ltrim(Column arg0, Object arg1)
Example:
ltrim(“apple”) will return “apple”
Regexp_extract
Extracts a specific group matched by a Java regex, from the specified string column.
Arguments: Column arg0, Object arg1, int arg2
arg0 (required) - The string column from which the group to be extracted.
arg1 (required) - The string specifying the regex.
arg2 (required) - The regex group id.
Output Type String Column Configuration Parameters No Returns The extracted group. Throws Application Exception
Use Case:
regexp_extract(Column arg0, Object arg1, int arg2)
Example:
regexp_extract(“foothebar”,“foo(.*?)(bar)”, 2) will return “bar”
Regexp_replace
Replace all substrings of the specified string column that match regexp with the given replacement.
Input Parameters: Column arg0, Object arg1, Object arg2
arg0 (required) - The string column from which substrings to be replaced.
arg1 (required) - The pattern to be used.
arg2 (required) - The replacement.
Output Type String Column Configuration Parameters No Returns The string column after replacement of substrings that match regexp. Throws Application Exception
Use Case:
regexp_replace(Column arg0, Object arg1, Object arg2)
Example:
regexp_replace(“foobar”, “oo|ar”, ““) will return”fb”
Repeat
Repeats each value in ‘arg0’ column ‘arg1’ times.
Input Parameters: Column arg0,int arg1
arg0 (required) - A column needs to be repeated
arg1 (required) - Integer value representing no of times arg0 is to be repeated.
Output Type Column Configuration Parameters No Returns A repeated value. Throws Application Exception
Use Case:
repeat(Column arg0,int arg1)
Example:
repeat(“str”,2) will return ‘strstr’
Reverse
Reverses the string column and returns it as a new string column.
Input Parameters: Column arg0
arg0 (required) - The string column to be reversed.
Output Type String Column Configuration Parameters No Returns The reversed string column. Throws Application Exception
Use Case:
reverse(Column arg0)
Example:
reverse(“apple”) will return “elppa”
Rpad
Right-pad the string column with the given string, to a given length.
Input Parameters: Column arg0, int arg1, Object arg2
arg0 (required) - The string column to be right-padded.
arg1 (required) - The length for the padding.
arg2 (required) - The string to use for right-pad.
Output Type StringColumn Configuration Parameters No Returns The Right-padded string. Throws Application Exception
Use Case:
rpad(Column arg0, int arg1, Object arg2)
Example:
rpad(“SQL Tutorial”, 20, “ABC”) will return “SQL TutorialABCABCAB”
Rtrim
Trim the spaces from right end for the specified string column.
Input Parameters: Column arg0
arg0 (required) - The string column from which right spaces to be trimmed.
Output Type String Column Configuration Parameters No Returns The trimmed string column.
Use Case:
rtrim(Column arg0)
Example:
rtrim(“apple”) will return “apple”
Substring
Substring starts at given position and is of given length when given argument is String type or returns the slice of byte array that starts at given position in byte and is of given length when given argument is Binary type.
Input Parameters: Column arg0, int arg1, int arg2
arg0 (required) - The String column from which substring to be extracted.
arg1 (required) - The start position for the substring.
arg2 (required) - The end position for the substring.
Output Type String Column Configuration Parameters No Returns Returns the computed last day.
Use Case:
substring(Column arg0, int arg1, int arg2)
Example:
substring(“foo bar”,4,6) will return “bar”
Substring_index
Computes the substring from given string column before given count occurrences of the given delimiter.
Input Parameters: Column arg0, Object arg1, int arg2
arg0 (required) - The String column from which substring to be extracted.
arg1 (required) - The delimiter.
arg2 (required) - The count occurrences for the delimiter.
Output Type Column Configuration Parameters No Returns The result substring.
Use Case:
substring_index(Column arg0, Object arg1, int arg2)
Example:
substring_index(“www.xyz.com”,“.”,2) will return www.xyz
Translate
Translate any character in the given string by a given character in given replaceString.
Input Parameters: Column arg0, Object arg1, Object arg2
arg0 (required) - The string column in which the translation to be done.
arg1 (required) - The matching character.
arg2 (required) - The replacement character.
Output Type String Column Configuration Parameters No Returns The translated string.
Use Case:
translate(Column arg0, Object arg1, Object arg2)
Example:
translate(“The foo bar”,“f”,“t”) will return “The too bar”
Trim
Trim the spaces from both ends for the specified string column.
Input Parameters: Column arg0
arg0 (required) - The string column from which spaces to be trimmed.
Output Type String Column Configuration Parameters No Returns The trimmed string value. Throws Application Exception
Use Case:
trim(Column arg0, Object arg1)
Example:
trim(" An apple “) will return”An apple"
Upper
Converts a string column to upper case.
Input Parameters: Column arg0
arg0 (required) - The string column to be converted to upper case.
Output Type String Column Configuration Parameters No Returns The converted string column.
Use Case:
upper(Column arg0)
Example:
upper(“aPPle”) will return “APPLE”
If you have any feedback on Gathr documentation, please email us!