String Functions

Ascii

Computes the numeric value of the first character of the string column, and returns the result as an int column.

Input Parameters: Column arg0

arg0(required) - The string column for which the first character’s numeric value to be calculated.

Output Type String Column Configuration Parameters No Returns The ASCII value as an int column. Throws Application Exception

Use Case:

ascii(Column arg0)

Example:

ascii(“An apple”) will return 65

base64

Converts the argument from a binary bin to a base 64 string.

Input Parameters:

arg0: The Column to be converted to base64 string.

Output Type String Column Configuration Parameters No Returns Returns the base64 string. Throws Application Exception

Use Case:

base64(Column arg0)

Example:

base64(‘Spark SQL’) will return U3BhcmsgU1FM

bin

Returns the string representation of the long value expr represented in binary.

Input Parameters:

arg0: The numerical Column to be converted to represented in binary.

Output Type String Column Configuration Parameters No Returns Returns the binary representation. Throws Application Exception

Use Case:

bin(Column arg0)

Example:

bin(13) will return 1101

decode

Decodes the first argument using the second argument character set.

Input Parameters:

arg0:Column to be decoded.

arg1: The charset

Output Type String Column Configuration Parameters No Returns Returns the decoded column. Throws Application Exception

Use Case:

decode(Column arg0, Object arg1)

Example:

decode(encode(‘abc’, ‘utf-8’), ‘utf-8’) will return abc

encode

Encodes the first argument using the second argument character set.

Input Parameters:

arg0:Column to be encoded.

arg1: The charset

Output Type String Column Configuration Parameters No Returns Returns the encoded column. Throws Application Exception

Use Case:

encode(Column arg0, Object arg1)

Example:

encode(‘abc’, ‘utf-8’) will return abc

itrim

Removes the leading string contains the characters from the trim string.

Input Parameters:

arg0:the trim string characters to trim, the default value is a single space.

arg1: a string expression.

Output Type String Column Configuration Parameters No Returns Returns the trimed string. Throws Application Exception

Use Case:

ltrim(Column arg0, Object arg1)

Example:

ltrim(‘Sp’,‘SsparkSQLS’)will return ArkSQLS

soundex

Returns Soundex code of the string.

Input Parameters:

arg0:String column

Output Type String Column Configuration Parameters No Returns Returns Soundex code of the string. Throws Application Exception

Use Case:

soundex(Column arg0)

Example:

soundex(‘Miller’) will return M460

split

Splits str around occurrences that match regex

Input Parameters:

arg0: str string column.

arg1:the regex string

Output Type String Column Configuration Parameters No Returns Returns the spilts. Throws Application Exception

.

Use Case:

split(Column arg0, Object arg1)

Example:

split(‘oneAtwoBthreeC’, ‘[ABC]’)will return [“one”,“two”,“three”,""]

trim

Remove the leading and trailing trimStr characters from str

Input Parameters:

arg0:The trimStr String column.

arg1:The str string

Output Type String Column Configuration Parameters No Returns Returns the trimed string. Throws Application Exception

Use Case:

trim(Column arg0, Object arg1)

Example:

trim(‘SL’, ‘SsparkSQLS’) will return parkSQ

unbase64

Converts the argument from a base 64 string str to a binary.

Input Parameters:

arg0:The base 64 String column

Output Type String Column Configuration Parameters No Returns Returns the unbase64 of string. Throws Application Exception

Use Case:

unbase64(Column arg0)

Example:

unbase64(‘U3BhcmsgU1FM’) will return Spark SQL

Concat

Concatenates multiple input string columns together into a single string column.

Input Parameters: Column… arg0

arg0 (required) - The String columns to be concatenated.

Output Type String Column Configuration Parameters No Returns The concatenated string as a single string column. Throws Application Exception

Use Case:

concat(Column… arg0)

Example:

concat(“format”,“string”) will return “formatstring”

Concat_ws

Concatenates multiple input string columns together into a single string column, using the given separator.

Input Parameters: Object arg0, Column… arg1

arg0(required) - The separator to be used.

arg1 (required) - The String columns to be concatenated.

Output Type Column Configuration Parameters No Returns The concatenated strings using the given separator as a single string column Throws Application Exception

Use Case:

concat_ws(Object arg0, Column… arg1)

Example:

concat_ws(“-”,“format”,“string”) will return “format-string”.

Expr

Parses the expression string into the column that it represents.

Input Parameters: Object arg0

arg0 (required) - The expression string to be parsed

Output Type Column Configuration Parameters No Returns The column return by the expression evaluation. Throws Application Exception

Use Case:

expr(Object arg0)

Example:

expr(“colA * 3”) will return 6 if the value of column ‘colA’ is 2 for particular row.

Format_string

Formats the arguments in printf-style and returns the result as a string column.

Input Parameters: Object arg0, Column arg1

arg0 (required) - The printf style format.

arg1 (required) - The columns to be formatted.

Output Type String Column Configuration Parameters No Returns The formatted arguments as a string column. Throws Application Exception

Use Case:

format_string(Object arg0, Column… arg1)

Example:

format_string(“the %s jumped over the %s, %d times”,“cow”,“moon”,2) will return “the cow jumped over the moon 2 times”.

Initcap

Computes a new string column by converting the first letter of each word to uppercase.

Input Parameters: Column arg0

arg0 (required) - The input string column

Output Type String Column Configuration Parameters No Returns The converted string column. Throws ApplicationException

Use Case:

initcap(Column arg0)

Example:

initcap(“apple”) will return “Apple”)

Instr

Locate the position of the first occurrence of given substring in the given string column.

Input Parameters: Column arg0, Object arg1

arg0 (required) - The string column in which the location to be determined.

arg1 (required) - The substring for which the position to be determined.

Output Type Integer Column Configuration Parameters No Returns Returns the computed last day. Throws Application Exception

Use Case:

instr(Column arg0, Object arg1)

Example:

instr(“apple”,“le”) will return 4

Length

Computes the length of a given string or binary column.

Input Parameters: Column arg0

arg0 (required) - The string column for which the length to be determined.

Output Type Integer Column Configuration Parameters No Returns The computed length.

Use Case:

length(Column arg0)

Example:

length(“apple”) will return 5

Levenshtein

Computes the Levenshtein distance of the two given string columns.

Input Parameters: Column arg0, Column arg1

arg0 (required) - The first string column from which the Levenshtein distance from the second string column to be determined.

arg1 (required) - The second string column

Output Type Integer Column Configuration Parameters No Returns The computed Levenshtein distance. Throws Application Exception

Use Case:

levenshtein(Column arg0, Column arg1)

Example:

levenshtein(“kitten”, “sitting”) will return 3

Locate

Locate the position of the first occurrence of given substring in a string column, after the given position.

Input Parameters: Object arg0, Column arg1, int arg2

arg0 (required) - The String for which the location to be determined.

arg1 (required) - The String column in which the location to be determined.

arg2 (required) - The position after which the location to be determined

Output Type Integer Column Configuration Parameters No Returns The position of the first occurrence of substring. Throws Application Exception

Use Case:

locate(Object arg0, Column arg1, int arg2)

Example:

locate(“apple”,“An apple”,1) will return 3

Lower

Converts a string column to lower case.

Input Parameters: Column arg0

arg0 (required) - The string column to be converted to lower case

Output Type String Column Configuration Parameters No Returns String in lower case. Throws Application Exception

Use Case:

lower(Column arg0)

Example:

lower(“Apple”) will return “apple”

Lpad

Left-pad the string column with the given string, to a given length.

Input Parameters: Column arg0, int arg1, Object arg2

arg0 (required) - The string column to be left-padded.

arg1 (required) - The length for the padding.

arg2 (required) - The string to use for left-pad.

Output Type String Column Configuration Parameters No Returns The Left-padded string. Throws Application Exception

Use Case:

lpad(Column arg0, int arg1, Object arg2)

Example:

lpad(“SQL Tutorial”, 20, “ABC”) will return “ABCABCABSQL Tutorial”

Ltrim

Trim the spaces from left end for the specified string column.

Input Parameters: Column arg0

arg0 (required) - The string column from which left spaces to be trimmed.

Output Type String Column Configuration Parameters No Returns The trimmed string column.

Use Case:

ltrim(Column arg0, Object arg1)

Example:

ltrim(“apple”) will return “apple”

Regexp_extract

Extracts a specific group matched by a Java regex, from the specified string column.

Arguments: Column arg0, Object arg1, int arg2

arg0 (required) - The string column from which the group to be extracted.

arg1 (required) - The string specifying the regex.

arg2 (required) - The regex group id.

Output Type String Column Configuration Parameters No Returns The extracted group. Throws Application Exception

Use Case:

regexp_extract(Column arg0, Object arg1, int arg2)

Example:

regexp_extract(“foothebar”,“foo(.*?)(bar)”, 2) will return “bar”

Regexp_replace

Replace all substrings of the specified string column that match regexp with the given replacement.

Input Parameters: Column arg0, Object arg1, Object arg2

arg0 (required) - The string column from which substrings to be replaced.

arg1 (required) - The pattern to be used.

arg2 (required) - The replacement.

Output Type String Column Configuration Parameters No Returns The string column after replacement of substrings that match regexp. Throws Application Exception

Use Case:

regexp_replace(Column arg0, Object arg1, Object arg2)

Example:

regexp_replace(“foobar”, “oo|ar”, ““) will return”fb”

Repeat

Repeats each value in ‘arg0’ column ‘arg1’ times.

Input Parameters: Column arg0,int arg1

arg0 (required) - A column needs to be repeated

arg1 (required) - Integer value representing no of times arg0 is to be repeated.

Output Type Column Configuration Parameters No Returns A repeated value. Throws Application Exception

Use Case:

repeat(Column arg0,int arg1)

Example:

repeat(“str”,2) will return ‘strstr’

Reverse

Reverses the string column and returns it as a new string column.

Input Parameters: Column arg0

arg0 (required) - The string column to be reversed.

Output Type String Column Configuration Parameters No Returns The reversed string column. Throws Application Exception

Use Case:

reverse(Column arg0)

Example:

reverse(“apple”) will return “elppa”

Rpad

Right-pad the string column with the given string, to a given length.

Input Parameters: Column arg0, int arg1, Object arg2

arg0 (required) - The string column to be right-padded.

arg1 (required) - The length for the padding.

arg2 (required) - The string to use for right-pad.

Output Type StringColumn Configuration Parameters No Returns The Right-padded string. Throws Application Exception

Use Case:

rpad(Column arg0, int arg1, Object arg2)

Example:

rpad(“SQL Tutorial”, 20, “ABC”) will return “SQL TutorialABCABCAB”

Rtrim

Trim the spaces from right end for the specified string column.

Input Parameters: Column arg0

arg0 (required) - The string column from which right spaces to be trimmed.

Output Type String Column Configuration Parameters No Returns The trimmed string column.

Use Case:

rtrim(Column arg0)

Example:

rtrim(“apple”) will return “apple”

Substring

Substring starts at given position and is of given length when given argument is String type or returns the slice of byte array that starts at given position in byte and is of given length when given argument is Binary type.

Input Parameters: Column arg0, int arg1, int arg2

arg0 (required) - The String column from which substring to be extracted.

arg1 (required) - The start position for the substring.

arg2 (required) - The end position for the substring.

Output Type String Column Configuration Parameters No Returns Returns the computed last day.

Use Case:

substring(Column arg0, int arg1, int arg2)

Example:

substring(“foo bar”,4,6) will return “bar”

Substring_index

Computes the substring from given string column before given count occurrences of the given delimiter.

Input Parameters: Column arg0, Object arg1, int arg2

arg0 (required) - The String column from which substring to be extracted.

arg1 (required) - The delimiter.

arg2 (required) - The count occurrences for the delimiter.

Output Type Column Configuration Parameters No Returns The result substring.

Use Case:

substring_index(Column arg0, Object arg1, int arg2)

Example:

substring_index(“www.xyz.com”,“.”,2) will return www.xyz

Translate

Translate any character in the given string by a given character in given replaceString.

Input Parameters: Column arg0, Object arg1, Object arg2

arg0 (required) - The string column in which the translation to be done.

arg1 (required) - The matching character.

arg2 (required) - The replacement character.

Output Type String Column Configuration Parameters No Returns The translated string.

Use Case:

translate(Column arg0, Object arg1, Object arg2)

Example:

translate(“The foo bar”,“f”,“t”) will return “The too bar”

Trim

Trim the spaces from both ends for the specified string column.

Input Parameters: Column arg0

arg0 (required) - The string column from which spaces to be trimmed.

Output Type String Column Configuration Parameters No Returns The trimmed string value. Throws Application Exception

Use Case:

trim(Column arg0, Object arg1)

Example:

trim(" An apple “) will return”An apple"

Upper

Converts a string column to upper case.

Input Parameters: Column arg0

arg0 (required) - The string column to be converted to upper case.

Output Type String Column Configuration Parameters No Returns The converted string column.

Use Case:

upper(Column arg0)

Example:

upper(“aPPle”) will return “APPLE”

Top