String Functions

Ascii

Computes the numeric value of the first character of the string column, and returns the result as an int column.

Input Parameters: Column arg0

  • arg0(required) - The string column for which the first character’s numeric value to be calculated.
Output TypeString Column
Configuration ParametersNo
ReturnsThe ASCII value as an int column.
ThrowsApplication Exception

Use Case:

usecae_asci

Example:

ascii(“An apple”) will return 65


base64

Converts the argument from a binary bin to a base 64 string.

Input Parameters:

arg0: The Column to be converted to base64 string.

Output TypeString Column
Configuration ParametersNo
ReturnsReturns the base64 string.
ThrowsApplication Exception

Example:

base64('Spark SQL') will return U3BhcmsgU1FM


bin

Returns the string representation of the long value expr represented in binary.

Input Parameters:

arg0: The numerical Column to be converted to represented in binary.

Output TypeString Column
Configuration ParametersNo
ReturnsReturns the binary representation.
ThrowsApplication Exception

Example:

bin(13) will return 1101


decode

Decodes the first argument using the second argument character set.

Input Parameters:

arg0:Column to be decoded.

arg1: The charset

Output TypeString Column
Configuration ParametersNo
ReturnsReturns the decoded column.
ThrowsApplication Exception

Example:

decode(encode('abc', 'utf-8'), 'utf-8') will return abc


encode

Encodes the first argument using the second argument character set.

Input Parameters:

arg0:Column to be encoded.

arg1: The charset

Output TypeString Column
Configuration ParametersNo
ReturnsReturns the encoded column.
ThrowsApplication Exception

Example:

encode('abc', 'utf-8') will return abc


itrim

Removes the leading string contains the characters from the trim string.

Input Parameters:

arg0:the trim string characters to trim, the default value is a single space.

arg1: a string expression.

Output TypeString Column
Configuration ParametersNo
ReturnsReturns the trimed string.
ThrowsApplication Exception

Example:

ltrim('Sp','SsparkSQLS') will return ArkSQLS


soundex

Returns Soundex code of the string.

Input Parameters:

arg0:String column

Output TypeString Column
Configuration ParametersNo
ReturnsReturns Soundex code of the string.
ThrowsApplication Exception

Example:

soundex('Miller') will return M460


split

Splits str around occurrences that match regex

Input Parameters:

arg0: str string column.

arg1:the regex string

Output TypeString Column
Configuration ParametersNo
ReturnsReturns the spilts.
ThrowsApplication Exception

Example:

split('oneAtwoBthreeC', '[ABC]') will return [“one”,“two”,“three”,""]


trim

Remove the leading and trailing trimStr characters from str

Input Parameters:

arg0:The trimStr String column.

arg1:The str string

Output TypeString Column
Configuration ParametersNo
ReturnsReturns the trimed string.
ThrowsApplication Exception

Example:

trim('SL', 'SsparkSQLS') will return parkSQ


unbase64

Converts the argument from a base 64 string str to a binary.

Input Parameters:

arg0:The base 64 String column

Output TypeString Column
Configuration ParametersNo
ReturnsReturns the unbase64 of string.
ThrowsApplication Exception

Example:

unbase64('U3BhcmsgU1FM') will return Spark SQL


Concat

Concatenates multiple input string columns together into a single string column.

Input Parameters: Column… arg0

arg0 (required) - The String columns to be concatenated.

Output TypeString Column
Configuration ParametersNo
ReturnsThe concatenated string as a single string column.
ThrowsApplication Exception

Use Case:

concat_output

Example:

concat("format","string") will return “formatstring”


Concat_ws

Concatenates multiple input string columns together into a single string column, using the given separator.

Input Parameters: Object arg0, Column… arg1

  • arg0*(required)* - The separator to be used.   

  • arg1 (required) - The String columns to be concatenated.

Output TypeColumn
Configuration ParametersNo
ReturnsThe concatenated strings using the given separator as a single string column
ThrowsApplication Exception

Example:

concat_ws("-","format","string") will return “format-string”.


Expr

Parses the expression string into the column that it represents.

Input Parameters: Object arg0

  • arg0 (required) - The expression string to be parsed
Output TypeColumn
Configuration ParametersNo
ReturnsThe column return by the expression evaluation.
ThrowsApplication Exception

Use Case:

exp

Example:

expr("colA \* 3") will return 6 if the value of column ‘colA’ is 2 for a particular row.


Format_string

Formats the arguments in printf-style and returns the result as a string column.

Input Parameters: Object arg0, Column arg1

  • arg0 (required) - The printf style format.

  • arg1 (required) - The columns to be formatted.

Output TypeString Column
Configuration ParametersNo
ReturnsThe formatted arguments as a string column.
ThrowsApplication Exception

Example:

format_string("the %s jumped over the %s, %d times","cow","moon",2) will return “the cow jumped over the moon 2 times”.


Initcap

Computes a new string column by converting the first letter of each word to uppercase.

Input Parameters: Column arg0

  • arg0 (required) - The input string column
Output TypeString Column
Configuration ParametersNo
ReturnsThe converted string column.
ThrowsApplicationException

Use Case:

initicap

Example:

initcap("apple") will return (“Apple”)


Instr

Locate the position of the first occurrence of given substring in the given string column.

Input Parameters: Column arg0, Object arg1

  • arg0 (required) - The string column in which the location to be determined.

  • arg1 (required) - The substring for which the position to be determined.

Output TypeInteger Column
Configuration ParametersNo
ReturnsReturns the computed last day.
ThrowsApplication Exception

Example:

instr("apple","le") will return 4


Length

Computes the length of a given string or binary column.

Input Parameters: Column arg0

arg0 (required) - The string column for which the length to be determined.

Output TypeInteger Column
Configuration ParametersNo
ReturnsThe computed length.

Use Case:

lenegth

Example:

length("apple") will return 5


Levenshtein

Computes the Levenshtein distance of the two given string columns.

Input Parameters: Column arg0, Column arg1

  • arg0 (required) - The first string column from which the Levenshtein distance from the second string column to be determined.

  • arg1 (required) - The second string column

Output TypeInteger Column
Configuration ParametersNo
ReturnsThe computed Levenshtein distance.
ThrowsApplication Exception

Use Case:

leven

Example:

levenshtein("kitten", "sitting") will return 3


Locate

Locate the position of the first occurrence of given substring in a string column, after the given position.

Input Parameters: Object arg0, Column arg1, int arg2

  • arg0 (required) - The String for which the location to be determined.

  • arg1 (required) - The String column in which the location to be determined.

  • arg2 (required) - The position after which the location to be determined

Output TypeInteger Column
Configuration ParametersNo
ReturnsThe position of the first occurrence of substring.
ThrowsApplication Exception

Use Case:

locate

Example:

locate("apple","An apple",1) will return 3


Lower

Converts a string column to lower case.

Input Parameters: Column arg0

  • arg0 (required) - The string column to be converted to lower case
Output TypeString Column
Configuration ParametersNo
ReturnsString in lower case.
ThrowsApplication Exception

Use Case:

lower

Example:

lower("Apple") will return “apple”


Lpad

Left-pad the string column with the given string, to a given length.

Input Parameters: Column arg0, int arg1, Object arg2

  • arg0 (required) - The string column to be left-padded.

  • arg1 (required) - The length for the padding.

  • arg2 (required) - The string to use for left-pad.

Output TypeString Column
Configuration ParametersNo
ReturnsThe Left-padded string.
ThrowsApplication Exception

Use Case:

lpad

Example:

lpad("SQL Tutorial", 20, "ABC") will return “ABCABCABSQL Tutorial”


Ltrim

Trim the spaces from left end for the specified string column.

Input Parameters: Column arg0

  • arg0 (required) - The string column from which left spaces to be trimmed.
Output TypeString Column
Configuration ParametersNo
ReturnsThe trimmed string column.

Use Case:

ltrim

Example:

ltrim(“apple”) will return “apple”


Regexp_extract

Extracts a specific group matched by a Java regex, from the specified string column.

Arguments: Column arg0, Object arg1, int arg2

  • arg0 (required) - The string column from which the group to be extracted.

  • arg1 (required) - The string specifying the regex.

  • arg2 (required) - The regex group id.

Output TypeString Column
Configuration ParametersNo
ReturnsThe extracted group.
ThrowsApplication Exception

Use Case:

regexextract

Example:

regexp_extract("foothebar","foo(.\*?)(bar)", 2) will return “bar”


Regexp_replace

Replace all substrings of the specified string column that match regexp with the given replacement.

Input Parameters: Column arg0, Object arg1, Object arg2

  • arg0 (required) - The string column from which substrings to be replaced.

  • arg1 (required) - The pattern to be used.

  • arg2 (required) - The replacement.

Output TypeString Column
Configuration ParametersNo
ReturnsThe string column after replacement of substrings that match regexp.
ThrowsApplication Exception

Use Case:

replace

Example:

regexp_replace("foobar", "oo|ar", "") will return “fb”


Rename_column

Rename the given column.

Input Parameters: Object arg0

arg0 (required) - The column name to be renamed.

Output TypeColumn
Configuration ParametersNo
ReturnsThe renamed column.
ThrowsApplication Exception

Use Case:

rename

Example:

rename_column("age") will rename “age” to given output column name.


Repeat

Repeats each value in ‘arg0’ column ‘arg1’ times.

Input Parameters: Column arg0,int arg1

arg0 (required) - A column needs to be repeated

arg1 (required) - Integer value representing no of times arg0 is to be repeated.

Output TypeColumn
Configuration ParametersNo
ReturnsA repeated value.
ThrowsApplication Exception

Use Case:

repeat

Example:

repeat("str",2) will return ‘strstr’


Reverse

Reverses the string column and returns it as a new string column.

Input Parameters: Column arg0

arg0 (required) - The string column to be reversed.

Output TypeString Column
Configuration ParametersNo
Returns The reversed string column.
ThrowsApplication Exception

Use Case:

reverset

Example:

reverse("apple") will return “elppa”


Rpad

Right-pad the string column with the given string, to a given length.

Input Parameters: Column arg0, int arg1, Object arg2

  • arg0 (required) - The string column to be right-padded.

  • arg1 (required) - The length for the padding.

  • arg2 (required) - The string to use for right-pad.

Output TypeStringColumn
Configuration ParametersNo
ReturnsThe Right-padded string.
ThrowsApplication Exception

Use Case:

rpad

Example:

rpad("SQL Tutorial", 20, "ABC") will return “SQL TutorialABCABCAB”


Rtrim

Trim the spaces from right end for the specified string column.

Input Parameters: Column arg0

arg0 (required) - The string column from which right spaces to be trimmed.

Output TypeString Column
Configuration ParametersNo
ReturnsThe trimmed string column.

Use Case:

rtrim

Example:

rtrim("apple ") will return “apple”


Substring

Substring starts at given position and is of given length when given argument is String type or returns the slice of byte array that starts at given position in byte and is of given length when given argument is Binary type.

Input Parameters: Column arg0, int arg1, int arg2

  • arg0 (required) - The String column from which substring to be extracted.

  • arg1 (required) - The start position for the substring.

  • arg2 (required) - The end position for the substring.

Output TypeString Column
Configuration ParametersNo
ReturnsReturns the computed last day.

Use Case:

substring

Example:

substring("foo bar",4,6) will return “bar”


Substring_index

Computes the substring from given string column before given count occurrences of the given delimiter.

Input Parameters: Column arg0, Object arg1, int arg2

  • arg0 (required) - The String column from which substring to be extracted.

  • arg1 (required) - The delimiter.

  • arg2 (required) - The count occurrences for the delimiter.

Output TypeColumn
Configuration ParametersNo
ReturnsThe result substring.

Use Case:

![substrign _index](./images/18_Functions/substrign _index.JPG)

Example:

substring_index("www.xyz.com",".",2) will return www.xyz


Translate

Translate any character in the given string by a given character in given replaceString.

Input Parameters: Column arg0, Object arg1, Object arg2

  • arg0 (required) - The string column in which the translation to be done.

  • arg1 (required) - The matching character.

  • arg2 (required) - The replacement character.

Output TypeString Column
Configuration ParametersNo
ReturnsThe translated string.

Example:

translate("The foo bar","f","t") will return “The too bar”


Trim

Trim the spaces from both ends for the specified string column.

Input Parameters: Column arg0

arg0 (required) - The string column from which spaces to be trimmed.

Output TypeString Column
Configuration ParametersNo
ReturnsThe trimmed string value.
ThrowsApplication Exception

Use Case:

trim

Example:

trim(" An apple ") will return “An apple”


Upper

Converts a string column to upper case.

Input Parameters: Column arg0

arg0 (required) - The string column to be converted to upper case.

Output TypeString Column
Configuration ParametersNo
ReturnsThe converted string column.

Use Case:

uppercolumn

Example:

upper("aPPle") will return “APPLE”

Top