Formulas adding computed columns
When looking for answers to basic business questions, your data may not contain the fields/columns that you need for your analysis. This is when computed columns are required.
For example, your dataset contains release start date and end date. But you need to know how many days it takes for the release. In such scenarios, you can create computed fields or columns in your data.
A computed column is an additional column added to your dataset can be used for even more advanced calculations and visualizations.
The value of rows of this column is calculated by the system based on a formula. You can create these formulas by using columns (or fields) from one or more datasets.
Gathr Analytics supports following arithmetic operators:
Arithmetic Operation | Gathr Analytics’s Operator |
Addition | + |
Subtraction | - |
Multiplication | * |
Division | / |
Exponent(Power) | ^ |
Gathr Analytics also has a rich set of functions that can be used in a formula. Gathr Analytics has logical, text, date, aggregation and various advanced analytical functions for unstructured text analytics, Machine learning, and custom scripting.
Add a computed column
Suppose that you have a sample dataset that contains all the sales data for a chain of stores such as transaction date, product, price, payment type, country etc. This dataset has a column called Price, which is the purchase price to the retailer. To calculate the Sales prices with 15% profit for each product, you can create a computed column called Selling_price that contains the following formula:
Selling Price=Price +Price*0.15
The value of price will be taken from the corresponding row of the ‘Price’ column.
To add this new computed column based on the above formula, follow the steps mentioned below.
Step 1: Open the Data panel.
Under ‘Current Page’, click on the Overflow Menu for your dataset. Select ‘Add Formula’ from the overflow menu.
Step 2: Write the expression in the formula editor.
In the formula editor, on the left of ‘=’ sign, is the default name of the new computed column, which you can edit. Type Selling Price.
To the right of ‘=’ sign, is where you can write your formula. Type the name of the dataset or column in the expression (price- in this example). As you start typing, Gathr Analytics will suggest you the available datasets and columns in this format “<DataSet_Name>.<Column_Name>”. Select the appropriate column from the list.
Alternatively, you can also select the column from the Data Panel by clicking on the column name.
Complete the expression using mathematical operations.
Step 3: When finished, click on icon to create a computed column to your dataset.
You might get errors for any incomplete arithmetic expressions. Correct the errors and click icon.
Step 4: Add Business Type to newly created column. if business types are defined on the columns present in the formula/function, Gathr Analytics will provide suggestions. Select Business Type(s) from the suggested options and click Ok.
Note: You can define multiple business types for your column.
To learn more, refer Section Basic Concepts and Terminology.
New computed column “Selling Price” is saved to your dataset and can be used to create more calculated columns and visualizations.
Step 5: Add this column to your visualization (Optional Step).
Note: Not every custom column you create needs to be added to the Visualizations. Sometimes, you want to use it in another calculation without visualizing.
To add, open the Visualization Panel. Drag & drop the newly added column from the Data Panel under ‘Columns’ in the Visualization Panel.
Edit a Computed Column
To edit a computed column, follow the steps given below:
Open The Data panel. For your dataset where the formula is applied, click on icon to display all the columns.
Under Formula, hover over a computed column and click on the icon. In the formula editor, edit the formula and select icon.
Remove a Computed Column
To remove a computed column, follow the steps given below:
- Open The Data panel.
For your dataset where the formula is applied, click on icon to display all the columns.
Under Formula, hover over a computed column to be deleted. Click on the Overflow Menu icon and select ‘Delete Column’.
Click ‘Yes’ to confirm the removal of the column.
If the deleted column was added to your visualization then you need to reconfigure the visualization. Open the Visualizations Panel and remove the column from the view as shown in the following figure.
The column will be successfully removed from the dataset and visualizations.
Gathr Analytics In-built Functions
A Gathr Analytics in-built functions are off-the-shelf formulas to perform complex calculations with ease. e.g. SUM, MAX, AVERAGE, VLOOKUP, COUNTIF etc.
All in-built functions have a standard syntax as shown below:
<Function Name> ( [Param1] , [Param2] , …)
Parameters are inputs provided to the function. Multiple arguments are separated by a comma.
e.g.: AVG(12, 17, 19)
You can use fields of a dataset as arguments as well.
e.g.: AVG(‘SalesJan2019.Selling Price’)
Categories
Gathr Analytics in-built functions are categorized as under
Number Functions (ABSOLUTE, ROUND etc.)
Text Functions (TOKENIZE, TOUPPER, TRIM etc.)
Aggregate Functions (SUM, AVERAGE, MAX etc.)
Lookup Functions (VLOOKUP)
Date & Time Functions (DATEDIFF, DATEADD etc.)
Logical Functions (AND, OR)
Type Conversion (TOINT, TODATE etc.)
Analytical functions (SENTIMENT, APPLYCLSUTERING etc.)
Gathr Analytics’s complete function list is available here - [Gathr Analytics In-built Function reference].
How to Use Gathr Analytics’s In-built Functions
Suppose “Releases” dataset has Start date and Release date but you need time taken for the release. Use the DATEDIFF function to calculate the difference between two dates.
Open the formula editor, and name the new computed column.
Type-in few letters of the function name (say date.) Select the function you need, from the drop downs list.
- A function help box will provide you description, syntax, and few examples to help you complete the formula.
- Type-in first few letters of the name of the column (to be provided as arguments to the function) and Gathr Analytics’s autocomplete feature will provide you the list of matching columns and Data Sets. You can choose from the drop-down list.
Note: You can select columns from the Data Panel as well.
- Provide all the arguments to complete your formula. In the example, you need to provide interval as ‘Day’.
When finished, click icon. Fix the errors (if any) for missing parameters,an incorrect number of required parameters or misplaced closing parenthesis.
Add a business type for the newly created column from the suggested options and click Ok.
To learn more, refer Section Basic Concepts and Terminology.
Gathr Analytics’s Functions List
List of all the in-built functions available in Gathr Analytics.
# | Function | Description | Syntax |
1. | ABSOLUTE | Returns the absolute value of a number. | ABSOLUTE (numeric column) Example: ABSOLUTE(EMPLOYEE.SALARY) |
2. | AND | Returns TRUE if ALL logical expressions are ‘TRUE’ else return ‘FALSE’. | AND (logical expression, …) Example: AND(EMPLOYEE.SALARY>5000, EMPLOYEE.SALARY<10000) |
3. | AVERAGEIFS | Returns the average of a range depending on multiple criteria. | AVERAGEIFS (average_range, critera_range1, critera1, [critera_range2], [criteria2], ….) Example: AVERAGEIFS(EMPLOYEE.SALARY,EMPLOYEE.DEPARTMENT,”ENGINNERING”, EMPLOYEE.GRADE,”10) |
4. | AVERSAGEIF | Returns the average of a column, filtered by a criterion. | AVERAGEIF (range, criteria, [avg_range]) Example: AVERAGEIF(EMPLOYEE.AGE,”>30,EMPLOYEE.SALARY) |
5. | AVG | Returns the average values of value from the column(s) | AVG (Column1, [Column2], …) Here, Column should be numeric type only Example: AVG(DS1.StoryPoints) |
6. | CHAR | Convert a number into a character according to the universal Unicode table. | CHAR (Column or Number) Example: CHAR(126) |
7. | CONCAT | Returns the concatenation of multiple values | CONCAT(Column1,…) Example: CONCAT(EMPLOYEE.FIRST_NAME,EMPLOYEE.LAST_NAME) Here, column should be from same DS |
8. | COUNT | Counts the values from the column(s) | COUNT (Column, [Column2], …) Here, Column can be of any data type Example: COUNT(DS1.StoryPoints), COUNT(DS1.id) |
9. | COUNTIF | Conditional count across a range | COUNTIF (criteria_range, criterion) Example: COUNTIF (DS1.StoryPoints,”>30”) COUNTIF (DS1.StoryPoints,” >DS1.minPoints”) |
10. | COUNTIFS | Returns a conditional count across a range | COUNTIFcriteria_range1, criterion1, [criteria_range2], [criterion2] …) Same as COUNTIF, with multiple conditions Example: COUNTIF(DS1.StoryPoints,”>30”,DS1.status,”=Success”) |
11. | DATE | Converts a year, month and day into a date | DATE(Year, Month, Day, [Time Zone]) Example : DATE(2018,2,2) DATE(1978,6,23, ”GMT+05:30”) DATE(1978,6,23, ”UTC+02:00”) |
12. | DATEADD | Returns a date to which a specific interval is added | DATEADD(DATE, Number_To_Add, Interval) Example: DATEADD(TODAY(),10,”DAY”) Supported value of Param3: DAY,MONTH,YEAR Supported value of Param1: Date Column or Date Value |
13. | DAY | Returns the day of the month that a specific date falls on, in numeric format. | DAY (date column, [TIME_ZONE]) Example: DAY(EMPLOYEE.JOINING_DATE) DAY(TODAY(),”GMT+12”) |
14. | IF | Returns one value if a logical expression is ‘TRUE’ or another value if it is ‘FALSE’ | IF(Condition,Column1,Column2) Example: IF(CreatedDate >TODAY(),DS1.Column1,DS1.Column2) Here, Column1 and Column2 should be of same type IF(EMPLOYEE.STATUS=1,”ACTIVE”,”DIASABLE”) |
15. | ISNULL | Returns true if column value is null otherwise false. | ISNULL(column) Example: ISNULL(EMPLOYEE.RELEASING_DATE) |
16. | ISNUMBER | returns TRUE if this is a number or Column containing a numeric value and FALSE otherwise | ISNUMBER(Column) Example: ISNUMBER(EMPLOYEE.SALARY) |
17. | LARGE | Returns the nth largest element from a data set, where n is user-defined. | LARGE (numeric column, n) Example: LARGE(EMPLOYEE.SALARY,5) |
18. | LEFT | Returns a substring from the beginning of a specified string. | LEFT (column, [number_of_characters]) Example: LEFT(EMPLOYEE.ADDRESS,4) |
19. | LENGTH | returns the length of a string | LENGTH(column) Example: LENGTH(EMPLOYEE.ADDRESS) |
20. | MAX | Returns the max value from column(s) | MAX(Column) Here, Column can be numeric or date time type Example: MAX(DS1.CreatedDate) |
21. | MAXIF | Returns the max value from the column, filtered by a criterion | MAXIF (column, criteria_range, criteria1)) Here, criterion1 can be constant as well e.g. “KLERA-123” Example: MAXIF(EMPLOYEE.SALARY,EMPLOYEE.AGE,”30”) MAXIF(EMPLOYEE.SALARY,EMPLOYEE.AGE,”>=50”) |
22. | MAXIFS | Returns the max value from the column, filtered by multiple criteria | MAXIFS(column, criteria_range1, criteria1, critera_range2, criteria2], …) Same as MAXIF, with multiple conditions Example: MAXIFS(EMPLOYEE.SALARY,EMPLOYEE.AGE,”30”,EMPLOYEE.GRADE,”C”) |
23. | MEDIAN | Returns the median value in a numeric dataset. | MEDIAN (numeric column) Example: MEDIAN(EMPLOYEE.SALARY) |
24. | MID | Extracts A segment | MID (Column OR String, starting_at, extract_length) Example: MID(“Hello Word”,4,4) |
25. | MIN | Returns the min value from column(s) | MIN(Column) Here, Column can be numeric or date time type Example: MIN(DS1.CreatedDate) |
26. | MINIF | Returns the min value from the column, filtered by a criterion | MINIF (column, criteria_range, criteria) Here, criterion can be constant as well e.g. “KLERA-123” Example: MINIF(EMPLOYEE.SALARY,EMPLOYEE.AGE,”30”) |
27. | MINIFS | Returns the min value from the column, filtered by multiple criteria | MINIFS(column, criteria_range1, criteria1, critera_range2, criteria2], …) Same as MINIF, with multiple conditions Example: MINIFS(EMPLOYEE.SALARY,EMPLOYEE.AGE,”30”,EMPLOYEE.GRADE,”C”) |
28. | MONTH | Returns the month of the year a specific date falls in, in numeric format. | MONTH (date column, [TIME_ZONE]) Example: MONTH(EMPLOYEE.JOINING_DATE) MONTH(NOW(),”GMT+12”) |
29. | NOTNULL | Returns true if column value is not null otherwise false. | NOTNULL(column) Example: NOTNULL(EMPLOYEE.RELEASING_DATE) |
30. | OR | Returns TRUE if any logical expressions is ‘TRUE’ else return ‘FALSE’ | OR(logical expression) Example: OR (EMPLOYEE.SALARY<5000, EMPLOYEE.SALARY>10000) |
31. | RANK | Returns the rank of a specified value in a dataset | RANK (value, numeric column, [is_ascending) IS_ASCENDING-0 BY DEFAULT, means the greatest value in data will have Rank 1 RANK(5,EMPLOYEE.SALARY,1) |
32. | REPT | Returns a specified text repeated several times | Returns specified text repeated several times. RANK (column OR String, number_of_repetitions) Example: REPT(“John”,3) |
33. | RIGHT | Returns a substring from the end of a specified string. | RIGHT (column, [number_of_characters]) Example: RIGHT(EMPLOYEE.ADDRESS,4) |
34. | ROUND | Rounds a number to a certain number of decimal places according to standard rules. | ROUND (column1, column2) Example: ROUND(EMPLOYEE.SALARY,2) |
35. | SEARCH | Returns the position at which a string is first found within cell and ignores capitalization of letters. Returns null if the string is not found. | SEARCH (search_for, text_to_search, [starting_at]) Example: SEARCH(street,EMPLOYEE.ADDRESS,1) |
36. | SMALL | Returns the nth smallest element from a data set, where n is user-defined. | SMALL (numeric column,n) Example: SMALL(EMPLOYEE.SALARY,4) |
37. | STDDEV | Returns standard deviation of a numeric dataset. | STDDEV (numeric column) Example: STDDEV(EMPLOYEE.SALARY) |
38. | SUM | Returns the sum of values from column(s) | SUM(Column1,[Column2],….) Here, Column should be numeric type only Example: SUM (DS1.StoryPoints) SUM (EMP.SALARY,EMP.INCENTIVE) |
39. | SUMIF | Returns the sum of a column, filtered by a column | SUMIF(criteria_range, criterion,[sum_range]) Example: SUMIF(EMP.AGE,”>30”,EMP.SALARY) Here, criterion can be constant as well e.g. “KLERA-123” Sum_range should be of numeric only |
40. | SUMIFS | Returns the sum of columns depending on multiple criterion | SUMIFS(criteria_range1, criterion1, [criteria_range2, …], [criterion2, …]) Same as SUMIF, with multiple conditions |
41. | TODATE | Converts a date/datetime column to a date column. | TODATE(date_time_column) Example: TODATE(EMPLOYEE.JOINING_DATE) |
42. | TODAY | Returns the current date as a Date value | TODAY() |
43. | TOINT | Converts value to integer value | TOINT(Column) Example: TOINT(EMPLOYEE.AGE), TOINT(“25”) Input can be column or Value |
44. | TOKENIZE | Divides text around a specified character or string and puts each fragment as comma separated values | TOKENIZE (Column, delimiter,[SpiltByEach,StopWordList],[include_system_stopwords]) Here, SplitByEach is an option and is Boolean. If user provides more than one separator and make this flag true then the string will be split wherever any of these splitter is present in the text and stowords is word list which will be ignored Example: TOKENIZE (EMPLOYEE.ADDRESS,”,- “, TRUE, ”#$%&”) |
45. | TOLOWER | Converts a specified string to lower case. | TOLOWER (Column OR String) Example: TOLOWER(EMPLOYEE.ADDRESS) |
46. | TOSTRING | Returns STRING VALUES OF A COLUMN | TOSTRING(Column) Example: TOSTRING(EMPLOYEE.SALARY), TOSTRING(“25”) Input can be column or Value |
47. | TOUPPER | Converts a specified string to uppercase. | TOUPPER (Column OR String) Example: TOUPPER(EMPLOYEE.ADDRESS) |
48. | TRIM | Removes leading and trailing spaces in a specified string. | TRIM (Column OR String) Example: TRIM(EMPLOYEE.ADDRESS) |
49. | VAR | Returns the variance of a numeric dataset. | VAR (numeric column) Example: VAR(EMPLOYEE.SALARY) |
50. | VLOOKUP | Searches down the search column for a search key and returns the value of a specified output column | VLOOKUP(search_key, search column, ouput_column) Example: VLOOKUP(EMPLOYEE.DEPARTMENT_CODE,DEPARTMENT.DEPARTMENT_CODE,DEPARTMENT.DEPARTMENT_NAME) Lookup will be done on first match |
51. | YEAR | Returns the year specified by a given date. | YEAR (date column, [TIME_ZONE]) Example: YEAR(DATEVALUE(“06/12/2017”)) |
If you have any feedback on Gathr documentation, please email us!