Salesforce ETL Source
Salesforce is a top-notch CRM application built on the Force.com platform. It can manage all the customer interactions of an organization through different media, like phone calls, site email inquiries, communities, as well as social media.
Once you configure the Salesforce source channel, it then allows to read Salesforce data from a valid Salesforce account. This is done by reading Salesforce object specified by Salesforce Object Query Language.
Schema Type
See the topic Provide Schema for ETL Source → to know how schema details can be provided for data sources.
After providing schema type details, the next step is to configure the data source.
Data Source Configuration
Connection Name: Connections are the service identifiers. A connection name can be selected from the list if you have created and saved connection details for Salesforce earlier. Or create one as explained in the topic - Salesforce Connection →
Table Name: Source table name to be selected for which you want to view the metadata.
Query: Salesforce Object Query Language (SOQL) to search your organization’s Salesforce data for specific information. SOQL is similar to the SELECT statement in the widely used Structured Query Language (SQL) but is designed specifically for Salesforce data. It is mandatory for reading objects like opportunity.
Infer Schema: (Optional) Infer schema from the query results. This will find the data type of the field specified in SOQL. Sample rows will be taken to find the data type. This will work if number of records are 5 and above.
Date Format: A string that indicates the format that follow java.text.SimpleDateFormat to use when reading timestamps. This applies to Timestamp type. By default, it is null which means trying to parse timestamp by java.sql.Timestamp.valueof().
Bulk: (Optional) Flag to enable bulk query. This is the preferred method when loading large sets of data. Bulk API is based on REST principles and is optimized for loading large sets of data. You can use it to query many records asynchronously by submitting batches. Salesforce will process batches in the background. Default value is false.
Salesforce Object: (Conditional) Salesforce Objects are database tables which permit you to store data specific to organization. This is a mandatory parameter when bulk is true and it should be same as specified in SOQL.
Pk Chunking: (Optional) Flag to enable automatic primary key chunking for bulk query job.
This splits bulk queries into separate batches that of the size defined by chunkSize option. By default, it is false. Pk Chunking feature can automatically make large queries manageable when using the Bulk API.
Pk stands for Primary Key - the object’s record ID - which is always indexed. This feature is supported for all custom objects, many standard objects, and their sharing tables.
Chunk size: The size of the number of records to include in each batch. Default value is 100,000. This option can only be used when Pk Chunking is true. Maximum size is 250,000.
Version: Salesforce API version to be selected from the list.
Detect Schema
Check the populated schema details. For more details, see Schema Preview →
Incremental Read
Optionally, you can enable incremental read. For more details, see Salesforce Incremental Configuration →
Pre Action
To understand how to provide SQL queries or Stored Procedures that will be executed during pipeline run, see Pre-Actions →)
Notes
Optionally, enter notes in the Notes → tab and save the configuration.
If you have any feedback on Gathr documentation, please email us!