Databricks Connection

The Databricks Connection serves as the gateway between your Gathr application and the Databricks platform. It enables seamless communication and interaction with the Databricks Query Endpoint for running SQL queries.

Prerequisites


If using SQL warehouse as Query Endpoint, get the connection details as follows:

  1. Log in to your Databricks workspace.

  2. In the sidebar, click SQL > SQL Warehouses.

  3. In the list of available warehouses, click the target warehouse’s name.

  4. On the Connection Details tab, copy the connection details that you need, such as Server hostname, Port, and HTTP path.

  5. For authenticating with service principle.

    • Add service principal permission to the SQL warehouse.

    • Add service principal permission to the Databricks Workspace.


If using Compute Cluster as Query Endpoint, get the connection details as follows:

  1. Log in to your Databricks workspace.

  2. In the sidebar, click Compute.

  3. In the list of available All-purpose compute, click the target Compute Cluster’s name.

  4. On the Configuration tab scroll-down to the Advanced options > JDBC/ODBC tab, copy the connection details that you need, such as Server hostname, Port, and HTTP path.

  5. For authenticating with service principle.

    • The target Cluster’s node should be of Multi node type.

    • The Access mode parameter for the target Cluster should be set to Shared.

      ClusterSharedMode

    • For the target Cluster, click on Edit Permissions (vertical ellipsis option), add the Service Principal (to be used for authentication) and save the updated permissions.

    • Set permission of the Service Principle on the Cluster.

      ClusterServicePrinciplePermission

    • Set all the required permission on the Catalog, Database, Schema, Tables and Volumes for the service principle.

      SP_CATALOG_Permission

    • Service principal permission should be set on the Workspace.


Connection Configuration

Configure the fields required to create the connection as explained below.

Connection Name

Name of the connection to be created.


Host Name

Hostname of the Databricks SQL Warehouse instance.

For example, .cloud.databricks.com

For more details, see Prerequisites.


Port

The port number associated with the Databricks SQL Warehouse instance. Default port value is 443.

For more details, see Prerequisites.


Query Endpoint

Gathr supports SQL Warehouse and Compute Cluster as the Query Endpoints. Please select the preferred compute resource.


HTTP Path

HTTP Path of the Databricks Query Endpoints.

For more details, see Prerequisites.


Authentication Type

Choose the authentication method for Gathr application when connecting to Databricks Query Endpoint.

Token: Use this option to authenticate using a personal access token.

Personal Access Token

The personal access token of your Databricks workspace user.

To know more about Databricks personal access tokens for workspace users, click here.


Service Principal: Use this option to authenticate using a Service Principal.

Client ID

Provide the unique identifier assigned to the Service Principal. You can find the Client ID for Service Principal by navigating to the identity and access settings section within the Databricks portal.

Client Secret

Provide the client secret generated for the Service Principal. You can find the Client Secret for Service Principal by navigating to the identity and access settings section within the Databricks portal.


Advanced Configurations

This section contains advanced configuration parameters.

Auto Start Query Endpoint

Enabling this option allows you to start the Databricks Query Endpoint at application runtime, if it is not running. This can be useful to ensure that the Query Endpoint is available when needed without manual intervention.

SQL Query Endpoint Action

Specify the action to be taken once the Query Endpoint is initiated.

Possible values are:

  • Start: The Query Endpoint is initiated and begins execution. But it remains operational even after the job is completed.

  • Start and Stop: The Query Endpoint is initiated and begins execution. However, it will automatically stop once the job is completed, optimizing resource utilization.

    Suppose multiple applications run simultaneously using the same Databricks connection with this option enabled. In that case, the automatic Query Endpoint termination after the first application’s completion can impact subsequent jobs from other applications. The Query Endpoint termination will cause them to fail.


After entering all the details, click on the TEST button.

If the connection service identification and authentication details are provided correctly, a success message stating “connection available” is generated. Click on the CREATE button to save the changes.

If the details are incorrect or the server is down, you will get a message “Connection unavailable”.

Top