Manage Databricks Clusters

User has options to create interactive clusters, perform actions like start, refresh, edit and delete clusters, view logs and redirect to spark UI.

Given below is an illustration of the Databricks Interactive Clusters page followed by the steps to create a cluster.

Databricks_Interactive

On the listing page under Action tab, user has options to start the cluster. Upon clicking the ellipses, under Action tab the available options are: Restart, Edit, User Permissions and Delete.

User has option to manage permissions of the existing Databricks clusters. Upon clicking User Permissions, the below screen will open:

Cluster_list_view_UserPermission

There are four permission levels for a cluster: No Permissions, Can Attach To, Can Restart, and Can Manage thereby enabling the cluster to be accessed by a set of users with specific permissions.

Note: User must comply with the below mentioned prerequisites for User Permissions in Databricks account:

  • Admin to enable below mentioned toggles for the workspace in its settings:

  • Under Access Control, ‘Control Jobs, and pools Access Control’ toggle must be enabled.

  • Under Access Control, ‘Cluster Visibility Control’ option must be enabled to allow listing of only those clusters to the user for which it has any permission.

  • Access control is available only in the Premium plan. To see the plans, refer to AWS Pricing.

Databricks Interactive Clusters

On the listing page, click CREATE CLUSTER to create Databricks cluster. Provide the below fields:

FieldDescription
Cluster PolicyA cluster policy defines limit on attributes available during cluster creations. Option to select the cluster policy created in Databricks to configure cluster that defines limits on the attributes available during the cluster creation. Default value is unrestricted. User have an option to download the cluster policy in the local system.
Cluster NameProvide a unique name for the cluster.
Cluster Mode

Select Cluster node from the below options available:

-Standard

-Single Node

The below option will be available upon selecting Single Node as Cluster Mode:

Databricks Runtime VersionProvide the Databricks runtime version for the core components that run on the clusters managed by Databricks.
Terminate AfterProvide value in minutes to terminate the cluster due to inativity.
Minutes of inactivityProvide value for minutes of inactivity upon selecting Terminate After option.

The below options will be visible upon selecting Standard as Cluster Mode

Databricks Runtime VersionProvide the Databricks runtime version for the core components that run on the clusters managed by Databricks.
Driver Type

Select the driver type from the drop down list. The available options are:

- Same as worker

- Pool

- Memory Optimized

Worker TypeSelect the worker type from the drop down list.
Enable Auto-scalingThis option is unchecked by default. Check the checkbox to enable auto-scaling the cluster between minimum and maximum number of nodes.

Provide value for below fields if auto scaling is enabled:

Min WorkersProvide value for minimum workers.
Max WorkersProvide value for maximum workers.
WorkersIf auto scaling is unchecked, then provide value for workers.
On-demand/Spot CompositionProvide the value for on-demand number of nodes.
Spot fall back to On-demandCheck the option to use the Spot fall back to On-demand option. If the EC2 spot price exceeds the bid, use On-demand instances instead of spot instances.
Terminate AfterProvide value in minutes to terminate the cluster due to inativity.
Minutes of inactivityProvide value for minutes of inactivity upon selecting Terminate After option.

Databricks Interactive Clusters

Given below is an illustration of the Databricks Interactive Clusters page followed by the steps to create a cluster.

Databricks_Interactive

Instances

Under Instances tab the below fields are avaialble:

FieldDescription
Availability ZoneCluster availability zones where the desired machine type may only be available in certain availability zones.

Spot Pricing: The cost of spot instances may vary between available zones

Reserved Instance: Select an availability zone to use reserved instances you may have.

Spot Bid PriceAWS spot instances are charged on the spot prices. The spot prices are determined by AWS marketplace. If the spot prices go above your bid price, then AWS will terminate your instances. The default spot bid price is set to the on-demand pricing.
EBS Volume TypeDatabricks provisions EBS volumes by default for efficient functioning of the cluster.
# VolumesThe number of volumes to provision for each instance. You can choose upto 10 volumes per instance.
Size in GBThe size of each EBS volumes (in GB) launched for each instance. For general purpose SSD, this value must be within the range 100-4096. For throughput optimized HDD, this value must be within the range 500-4096.
IAM RoleIAM roles allow you to access the data from Databricks clusters without the need to manage, deploy or rotate AWS keys.

Spark

Under Spark tab the below fields are avaialble:

FieldDescription
Spark ConfigIf you have used the following components in pipeline then make the respective settings:

S3: Select IAM Role with the necessary S3 permissions.

HDFS: Set HADOOP_USER_NAME=hadoop in spark onfigurations.

Hive: spark.sql.hive.metastore.version 2.3

spark.sql.hive.metastore.jars maven

hive.metastore.uris thrift://HiveMetastoreIP:9083

spark.hadoop.hive.metastore.uris

thrift://HiveMetastoreIP:9083

If you want to use hive components make sure the hive properties are configured at the time of cluster creation or modification under Spark > Spark Configuration.

Environment VariablesEnvironment variables or spark config details.

Tags

The tags are automatically added as key-value pair to clusters for Amazon tracking purposes. For details, click here .

SSH

Under Spark tab the below fields are avaialble:

FieldDescription
SSH Public KeyThe public key of the SSH key-pair to use in order to login to the driver or worker instances. Example: ssh-rsa public _key email@example.com

Logging

Under Spark tab the below fields are avaialble:

FieldDescription
DestinationAll cluster logs will be delivered to cloud storage every 5 minutes on a best effort basis.
Cluster Log PathThe cloud storage path where cluster logs should be delivered. Databricks will create sub folders with cluster_id under the destination and deliver logs there.

Init Scripts

Under the Init Scripts tab you have an option to add scripts by clicking athe the ADD SCRIPTS button.

Provide Type example: S3, Databricks Workspace etc., File Path, Region.

Databricks Job Clusters

Given below is an illustration of the Databricks Job Clusters page.

Databricks_Jobs

Top