Create Cluster

Click CREATE CLUSTER to create a fresh cluster. Provide the below fields for creating a new cluster.

FieldDescription
Cluster NameProvide a unique name for the cluster.
VPCSelect VPC for the cluster to be launched from where gathr is accessible.
Subnet IDSelect subnet with the cluster to be launched from where gathr is accessible.
Security GroupSelect security group for the cluster to be launched that has the required access to communication with gathr.
Security ConfigurationSelect security configuration from the drop down list of available options. Select None if security configuration is not required. Using this option you can configure data encryption, kerberos, and S3 authorization.
Service RoleSelect IAMRole to attach to EC2 instance from EMR pipeline cluster.
Job Flow RoleSelect IAMRole to attach to EC2 instance in EMR pipeline cluster.
Auto Scaling RoleSelect the IAMRole to auto scale the EMR cluster.
Custom AMI IdSelect or provide the ID of a Custom Amazon Linux AMI for the chosen cluster.
Root EBS Volume (GB)Provide Master EBS volume for the cluster (EBS volume for core or task node will be same as Master EBS volume).
EMR Managed ScalingUpon checking the option, the EMR will automatically adjust the number of EC2 instances required in core and task nodes based on workload. This option is unchecked by default.

Upon checking the EMR Managed Scaling option, provide values for the below fields:

Minimum UnitsProvide the minimum number of core or task units allowed in a cluster. Minimum value is 1.
Maximum UnitsProvide the maximum number of core or task units allowed in a cluster. Minimum value is 1.
Maximum On-Demand LimitProvide the maximum allowed core or task units for On-Demand market type in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 0.
Maximum Core UnitsProvide the maximum allowed core nodes in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 1.
Auto TerminationThis option is unchecked by default. Check this option for auto termination of cluster. Once the cluster becomes idle, it will terminate after the duration specified. Choose a minimum of one minute or a maximum of 24 hours value.
Steps ConcurrencyThis option is unchecked by default. Check this option to enable running multiple steps concurrently. Once the last step completes, the cluster will enter a waiting state.

Click FETCH CLUSTER FROM AWS option to fetch an existing cluster by selecting the cluster ID from the drop-down list.

Below options on the Create Cluster window are: Software Configuration, Tags, Master Nodes, Core Nodes, Task Nodes, SSH, Bootstrap Actions. These are explained below.

create_cluster_emr

Software Configuration

ReleaseSelect EMR for release version i.e, emr-6.10.0. You can choose the below configuration options by clicking the checkboxes against them. Hadoop 3.3.3, JupyterHub 1.5.0, Ganglia 3.7.2, Hive 3.1.3, JupyterEnterpriseGateway 2.6.0, Spark 3.3.1 can be configured amongst various other options available under this tab that can be selected as per business requirement.
Enter ConfigurationProvide configuration for any additional yarn properties to the cluster.

Tags

Add TagOption to tag the cluster’s configuration options. Provide value and Action(s) for tags.

Master Nodes

Instance TypeOption to select instance the for the master node. 30.5 GB Memory, 4 vCores, EBS only.
Instance CountOption to provide instance count for the master node.
Volume TypeOption to provide volume type for the master node.
EBS VolumeOption to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB.
IOPSOption to provide IOPS for the node.
Volumes per InstanceOption to provide number of EBS volume for the master node.
Node TypeSelect EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance).

Core Nodes

Instance TypeOption to select instance the for the core node. 30.5 GB Memory, 4 vCores, EBS only.
Instance CountOption to provide instance count for the core node.
Volume TypeOption to provide volume type for the core node.
EBS VolumeOption to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB.
IOPSOption to provide IOPS for the core node.
Volumes per InstanceOption to provide number of EBS volume for the core node.
Node TypeSelect EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance).
Enable AutoscalingSelect the checkbox to enable autoscaling option.
Minimum NodesProvide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).

Task Nodes

Instance TypeOption to select instance the for the task node. 30.5 GB Memory, 4 vCores, EBS only.
Instance CountOption to provide instance count for the task node.
Volume TypeOption to provide volume type for the task node.
EBS VolumeOption to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB.
IOPSOption to provide IOPS for the task node.
Volumes per InstanceOption to provide number of EBS volume for the task node.
Node TypeSelect EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance).
Enable AutoscalingSelect the checkbox to enable autoscaling option.
Minimum NodesProvide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).

SSH

EC2 Key Pair nameSelect the pem file to SSH into cluster.

Bootstrap

S3 PathOption to provide S3 path for bootstrap script locations.
Top