Manage EMR Clusters

User has options to create long running clusters, fetch clusters from AWS, perform actions like start, edit and delete clusters, view logs and redirect to spark UI.

EMR Long Running Clusters

Given below is an illustration of the EMR Long Running Clusters page.

aws_long_running

EMR Job Clusters

Given below is an illustration of the EMR Job Clusters page.

aws_jobs


Create Cluster

Click CREATE CLUSTER to create a fresh cluster. Provide the below fields for creating a new cluster.

FieldDescription
Cluster NameUnique name to identify the EMR cluster configuration should be provided.
VPCSelect VPC for the cluster to be launched from where gathr is accessible.
Subnet IDSelect subnet with the cluster to be launched from where gathr is accessible.
Security GroupSelect security group for the cluster to be launched that has the required access to communication with gathr.
Security ConfigurationSelect security configuration from the drop down list of available options. Select None if security configuration is not required. Using this option you can configure data encryption, kerberos, and S3 authorization.
Service RoleSelect IAMRole to attach to EC2 instance from EMR pipeline cluster.
Job Flow RoleSelect IAMRole to attach to EC2 instance in EMR pipeline cluster.
Auto Scaling RoleSelect the IAMRole to auto scale the EMR cluster.
Custom AMI IdSelect or provide the ID of a Custom Amazon Linux AMI for the chosen cluster.
Root EBS Volume (GB)Provide Master EBS volume for the cluster (EBS volume for core or task node will be same as Master EBS volume).
EMR Managed ScalingUpon checking the option, the EMR will automatically adjust the number of EC2 instances required in core and task nodes based on workload. This option is unchecked by default.

Upon checking the EMR Managed Scaling option, provide values for the below fields:

Minimum UnitsProvide the minimum number of core or task units allowed in a cluster. Minimum value is 1.
Maximum UnitsProvide the maximum number of core or task units allowed in a cluster. Minimum value is 1.
Maximum On-Demand LimitProvide the maximum allowed core or task units for On-Demand market type in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 0.
Maximum Core UnitsProvide the maximum allowed core nodes in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 1.
Auto TerminationThis option is unchecked by default. Check this option for auto termination of cluster. Once the cluster becomes idle, it will terminate after the duration specified. Choose a minimum of one minute or a maximum of 24 hours value.
Steps ConcurrencyThis option is unchecked by default. Check this option to enable running multiple steps concurrently. Once the last step completes, the cluster will enter a waiting state.

Click FETCH CLUSTER FROM AWS option to fetch an existing cluster by selecting the cluster ID from the drop-down list.

Below options on the Create Cluster window are:Software Configuration, Tags, Master Nodes, Core Nodes, Task Nodes, SSH, Bootstrap Actions. These are explained below.

create_cluster_emr

Software Configuration

ReleaseSelect EMR for release version i.e, emr-6.10.0. You can choose the below configuration options by clicking the checkboxes against them. Hadoop 3.3.3, JupyterHub 1.5.0, Ganglia 3.7.2, Hive 3.1.3, JupyterEnterpriseGateway 2.6.0, Spark 3.3.1 can be configured amongst various other options available under this tab that can be selected as per business requirement.
Enter ConfigurationProvide configuration for any additional yarn properties to the cluster.

Tags

Add TagCustomized tags can be added for the EMR cluster. Provide value and Action(s) for tags.

Master Nodes

Instance TypeOption to select instance the for the master node. 30.5 GB Memory, 4 vCores, EBS only.
Instance CountOption to provide instance count for the master node.
Volume TypeOption to provide volume type for the master node.
EBS VolumeOption to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB.
IOPSOption to provide IOPS for the node.
Volumes per InstanceOption to provide number of EBS volume for the master node.
Node TypeSelect EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance).

Core Nodes

Instance TypeOption to select instance the for the core node. 30.5 GB Memory,4vCores, EBS only.
Instance CountOption to provide instance count for the core node.
Volume TypeOption to provide volume type for the core node.
EBS VolumeOption to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB.
IOPSOption to provide IOPS for the core node.
Volumes per InstanceOption to provide number of EBS volume for the core node.
Node TypeSelect EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance).
Enable AutoscalingSelect the checkbox to enable autoscaling option.
Minimum NodesProvide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).

Scale Out Rules

AddProvide the number of EC2 instances to be added each time the autoscaling rule is triggered.
ifChoose the AWS CloudWatch metric that should be used to trigger autoscaling.
isEnter the threshold value and condition for the CloudWatch metric selected above.
forEnter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown periodThe time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
ADD RULEClick to add additional scale out rules.

Scale In Rules

Rule NameA name for the scale in rule should be provided.
TerminateProvide the number of EC2 instances to be terminated each time the autoscaling rule is triggered.
ifChoose the AWS CloudWatch metric that should be used to trigger autoscaling.
isEnter the threshold value and condition for the CloudWatch metric selected above.
forEnter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown periodThe time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
ADD RULEClick to add additional scale in rules.

Task Nodes

Instance TypeOption to select instance the for the task node. 30.5 GB Memory, 4 vCores, EBS only.
Instance CountOption to provide instance count for the task node.
Volume TypeOption to provide volume type for the task node.
EBS VolumeOption to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB.
IOPSOption to provide IOPS for the task node.
Volumes per InstanceOption to provide number of EBS volume for the task node.
Node TypeSelect EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance).
Enable AutoscalingSelect the checkbox to enable autoscaling option.
Minimum NodesProvide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).

Scale Out Rules

AddProvide the number of EC2 instances to be added each time the autoscaling rule is triggered.
ifChoose the AWS CloudWatch metric that should be used to trigger autoscaling.
isEnter the threshold value and condition for the CloudWatch metric selected above.
forEnter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown periodThe time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
ADD RULEClick to add additional scale out rules.

Scale In Rules

Rule NameA name for the scale in rule should be provided.
TerminateProvide the number of EC2 instances to be terminated each time the autoscaling rule is triggered.
ifChoose the AWS CloudWatch metric that should be used to trigger autoscaling.
isEnter the threshold value and condition for the CloudWatch metric selected above.
forEnter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown periodThe time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
ADD RULEClick to add additional scale in rules.

SSH

EC2 Key Pair nameSelect the pem file to SSH into cluster.

Bootstrap

S3 PathOption to provide S3 path for bootstrap script locations.
Top