Manage EMR Clusters

User has options to create long running clusters, fetch clusters from AWS, perform actions like start, edit and delete clusters, view logs and redirect to spark UI.

EMR Long Running Clusters

Given below is an illustration of the EMR Long Running Clusters page.

EMR Job Clusters

Given below is an illustration of the EMR Job Clusters page.

Create Cluster

Click CREATE CLUSTER to create a fresh cluster. Provide the below fields for creating a new cluster.

Field	Description
Cluster Name	Unique name to identify the EMR cluster configuration should be provided.
VPC	Select VPC for the cluster to be launched from where gathr is accessible.
Subnet ID	Select subnet with the cluster to be launched from where gathr is accessible.
Security Group	Select security group for the cluster to be launched that has the required access to communication with gathr.
Security Configuration	Select security configuration from the drop down list of available options. Select None if security configuration is not required. Using this option you can configure data encryption, kerberos, and S3 authorization.
Service Role	Select IAMRole to attach to EC2 instance from EMR pipeline cluster.
Job Flow Role	Select IAMRole to attach to EC2 instance in EMR pipeline cluster.
Auto Scaling Role	Select the IAMRole to auto scale the EMR cluster.
Custom AMI Id	Select or provide the ID of a Custom Amazon Linux AMI for the chosen cluster.
Root EBS Volume (GB)	Provide Master EBS volume for the cluster (EBS volume for core or task node will be same as Master EBS volume).
EMR Managed Scaling	Upon checking the option, the EMR will automatically adjust the number of EC2 instances required in core and task nodes based on workload. This option is unchecked by default.

Upon checking the EMR Managed Scaling option, provide values for the below fields:


Minimum Units	Provide the minimum number of core or task units allowed in a cluster. Minimum value is 1.
Maximum Units	Provide the maximum number of core or task units allowed in a cluster. Minimum value is 1.
Maximum On-Demand Limit	Provide the maximum allowed core or task units for On-Demand market type in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 0.
Maximum Core Units	Provide the maximum allowed core nodes in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 1.
Auto Termination	This option is unchecked by default. Check this option for auto termination of cluster. Once the cluster becomes idle, it will terminate after the duration specified. Choose a minimum of one minute or a maximum of 24 hours value.
Steps Concurrency	This option is unchecked by default. Check this option to enable running multiple steps concurrently. Once the last step completes, the cluster will enter a waiting state.

Click FETCH CLUSTER FROM AWS option to fetch an existing cluster by selecting the cluster ID from the drop-down list.

Below options on the Create Cluster window are:Software Configuration, Tags, Master Nodes, Core Nodes, Task Nodes, SSH, Bootstrap Actions. These are explained below.

Software Configuration


Release	Select EMR for release version i.e, emr-6.10.0. You can choose the below configuration options by clicking the checkboxes against them. Hadoop 3.3.3, JupyterHub 1.5.0, Ganglia 3.7.2, Hive 3.1.3, JupyterEnterpriseGateway 2.6.0, Spark 3.3.1 can be configured amongst various other options available under this tab that can be selected as per business requirement.
Enter Configuration	Provide configuration for any additional yarn properties to the cluster.

Tags


Add Tag	Customized tags can be added for the EMR cluster. Provide value and Action(s) for tags.

Master Nodes


Instance Type	Option to select instance the for the master node. 30.5 GB Memory, 4 vCores, EBS only.
Instance Count	Option to provide instance count for the master node.
Volume Type	Option to provide volume type for the master node.
EBS Volume	Option to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB.
IOPS	Option to provide IOPS for the node.
Volumes per Instance	Option to provide number of EBS volume for the master node.
Node Type	Select EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance).

Core Nodes


Instance Type	Option to select instance the for the core node. 30.5 GB Memory,4vCores, EBS only.
Instance Count	Option to provide instance count for the core node.
Volume Type	Option to provide volume type for the core node.
EBS Volume	Option to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB.
IOPS	Option to provide IOPS for the core node.
Volumes per Instance	Option to provide number of EBS volume for the core node.
Node Type	Select EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance).
Enable Autoscaling	Select the checkbox to enable autoscaling option.
Minimum Nodes	Provide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).

Scale Out Rules


Add	Provide the number of EC2 instances to be added each time the autoscaling rule is triggered.
if	Choose the AWS CloudWatch metric that should be used to trigger autoscaling.
is	Enter the threshold value and condition for the CloudWatch metric selected above.
for	Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown period	The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
ADD RULE	Click to add additional scale out rules.

Scale In Rules


Rule Name	A name for the scale in rule should be provided.
Terminate	Provide the number of EC2 instances to be terminated each time the autoscaling rule is triggered.
if	Choose the AWS CloudWatch metric that should be used to trigger autoscaling.
is	Enter the threshold value and condition for the CloudWatch metric selected above.
for	Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown period	The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
ADD RULE	Click to add additional scale in rules.

Task Nodes


Instance Type	Option to select instance the for the task node. 30.5 GB Memory, 4 vCores, EBS only.
Instance Count	Option to provide instance count for the task node.
Volume Type	Option to provide volume type for the task node.
EBS Volume	Option to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB.
IOPS	Option to provide IOPS for the task node.
Volumes per Instance	Option to provide number of EBS volume for the task node.
Node Type	Select EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance).
Enable Autoscaling	Select the checkbox to enable autoscaling option.
Minimum Nodes	Provide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s).

Scale Out Rules


Add	Provide the number of EC2 instances to be added each time the autoscaling rule is triggered.
if	Choose the AWS CloudWatch metric that should be used to trigger autoscaling.
is	Enter the threshold value and condition for the CloudWatch metric selected above.
for	Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown period	The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
ADD RULE	Click to add additional scale out rules.

Scale In Rules


Rule Name	A name for the scale in rule should be provided.
Terminate	Provide the number of EC2 instances to be terminated each time the autoscaling rule is triggered.
if	Choose the AWS CloudWatch metric that should be used to trigger autoscaling.
is	Enter the threshold value and condition for the CloudWatch metric selected above.
for	Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period.
Cooldown period	The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed.
ADD RULE	Click to add additional scale in rules.

SSH


EC2 Key Pair name	Select the pem file to SSH into cluster.

Bootstrap


S3 Path	Option to provide S3 path for bootstrap script locations.

If you have any feedback on Gathr documentation, please email us!

Manage EMR Clusters

EMR Long Running Clusters #

EMR Job Clusters #

Create Cluster #

EMR Long Running Clusters

EMR Job Clusters

Create Cluster