Create Cluster
Click CREATE CLUSTER to create a fresh cluster. Provide the below fields for creating a new cluster.
Field | Description |
---|---|
Cluster Name | Unique name to identify the EMR cluster configuration should be provided. |
VPC | Select VPC for the cluster to be launched from where gathr is accessible. |
Subnet ID | Select subnet with the cluster to be launched from where gathr is accessible. |
Security Group | Select security group for the cluster to be launched that has the required access to communication with gathr. |
Security Configuration | Select security configuration from the drop down list of available options. Select None if security configuration is not required. Using this option you can configure data encryption, kerberos, and S3 authorization. |
Service Role | Select IAMRole to attach to EC2 instance from EMR pipeline cluster. |
Job Flow Role | Select IAMRole to attach to EC2 instance in EMR pipeline cluster. |
Auto Scaling Role | Select the IAMRole to auto scale the EMR cluster. |
Custom AMI Id | Select or provide the ID of a Custom Amazon Linux AMI for the chosen cluster. |
Root EBS Volume (GB) | Provide Master EBS volume for the cluster (EBS volume for core or task node will be same as Master EBS volume). |
EMR Managed Scaling | Upon checking the option, the EMR will automatically adjust the number of EC2 instances required in core and task nodes based on workload. This option is unchecked by default. |
Upon checking the EMR Managed Scaling option, provide values for the below fields:
Minimum Units | Provide the minimum number of core or task units allowed in a cluster. Minimum value is 1. |
Maximum Units | Provide the maximum number of core or task units allowed in a cluster. Minimum value is 1. |
Maximum On-Demand Limit | Provide the maximum allowed core or task units for On-Demand market type in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 0. |
Maximum Core Units | Provide the maximum allowed core nodes in a cluster. If this parameter is not specified, it defaults to maximum units value. Minimum value is 1. |
Auto Termination | This option is unchecked by default. Check this option for auto termination of cluster. Once the cluster becomes idle, it will terminate after the duration specified. Choose a minimum of one minute or a maximum of 24 hours value. |
Steps Concurrency | This option is unchecked by default. Check this option to enable running multiple steps concurrently. Once the last step completes, the cluster will enter a waiting state. |
Click FETCH CLUSTER FROM AWS option to fetch an existing cluster by selecting the cluster ID from the drop-down list.
Below options on the Create Cluster window are:Software Configuration, Tags, Master Nodes, Core Nodes, Task Nodes, SSH, Bootstrap Actions. These are explained below.
Software Configuration
Release | Select EMR for release version i.e, emr-6.10.0. You can choose the below configuration options by clicking the checkboxes against them. Hadoop 3.3.3, JupyterHub 1.5.0, Ganglia 3.7.2, Hive 3.1.3, JupyterEnterpriseGateway 2.6.0, Spark 3.3.1 can be configured amongst various other options available under this tab that can be selected as per business requirement. |
Enter Configuration | Provide configuration for any additional yarn properties to the cluster. |
Tags
Add Tag | Customized tags can be added for the EMR cluster. Provide value and Action(s) for tags. |
Master Nodes
Instance Type | Option to select instance the for the master node. 30.5 GB Memory, 4 vCores, EBS only. |
Instance Count | Option to provide instance count for the master node. |
Volume Type | Option to provide volume type for the master node. |
EBS Volume | Option to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB. |
IOPS | Option to provide IOPS for the node. |
Volumes per Instance | Option to provide number of EBS volume for the master node. |
Node Type | Select EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance). |
Core Nodes
Instance Type | Option to select instance the for the core node. 30.5 GB Memory,4vCores, EBS only. |
Instance Count | Option to provide instance count for the core node. |
Volume Type | Option to provide volume type for the core node. |
EBS Volume | Option to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB. |
IOPS | Option to provide IOPS for the core node. |
Volumes per Instance | Option to provide number of EBS volume for the core node. |
Node Type | Select EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance). |
Enable Autoscaling | Select the checkbox to enable autoscaling option. |
Minimum Nodes | Provide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s). |
Scale Out Rules
Add | Provide the number of EC2 instances to be added each time the autoscaling rule is triggered. |
if | Choose the AWS CloudWatch metric that should be used to trigger autoscaling. |
is | Enter the threshold value and condition for the CloudWatch metric selected above. |
for | Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period. |
Cooldown period | The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed. |
ADD RULE | Click to add additional scale out rules. |
Scale In Rules
Rule Name | A name for the scale in rule should be provided. |
Terminate | Provide the number of EC2 instances to be terminated each time the autoscaling rule is triggered. |
if | Choose the AWS CloudWatch metric that should be used to trigger autoscaling. |
is | Enter the threshold value and condition for the CloudWatch metric selected above. |
for | Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period. |
Cooldown period | The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed. |
ADD RULE | Click to add additional scale in rules. |
Task Nodes
Instance Type | Option to select instance the for the task node. 30.5 GB Memory, 4 vCores, EBS only. |
Instance Count | Option to provide instance count for the task node. |
Volume Type | Option to provide volume type for the task node. |
EBS Volume | Option to provide EBS volume type. EBS volume should either be 0 GiB, or between 15-100 GiB. |
IOPS | Option to provide IOPS for the task node. |
Volumes per Instance | Option to provide number of EBS volume for the task node. |
Node Type | Select EC2 Instance Type according to Pricing model. On Demand/Spot(Provide the Spot Bid Price i.e., the bid price for spot instance). |
Enable Autoscaling | Select the checkbox to enable autoscaling option. |
Minimum Nodes | Provide minimum number of nodes for auto scaling. Provide values for Scale Out Rules and Scale In Rules. You can also add further Rule(s). |
Scale Out Rules
Add | Provide the number of EC2 instances to be added each time the autoscaling rule is triggered. |
if | Choose the AWS CloudWatch metric that should be used to trigger autoscaling. |
is | Enter the threshold value and condition for the CloudWatch metric selected above. |
for | Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period. |
Cooldown period | The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed. |
ADD RULE | Click to add additional scale out rules. |
Scale In Rules
Rule Name | A name for the scale in rule should be provided. |
Terminate | Provide the number of EC2 instances to be terminated each time the autoscaling rule is triggered. |
if | Choose the AWS CloudWatch metric that should be used to trigger autoscaling. |
is | Enter the threshold value and condition for the CloudWatch metric selected above. |
for | Enter the number of consecutive five-minute periods over which the metric data will be compared to the threshold. Autoscaling will be triggered if the condition is met for each consecutive period. |
Cooldown period | The time specified will be the cool-down time taken to start the next scaling activity after an ongoing scaling activity is completed. |
ADD RULE | Click to add additional scale in rules. |
SSH
EC2 Key Pair name | Select the pem file to SSH into cluster. |
Bootstrap
S3 Path | Option to provide S3 path for bootstrap script locations. |
If you have any feedback on Gathr documentation, please email us!