AWS Compute Setup in Gathr

This topic covers information on how you can leverage AWS PrivateLink to run Gathr applications in the region of your choice.

The compute setup feature allows you to associate your cloud accounts with Gathr.

By doing so, you can run the Gathr applications on clusters that belong to the cloud accounts that you register.

You can link your AWS accounts with Gathr and use VPC endpoint service to securely manage data traffic while running Gathr applications.

Compute_Diagram

You can add AWS accounts with Gathr and setup compute environments in few simple steps. Available with AWS multi-region support, once registered you can bring your own compute environments in Gathr to run Data Ingestion, CDC and Advanced ETL applications.


Prerequisites

There are certain prerequisites that needs to be fulfilled before using EMR clusters as registered compute environments in Gathr:

  • User must have a Gathr account and at least one AWS account.
  • User must have permissions to create VPC instances, new policies and roles in AWS console.
  • The following roles must be created in the user’s AWS account:
    • EMR_DefaultRole
    • EMR_EC2_DefaultRole
    • EMR_AutoScaling_DefaultRole
    • AWSServiceRoleForEMRCleanup

Create EMR Roles

Before launching an EMR Cluster in Consumer’s AWS Account, specific EMR service roles must be created in the AWS account.

While launching EMR Clusters from Gathr, these roles need to be provided as part of configuration options but will only be visible if these roles exist in the Consumer’s AWS Account.

These roles are used only by the EMR service in your AWS account to create and terminate the clusters and access other AWS services as per the requirements of the pipeline executing in the EMR clusters.

Steps to create the required EMR roles

Sign in to the AWS Management Console and open the IAM console.

Follow the steps given below for creation of each role.


EMR_DefaultRole

Steps to create the EMR_DefaultRole

  1. In the navigation pane of the console, choose Roles and then, Create role.

    Create_EMR_Roles_1

  2. Select AWS services Radio Button.

    Select_EMR_Option

  3. Go to Use cases for other AWS services.

  4. Search EMR and select the option EMR.

  5. Click Next.

  6. On the Add permissions page, click Next.

    Add_Permissions

  7. Enter the Role name as EMR_DefaultRole and click Create role.

    Create_EMR_Role1


EMR_EC2_DefaultRole

Steps to create the EMR_EC2_DefaultRole

  1. In the navigation pane of the console, choose Roles and then, Create role.

    Create_EMR_Roles_1

  2. Select AWS services Radio Button.

    EMR_Role_For_EC2

  3. Go to Use cases for other AWS services.

  4. Search EMR and select the option EmrRole For EC2.

  5. Click Next.

  6. On the Add permissions page, click Next.

    Add_Permissions_EC2

  7. Enter the Role name as EMR_EC2_DefaultRole and click Create role.

    Create_EMR_EC2_Default_Role


EMR_AutoScaling_DefaultRole

Steps to create the EMR_AutoScaling_DefaultRole

  1. In the navigation pane of the console, choose Roles and then, Create role.

    Create_EMR_Roles_1

  2. Select AWS services radio button.

  3. Select the EC2 option and click Next.

    Create_EMR_Roles_3

  4. On the Add Permissions page, search and add the policy AmazonElasticMapReduceforAutoScalingRole and click Next.

    Add_Permissions_role_3

  5. Enter role name EMR_AutoScaling_DefaultRole and click Create role.

    Click_Create_EMR_Role_3

Modify the trust relationship for the role as follows:

  1. In the Roles dashboard, search the role by name EMR_AutoScaling_DefaultRole.

    Update_Trust_Policy_Role3

  2. Open this role and then click on Edit trust policy.

    Edit_Trust_Policy_Role3

  3. Replace the existing JSON file with the following configuration JSON.

    {
        “Version”: “2012-10-17”,
        “Statement”: \[
        {
            “Effect”: “Allow”,
            “Principal”: {
                “Service”: \[
                            “application-autoscaling.amazonaws.com”,
                            “elasticmapreduce.amazonaws.com”
                            \]
                        },
            “Action”: “sts:AssumeRole”
        }
        \]
    }
    
  4. Click on Update policy.

    Click_Update_Policy_Role3


AWSServiceRoleForEMRCleanup

Steps to create the AWSServiceRoleForEMRCleanup

  1. In the navigation pane of the console, choose Roles and then, Create role.

    Create_EMR_Roles_1

  2. Select AWS services radio button.

  3. Go to Use cases for other AWS services.

  4. Search EMR and select the option EMR-Cleanup.

    EMR_Cleanup_Role

  5. Click Next.

  6. On the Add permissions page, click Next.

    Add_Permissions_EMR_Cleanup_Role

  7. A default role name will we there, click Create role.

    Click_Create_EMR_Cleanup_Role


If you do not have a security group for Gathr, create one by following the steps given here.

Steps to Update Security Group

Update the AWS PrivateLink Endpoint Security Group in AWS Console as follows:

  1. Navigate to VPC > Security groups.

    Security_Groups

  2. Search and open the preferred for Gathr.

  3. Click the option to Edit Inbound rules for the security group.

  4. Select the rule type as Custom TCP and add the following rules:

    • Custom TCP: 12348
    • Custom TCP: 31218
    • Custom TCP: 32900
    • Custom TCP: 36627

All 4 rules are self-referencing (enter same security group ID of this security group).

Edit_Inbound_rules

  1. Click on Save rules.
Top