domingo, dezembro 3, 2023

Introducing shared VPC assist on Amazon MWAA


On this put up, we reveal automating deployment of Amazon Managed Workflows for Apache Airflow (Amazon MWAA) utilizing customer-managed endpoints in a VPC, offering compatibility with shared, or in any other case restricted, VPCs.

Information scientists and engineers have made Apache Airflow a number one open supply instrument to create knowledge pipelines resulting from its lively open supply neighborhood, acquainted Python improvement as Directed Acyclic Graph (DAG) workflows, and intensive library of pre-built integrations. Amazon MWAA is a managed service for Airflow that makes it straightforward to run Airflow on AWS with out the operational burden of getting to handle the underlying infrastructure. For every Airflow setting, Amazon MWAA creates a single-tenant service VPC, which hosts the metadatabase that shops states and the online server that gives the person interface. Amazon MWAA additional manages Airflow scheduler and employee cases in a customer-owned and managed VPC, so as to schedule and run duties that work together with buyer assets. These Airflow containers within the buyer VPC entry assets within the service VPC through a VPC endpoint.

Many organizations select to centrally handle their VPC utilizing AWS Organizations, permitting a VPC in an proprietor account to be shared with assets in a special participant account. Nevertheless, as a result of creating a brand new route exterior of a VPC is taken into account a privileged operation, participant accounts can’t create endpoints in proprietor VPCs. Moreover, many purchasers don’t need to lengthen the safety privileges required to create VPC endpoints to all customers provisioning Amazon MWAA environments. Along with VPC endpoints, prospects additionally want to limit knowledge egress through Amazon Easy Queue Service (Amazon SQS) queues, and Amazon SQS entry is a requirement within the Amazon MWAA structure.

Shared VPC assist for Amazon MWAA provides the flexibility so that you can handle your personal endpoints inside your VPCs, including compatibility to shared and in any other case restricted VPCs. Specifying customer-managed endpoints additionally offers the flexibility to fulfill strict safety insurance policies by explicitly limiting VPC useful resource entry to simply these wanted by your Amazon MWAA environments. This put up demonstrates how customer-managed endpoints work with Amazon MWAA and offers examples of find out how to automate the provisioning of these endpoints.

Resolution overview

Shared VPC assist for Amazon MWAA permits a number of AWS accounts to create their Airflow environments into shared, centrally managed VPCs. The account that owns the VPC (proprietor) shares the 2 personal subnets required by Amazon MWAA with different accounts (members) that belong to the identical group from AWS Organizations. After the subnets are shared, the members can view, create, modify, and delete Amazon MWAA environments within the subnets shared with them.

When customers specify the necessity for a shared, or in any other case policy-restricted, VPC throughout setting creation, Amazon MWAA will first create the service VPC assets, then enter a pending state for as much as 72 hours, with an Amazon EventBridge notification of the change in state. This permits homeowners to create the required endpoints on behalf of members primarily based on endpoint service info from the Amazon MWAA console or API, or programmatically through an AWS Lambda perform and EventBridge rule, as within the instance on this put up.

After these endpoints are created on the proprietor account, the endpoint service within the single-tenant Amazon MWAA VPC will detect the endpoint connection occasion and resume setting creation. Ought to there be a difficulty, you may cancel setting creation by deleting the setting throughout this pending state.

This characteristic additionally means that you can take away the create, modify, and delete VPCE privileges from the AWS Identification and Entry Administration (IAM) principal creating Amazon MWAA environments, even when not utilizing a shared VPC, as a result of that permission will as an alternative be imposed on the IAM principal creating the endpoint (the Lambda perform in our instance). Moreover, the Amazon MWAA setting will present the SQS queue Amazon Useful resource Identify (ARN) utilized by the Airflow Celery Executor to queue duties (the Celery Executor Queue), permitting you to explicitly enter these assets into your community coverage slightly than having to offer a extra open and generalized permission.

On this instance, we create the VPC and Amazon MWAA setting in the identical account. For shared VPCs throughout accounts, the EventBridge rule and Lambda perform would exist within the proprietor account, and the Amazon MWAA setting can be created within the participant account. See Sending and receiving Amazon EventBridge occasions between AWS accounts for extra info.

Conditions

It’s best to have the next stipulations:

  • An AWS account
  • An AWS person in that account, with permissions to create VPCs, VPC endpoints, and Amazon MWAA environments
  • An Amazon Easy Storage Service (Amazon S3) bucket in that account, with a folder referred to as dags

Create the VPC

We start by making a restrictive VPC utilizing an AWS CloudFormation template, so as to simulate creating the required VPC endpoint and modifying the SQS endpoint coverage. If you wish to use an present VPC, you may proceed to the following part.

  1. On the AWS CloudFormation console, select Create stack and select With new assets (commonplace).
  2. Below Specify template, select Add a template file.
  3. Now we edit our CloudFormation template to limit entry to Amazon SQS. In cfn-vpc-private-bjs.yml, edit the SqsVpcEndoint part to look as follows:
   SqsVpcEndoint:
     Kind: AWS::EC2::VPCEndpoint
     Properties:
       ServiceName: !Sub "com.amazonaws.${AWS::Area}.sqs"
       VpcEndpointType: Interface
       VpcId: !Ref VPC
       PrivateDnsEnabled: true
       SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
       SecurityGroupIds:
        - !Ref SecurityGroup
       PolicyDocument:
        Assertion:
         - Impact: Permit
           Principal: '*'
           Motion: '*'
           Useful resource: []

This extra coverage doc entry prevents Amazon SQS egress to any useful resource not explicitly listed.

Now we will create our CloudFormation stack.

  1. On the AWS CloudFormation console, select Create stack.
  2. Choose Add a template file.
  3. Select Select file.
  4. Browse to the file you modified.
  5. Select Subsequent.
  6. For Stack title, enter MWAA-Surroundings-VPC.
  7. Select Subsequent till you attain the assessment web page.
  8. Select Submit.

Create the Lambda perform

Now we have two choices for self-managing our endpoints: guide and automatic. On this instance, we create a Lambda perform that responds to the Amazon MWAA EventBridge notification. You would additionally use the EventBridge notification to ship an Amazon Easy Notification Service (Amazon SNS) message, similar to an e mail, to somebody with permission to create the VPC endpoint manually.

First, we create a Lambda perform to answer the EventBridge occasion that Amazon MWAA will emit.

  1. On the Lambda console, select Create perform.
  2. For Identify, enter mwaa-create-lambda.
  3. For Runtime, select Python 3.11.
  4. Select Create perform.
  5. For Code, within the Code supply part, for lambda_function, enter the next code:
    import boto3
    import json
    import logging
    
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    
    def lambda_handler(occasion, context):
        if occasion['detail']['status']=="PENDING":
            element=occasion['detail']
            title=element['name']
            celeryExecutorQueue=element['celeryExecutorQueue']
            subnetIds=element['networkConfiguration']['subnetIds']
            securityGroupIds=element['networkConfiguration']['securityGroupIds']
            databaseVpcEndpointService=element['databaseVpcEndpointService']
    
            # MWAA doesn't must retailer the VPC ID, however we will get it from the subnets
            shopper = boto3.shopper('ec2')
            response = shopper.describe_subnets(SubnetIds=subnetIds)
            logger.information(response['Subnets'][0]['VpcId'])  
            vpcId=response['Subnets'][0]['VpcId']
            logger.information("vpcId: " + vpcId)       
            
            webserverVpcEndpointService=None
            if element['webserverAccessMode']=="PRIVATE_ONLY":
                webserverVpcEndpointService=occasion['detail']['webserverVpcEndpointService']
            
            response = shopper.describe_vpc_endpoints(
                VpcEndpointIds=[],
                Filters=[
                    {"Name": "vpc-id", "Values": [vpcId]},
                    {"Identify": "service-name", "Values": ["*.sqs"]},
                    ],
                MaxResults=1000
            )
            sqsVpcEndpoint=None
            for r in response['VpcEndpoints']:
                if subnetIds[0] in r['SubnetIds'] or subnetIds[0] in r['SubnetIds']:
                    # We're filtering describe by service title, so this should be SQS
                    sqsVpcEndpoint=r
                    break
            
            if sqsVpcEndpoint:
                logger.information("Discovered SQS endpoint: " + sqsVpcEndpoint['VpcEndpointId'])
    
                logger.information(sqsVpcEndpoint)
                pd = json.masses(sqsVpcEndpoint['PolicyDocument'])
                for s in pd['Statement']:
                    if s['Effect']=='Permit':
                        useful resource = s['Resource']
                        logger.information(useful resource)
                        if '*' in useful resource:
                            logger.information("'*' already allowed")
                        elif celeryExecutorQueue in useful resource: 
                            logger.information("'"+celeryExecutorQueue+"' already allowed")                
                        else:
                            s['Resource'].append(celeryExecutorQueue)
                            logger.information("Updating SQS coverage to " + str(pd))
            
                            shopper.modify_vpc_endpoint(
                                VpcEndpointId=sqsVpcEndpoint['VpcEndpointId'],
                                PolicyDocument=json.dumps(pd)
                                )
                        break
            
            # create MWAA database endpoint
            logger.information("creating endpoint to " + databaseVpcEndpointService)
            endpointName=title+"-database"
            response = shopper.create_vpc_endpoint(
                VpcEndpointType="Interface",
                VpcId=vpcId,
                ServiceName=databaseVpcEndpointService,
                SubnetIds=subnetIds,
                SecurityGroupIds=securityGroupIds,
                TagSpecifications=[
                    {
                        "ResourceType": "vpc-endpoint",
                        "Tags": [
                            {
                                "Key": "Name",
                                "Value": endpointName
                            },
                        ]
                    },
                ],           
            )
            logger.information("created VPCE: " + response['VpcEndpoint']['VpcEndpointId'])
                
            # create MWAA net server endpoint (if personal)
            if webserverVpcEndpointService:
                endpointName=title+"-webserver"
                logger.information("creating endpoint to " + webserverVpcEndpointService)
                response = shopper.create_vpc_endpoint(
                    VpcEndpointType="Interface",
                    VpcId=vpcId,
                    ServiceName=webserverVpcEndpointService,
                    SubnetIds=subnetIds,
                    SecurityGroupIds=securityGroupIds,
                    TagSpecifications=[
                        {
                            "ResourceType": "vpc-endpoint",
                            "Tags": [
                                {
                                    "Key": "Name",
                                    "Value": endpointName
                                },
                            ]
                        },
                    ],                  
                )
                logger.information("created VPCE: " + response['VpcEndpoint']['VpcEndpointId'])
    
        return {
            'statusCode': 200,
            'physique': json.dumps(occasion['detail']['status'])
        }

  6. Select Deploy.
  7. On the Configuration tab of the Lambda perform, within the Basic configuration part, select Edit.
  8. For Timeout, increate to five minutes, 0 seconds.
  9. Select Save.
  10. Within the Permissions part, underneath Execution function, select the function title to edit the permissions of this perform.
  11. For Permission insurance policies, select the hyperlink underneath Coverage title.
  12. Select Edit and add a comma and the next assertion:
    {
    		"Sid": "Statement1",
    		"Impact": "Permit",
    		"Motion": 
    		[
    			"ec2:DescribeVpcEndpoints",
    			"ec2:CreateVpcEndpoint",
    			"ec2:ModifyVpcEndpoint",
                "ec2:DescribeSubnets",
    			"ec2:CreateTags"
    		],
    		"Useful resource": 
    		[
    			"*"
    		]
    }

The whole coverage ought to look just like the next:

{
	"Model": "2012-10-17",
	"Assertion": [
		{
			"Effect": "Allow",
			"Action": "logs:CreateLogGroup",
			"Resource": "arn:aws:logs:us-east-1:112233445566:*"
		},
		{
			"Effect": "Allow",
			"Action": [
				"logs:CreateLogStream",
				"logs:PutLogEvents"
			],
			"Useful resource": [
				"arn:aws:logs:us-east-1:112233445566:log-group:/aws/lambda/mwaa-create-lambda:*"
			]
		},
		{
			"Sid": "Statement1",
			"Impact": "Permit",
			"Motion": [
				"ec2:DescribeVpcEndpoints",
				"ec2:CreateVpcEndpoint",
				"ec2:ModifyVpcEndpoint",
               	"ec2:DescribeSubnets",
				"ec2:CreateTags"
			],
			"Useful resource": [
				"*"
			]
		}
	]
}

  1. Select Subsequent till you attain the assessment web page.
  2. Select Save modifications.

Create an EventBridge rule

Subsequent, we configure EventBridge to ship the Amazon MWAA notifications to our Lambda perform.

  1. On the EventBridge console, select Create rule.
  2. For Identify, enter mwaa-create.
  3. Choose Rule with an occasion sample.
  4. Select Subsequent.
  5. For Creation methodology, select Consumer sample type.
  6. Select Edit sample.
  7. For Occasion sample, enter the next:
    {
      "supply": ["aws.airflow"],
      "detail-type": ["MWAA Environment Status Change"]
    }

  8. Select Subsequent.
  9. For Choose a goal, select Lambda perform.

You might also specify an SNS notification so as to obtain a message when the setting state changes.

  1. For Operate, select mwaa-create-lambda.
  2. Select Subsequent till you attain the ultimate part, then select Create rule.

Create an Amazon MWAA setting

Lastly, we create an Amazon MWAA setting with customer-managed endpoints.

  1. On the Amazon MWAA console, select Create setting.
  2. For Identify, enter a novel title on your setting.
  3. For Airflow model, select the most recent Airflow model.
  4. For S3 bucket, select Browse S3 and select your S3 bucket, or enter the Amazon S3 URI.
  5. For DAGs folder, select Browse S3 and select the dags/ folder in your S3 bucket, or enter the Amazon S3 URI.
  6. Select Subsequent.
  7. For Digital Personal Cloud, select the VPC you created earlier.
  8. For Net server entry, select Public community (Web accessible).
  9. For Safety teams, deselect Create new safety group.
  10. Select the shared VPC safety group created by the CloudFormation template.

As a result of the safety teams of the AWS PrivateLink endpoints from the sooner step are self-referencing, it’s essential to select the identical safety group on your Amazon MWAA setting.

  1. For Endpoint administration, select Buyer managed endpoints.
  2. Preserve the remaining settings as default and select Subsequent.
  3. Select Create setting.

When your setting is offered, you may entry it through the Open Airflow UI hyperlink on the Amazon MWAA console.

Clear up

Cleansing up assets that aren’t actively getting used reduces prices and is a greatest observe. Should you don’t delete your assets, you may incur extra costs. To wash up your assets, full the next steps:

  1. Delete your Amazon MWAA setting, EventBridge rule, and Lambda perform.
  2. Delete the VPC endpoints created by the Lambda perform.
  3. Delete any safety teams created, if relevant.
  4. After the above assets have accomplished deletion, delete the CloudFormation stack to make sure that you could have eliminated the entire remaining assets.

Abstract

This put up described find out how to automate setting creation with shared VPC assist in Amazon MWAA. This offers you the flexibility to handle your personal endpoints inside your VPC, including compatibility to shared, or in any other case restricted, VPCs. Specifying customer-managed endpoints additionally offers the flexibility to fulfill strict safety insurance policies by explicitly limiting VPC useful resource entry to simply these wanted by their Amazon MWAA environments. To be taught extra about Amazon MWAA, seek advice from the Amazon MWAA Consumer Information. For extra posts about Amazon MWAA, go to the Amazon MWAA assets web page.


In regards to the writer

John Jackson has over 25 years of software program expertise as a developer, programs architect, and product supervisor in each startups and enormous companies and is the AWS Principal Product Supervisor liable for Amazon MWAA.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles