Reduce Costs with Scheduled EC2 Instance Shutdowns on AWS

It is often still necessary to run certain applications on an EC2 instance, particularly if it is not easily containerised, and you do not want to pay extra for a managed service. In a non-production environment, you will be able to reduce costs by shutting down these instances when they are not needed by your developers or users.

Off-the-shelf scheduling with an AWS template

The team at AWS has already put together a CloudFormation template you can deploy. The documentation and deployment guide is available here: https://docs.aws.amazon.com/solutions/latest/instance-scheduler/overview.html

It uses a new command line interface to initialise and configure a schedule. This metadata is stored in DynamoDB, and a Lambda function will be run every few minutes to ensure the rules get applied to your instances at the correct times. It is scalable but can seem a bit heavy if you don't need all the features.

A custom but lightweight alternative

The previous template may be too heavy for some. As a simple alternative, you can make something similar with your own Lambda function, albeit with reduced functionality. This approach is suited to non-production environments where developers have permission to start instances manually when they need them, and have those instances stop automatically at the end of the day.

This approach is just a serverless application deployed with AWS Serverless Application Model (SAM). The two key resources are a Lambda function, and a Cloudwatch event rule. With SAM you can specify one template.yaml and just have the one resource, a Type: AWS::Serverless::Function. This resource will neatly wrap up your function's code, event triggers, and IAM policies and make them ready for deployment. Here is the entire template used for this shutdown app:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
  ec2-autoshutdown

  SAM Template for ec2-autoshutdown

Resources:
  LambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ec2autoshutdown/
      Handler: app.lambda_handler
      Runtime: python3.8
      Events:
        Schedule1:
          Type: Schedule
          Properties:
            Description: Schedule for shutdown (7:30/UTC = 5:30pm/+10)
            Enabled: True
            Schedule: cron(30 7 * * ? *)
      Policies:
        - !Ref EC2AccessPolicy
      Environment:
        Variables:
          SHUTDOWN_TAG_NAME: 'AutoShutdown'
          WAIT_TIMEOUT_SECS: 10
      MemorySize: 256
      Timeout: 900

  EC2AccessPolicy:
    Type: AWS::IAM::ManagedPolicy
    Properties:
      Description: Allows access and state management of EC2 instances
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Action:
              - 'ec2:Get*'
              - 'ec2:Describe*'
              - 'ec2:StopInstances'
            Resource: '*'

Now onto the application itself. It gets initiated by a CloudWatch Events Rule, with a daily time configured in the template above. It will request a list of all running EC2 instances, then check each one for the tag key AutoShutdown. If its associated value is True, its ID gets saved along with any others that also match the criteria. Once all other instances have been checked, it will send a single request to stop all those instances.

The application code is written in Python and uses the standard boto3 library for interacting with AWS services. The environment variables above are given to the function below on load, so it knows which tag to look for, and how long to wait.

import json
import boto3
from time import sleep
import os

REGION = os.environ['AWS_REGION']
WAIT_TIMEOUT = os.environ['WAIT_TIMEOUT_SECS']
SHUTDOWN_TAG_NAME = os.environ['SHUTDOWN_TAG_NAME']


def lambda_handler(event, context):

    ec2 = boto3.client('ec2', region_name=REGION)

    response_describe = ec2.describe_instances(Filters=[{
        'Name': 'instance-state-name',
        'Values': ['running']
    }])['Reservations']

    shutdown_list = list()

    for res in response_describe:
        for instance in res.get('Instances', []):
            ec2_id = instance.get('InstanceId')
            taglist = instance.get('Tags', [])

            ec2_has_true_shutdown_tag = False
            ec2_name = None
            for tag in taglist:
                if tag.get('Key') == SHUTDOWN_TAG_NAME:
                    if tag.get('Value') == 'True':
                        ec2_has_true_shutdown_tag = True
                if tag.get('Key').startswith('Name'):
                    ec2_name = tag.get('Value')

            if not ec2_name:
                ec2_name = 'Unnamed'

            if ec2_has_true_shutdown_tag:
                shutdown_list.append(ec2_id)
                print('Affected instance: {} ({})'.format(ec2_name, ec2_id))

    if shutdown_list:
        print("\nShutting down the following instances in {} seconds: {}".format(WAIT_TIMEOUT, ', '.join(shutdown_list)))
        sleep(int(WAIT_TIMEOUT))

        response_shutdown = ec2.stop_instances(InstanceIds=shutdown_list)
        print(response_shutdown)

        return {
            "statusCode": 200,
            "body": json.dumps(response_shutdown, default=str)
        }
    else:
        print("No instances to shut down.")

To deploy using SAM, save the code blocks above to template.yaml and ec2shutdown/app.py respectively then run the following two commands:

sam build --use-container
sam deploy --guided

This will create the resources in AWS for you, and you can tinker with the YAML parameters without having to change the Python app code. When it runs, we get a simple event in the logs like this, followed by the actual API response from the stop_instances request:

Affected instance: ENV-TEST-APP-EC2 (i-0bc99...)
Affected instance: ENV-TEST-DATABASE-EC2 (i-0a239...)
Shutting down the following instances in 10 seconds: i-0bc99..., i-0a239...

This simple output will help developers check that the right instances are getting shut down.

But the next day when you start up the instance again, what if your application doesn't start?

Ensuring your application runs on instance boot

We also need to reduce the time spent starting instances that have been shut down, especially if the application doesn't automatically start for us.

Use systemd

If you are running a Linux application, you may be able to configure the built-in service manager systemd to run your application on boot as a system service. All you really need to do is change some sample values (Description, User/Group, ExecStart) in this example in a text editor, then save the file as /etc/systemd/system/sampleapp.service

[Unit]
Description=sampleapp
Requires=network-online.target
After=network-online.target

[Service]
User=ubuntu
Group=ubuntu
Restart=on-failure
ExecStart=/path/to/sampleapp -listen ":8080"
ExecReload=/bin/kill -HUP $MAINPID
KillSignal=SIGINT
TimeoutStopSec=30s
Restart=on-failure

[Install]
WantedBy=multi-user.target

Original source code thanks to Tristan.

To start and enable the service to run on boot, run:

sudo systemctl start sampleapp
sudo systemctl enable sampleapp

AWS Glue Development Endpoints are another good candidate to shut down using this method because of their higher hourly cost. Cleaning these up can also be done using the simple Lambda above, where you just list Dev Endpoints (with client.list_dev_endpoints) and delete them.

Consider using Cloud9 for development instances as it has an automatic shutdown based on a configurable timeout (default 30 minutes).

For a routine clean-up of your environment that removes unused resources, Marat has developed a clean-up tool, you can read more about it here.

Off-the-shelf scheduling with an AWS template

A custom but lightweight alternative

Ensuring your application runs on instance boot

Use systemd

Other related cost reductions