Tagging and Snapshotting in AWS with Lambda
Update [February 10, 2021]: The Github community has actively contributed to the below code snippets over the past five years. I’d recommend reading the comments on the Gist page before proceeding with this post. Thank you, Github community!
One of the more challenging aspects to managing a large AWS infrastructure can be tag management for cost allocation and tracking. When you create an EC2 instance, several other assets are created with it, some of which generate charges that should be tracked over time. While keeping an instance’s tags updated is fairly straightforward, ensuring that its EBS volumes, elastic IPs, elastic network interfaces and snapshots stay tagged appropriately can be a real headache.
In my quest to streamline the operations of my clients’ AWS infrastructure using Lambda, I’ve created a Lambda Function that will write and update specific tags from an EC2 instance to that instance’s attached volumes and network interfaces. I have this function’s event set to trigger every hour to ensure the tags stay up to date.
Active Gist
from __future__ import print_function
import json
import boto3
import logging
#setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.ERROR)
#define the connection region
ec2 = boto3.resource('ec2', region_name="us-west-2")
#Set this to True if you don't want the function to perform any actions
debugMode = False
def lambda_handler(event, context):
#List all EC2 instances
base = ec2.instances.all()
#loop through by running instances
for instance in base:
#Tag the Volumes
for vol in instance.volumes.all():
#print(vol.attachments[0]['Device'])
if debugMode == True:
print("[DEBUG] " + str(vol))
tag_cleanup(instance, vol.attachments[0]['Device'])
else:
tag = vol.create_tags(Tags=tag_cleanup(instance, vol.attachments[0]['Device']))
print("[INFO]: " + str(tag))
#Tag the Network Interfaces
for eni in instance.network_interfaces:
#print(eni.attachment['DeviceIndex'])
if debugMode == True:
print("[DEBUG] " + str(eni))
tag_cleanup(instance, "eth"+str(eni.attachment['DeviceIndex']))
else:
tag = eni.create_tags(Tags=tag_cleanup(instance, "eth"+str(eni.attachment['DeviceIndex'])))
print("[INFO]: " + str(tag))
#------------- Functions ------------------
#returns the type of configuration that was performed
def tag_cleanup(instance, detail):
tempTags=[]
v={}
for t in instance.tags:
#pull the name tag
if t['Key'] == 'Name':
v['Value'] = t['Value'] + " - " + str(detail)
v['Key'] = 'Name'
tempTags.append(v)
#Set the important tags that should be written here
elif t['Key'] == 'Application Owner':
print("[INFO]: Application Owner Tag " + str(t))
tempTags.append(t)
elif t['Key'] == 'Cost Center':
print("[INFO]: Cost Center Tag " + str(t))
tempTags.append(t)
elif t['Key'] == 'Date Created':
print("[INFO]: Date Created Tag " + str(t))
tempTags.append(t)
elif t['Key'] == 'Requestor':
print("[INFO]: Requestor Tag " + str(t))
tempTags.append(t)
elif t['Key'] == 'System Owner':
print("[INFO]: System Owner Tag " + str(t))
tempTags.append(t)
else:
print("[INFO]: Skip Tag - " + str(t))
print("[INFO] " + str(tempTags))
return(tempTags)
Using the tagging Lambda function with a snapshotting function that copies a volume’s tags to all newly created snapshots will ensure your billing reports and charge backs capture all charges associated with running that instance in AWS, nearly automatically.
The following snapshot script also cleans up old snapshot (you can set the offset on line 15). I normally set this function’s event to trigger once a day, during a low transactional point. I also recommend setting the timeout on this function to five minutes, as the cleanup process can to take a very long time, depending on the number of snapshots you keep and the number of volumes you’re snapshotting.
Active Gist
import boto3
import logging
import datetime
import re
import time
#setup simple logging for INFO
logger = logging.getLogger()
logger.setLevel(logging.ERROR)
#define the connection
ec2 = boto3.resource('ec2', region_name="us-west-2")
#set the snapshot removal offset
cleanDate = datetime.datetime.now()-datetime.timedelta(days=5)
#Set this to True if you don't want the function to perform any actions
debugMode = False
def lambda_handler(event, context):
if debugMode == True:
print("-------DEBUG MODE----------")
#snapshop the instances
for vol in ec2.volumes.all():
tempTags=[]
#Prepare Volume tags to be importated into the snapshot
if vol.tags != None:
for t in vol.tags:
#pull the name tag
if t['Key'] == 'Name':
instanceName = t['Value']
tempTags.append(t)
else:
tempTags.append(t)
else:
print("Issue retriving tag")
instanceName = "NoName"
t['Key'] = 'Name'
t['Value'] = 'Missing'
tempTags.append(t)
description = str(datetime.datetime.now()) + "-" + instanceName + "-" + vol.id + "-automated"
if debugMode != True:
#snapshot that server
snapshot = ec2.create_snapshot(VolumeId=vol.id, Description=description)
#write the tags to the snapshot
tags = snapshot.create_tags(
Tags=tempTags
)
print("[LOG] " + str(snapshot))
else:
print("[DEBUG] " + str(tempTags))
print "[LOG] Cleaning out old entries starting on " + str(cleanDate)
#clean up old snapshots
for snap in ec2.snapshots.all():
#veryify results have a value
if snap.description.endswith("-automated"):
#Pull the snapshot date
snapDate = snap.start_time.replace(tzinfo=None)
if debugMode == True:
print("[DEBUG] " + str(snapDate) +" vs " + str(cleanDate))
#Compare the clean dates
if cleanDate > snapDate:
print("[INFO] Deleteing: " + snap.id + " - From: " + str(snapDate))
if debugMode != True:
try:
snapshot = snap.delete()
except:
#if we timeout because of a rate limit being exceeded, give it a rest of a few seconds
print("[INFO]: Waiting 5 Seconds for the API to Chill")
time.sleep(5)
snapshot = snap.delete()
print("[INFO] " + str(snapshot))
Using these two function with one another will not only streamline your chargeback and tagging models, but also ensure you have consistent snapshots of all of your instances over time. While I don’t recommend snapshots as your sole backup method, I do recommend keeping at least one per day to accelerate the recovery process if a disaster does occur.
If you’re looking for a solid snapshotting and tagging solution, give these Lambda Functions a try. If you have any questions or suggestions, please leave a comment below or contact me on Twitter. I’m constantly looking for better ways to write and run these functions.
Comments ()