背景
在日常运维过程中,业务有时候需要开启EC2的时候会自动注册一个自定义域名。在默认情况下,AWS开启EC2后会生成一个私有DNS,但是这个私有DNS并不支持生成自定义域名,本文就是通过Lambda+Route53来支持此功能,同时还能自动化注册从ip到域名的反向解析记录。这样无论我们在业务拍错的时候,日志中获取到的无论是IP地址或是私有DNS,都可以快速定位EC2实例。
解决方案
方案描述
在Route 53中创建2个私有托管区,一个用于自定义域名解析到EC2的私有IP地址,另一个用于IP地址反向解析到自定义域名。利用CloudWatchd中的Events-Rules,添加监控Tag Change on Resource。根据instance的tag Name来写入私有托管区一个A记录和一个PTR记录。为了名字修改后可以更新,还需要再添加监控Instance State-change Notification,当EC2状态变更为Running或者Terminated的时候更新记录。
适用性:
- 一个Region内的多个VPC的ip段的前缀至少有一个相同。
- 符合要求的VPC举例:
- IP段分别为1.0.0/16,10.2.0.0/16的2个VPC
- IP段分别为18.0.0/20,18.18.128.0/20的2个VPC
- 不符合要求的VPC举例:
- IP段分别为1.0.0/16,10.2.0.0/16的2个VPC
架构图

实施步骤
- 创建一个私有托管区B,用于自定义域名。


- 创建一个私有托管区B,用于从IP反向获取自定义域名。如果你的VPC绑定的IP段为B.C.D,你的域名必须为以下几种之一:
a. D.in-addr.arpa,例如18.in-addr.arpa
b. C.D.in-addr.arpa,例如18.18.in-addr.arpa
c. B.C.D.in-addr.arpa,例如0.18.18.in-addr.arpa

查询VPC所在的CIDR

设置自定义域名,规则必须符合绑定的域名
- 创建IAM Role用于Lambda执行的角色lambda-ec2-name-register-role,并附加以下策略
a. AmazonEC2ReadOnlyAccess
b. AmazonRoute53ReadOnlyAccess
c. AmazonRoute53AutoNamingFullAccess
d. AWSLambdaExecute

- 创建Lambda,用于tag响应事件。
a.设置lambda函数名ec2_change_name
b.设置已经创建好的IAM Role lambda-ec2-name-register-role
c.配置中设置超时时间为10秒

import boto3
import time
import asyncio
config = {
'HOSTED_ZONE_ID': '<自定义域名的托管区id>', # 本案例中为:Z0348615WGFD7IWPZOCV
'PTR_ZONE_ID': '<PTR的托管区ID>', # 本案例中为:Z05233005YXC6V4H0HJK
'PTR_RESERVED_PARTS': 2, # 本案例中值为2,如果你的PTR域名为10.in-addr-apra,则这个值为1
}
route53 = boto3.client('route53')
ec2 = boto3.client('ec2')
loop = asyncio.get_event_loop()
def get_instance_private_ip(instance_id):
instance = ec2.describe_instances(
InstanceIds=[instance_id]
)
private_ip = instance['Reservations'][0]['Instances'][0]['PrivateIpAddress']
return private_ip
async def add_a_record(new_name, private_ip):
begin_time = time.time()
# 如果有自定义dns_name
if len(new_name) == 0:
return
try:
host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)
await delete_dns_record(private_ip)
# 注册内网A记录
response = route53.change_resource_record_sets(
HostedZoneId=config.HOSTED_ZONE_ID,
ChangeBatch={
'Comment': 'add A %s -> %s' % (new_full_custom_dns_name, private_ip),
'Changes': [
{
'Action': 'UPSERT',
'ResourceRecordSet': {
'Name': new_full_custom_dns_name,
'Type': 'A',
'TTL': 300,
'ResourceRecords': [{'Value': private_ip}]
}
}
]
}
)
print('ADD A: %s is recorded for %s, cost %.3fs' % (
new_full_custom_dns_name, private_ip, time.time() - begin_time))
except Exception as e:
print(e)
async def add_ptr_record(new_name, private_ip):
begin_time = time.time()
# 如果有自定义dns_name
if len(new_name) == 0:
return
try:
host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)
await delete_ptr_record(private_ip)
# 添加反向PTR记录
ptr_zone_info = route53.get_hosted_zone(
Id=config.PTR_ZONE_ID
)
ip_parts = private_ip.split('.')
ptr_reserved_ip_parts = ip_parts[config.PTR_RESERVED_PARTS:]
ptr_reserved_ip_parts.reverse()
ptr_name = '.'.join(ptr_reserved_ip_parts)
ptr_full_name = ptr_name + '.' + ptr_zone_info['HostedZone']['Name']
record_sets = route53.change_resource_record_sets(
HostedZoneId=config.PTR_ZONE_ID,
ChangeBatch={
'Comment': 'add PTR %s -> %s' % (ptr_full_name, private_ip),
'Changes': [
{
'Action': 'UPSERT',
'ResourceRecordSet': {
'Name': ptr_full_name,
'Type': 'PTR',
'TTL': 300,
'ResourceRecords': [{'Value': new_full_custom_dns_name}]
}
}
]
}
)
print('ADD PTR: %s is recorded for %s, cost %.3fs' % (
ptr_full_name, new_full_custom_dns_name, time.time() - begin_time))
except Exception as e:
print(e)
async def delete_dns_record(private_ip):
begin_time = time.time()
try:
# 查找匹配的记录
response = route53.list_resource_record_sets(
HostedZoneId=config.HOSTED_ZONE_ID,
StartRecordName=private_ip,
StartRecordType='A'
)
print(response)
# 删除匹配的记录
for record in response['ResourceRecordSets']:
if record['Type'] == 'A' and record['ResourceRecords'][0]['Value'] == private_ip:
record_sets = route53.change_resource_record_sets(
HostedZoneId=config.HOSTED_ZONE_ID,
ChangeBatch={
'Comment': 'delete %s' % record['Name'][:-1],
'Changes': [
{
'Action': 'DELETE',
'ResourceRecordSet': {
'Name': record['Name'][:-1],
'Type': 'A',
'TTL': record['TTL'],
'ResourceRecords': [{'Value': private_ip}]
}
}
]
}
)
print('DEL A: %s is deleted, cost %.3fs' % (record['Name'][:-1], time.time() - begin_time))
except Exception as e:
print(e)
async def delete_ptr_record(private_ip):
begin_time = time.time()
ip_parts = private_ip.split('.')
ip_parts.reverse()
reversed_ip = '.'.join(ip_parts)
try:
# 查找匹配的记录
response = route53.list_resource_record_sets(
HostedZoneId=config.PTR_ZONE_ID,
StartRecordName=reversed_ip,
StartRecordType='PTR'
)
# 删除匹配的记录
ptr_full_name = reversed_ip + '.in-addr.arpa.'
for record in response['ResourceRecordSets']:
if record['Type'] == 'PTR' and record['Name'] == ptr_full_name:
route53.change_resource_record_sets(
HostedZoneId=config.PTR_ZONE_ID,
ChangeBatch={
'Comment': 'delete PTR %s' % record['Name'][:-1],
'Changes': [
{
'Action': 'DELETE',
'ResourceRecordSet': {
'Name': record['Name'][:-1],
'Type': 'PTR',
'TTL': record['TTL'],
'ResourceRecords': [{'Value': record['ResourceRecords'][0]['Value']}]
}
}
]
}
)
print('DEL PTR: n%s is deleted, cost %0.3fs' % (record['Name'][:-1], time.time() - begin_time))
except Exception as e:
print(e)
def add_node(instance_id, new_name):
private_ip = get_instance_private_ip(instance_id)
tasks = [
add_a_record(new_name, private_ip),
add_ptr_record(new_name, private_ip)
]
loop.run_until_complete(asyncio.wait(tasks))
def del_node(instance_id):
private_ip = get_instance_private_ip(instance_id)
tasks = [
delete_dns_record(private_ip),
delete_ptr_record(private_ip)
]
loop.run_until_complete(asyncio.wait(tasks))
def change_instance_dns_name(instance_id, new_name):
if len(new_name) == 0:
del_node(instance_id)
else:
add_node(instance_id, new_name.strip())
def lambda_handler(event, context):
resources = event['resources']
detail = event['detail']
if 'changed-tag-keys' not in detail:
return
if 'Name' not in detail['changed-tag-keys']:
return
for resource in resources:
arn_parts = resource.split(':')
item = arn_parts[-1:][0].split('/')
if 'instance' == item[0]:
change_instance_dns_name(item[1], detail['tags']['Name'])
- 创建Lambda,用于EC2开关机响应事件
a.设置lambda函数名ec2_start_and_shutdown
b.设置已经创建好的IAM Role lambda-ec2-name-register-role
c.配置中设置超时时间为10秒
import boto3
import time
import asyncio
config = {
'HOSTED_ZONE_ID': '<自定义域名的托管区id>', # 本案例中为:Z0348615WGFD7IWPZOCV
'PTR_ZONE_ID': '<PTR的托管区ID>', # 本案例中为:Z05233005YXC6V4H0HJK
'PTR_RESERVED_PARTS': 2, # 本案例中值为2,如果你的PTR域名为10.in-addr-apra,则这个值为1
}
route53 = boto3.client('route53')
ec2 = boto3.client('ec2')
loop = asyncio.get_event_loop()
def get_instance_info(instance_id):
instance = ec2.describe_instances(
InstanceIds=[instance_id]
)
private_ip = instance['Reservations'][0]['Instances'][0]['PrivateIpAddress']
name = ''
for tag in instance['Reservations'][0]['Instances'][0]['Tags']:
if tag['Key'] == 'Name':
name = tag['Value']
return name, private_ip
def get_instance_private_ip(instance_id):
instance = ec2.describe_instances(
InstanceIds=[instance_id]
)
private_ip = instance['Reservations'][0]['Instances'][0]['PrivateIpAddress']
return private_ip
async def add_a_record(new_name, private_ip):
begin_time = time.time()
# 如果有自定义dns_name
if len(new_name) == 0:
return
try:
host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)
await delete_dns_record(private_ip)
# 注册内网A记录
response = route53.change_resource_record_sets(
HostedZoneId=config.HOSTED_ZONE_ID,
ChangeBatch={
'Comment': 'add A %s -> %s' % (new_full_custom_dns_name, private_ip),
'Changes': [
{
'Action': 'UPSERT',
'ResourceRecordSet': {
'Name': new_full_custom_dns_name,
'Type': 'A',
'TTL': 300,
'ResourceRecords': [{'Value': private_ip}]
}
}
]
}
)
print('ADD A: %s is recorded for %s, cost %.3fs' % (
new_full_custom_dns_name, private_ip, time.time() - begin_time))
except Exception as e:
print(e)
async def add_ptr_record(new_name, private_ip):
begin_time = time.time()
# 如果有自定义dns_name
if len(new_name) == 0:
return
try:
host_zone_info = route53.get_hosted_zone(Id=config.HOSTED_ZONE_ID)
host_zone_name = host_zone_info['HostedZone']['Name'][:-1]
new_full_custom_dns_name = '%s.%s' % (new_name, host_zone_name)
await delete_ptr_record(private_ip)
# 添加反向PTR记录
ptr_zone_info = route53.get_hosted_zone(
Id=config.PTR_ZONE_ID
)
ip_parts = private_ip.split('.')
ptr_reserved_ip_parts = ip_parts[config.PTR_RESERVED_PARTS:]
ptr_reserved_ip_parts.reverse()
ptr_name = ('.').join(ptr_reserved_ip_parts)
ptr_full_name = ptr_name + '.' + ptr_zone_info['HostedZone']['Name']
record_sets = route53.change_resource_record_sets(
HostedZoneId=config.PTR_ZONE_ID,
ChangeBatch={
'Comment': 'add PTR %s -> %s' % (ptr_full_name, private_ip),
'Changes': [
{
'Action': 'UPSERT',
'ResourceRecordSet': {
'Name': ptr_full_name,
'Type': 'PTR',
'TTL': 300,
'ResourceRecords': [{'Value': new_full_custom_dns_name}]
}
}
]
}
)
print('ADD PTR: %s is recorded for %s, cost %.3fs' % (
ptr_full_name, new_full_custom_dns_name, time.time() - begin_time))
except Exception as e:
print(e)
async def delete_dns_record(private_ip):
begin_time = time.time()
try:
# 查找匹配的记录
response = route53.list_resource_record_sets(
HostedZoneId=config.HOSTED_ZONE_ID,
StartRecordName=private_ip,
StartRecordType='A'
)
# 删除匹配的记录
for record in response['ResourceRecordSets']:
if record['Type'] == 'A' and record['ResourceRecords'][0]['Value'] == private_ip:
record_sets = route53.change_resource_record_sets(
HostedZoneId=config.HOSTED_ZONE_ID,
ChangeBatch={
'Comment': 'delete %s' % record['Name'][:-1],
'Changes': [
{
'Action': 'DELETE',
'ResourceRecordSet': {
'Name': record['Name'][:-1],
'Type': 'A',
'TTL': record['TTL'],
'ResourceRecords': [{'Value': private_ip}]
}
}
]
}
)
print('DEL A: %s is deleted, cost %.3fs' % (record['Name'][:-1], time.time() - begin_time))
except Exception as e:
print(e)
async def delete_ptr_record(private_ip):
begin_time = time.time()
ip_parts = private_ip.split('.')
ip_parts.reverse()
reversed_ip = '.'.join(ip_parts)
try:
# 查找匹配的记录
response = route53.list_resource_record_sets(
HostedZoneId=config.PTR_ZONE_ID,
StartRecordName=reversed_ip,
StartRecordType='PTR'
)
# 删除匹配的记录
ptr_full_name = reversed_ip + '.in-addr.arpa.'
for record in response['ResourceRecordSets']:
if record['Type'] == 'PTR' and record['Name'] == ptr_full_name:
route53.change_resource_record_sets(
HostedZoneId=config.PTR_ZONE_ID,
ChangeBatch={
'Comment': 'delete PTR %s' % record['Name'][:-1],
'Changes': [
{
'Action': 'DELETE',
'ResourceRecordSet': {
'Name': record['Name'][:-1],
'Type': 'PTR',
'TTL': record['TTL'],
'ResourceRecords': [{'Value': record['ResourceRecords'][0]['Value']}]
}
}
]
}
)
print('DEL PTR: n%s is deleted, cost %0.3fs' % (record['Name'][:-1], time.time() - begin_time))
except Exception as e:
print(e)
def add_node(instance_id):
instance_name, private_ip = get_instance_info(instance_id)
tasks = [
add_a_record(instance_name, private_ip),
add_ptr_record(instance_name, private_ip)
]
loop.run_until_complete(asyncio.wait(tasks))
def del_node(instance_id):
private_ip = get_instance_private_ip(instance_id)
tasks = [
delete_dns_record(private_ip),
delete_ptr_record(private_ip)
]
loop.run_until_complete(asyncio.wait(tasks))
def lambda_handler(event, context):
state = event['detail']['state']
instance_id = event['detail']['instance-id']
if 'running' == state:
add_node(instance_id)
if 'shutting-down' == state:
del_node(instance_id)
return 0
- CloudWatch中的“Event”,创建“Rule”
a.创建Tag Change on Resource事件,目标设置为Lambda,选择函数ec2_change_name

b. 创建Instance State-change Notification事件,目标设置为Lambda,选择函数ec2_start_and_shutdown

测试方法
1.正向域名解析道IP和反向IP解析到域名的验证办法:
a.用ssh登陆到与托管区绑定的VPC
b.正向域名测试使用Ping <custom domain>命令,可以解析指定ip
c.反向IP查询域名使用dig -x <ip address>,可以查询到自定义域名
2.事件有效性验证
a.创建一个EC2实例,设置Tag,Key=Name,Value=Demo后,查看托管区记录
b.Terminate一台实例后,查看托管区记录,相关记录已经删除
c.修改EC2实例名后,查看托管区记录,相关记录已经更新
成本估计
Lambda每月2万次调用,每次5秒钟运行时间,估计每月成本$0.21
Route53 2个托管区,每月成本$2
源代码
SAM源代码:GitHub – yourlin/ec2-name-register
通过SAM build和SAM deploy --guided来部署lambda,会自动创建并绑定CloudWatch Event的Rule,无需手动创建CloudWatch Event Rule。Lambda的相关IAM权限也会自动附加上,不需要手动添加。
参考
针对 PTR 记录启用 Route 53 的反向 DNS 功能
通过 Route 53 Resolver 使用和覆盖反向 DNS 规则
Creating records by using the Amazon Route 53 console
本篇作者