Constructing a DNS-01 Challenge with ACME and Python
Domains DNS TLS Python | 2023-07-26 |
The company I work for publishes content at hundreds of domains. Records for these domains weren't always managed by just one team, so as the company has grown, things like DNS and contacts were scattered across multiple registrars.
As one part of our effort to consolidate, I'm currently working on a tool that will help manage TLS certificates (a type of file that you must have in order to enable HTTPS on your website).
Until recently, a lot of our certs were on SAN certificates (a Subject Alternate Name, or SAN, is a type of cert which allows multiple hostnames to be protected by a single certificate). They saved money in many cases, but became risky and expensive to update when any of the included domains changed ownership. More specifically, rekeying SAN certificates to add or remove domains can potentially cause TLS to stop working for all the domains on a cert.
Our solution is to use this new tool to generate certificates with Let's Encrypt. Because Let's Encrypt certs are free, it becomes reasonable to get out from under SANs and issue a separate cert for each domain. The downside is that Let's Encrypt certs, by default, expire within three months, so we need a system that will:
- check TLS expiration dates on a regular basis
- renew them when appropriate
- upload the new certs and keys to our CDN
In order to issue a certificate, a CA (or Certificate Authority) such as Let's Encrypt must first verify that you control the domain that you're trying to get a certificate for.
For individual certificates, it probably makes sense in most cases to use something called HTTP01 validation - that is, to add a snippet of code to your web site. The CA can then reach out with a simple http request and verify that the code is there, thereby confirming that you in fact control the site you're requesting a certificate for. HTTP01 methods are very well-documented.
But we're working with certificates and domains in bulk, and for our purposes it's far faster and more efficient to use the DNS01 type of validation. That is, we place a TXT record in the domain's DNS, with a specific name and value provided by Let's Encrypt. Let's Encrypt then checks the domain's DNS for the presence of the TXT record in order to confirm ownership.
As I was building this tool, I found that there are lots of examples of HTTP01 challenge code out there, but DNS01 is not as well-documented. Hence this post.
I want to point out that we're using a couple of specific tools and providers that make all this possible, so ymmv:
All of our DNS records live in hosted zones in AWS Route53. The AWS boto3
library makes it simple to add and change DNS records programmatically, but AWS isn't the only DNS host that has an API. We also have a custom-built database that contains domain names with their Route 53 zone ids, plus some other important metadata.
Now let's take a look at the code we're using to connect to Let's Encrypt and AWS to go through this DNS01 validation process. The Python standard packages you'll need:
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.asymmetric import rsa
import datetime
import json
from OpenSSL import crypto
from OpenSSL.SSL import FILETYPE_PEM
import os
import sys
import time
cryptography
will be used for registering a new account with Let's Encrypt, and OpenSSL
will be used for making certificate signing requests, or CSRs.
Third-party libraries:
from acme import challenges, client, crypto_util
from acme import errors, messages
from acme.client import ClientNetwork, ClientV2
import boto3
import josepy
If you've generated Let's Encrypt certificates before, you may be familiar with the EFF's command line tool, certbot
. This acme
package contains methods that let you work with the ACME protocol programmatically. The code is here, inside the certbot
source code.
That repo has an "examples" path that contains only one piece of example code, for the HTTP-01 challenge. A lot of the code in that example is useful for constructing a DNS01 challenge, but there are some details missing.
Some default values we're setting:
DIRECTORY_URL = 'https://acme-staging-v02.api.letsencrypt.org/directory'
USER_AGENT = 'python-acme-example'
DIRECTORY_URL
is the ACME URL for Let's Encrypt's staging environment. (Read more about their testing environment here.)
USER_AGENT
is an arbitrary value - it can be any string, it's just used for generating an account with Let's Encrypt.
The first step is creating that account:
def register_account():
""" standalone method to register a new LE account """
# Create a new account key
print("Generating a user key")
user_key = josepy.JWKRSA(key=rsa.generate_private_key(public_exponent=65537, key_size=2048, backend=default_backend()))
# Register the account and accept TOS
net = ClientNetwork(user_key, user_agent=USER_AGENT)
directory = ClientV2.get_directory(DIRECTORY_URL, net)
acme_client = ClientV2(directory, net=net)
# Terms of Service URL is in acme_client.directory.meta.terms_of_service
# Creates account with contact information.
email = ('your-email-address@example.com')
account_resource = acme_client.new_account(messages.NewRegistration.from_data(email=email, terms_of_service_agreed=True))
return account_resource
You only need to create the account once, so once you have it, be sure and store the account_resource
object in some kind of secrets vault so that you can access it again.
Next comes the actual certificate request. In my version, I'm passing in a dict containing the name of the domain, the AWS Rt53 zone id, and the account resource.
def request_cert(**kwargs):
""" returns either [] of AWS secrets arns or an error string """
domain = kwargs['domain'].lower()
hostedzone = kwargs['zone']
account_key = kwargs['account_resource']
Generate a user key based on the account resource and instantiate the acme client:
user_key = josepy.JWKRSA.fields_from_json(account_key)
network = ClientNetwork(user_key)
directory = messages.Directory.from_json(network.get(DIRECTORY_URL).json())
acme_client = ClientV2(directory, network)
reg = messages.NewRegistration(key=user_key.public_key(), only_return_existing=True)
response = acme_client._post(directory['newAccount'], reg)
regr = acme_client._regr_from_response(response)
account = acme_client.query_registration(regr)
Create a certificate signing request (that will include a private key) for the domain:
pkey_pem, csr_pem = new_csr(domain)
Use the CSR to generate a certificate. The first step in that process is requesting the order - this returns an object that contains several types of challenges, including HTTP01 and DNS01:
order_object = acme_client.new_order(csr_pem)
You can see my code for get_dns_challenge()
at the bottom of this post - it's a simple method that extracts the DNS challenge from a collection of challenges in the order object:
dns_challenge_object = get_dns_challenge(order_object)
Then the validation process kicks off - from response_and_validation()
, the validation
object is the converted token that must be written to your domain's DNS records:
response, validation = dns_challenge_object.response_and_validation(acme_client.net.key)
My code for updating the DNS record is also below - this is specific to AWS and the boto3
library:
ready_to_validate = update_dns(validation, domain, hostedzone)
I'll show it in detail at the end of this post, but in a nutshell, it creates a TXT record named _acme-challenge.{domain}
. That record uses the validation token as the value, then sleeps for a couple of minutes to give the new record plenty of time to propogate.
Once we're sure the DNS record is in place, we can ping Let's Encrypt again and let them know it's time to attempt authorization:
fullchain_pem = ''
if ready_to_validate:
challenge_resource = acme_client.answer_challenge(dns_challenge_object, response)
deadline = datetime.datetime.now() + datetime.timedelta(seconds=180)
try:
finalized_order = acme_client.poll_and_finalize(order_object, deadline)
fullchain_pem = finalized_order.fullchain_pem
except errors.ValidationError as e:
print(f'Validation error on {domain}: {e.failed_authzrs}')
A couple of things to note:
The deadline
value passed to poll_and_finalize()
is optional - that basically just sets a timeout so that we're not waiting too long for Let's Encrypt to respond.
Also, the finalized_order
that's returned by poll_and_finalize()
contains both a fullchain.pem
and a cert.pem
. For our purposes, we're storing the fullchain.pem
- a combination of cert.pem (the "end-entity certificate") and chain.pem (the intermediate certificate chain) in a single file. Your TLS configuration may require something different, just know that those options are available. As you're testing, I recommend exploring the contents of the finalized_order
object to see what it contains.
Finally, we're doing another AWS-specific task - storing some objects (the csr_pem and private key pem from the original request, plus the fullchain_pem) in Secrets Manager:
arns = []
try:
pem_list = [
{'key': 'csr', 'value': csr_pem},
{'key': 'private_key', 'value': pkey_pem},
{'key': 'fullchain', 'value': fullchain_pem}
]
arns = store_pems(domain=domain, pems=pem_list)
except Exception as e:
print(f"Error storing certificate: {e}")
I'm not posting the code for our store_pems()
method here. We're just using create_secret
and update_secret
from the secretsmanager
class in boto3
, all well-documented here.
And here are the other utility methods mentioned above - this one parses the order object to extract metadata for a DNS01 challenge:
def get_dns_challenge(order_object):
"""Extract the DNS challenge from a collection of challenges"""
# This object holds the offered challenges by the server and their status.
authz_list = order_object.authorizations
for authz in authz_list:
for i in authz.body.challenges:
if isinstance(i.chall, challenges.DNS01):
return i
print('DNS-01 challenge was not offered by the Certificate Authority server.')
return False
This next one is specific to AWS - using the route53
class in boto3
to add a TXT record to the domain's DNS. The route53
methods are pretty well-documented, but I still found the call to change_resource_record_sets()
a little tricky to construct, so I'm including what I did here:
def update_dns(token, domain, hostedzone):
"""Add the challenge TXT record to DNS"""
awsclient = boto3.client('route53',
aws_access_key_id={your access key},
aws_secret_access_key={your secret key})
recordset = {
'Name': f'_acme-challenge.{domain}.',
'Type': 'TXT',
'ResourceRecords': [{"Value": f'"{token}"'}],
'TTL': 60
}
changeset = {'Changes': [{'Action': 'UPSERT', 'ResourceRecordSet': recordset}]}
try:
response = awsclient.change_resource_record_sets(
HostedZoneId = hostedzone,
ChangeBatch = changeset
)
change_id = response['ChangeInfo']['Id']
except Exception as e:
error = f"Error updating DNS: {e}"
return error
while True:
time.sleep(5)
response = awsclient.get_change(Id=change_id)
status = response['ChangeInfo']['Status']
print("DNS change status:", status)
if status == 'INSYNC':
time.sleep(120)
break
return True
Something we could probably add here is a check using the dnspython
library to verify that the TXT record is returning before continuing.
And that's it! If there's any interest, I may also go through what we're doing with the Fastly API as a part of this TLS tool, but in the meantime I hope this helps anyone struggling to work with DNS challenges for certificate requests.