Using GitHub Actions to run a Python script

2024-11-25

Hey, it's time for my annual blog post! I hate being one of those people who apologizes for being away for so long, but I'm not going to promise to write more frequently because I know that's not going to happen. Life is too busy, work is too busy. I don't even have time to keep up with social media as much as I'd like, much less to write in long form.

But technical blogs continue to be valuable; software engineering is an ever-changing landscape, and the best way to keep up with it is to learn from each other. I work on challenging projects all the time, projects that I think other engineers would benefit from hearing about. This one just happens to be small and digestible enough to write about in an afternoon.

This past week, I wrote a bit of automation that I'm pretty happy with. We use public certificates, which we convert to public keys, as part of the authentication system on some of our apps. Those keys rotate periodically, so we needed a scheduled process to pull the current certificates, convert them, and update the dictionary where we store them in Terraform.

This process is comprised of two parts:

  • A script that requests the certificates at their public endpoint, compares them to the values in the existing file, and updates the file if there are differences.
  • A GitHub Action workflow that calls the script, generates a pull request if there is a change to the dictionary, and sends a notice to Slack to let my team know the PR exists.

It's the second part that I want to focus on, although I'll show you some code for both. The script itself isn't doing anything out of the ordinary - it's making http requests, doing some string parsing, reading and writing from files. There are some interesting encoding/decoding steps, but otherwise it's nothing to write home about.

The GHA workflow was what I found fun to work on. We use GHA primarily for managing deployment workflows, and we use it extensively for that, but not for much else. So this was a chance to do something new and it's already giving me ideas for other time-saving processes and automation we can implement.

(I'm not here to teach you about GitHub Actions though - that's outside the scope of this post. If you have not worked with GitHub Actions before, I'd suggest going to the GHA docs and walking through the quickstart to get familiar with some of the concepts and terminology. It won't take long, I promise, and this post will be here waiting when you get back.)

To clarify, this workflow lives in the repository where we keep all of our organization's Terraform plans. We need it to run once daily, so the first thing I've done after giving it a name and adding a workflow_dispatch trigger, is to set a cron schedule:

name: Key Comparison

on:
  workflow_dispatch:
  schedule:
    - cron: "0 12 * * *"

Then I've set a few env vars. The first two are just constants - the first uses values from the workflow run to generate a unique branch name, and the second constructs a url that will be posted to Slack, so that engineers can follow it back to look at the details of the action run.

The second two vars are tokens whose values have been set at the repository level, to allow interaction with the code and for Slack notifications.

env:
  BRANCH_NAME: keys_update--${{ github.run_id }}-${{ github.run_attempt }}
  DEPLOY_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
  GH_TOKEN: ${{ secrets.GITHUB_ACCESS_TOKEN }}
  SLACK_BOT_TOKEN: ${{ secrets.SLACK_NOTIFICATIONS_TOKEN }}

Next we have two jobs - generate-pull-request and notify-slack. You can follow the steps in the first job to see the setup required to allow GHA to interact with the codebase using git, and to run the script using Python.

  • check out the repository so that it exists remotely on the GHA runner (the virtual machine that runs the workflow)
  • configure git and Python
  • run the Python script
  • if the script results in file changes, cut a branch and generate a pull request
jobs:
  generate-pull-request:
    name: Generate Pull Request
    runs-on: ubuntu-latest
    outputs:
      changes: ${{ steps.git-check.outputs.changes }}
    steps:
      - name: Check out repository to the runner
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - name: configure git
        run: |
          git config user.name github-actions
          git config user.email github-actions@github.com
          git checkout main
          git fetch origin
      - name: setup python
        uses: actions/setup-python@v5
        with:
          python-version: 3.12
      - name: Run script
        run: python3 .github/scripts/keys.py
      - name: check for changes
        id: git-check
        run: |
          if git diff --quiet; then
            echo "No changes detected, exiting workflow successfully"
            exit 0
          fi
          echo "changes=true" >> $GITHUB_OUTPUT
      - name: continue workflow
        if: steps.git-check.outputs.changes == 'true'
        run: echo "Changes detected, continuing workflow"
      - name: cut a branch
        if: steps.git-check.outputs.changes == 'true'
        run: |
          git checkout -b ${{ env.BRANCH_NAME }}
          git push -u origin ${{ env.BRANCH_NAME }}
      - name: stage changed files
        if: steps.git-check.outputs.changes == 'true'
        run: git add .
      - name: commit changed files
        if: steps.git-check.outputs.changes == 'true'
        run: git commit -m "replacing keys"
      - name: fetch from branch
        if: steps.git-check.outputs.changes == 'true'
        run: git fetch origin ${{ env.BRANCH_NAME }}
      - name: push code to branch
        if: steps.git-check.outputs.changes == 'true'
        run: git push origin ${{ env.BRANCH_NAME }}
      - name: generate pull request
        if: steps.git-check.outputs.changes == 'true'
        run: gh pr create -B main -H ${{ env.BRANCH_NAME }} --title "${{ env.BRANCH_NAME }}" --body "Created by GHA Workflow"

Note that in my case the script and workflow files both live at similar paths relative to the project root. Example:

project/
    .github/
        scripts/
        workflows/

Workflow files must live at the /project/.github/workflows/ but you have a lot more options when it comes to where to place your script. You can set a working-directory value and then call the script relative to that path. You can also, if you want to keep things simple, just run the Python script inline from the workflow itself, for example:

      - name: Run a python script inline
        shell: python
        run: |
          import os
          import sys

          print(os.getcwd())
          print("Hello world!")
          sys.exit()

Also, while most of these steps use git commands, the pull request is generated using the GitHub CLI. I could have used git request-pull but I find the CLI call more elegant and easier to configure. It's just a slight personal preference, not a requirement. You do get the GitHub CLI for free if you're using GitHub-hosted runners - there's nothing to install or configure, it just works as long as you set that environment variable called GH_TOKEN (see above).

The second job, notify-slack, sends a notification to our Slack channel once the first job is complete (see needs:). Because I have set the SLACK_BOT_TOKEN env var, I only need to use the slackapi/slack-github-action and pass the channel name at channel-id. The payload is the message sent to the Slack channel - you'll recognize the format if you've worked with programmatic Slack notifications before. You can pass variables and outputs, and do some basic formatting using Slack syntax.

  notify_slack:
    name: Notify Slack Channel on Job Run 
    needs: generate-pull-request
    if: needs.generate-pull-request.outputs.changes == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Get PR
        id: get-pr
        shell: python
        run: |
          import os
          import subprocess
          pull_request_id = ''
          prlist = subprocess.run(['gh','pr','list'], capture_output = True, text=True)
          mylist = prlist.stdout.split('\n')
          for m in mylist:
              if 'firebase_keys_update--' in m:
                  pull_request_id = m.split('\t')[0]
          with open(os.environ['GITHUB_OUTPUT'], "a") as f:
            f.write(pull_request_id)
      - name: Notify Channel
        uses: slackapi/slack-github-action@v1.26.0
        with:
          channel-id: "#my-channel-name"
          payload: |
            {
              "attachments": [
                {
                  "color": "#36A64F",
                  "blocks": [
                    {
                      "type": "header",
                      "text": {
                        "type": "plain_text",
                        "text": ":checkmark: Key Comparison Complete"
                      }
                    },
                    {
                      "type": "section",
                      "text": {
                        "type": "mrkdwn",
                        "text": "*View Deploy:* <${{ env.DEPLOY_URL }}>"
                      }
                    }
                  ]
                }
              ]
            }

With all the pieces in place, this workflow will run on its schedule, but it can also be run manually. You can use the GitHub Actions console (go to your repository > Actions, select the workflow from the lefthand column, then you should see a "Run workflow" option on the right.) It's also possible to trigger a run from the command line, using the GitHub CLI:

gh workflow run key_comparison.yaml --ref branch_name

As I said, the Python script itself is far less important - it's not especially complex, but I'll give you a truncated version so that you can see how I'm interacting with certificates and keys, and with files in the repository where this runs.

For simplicity's sake, I'm only using standard library packages. I just didn't want to deal with installs and package management.

import base64
import json
import os
import subprocess
import sys
from urllib.request import urlopen

The url I'm hitting returns something like this, so it's extremely easy to convert and parse:

{
"cert-one": "-----BEGIN CERTIFICATE-----\nblahblahblah\n-----END CERTIFICATE-----\n"",
"cert-two": "-----BEGIN CERTIFICATE-----\nblahblahblah\n-----END CERTIFICATE-----\n"",
"cert-three": "-----BEGIN CERTIFICATE-----\nblahblahblah\n-----END CERTIFICATE-----\n"",
"cert-four": "-----BEGIN CERTIFICATE-----\nblahblahblah\n-----END CERTIFICATE-----\n",
"cert-five": "-----BEGIN CERTIFICATE-----\nblahblahblah\n-----END CERTIFICATE-----\n"
}

api_url = 'https://www.example.com/path/to/publickeys/'

This is the reference to the file where the existing dictionary of cert keys lives:

rootdir = os.getcwd()
key_source_file = rootdir + '/path/to/keyfile/main.tf'

First, generate a list of current keys based on what's in the main branch to use for comparison. (In my source file, the keys just happen to be on the only lines in the file that begin with double quotes, so they're easy to identify.)

try:
    current_keys = []
    f = open(key_source_file, "r+")
    lines = f.readlines()
    for line in lines:
        if line.lstrip().startswith('"'):
            key = line.lstrip().split('"')[1]
            current_keys.append(key)
except Exception as e:
    sys.exit(e)

Next, get the keys from the API endpoint and add them to a second list.

with urlopen(api_url) as response:
    body = response.read()
try:
    remote_certs = json.loads(body)
except Exception as e:
    sys.exit(e)
new_keys = []
for k, v in remote_certs.items():
    new_keys.append(k)

This list will hold items to be replaced - if there are any - in the source file:

tfitems = []

Compare the lists and proceed if there are differences - we want the keys in our file to match what is available at the public endpoint.

if current_keys != new_keys:
    for k in new_keys:
        # Generate file pointers for certificate files to be used below
        certfile = f'/tmp/{k}_cert.pem'
        pubkeyfile = f'/tmp/{k}_pubkey.pem'

        # Replace any extraneous line break indicators so that the certificate converts correctly
        cert = remote_certs[k]
        cert_content = cert.replace('\n', '')
        cert_content = cert_content.replace('-----BEGIN CERTIFICATE-----', '-----BEGIN CERTIFICATE-----\n')
        cert_content = cert_content.replace('-----END CERTIFICATE-----', '\n-----END CERTIFICATE-----')

        # Copy the certificate value into a temporary .pem file
        with open(certfile, "w+") as f:
            f.write(cert_content)

        # Create the public key based off the certificate.
        # This would be the equivalent of running:
        # openssl x509 -pubkey -noout -in cert.pem > pubkey.pem
        f = open(pubkeyfile, "w+")
        ossl = subprocess.run(['openssl','x509','-pubkey','-noout', '-in', certfile], stdout=f)
        f.close()

        # Base64 encode the key to in preparation for storing in the Terraform dictionary
        f = open(pubkeyfile, "rb")
        keybyte = f.read() # returns a bytes object
        # Base64 encode the bytes object
        base = base64.b64encode(keybyte)

        # Decode to convert to string
        base_decode = base.decode()
        # Convert to a string so that it can be written into the terraform file:
        itemline = f'      "{k}" = "{base_decode}"\n'

        # Append items - this will be the list used to replace the dictionary in the source file.
        tfitems.append(itemline)

With the list of new keys assembled, we write the new terraform file:

# Open key_source_file
with open(key_source_file, "r") as f:
    data = f.readlines()

# Identify and replace dictionary items
for line in data:
    if line.lstrip().startswith('"'):
        num = data.index(line)
        try:
            data[num] = tfitems[0]
            tfitems.pop(0)
        except Exception as e:
            print(f"Not a token: {line}")

# Overwrite and close the file
with open(key_source_file, "w") as f:
    f.writelines(data)

Have questions or want to chat about this post? Hit me up on Mastadon or Bluesky