Python Developer and Educator
2012-08-29
If you're looking to get your feet wet when it comes to working with open U.S. government data, I can think of no better place to start than with the Sunlight Laps APIs. They're not kidding when they say that using their APIs is absurdly easy.
Sunlight Labs is a project of the Sunlight Foundation, an organization that has been working for several years to access public government data** - the kind of data that is freely available on state and federal web sites, but that is buried behind a Byzantine series of links or is just poorly formatted for analytical use. Sunlight has done the hard work of finding that data and collecting it, and Sunlight Labs has created the tools that make it accessible for all of us to use.
**(Their other projects include Sunlight Reporting Group, Sunlight Live and the Open House Project.)
Currently they provide five APIs accessible with Python:
sunlight-openstates
: http://python-sunlight.readthedocs.org/en/latest/services/openstates.html#sunlight-openstatessunlight-capitolwords
: http://python-sunlight.readthedocs.org/en/latest/services/capitolwords.html#sunlight-capitolwordspython-transparencydata
: https://github.com/sunlightlabs/python-transparencydataThis script example uses the openstates
API to
To use any of the libraries, you'll first need to get an API key:
http://services.sunlightlabs.com/accounts/register/
It only took a few minutes for my key to arrive in the mail. Once you've got it, you have a few options for setting it (I used ~/.sunlight.key):
http://python-sunlight.readthedocs.org/en/latest/index.html#usage
Then install the sunlight
module (this won't apply to the Transparency Data and Real Time Congress APIs) using either pip install
or checking out the project from Github:
http://python-sunlight.readthedocs.org/en/latest/index.html#installation
With all that done, you're ready to go. Let's pop open an interpreter and play around with the given example:
>>> import sunlight
>>> nc_legs = sunlight.openstates.legislators(state='nc')
As you'll see, this returns a list of dicts, each dict containing a lot of publicly availably information - such as name, district, office address, party affiliation, in some cases even a picture - about each state legislator in the state of North Carolina:
>>> nc_legs
>>> [{u'leg_id': u'NCL000242', u'first_name': u'Barbara',
u'last_name': u'Lee', u'middle_name': u'',
u'district': u'12', u'chamber': u'lower', u'url':
u'http://www.ncga.state.nc.us/gascripts/members/viewMember.pl?sChamber=House&nUserID=634',
u'created_at': u'2012-08-10 02:06:05', u'updated_at': u'2012-08-29 02:09:04',
u'email': u'Barbara.Lee@ncleg.net', u'+notice': u'[\xa0Appointed\xa008/06/2012\xa0]',
u'state': u'nc', u'offices': [{u'fax': None, u'name': u'Capitol Office',
u'phone': u'919-733-5995',
u'address': u'NC House of Representatives\n300 N. Salisbury Street, Room 613\n\nRaleigh, NC 27603-5925',
u'type': u'capitol', u'email': None}],
u'full_name': u'Barbara Lee', u'active': True, u'party': u'Democratic',
u'suffixes': u'', u'id': u'NCL000242',
u'photo_url': u'http://www.ncga.state.nc.us/House/pictures/hiRes/634.jpg'},
...
One simple but powerful API call and we've already got so much information at our fingertips. So what can we do with all that data? Well, since the ultimate goal is to get a count of party affiliations per state, let's start by creating a list of state abbreviations. Then for each state in that list, we can make the same API call to get all the legislative data, and write a subset of that data - the state, the representative's full name, and their party affiliation - to a new dict.
states = ["AL", "AK", "AZ", "AR", ...]
def find_state_reps():
# Start by instantiating the new dict:
statereps = {}
for s in states:
legs = sunlight.openstates.legislators(state=s)
# If you print 'legs', you'll see a dict with loads of
# contact information for each state representative.
# For my purposes, I'm only collecting name and
# party affiliation.
# This dict will hold {name:party} pairs for each state
l = {}
for leg in legs:
name = leg['full_name']
try:
party = leg['party']
except KeyError: # In some cases, 'party' is missing
party = None
l[name] = party
statereps[s] = l
# At this point, the 'statereps' dict contains:
# {'state':{'rep_name':'party_affiliation'}}
# for each state.
But you know what? Sunlight Labs is providing this API as a free resource, and I don't want to take advantage of their hard work by pounding their servers with a new set of 50 requests every time I run this script. So I'm going to write the dict to a file so that data doesn't have to be pulled from the API again.
outfile = 'state_reps_list.txt'
f = open(outfile, 'w')
f.write(str(statereps))
f.close()
Now, as I'm developing, I can just check to see if I have that file in place and use the dict from there. And when it's time to refresh the data, I can just delete the file and hit the API again to rebuild the statereps
dict from scratch:
import os.path
f = os.path.exists(outfile)
# If we've already got the list stored in a file,
# just refer to that file
# instead of hitting the API again:
if f:
# Get the file content and return it as the statereps dict
f = open(outfile, 'r')
statereps = eval(f.read())
f.close()
else:
# Hit the API for the data
...
return statereps
My statereps
dict looks something like this, but obviously contains a lot more data:
{
'WA': {u'Bruce Chandler': u'Republican', u'Derek Kilmer': u'Democratic', ...},
'WV': {u'Mike Green': u'Democratic', u'Mark Wills': u'Democratic', ...},
...
}
Now I can pass that data into another function that returns the summary count of party affiliations among state legislators per state (e.g., state: dems=x, repubs=y, other=z):
import re
def partycount(reps_dict):
partycount = {}
for s in reps_dict:
# Create lists to hold the party members on a per-state basis:
demlist = []
replist = []
otherlist = []
for k in reps_dict[s]:
# s -> state abbreviation
# k -> full name
# reps_dict[s][k] -> party affiliation
if reps_dict[s][k]:
# Use the re module to determine if either of these strings
# appears in the party affiliation value
dem = re.search('Dem', reps_dict[s][k])
rep = re.search('Repub', reps_dict[s][k])
# And funnel those values into the appropriate lists
if dem:
# If the legislator's party affiliation contains the substring 'Dem',
# add their name to the 'dem' list:
demlist.append(k)
elif rep:
# If the legislator's party affiliation contains the substring 'Rep',
# add their name to the 'rep' list:
replist.append(k)
else:
# If neither substring appears in the legislator's party affiliation,
# add their name to the 'other' list
otherlist.append(k)
c = {}
# Get the length of each list and you have a count of
# dems vs. repubs vs. other for this state:
c['Democrats'] = len(demlist)
c['Republicans'] = len(replist)
c['Other'] = len(otherlist)
partycount[s] = c
return partycount
And now we've got (yet another) dict that looks like this:
{
'WA': {'Republicans': 64, 'Other': 0, 'Democrats': 83},
'DE': {'Republicans': 22, 'Other': 0, 'Democrats': 40},
'DC': {'Republicans': 0, 'Other': 2, 'Democrats': 10},
'WI': {'Republicans': 74, 'Other': 1, 'Democrats': 55},
...
}
Before I return that partycount
dict, I can insert this somewhat ugly bit of code into the function to generate an HTML page with all that data embedded in a table:
# This count data could just as easily be output as
# a template context object, or printed to stdout
output = "<html><body><table>"
output += "<tr><td><b>STATE</td><td><b>Republicans</b></td> \
<td><b>Other</b></td><td><b>Democrats</b></td></tr>"
# Let's sort the keys while we're at it,
# so the states appear in alphabetical order:
for key in sorted(partycount.iterkeys()):
output += "<tr><td align='center'>%s</td>" % (key)
for k in partycount[key]:
output += "<td align='center'>%s</td>" % (partycount[key][k])
percentlist.append(partycount[key][k])
output += "</tr>\n"
output += "</table></body></html>"
f = open('redvblue.html', 'w')
f.write(str(output))
f.close()
One other thing - I can also take that first statereps
dict and convert it to json - that might be handy for doing visualizations down the road:
import simplejson as json
def converttojson(reps_dict):
"""
Take a dict object and convert it to JSON
"""
result = json.dumps(reps_dict, sort_keys=False, indent=4)
return result
Some resources for doing visualizations with the resulting JSON object:
Here are a few more things that I could see adding to this script:
datetime.datetime.fromtimestamp(os.path.getmtime(outfile))
Get unemployment data (source: US Department of Labor, Bureau of Labor Statistics) and compare on a per-state basis to see if there is any correlation between unemployment rates and dominance of any particular party at the state level:
http://www.bls.gov/web/laus/laumstrk.htm
My complete script, minus the changes mentioned above (which I have already implemented locally) can be found here:
https://gist.github.com/3501470
And incidentally, here's that table output:
Republicans | Other | Democrats | |
AK | 34 | 0 | 26 |
AL | 87 | 1 | 51 |
AR | 61 | 0 | 74 |
AZ | 61 | 1 | 28 |
CA | 42 | 1 | 77 |
CO | 48 | 0 | 52 |
CT | 66 | 0 | 121 |
DC | 0 | 2 | 10 |
DE | 22 | 0 | 40 |
FL | 109 | 0 | 50 |
GA | 147 | 1 | 88 |
HI | 9 | 0 | 67 |
IA | 84 | 0 | 66 |
ID | 85 | 0 | 20 |
IL | 78 | 0 | 99 |
IN | 97 | 0 | 53 |
KS | 123 | 0 | 41 |
KY | 63 | 1 | 74 |
LA | 82 | 2 | 60 |
MA | 37 | 0 | 161 |
MD | 55 | 0 | 133 |
ME | 97 | 3 | 85 |
MI | 90 | 0 | 58 |
MN | 109 | 0 | 91 |
MO | 132 | 1 | 64 |
MS | 95 | 0 | 79 |
MT | 95 | 0 | 54 |
NC | 98 | 0 | 71 |
ND | 106 | 0 | 38 |
NE | 0 | 49 | 0 |
NH | 310 | 0 | 109 |
NJ | 48 | 0 | 72 |
NM | 47 | 1 | 64 |
NV | 26 | 0 | 37 |
NY | 27 | 161 | 24 |
OH | 82 | 0 | 50 |
OK | 100 | 0 | 48 |
OR | 44 | 0 | 46 |
PA | 139 | 0 | 111 |
RI | 18 | 1 | 94 |
SC | 102 | 0 | 67 |
SD | 80 | 1 | 24 |
TN | 85 | 0 | 47 |
TX | 120 | 0 | 61 |
UT | 80 | 0 | 24 |
VA | 87 | 1 | 52 |
VT | 43 | 4 | 133 |
WA | 64 | 0 | 83 |
WI | 74 | 1 | 55 |
WV | 41 | 0 | 93 |
WY | 76 | 0 | 14 |
State legislative data current as of 2012-08-29 11:48:26
Contact: barbara@mechanicalgirl.com