Quick and Dirty Lat/Lng Values from the Google Geocoding API
Python | 2011-10-11 |
This past week, I had to assign lat/long values to a large batch of addresses we had sitting around in our database.
To get those values, I posted to the Google Geocoding API and parsed the resulting XML.
This script isn't particularly polished - I didn't spend much time thinking about the perfect way to grab this kind of information, I just dove in. Anyway, I thought it might be useful as a jumping-off point for anyone else looking to do the same thing with a batch of street addresses. Tweak and adjust to whatever suits your needs; as always, your mileage may vary.
I started with a source file that looks a little like this (a dump from the address table - I actually had about 10K records):
id|location_code|name|address1|city|state|postalcode|country
3394|70A|Arrowhead Stadium|1 Arrowhead Dr.|Kansas city|MO|64129|US
77|ARC|Arco Arena|1 Sports Parkway|Sacramento|CA|95834|US
2074|BW1|Burswood Dome|Great Eastern Highway|Perth|WA|6100|AU
2200|LIH|Limburghal|Jaarbeurslaan 6|Genk|Limburg|3600|BE
3199|12J|London Arena|Limeharbour|London|Eng.|E14 9TH|GB
1471|GDE|Rose Garden|One Center Court Suite 200|Portland|OR|97227|US
The script uses a few standard libraries, plus minidom to parse the XML:
import os, urllib, string, fileinput, sys
from xml.dom import minidom
source = 'venues.txt'
target = 'updates.txt'
def processfile(name):
"""
Parse address data from source file
Generate API requests
Write updates to include lat/lng values
"""
records = []
for line in fileinput.input(name):
if fileinput.isfirstline():
columns = line.split('|')
if not fileinput.isfirstline():
try:
line[1]
# Strip out characters that could cause errors on API request
line = line.replace('\n', '').replace(".", "")
line = line.replace(" ", "+").replace("&", "and")
address = line.split('|')
# In my case, I needed the venue name in its original form for
# inclusion in the UPDATE, so I'm returning spaces to field 3
address[2] = address[2].replace("+", " ")
# Rebuild each address
newaddress = []
newaddress.append(address[3])
newaddress.append(address[4])
newaddress.append(address[5])
newaddress.append(address[6])
newaddress.append(address[7])
newaddress.append(address[2])
# Then append each address back to a new record set
records.append(newaddress)
except IndexError:
print "emptyline"
requesturl = 'http://maps.googleapis.com/maps/api/geocode/xml?address='
for row in records:
# Generate the request URL
# Example:
# http://maps.googleapis.com/maps/api/geocode/xml?address=1+ \
# Arrowhead+Dr,Kansas+city,MO,64129,US&sensor=false
request = '%s%s,%s,%s,%s,%s&sensor=false' %(requesturl,row[0],row[1],
row[2],row[3],row[4])
# Make the request and parse the XML it returns
dom = minidom.parse(urllib.urlopen(request))
try:
# Get values for the 'lat' and 'lng' nodes
lat = dom.getElementsByTagName('lat')[0].firstChild.nodeValue
lng = dom.getElementsByTagName('lng')[0].firstChild.nodeValue
# I also needed to strip the operators out of 2-word city names
row[1] = row[1].replace("+", " ")
# And generate UPDATE statements
update = ('UPDATE venues SET lat="%s", lng="%s" WHERE name="%s"' +
'AND city="%s" AND country="%s";\n') %(lat,lng,row[5],
row[1],row[4])
except IndexError:
# If lat/lng values are not returned in the XML
update = ('/* ZERO_RESULTS WHERE name="%s" AND city="%s" AND ' +
'country="%s" */\n') %(row[5],row[1],row[4])
with open(target, "a") as f:
f.write(update)
return records
def main():
addresses = processfile(source)
if __name__ == "__main__":
main()
You'll notice that I have my UPDATE statements writing to a text file, instead of connecting to a database and updating directly - that's just because I wanted to review the updates by hand before running them against my table. But you could just as easily do something like this in place of the 'update' line:
import MySQLdb as Database
db = Database.connect("myhost", "myuser", "mypasswd", "mydbname")
cursor = db.cursor()
sql = """UPDATE venues SET lat='%s', lng='%s' WHERE name='%s'
AND city='%s' AND country='%s'""" %(lat,lng,row[5],
row[1],row[4])
cursor.execute(sql)
cursor.close()
db.close()