List Comprehension for filtering
In the previous blog post, I wrote about a list of domains I use in checking whether a domain looks valid for indexedbygoogle.com. The list of domains is retrieved from here: http://data.iana.org/TLD/tlds-alpha-by-domain.txt with much thanks to them. They seem to update the list on a regular basis so they'll do as a resource for now. Here is the bit of code used to download and make the file into a list. I want to exclude the first line which is a comment (the first char is a #) and any empty lines.
The interesting bit is in the last line. The if part of the list comprehension will filter out any blanks and any list items that start with a #.
It does not make sense to make this part of the CGI app itself since the code above will run whenever a user uses indexedbygoogle.com. A better alternative might be to pickle DOMAINS to a file and load it on demand, updating the contents of DOMAINS daily or weekly via a cron job.
Finally, here it is condensed into a one liner but, keep in mind that this might not be the best way to write it since it sacrifices readability and clarity for less code. Clarity should always trump brevity! :)
import urllib2
url = 'http://data.iana.org/TLD/tlds-alpha-by-domain.txt'
domain_file = urllib2.urlopen(url).read()
domain_list = domain_file.split('\n')
DOMAINS = [tld for tld in domain_list if not (tld.startswith('#') or tld == '')]
The interesting bit is in the last line. The if part of the list comprehension will filter out any blanks and any list items that start with a #.
It does not make sense to make this part of the CGI app itself since the code above will run whenever a user uses indexedbygoogle.com. A better alternative might be to pickle DOMAINS to a file and load it on demand, updating the contents of DOMAINS daily or weekly via a cron job.
Finally, here it is condensed into a one liner but, keep in mind that this might not be the best way to write it since it sacrifices readability and clarity for less code. Clarity should always trump brevity! :)
DOMAINS = [tld for tld in urllib.urlopen('http://data.iana.org/TLD/tlds-alpha-by-domain.txt').read().split('\n') if not (tld.startswith('#') or tld == '')]Labels: cgi, indexedbygoogle.com, list comprehension, python

