Sunday, April 12, 2009

List Comprehension for filtering

In the previous blog post, I wrote about a list of domains I use in checking whether a domain looks valid for indexedbygoogle.com. The list of domains is retrieved from here: http://data.iana.org/TLD/tlds-alpha-by-domain.txt with much thanks to them. They seem to update the list on a regular basis so they'll do as a resource for now. Here is the bit of code used to download and make the file into a list. I want to exclude the first line which is a comment (the first char is a #) and any empty lines.

import urllib2

url = 'http://data.iana.org/TLD/tlds-alpha-by-domain.txt'

domain_file = urllib2.urlopen(url).read()
domain_list = domain_file.split('\n')
DOMAINS = [tld for tld in domain_list if not (tld.startswith('#') or tld == '')]

The interesting bit is in the last line. The if part of the list comprehension will filter out any blanks and any list items that start with a #.

It does not make sense to make this part of the CGI app itself since the code above will run whenever a user uses indexedbygoogle.com. A better alternative might be to pickle DOMAINS to a file and load it on demand, updating the contents of DOMAINS daily or weekly via a cron job.

Finally, here it is condensed into a one liner but, keep in mind that this might not be the best way to write it since it sacrifices readability and clarity for less code. Clarity should always trump brevity! :)

DOMAINS = [tld for tld in urllib.urlopen('http://data.iana.org/TLD/tlds-alpha-by-domain.txt').read().split('\n') if not (tld.startswith('#') or tld == '')]

Labels: , , ,

Wednesday, April 08, 2009

Urlparse can't do everything

One of the things I wanted indexedbygoogle.com to do was discard obvious errors when a URL is entered. For e.g., site instead of site.com (a weird typo but bare with me). This would be a very shallow check. Nothing major like downloading the page, checking the header, etc. Just a simple domain check so things like abdd or dodah/post were not processed, i.e., check if the .com or .net (etc.) was missing.

I thought urlparse would do the job. Feed it the URL and it would spit out the tuple breaking the URL into its component parts. But that did not work. It surprised me that urlparse would fail on a domain that did not include the scheme where I would expect it to parse that successfully.

For example:

>>> from urlparse import urlparse
>>> urlparse('google.com')
('', '', 'google.com', '', '', '')
>>>
The domain should be the second item in the tuple, the network location, but its the third item, the path. Including the scheme however parses the URL correctly.

>>> urlparse('http://google.com')
('http', 'google.com', '', '', '', '')
>>>

Here, the scheme 'http' and the network location 'google.com' are the first and second items in the tuple and things are as they should be.

So I had to include the scheme if I wanted to parse the URL. No problem. A simple line like so solved the issue:
>>> if not url.lower().startswith('http://'): site='http://'+site
But at this point, I was nowhere near a solution because a url like 'http://garbage' was also parsed with unsatisfactory results:
>>> urlparse('http://garbage')
('http', 'garbage', '', '', '', '')
I wanted 'garbage' to be recognized as the path and not the network location, since it was missing the top level domain(.com, .net, etc). But you know what, that is not what urlparse is designed to do. In the end, I had to come up with my own algorithm. I still needed to use urlparse but with a few extra lines of code thrown in to validate the network location. I did not want to go nuts here. It was enough that the URL had a valid top level domain. For e.g. 'google.ca' would pass, but 'google.commo' or 'google' would fail.

In the end, I got a list containing all 250+ top level domains from http://data.iana.org/TLD/tlds-alpha-by-domain.txt. I used urlparse to parse the url so that I could isolate the network location into a variable called site . The following bit of code did the rest:
from urlparse import urlparse

DOMAINS = ['com','net','edu','gov','info'] # etc...

define is_domain(url):
scheme, site, a, b, c, d = urlparse(url)
try:
domain, tld = site.rsplit('.',1)
except ValueError:
return False # an error here signifies a url without a top level domain.
# e.g. google was entered instead of google.com
if tld.lower() in DOMAINS:
return True
return False


And the lesson I learned from all this: Python is explicit. Don't expect the standard library to do something it was not supposed to do. My assumption was that because urlparse parsed the url into its component parts, it would automatically recognize erroneous network locations. But it did not and in the end, I wrote a function that did what I wanted and it was easy and fun.

Labels: , , ,

Wednesday, March 25, 2009

indexedbygoogle.com

A friend asked me to create a simple one page app that would take as input a webpage url and check to see if the all-mighty search engine has indexed it. Seemed like overkill for a django app so I thought I would implement it as a straight up python cgi app.

The results are at http://indexedbygoogle.com. I am responsible for the code and Steve did the styling. The results are OK but I am busy on a version 2.0 that checks for variations in the url (e.g.: if you submit 'example.com/this-is-a-post', I also check the www.example.com/this-is-a-post version and the http://example.com/this-is-a-post version and the http://www.example.com/this-is-a-post version). Believe it or not, though the page might be indexed, that particular url might not be. I have to check all variations to tell if that page was indexed. Why does that matter? Well, the point is to see if the page is indexed but the url you enter might not be. By checking all variations for that url, I can tell if the page has been indexed at all. Remember, a user is not going to search for a page by its url but via keywords. The point is to see if your page will be one of the results and that is why you want to see if its indexed. This version is still in alpha and not live though its mostly working.

Also implemented it as a bookmarklet. Just drag this: IndexedByGoogle? to your bookmarks. Click the link on any page to see if its been indexed.

Feel free to comment on how you like it.

Update 9/22/09: Well, I've since re-coded the webapp into a Django site after all. It was super simple and along the way, I refactored the site a little. Plus, I added an api. No matter how easy, it's easier with Django. :)

Labels: , ,

Thursday, October 02, 2008

List Comprehension in Python, an example

Coding up a little website where a user would select (using check boxes) from a list of items and submit. I'm using Django btw (which is great). Due to the dynamic nature of the list being generated and displayed, the name of each checkbox wold be something like vid0, vid1, etc. The POST data however will not have the keys for unsubmitted values, and I have no way to tell which keys are in the POST dictionary.

So at first, I coded the following bit of code. Simple, straightforward, gets the job done.
feed = []
for i in range(max_results):
feed.append(request.POST.get('vid'+str(i), '')) # feed will contain empty strings
if feed[-1] is '':
del feed[-1] #if the most recently appended item is an empty string, delete it.
But then, I had a light bulb moment and coded this:
feed=[request.POST.get('vid'+str(i)) for i in range(max_results) if request.POST.get('vid'+str(i),'') is not '']
One line of code now replaces 5! How awesome is that! Python rocks.

There might be even simpler and more elegant solutions out there. If so, please leave them in the comments. (Maybe something using a lambda?)

BTW, I am a newcomer to python having started to code with it just a few months ago.

Labels: , , , , , ,

Getting Started in Corporate Training

Recently, a friend asked my input on getting started in the corporate software training world. I thought my response to him was well thought out and might be beneficial to others (oops, just sprained my shoulder patting myself on the back :) ), so here it is:

An important question to ask yourself is: Are you passionate about Speaking? That's speaking with a capital S. I.e, getting in front of a group of people and talking to them, explaining, presenting, communicating with them, making a connection with them? You got to be passionate about it to be good at it. It's gotta give you a thrill. You gotta feel like you can do it forever. If it's just one more thing to do to make cash and it does not thrill you, then your audience will be able to tell, the sessions will not be good and you'll get no satisfaction. And then, what's the point?

The most important thing about training is presentation and communication. You need to be able to communicate concepts and ideas. Training is not a pedantic click here, click there, you're a moron for not being able to figure this out kind of thing. Adult learners need to be able to fit the why and the benefits of the software tool into their 'mental paradigm'. Benefits, concepts and ideas need to be explained in a why that is relevant to your audience, hence communication and presentation skills become paramount. A great GREAT presenter/speaker is Steve Jobs. Check out his keynote presentations (on youtube) for an example of a great communicator.

Keep in mind that if you're training a software app, it helps to be familiar with said app, but (and I can't repeat this enough), presentation and communications skills rule.

As far as job hunting goes, most corporate training gigs are freelance, which means you sign on for a contract that spans a few weeks, or months. This is not a great time for trainer freelancers. The economic environment means that companies are cutting back on training rollouts and as a result, whatever training gigs there are go to veterans with networked resources. I would keep your current job and just take days off for any gigs you get. This has the added benefit of building experience that you can add onto your resume.

To get experience to add to your resume (this is what i did when I started aobut 18 years ago): I put up fliers and offered individual one-on-one training for about $50 an hour. I was able to add this to my resume as software training. I also volunteered at the public library teaching computer classes (also good on the resume).

As far as software experience is concerned, it's unfortunate that a lot of hiring managers want to see software titles on the resume. Add the usual Office suites and whatever else you can think off that you may have remotely been involved with. The more obscure, or 'corporate' the better (like PeopleSoft, Siebel, or even FileMaker).

Here are a few blogs that I like to read that help make me a better trainer. Not all of them are about training perse, some are about presenting or creating training materials, but all of them deal in one way or another about reaching your audience and communicating.
http://headrush.typepad.com/ <
http://www.presentationzen.com
http://blog.cathy-moore.com/
http://www.articulate.com/rapid-elearning
<

Hope this helps. Let me know how it turns out. I am not one to tell you not to follow your dreams but in this financial environment, maybe keeping your job and looking for part-time contracting work (training wise anyway) would not be a terrible idea.

Tips:
Join LinkedIn and update your profile so it includes skills and accomplishments that line up with training jobs.  For a good idea of what to include, search for training jobs and look for job responsibilities and candidate requirements and try to use the same keywords.  You'll have job recruiters knocking on your door. Try to become an expert on LinkedIn by responding to training related questions from the community.

Finally, for job searches, I use simplyhired.com and indeed.com. They both search other job posting websites and aggregate the search results. Simplyhired also integrates with linkedIn.


FYI, I have not trained in a couple of years as I have become more of a training material author creating user guides and rapid e-learning for my clients.

Labels: , , , ,

Monday, September 15, 2008

And her Tina Fey glasses....

Friday, November 16, 2007

Death by PowerPoint


Death by PowerPoint


From: thecroaker, 3 months ago





Fighting death by PowerPoint... How to make a presentation and not to bore your audience to death.


Link: SlideShare Link

Thursday, July 12, 2007

What's Opera, Doc?

A fave. Enjoy.