djangoproject.com | nginx.org | python.org | linux.com
version seven.
  http://demongin.org
demongin.org - Changelog

Changelog

In which I describe the salient differences between Version Six and the newly released Version Seven of http://demongin.org.


Sunday, 2009-07-26 | demongin.org, Django, Programming

A Foolish Consistency is the Hobgoblin of Little Minds.

Guido van Rossum, PEP8

First, foremost and most obviously, there is the change in appearance.

Secondarily, certain "areas", "aspects" or "features" of the site, depending on which kind of metaphors you like to use when describing websites, will seem to have "been closed", "vanished" or "receded". Beyond that, little will appear to have changed to the casual user.

But, in reality, a great deal has changed. And the best way I can think of to describe what has changed, how it has changed and why it's better this way is to describe typical user/site interaction in Version Six and then describe how things work in the current version of http://demongin.org.

In V6, you would navigate to the TLD and apache (the web server) would redirect your browser to the index page, index.py. Index.py was a very simple script. For the heads, this is how it looked:

#!/usr/bin/python

import demongin
import time

# START
start_time = time.time()
demongin.print_header()

# BODY
demongin.print_one_post("index")


# FINISH
gen_time = start_time - time.time()
demongin.print_footer(abs(gen_time))
Basically, for non-pythonistas, the old index.py was a script that called a bunch of functions from a bunch of other scripts. The "import demongin" statement calls the main helper script, "demongin.py", which makes all of the little functions that handle the various tasks required to display an HTML document. "print_header()", for example, looked like this:
def print_header():
    print """\
Content-Type: text/html\n\n<html><head><title>%s</title>
<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta http-equiv="content-Type" content="text/html" />
<meta name="description" content="Personal weblog of Timothy O'Connell"/>
<meta name="keywords" content="demongin, demon gin, blog, cyberpunk, Timothy O'Connell, TOC, Tyranny Belle, Children's Masterpiece Theatre, CMT, New Athens"/>
<link rel="alternate" type="application/rss+xml" title="RSS" href="http://demongin.org/rss.xml" />
</head><body>\n""" % (project_name,time.strftime("%Y"))
    print '<link rel="stylesheet" href="%s" type="text/css" media="all" />' % main_css
    print """<body><table class=main><tr><td class=buffer rowspan=2> </td>
        <td class=red_vertical rowspan=2> </td><td class=green_vertical rowspan=2> </td>
        <td class=blue_vertical rowspan=2> g;/td>
        <td class=buffer rowspan=2> </td><td class=body rowspan=2>"""
    print """\
<table width=100%><TR><td class=red width=33%><font class=top_nav>Features</font></td>
<td class=green width=33% align=center><font class=top_nav>Blog</font></td>
<td class=blue width=33% align=right><font class=top_nav>Discography</font></td><tr>"""
    print "<tr><td class=top_nav_left>"
    for item in red_list:
        print item
    print "</td><td class=top_nav_center>"
    for item in green_list:
        print item
    print "</td><td class=top_nav_right>"
    for item in blue_list:
        print item
    print "</td></tr></table>"
Yeah, it was hella ugly. But you can see, in the "print_header" function, the basic MO of the site: some basic header stuff required to display a document that a web browser can read, followed by some meta tags (for SEO stuff) and then the menus that allowed users to navigate the old site by pointing and clicking.

This part:
Content-Type: text/html\n\n
which went before the first "" tag is required for python scripts, when executed by apache, to "return" something that apache can use when they print. And, basically, the whole site was a series of print statements.

Ugly.

And difficult to maintain. Just from looking at "print_header", you can see that my logic and my presentation were pretty hopelessly entangled. Hence the codename "rat's nest" for Version Six of the site: it was an unholy mess back there.

In addition to the series of print statements that pseudo-dynamically created the site's UI and showed off its content was the fact that I wasn't doing anything fancier with the database component that using psycopg2 (a module for python) to connect to a postgresql (PG) database and run queries of varying complexity.

Take, for example, the first part of the function in "demongin.py" that rendered the index or "home" page, depending on whether a user submitted a query string with his http request (e.g. something like "index.py?post=25" where everything in quotes is the http request and everything after the question mark is the query string):
def print_one_post(pkid_or_index):
    if pkid_or_index == "index":
        db = MySQLdb.connect(dbhost,dbuser,dbpass,dbname)
        c = db.cursor()
        c.execute("SELECT pkid,title,date,sub_title,body,quote,category FROM demongin ORDER BY date desc, pkid desc LIMIT 1")
        pkid,title,date,sub_title,body,quote,category = c.fetchone()
    else:
        pkid = int(pkid_or_index)
        db = MySQLdb.connect(dbhost,dbuser,dbpass,dbname)
        c = db.cursor()
        c.execute("SELECT pkid,title,date,sub_title,body,quote,category FROM demongin WHERE pkid like %s" % pkid)
        pkid,title,date,sub_title,body,quote,category = c.fetchone()
Basically, I grabbed the query in another script, decided (based on some very rudimentary logic) whether it was a request for real content or a request for the index, and then threw the string "index" or the post id ("pkid" stands for "primary k ey id") at this function.

The function then connects to the db and grabs a post, returning data according to the structure of the query (i.e. the "SELECT etc." stuff).

Again, very inefficiently implemented and poorly designed: code like this is inflexible and difficult to maintain on account of a.) the highly idiosyncratic nature of it and b.) how vast the scripts became. "demongin.py", for example, was 644 lines and 2280 words.

Additionally, astute readers will already be wondering about how the database was created and how the posts from the old, pure-HTML versions of the site got into that database. Long story short, I created the databases manually (i.e. by connecting to my PG database and issuing the basic SQL commands that created my tables) and then populated them using a hand-written post editor that I had designed and coded specifically for the purposes.

I had tried, during the creation of Version Six, to script something that would automatically read through the voluminous HTML documents that comprised the old versions of the site and create posts accordingly, but the data was too random and too idiosyncratic to efficiently script a transfer. So I went through something like 600 posts and manually created them in my manually configured PG database.

There are also security issues involved in serving public websites in this very crude way (i.e. by passing a raw query string to python scripts executed by apache). There is a reason that you don't generally see websites doing that. The reason is that when you give any user out there on the Internet permission to pass anything he wants as an argument to scripts that your webserver will faithfully and uncritically execute, you open yourself up to a whole bevy of potential attacks.

Again: rat's nest.

But, all things considered, a valuable learning experience: doing things the hard way, while it seldom results in quality work, is still the best way to learn how and, more importantly, why to "work smart" and avail yourself of a.) labor and time saving applications and b.) public standards.

To wit: during one of the various, abortive re-writes of Version Six, I did a fair amount of experimentation with properly structured query strings and read (most of) the RFC for http requests and http queries. During another attempt to re-write the site, I learned how to use an old python project called cgi_app to create HTML "templates" that would allow me to better separate logic from presentation.

Eventually, during the course of the various re-writes, I decided to take a whack at doing the site in django. Django is, for those who don't live and work on the Internet, what we who do live and work on the Internet refer to as a "framework". Frameworks a software designed to help make the development and deployment of complicated websites with lots of "moving parts" a little easier to manage. Frameworks like django are, to gloss the whole category in one clumsy phrase, "labor saving devices".

The best way to describe how the site works within the django framework is, I think, to compare the old site with the new. Take, for instance, the old model of having an "index.py" script that calls other functions. The django approach to this is similar in concept, but much more robust, flexible and scalable.

When you use your browser to make an http request to http://demong.org, it gets handled intelligently by a file in my django codebase called "urls.py". Basically, what "urls.py" does is evaluate all requests, compare them to a list of recognized URLs and then make a decision about what happens next.

The current "urls.py" looks like this:
from django.conf.urls.defaults import *

from django.contrib import admin
admin.autodiscover()

from blog.views import LatestEntries

# RSS
feeds = {
    'latest': LatestEntries,
}

urlpatterns = patterns('',
    # Uncomment the admin/doc line below and add 'django.contrib.admindocs' 
    # to INSTALLED_APPS to enable admin documentation:
    # (r'^admin/doc/', include('django.contrib.admindocs.urls')),

    # Uncomment the next line to enable the admin:
    (r'^$', 'demongin.blog.views.index', {'post_id': 0}),
    (r'^admin/(.*)', admin.site.root),
    (r'^feeds/(?P<url>.*)/$', 'django.contrib.syndication.views.feed', {'feed_dict': feeds}),
    (r'^blog/$', 'demongin.blog.views.index', {'post_id': 0}),
    (r'^blog/(?P<post_id>\d+)/$', 'demongin.blog.views.index'),
    (r'^static/$', 'demongin.blog.views.static', {'page_id': 2}),
    (r'^static/(?P<page_id>\d+)/$', 'demongin.blog.views.static'),
    (r'^tag/$', 'demongin.blog.views.tag_page', {'request_tag': 0}),
    (r'^tag/(?P<request_tag>\d+)/$', 'demongin.blog.views.tag_page'),

    (r'^favicon\.ico$', 'django.views.generic.simple.redirect_to', {'url': 'http://demongin.com/favicon.ico'}),

)</request_tag></page_id></post_id></url>
Take the first line, for example:
(r'^$', 'demongin.blog.views.index', {'post_id': 0}),
Basically, what happens there, is any request for the "root" or "/" of the site, runs a function called "index" in the demongin site, under the blog application in the file "views.py". Additionally, it automatically passes the argument "0" (i.e. the integer zero) to this function called index.

The function in question looks like this:

def index(request, post_id):
    if post_id == 0:
        post = Post.objects.all().order_by('-pub_date')[:1].get()
    else:
        post = get_object_or_404(Post, pk=post_id)

    latest = Post.objects.all().order_by('-pub_date')[:10]    
    tagcloud = Tag.objects.all().order_by('-added_date')[:1].get()
    statics = Static.objects.all().order_by('title')
    return render_to_response(
        "blog.html", 
            {'p': post, 
             't': tagcloud,
             'pages': statics,
             'latest': latest}, 
        context_instance = RequestContext(request)
        )
You don't have to know much (maybe even any) python to see what's happening there. A big part of what makes django a quality project/product is that it conforms to one of the main philosophical tenets of the python language: "readability counts".

The eminently readable code above shows a function called "index" which gets called with two arguments: "request" and "post_id". "request" is the http request itself and "post_id" is the query that the user (or, in the example above, the integer zero) passes to the script. From there, depending on what sort of argument comes in, there's a little bit of very simple logic about what to pull from the database and then the "render_to_response" bit hands all of the information that needs to be retrieved over to the templates.

Before we get to templates, however, let's take a second to compare how the database stuff works. Getting database functionality in django, especially as opposed to the Version Six method of manually creating tables and writing queries, is drop-dead simple.

Another file in the django codebase called "models.py" allows you to write a python class (a "class" is a concept in object-oriented programming; the Wikipedia page on OOP is excellent, if a little dense) which defines database relationships, describes methods of operating on the data in the database and which automatically creates and manages database tables.

The "Post" class in Version Seven's "models.py" looks like this:
class Post(models.Model):
    # Helper functions for the list_display in the admin tool
    def __unicode__(self):
        return self.title
    def get_absolute_url(self):
        """ Used to generate RSS links. Also just a good idea to
        have a function like this laying around. """
        return "http://demongin.org/blog/%s" % (self.id)

    def get_tags(self):
        """ Used to generate comma separated list (in the form of
        a string) of all tags associated with a given post object.
        Used in the admin site. """
        tag_list = [str(item) for item in self.tag.all()]
        return ", ".join(tag_list)
    def get_tags_with_anchors(self):
        """ Used to generate comma separated list (in the form of
        a string) of all tags associated with a given post object.
        used in the http site. """
        tag_list = ['&lt;a href="/tag/%s"&gt;%s&lt;/a&gt;' % (str(item.id), str(item)) for item in self.tag.all()]
        return ", ".join(tag_list)

    
    def word_count(self):
        """ Used in the administrative interface. """
        return len(self.body.split())
    def day_of_week(self):
        """ Returns the day of the week on which a post was published
        as a string: the templating language doesn't appear to
        allow you to call functions. Boo this man. """
        return self.pub_date.strftime("%A")
        
    
    
    # Helper functions for automatically generating data
    def get_pom():
        "Returns the current phase of the moon. Gives every new post the current POM. """
        command = ["/usr/games/pom"]   # Debian
        #command = ["/usr/bin/pom"]      # RHEL
        p = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        return p.stdout.read()

    
    # Function descriptions
    get_tags.short_description = 'tags'
    word_count.short_description = 'word count'

    # Fields
    title = models.CharField(max_length=666)
    subtitle = models.CharField(max_length=666)
    quote = models.TextField(blank=True)
    quote_attribution = models.CharField(max_length=200, blank=True)
    body = models.TextField()
    tag = models.ManyToManyField(Tag)

    pom = models.CharField(max_length=255, default=get_pom)
    pub_date = models.DateTimeField('date published', default=datetime.datetime.today())
    mod_date = models.DateTimeField('date modified', default=datetime.datetime.today())
There's a lot going on there, but if you consider that's pretty much all the code that you need to write to get the creating, editing and displaying of blog posts up and running, it's really a very small amount of code to have to write.

Astute readers will have noticed this line:
    tag = models.ManyToManyField(Tag)
This is a reference to another class called Tag. The "ManyToManyField" relationship between "Post" and "Tag" is how the current site handles the relationship between the 30-something categories or "tags" for posts (e.g. "demongin.org", "Programming", "Philosophy", etc.) and individual posts themselves.

And that's all the database code I really had to write. Pretty much everything after the "models.py" and "views.py" files is pure dressing/presentation. This, especially as compared with Version Six, creates a stark divide between logic and presentation and makes for a much more manageable and easily modified site.

Django, like the cgi_app project discussed briefly above, uses a templating language. When, for example, a user requests the "index" or "home" page of demongin.org, you'll recall that the "index" function of "views.py" called an html file called "blog.html". This file, "blog.html" is the template used to construct the index page. It looks like this:
{% extends "index.html" %}
{% block sitelinks %}
	{% for link in pages %}
	<a href="/static/{{ link.id }}"> {{ link.title }} </a> |	
	{% endfor %}
{% endblock %}
{% block display_post %}
	demongin.org - {{ p.title }}
	{% autoescape off %}
	<h1>{{ p.title }}</h1>	
	<h2>{{ p.subtitle }}</h2>
	<br>
	<h3>{{ p.day_of_week }}, {{ p.pub_date.date }} | {{ p.get_tags_with_anchors }} </h3>
	<table class=nav>
	<tr>
	  <td><b><a href="/blog/{{ p.get_previous_by_pub_date.id }}">« Previous</a></b></td>
	  <td><b><a href="/blog{{ p.get_next_by_pub_date.id }}">Next »</a></b></td>
	</tr>
	<tr>
	  <td colspan=2>
	  <hr>
		{% for post in latest %}
		<a href="/blog/{{ post.id }}"><img src="{{ MEDIA_URL }}images/xbullet.gif"> {{ post.title }}</a> <br>
		{% endfor %}
		    <b><a href="/archive">More...</a></b>
	  </td>
	</tr>
	</table>
	{% if p.quote %}
		<center>
		<table class=quote>
		<tr>
	  	  <td> <p>"{{ p.quote }}"</p> </td>
		</tr>
		{% if p.quote_attribution %}
			<tr>
	  	  	  <td class=quoteattribution> <b>{{ p.quote_attribution }}</b></td>
			</tr>
		{% endif %}
		</table>
		</center>
	{% endif %}
	<p>{{ p.body }}</p>
	{% endautoescape %}
{% endblock %}
{% block tagcloud %}
	{% for tag in t.tagcloud %}
		<font style="font-size: {{ tag.size }}em;"><a href="/tag/{{ tag.id }}">{{ tag.tag }}</a></font>
	{% endfor %}
{% endblock %}
Very simple stuff. Mostly it's a bunch of very readable if and for statements that decide a.) whether and b.) how to display various content. That "index.html" file that this file "extends" is the big template file that contains all the aesthetic/presentation stuff that appears on every page (e.g. the cool new marker/tag style log and that hilarious picture of me and my pained expression).

And, while that file is pretty long, it's nowhere near as long as the integral files of Version Six. It's also much cleaner, as it doesn't have a bunch of database queries and who-knows-what-else going on within it: it's a simple HTML style layout.

"Simple" is a good word to describe working within the framework. "Manageable" is another one I've used a few times in this document. "Extensible" is one that also comes to mind. They're also good words to describe what motived the change: the lack of simplicity, manageability and extensibility of Version Six are what motivated the re-write.

Simplicity, manageability and extensibility are also good words to describe the functional difference between the old site and the new site.