djangoproject.com | python.org | nginx.org
version seven.
  http://demongin.org
demongin.org - Syntax Highlighting in <i>django</i>

Syntax Highlighting in django

In which I describe how I added codeblocks and syntax highlighting to the new version of the site.


Tuesday, 2009-07-28 | demongin.org, Django, Programming

Vision is the art of seeing the invisible.

Swift

Django has a cool little feature called "template tags" that allows you to use "plug-ins" or create custom "tags" that you can use in your templates to do...whatever it is that you feel like you need to do.

So, for example, a built-in template tag looks like this:

{% for item in list %}
That's the "for" tag. And it does "for" stuff.

Custom template tags are what I used to add the syntax highlighting that you are about to enjoy to demongin.org. What follows will be a rough "how to" that documents the process.

The first thing I did was track down a nicely written, easy to understand template tag that someone else had written. I settled on this one from djangosnippets.org. It looks like this:
from django import template
register = template.Library()

# Pygments: http://pygments.org -- a generic syntax highlighter.
from pygments import highlight
from pygments.formatters import HtmlFormatter
from pygments.lexers import get_lexer_by_name, guess_lexer

# Python Markdown (dropped in my project directory)
from codebase.markdown import markdown

# BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/
from codebase.BeautifulSoup import BeautifulSoup

@register.filter
def render(content, safe="unsafe"):
    """Render this content for display."""
    # First, pull out all the code blocks, to keep them away
    # from Markdown (and preserve whitespace).
    soup = BeautifulSoup(str(content))
    code_blocks = soup.findAll('code')
    for block in code_blocks:
        block.replaceWith('&lt;code class="removed"&gt;&lt;/code&gt;')

    # Run the post through markdown.
    if safe == "unsafe":
        safe_mode = False
    else:
        safe_mode = True
    markeddown = markdown(str(soup), safe_mode=safe_mode)

    # Replace the pulled code blocks with syntax-highlighted versions.
    soup = BeautifulSoup(markeddown)
    empty_code_blocks, index = soup.findAll('code', 'removed'), 0
    formatter = HtmlFormatter(cssclass='source')
    for block in code_blocks:
        if block.has_key('class'):
            # &lt;code class='python'&gt;python code&lt;/code&gt;
            language = block['class']
        else:
            # &lt;code&gt;plain text, whitespace-preserved&lt;/code&gt;
            language = 'text'
        try:
            lexer = get_lexer_by_name(language, stripnl=True, encoding='UTF-8')
        except ValueError, e:
            try:
                # Guess a lexer by the contents of the block.
                lexer = guess_lexer(block.renderContents())
            except ValueError, e:
                # Just make it plain text.
                lexer = get_lexer_by_name('text', stripnl=True, encoding='UTF-8')
        empty_code_blocks[index].replaceWith(
                highlight(block.renderContents(), lexer, formatter))
        index = index + 1

    return str(soup)
It's about 50-something lines (not counting line breaks) and written in a very straight forward manner.

Once I had that element in hand, I created a file called "pygmentize.py", named for the pygment module that does the highlighting in the script, in my directory structure. Notice the screencap from eclipse, which indicates where in the codebase the file belongs.

The next thing to do was modify my "settings.py" file to include the templatetag:
INSTALLED_APPS = (
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
    'django.contrib.admin',
    'demongin.blog',
    'demongin.blog.templatetags.pygmentize',
)
That last line there is the new one and the one which includes the script when the server loads everything up to answer requests.

Once that line was added, everything was pretty much ready to go, so I modified the HTML template file that is used to display the index page of the site. At the top of the template, I added this line:
{% load pygmentize %} 
and, later in the template, I modified the part where the body of the post is displayed:
 <p>{{ p.body|pygmentize  }}</p>
Once those changes had been made, I fired up the test server.

...and watched it crash and burn. Apparently, I was experiencing an encoding problem. The debugger spat this out:
'ascii' codec can't encode character u'\u2019' in position 1343: ordinal not in range 
and pointed me to the line that was malfunctioning:
soup = BeautifulSoup(str(content))
Basically, the thing was gut-shot from the word "go": BeautifulSoup, the module that takes the post, sorts through it and allows the syntax highlighting to happen, wasn't even getting the post because the post couldn't be changed from a sequence of bytes to a string.

After a lengthy period of soul-searching, head-scratching and audible sighing, the solution I improvised was to swap that single line for this try/except loop:
    try:
        soup = BeautifulSoup(str(content))
    except UnicodeEncodeError:
        punctuation = {
            u'\u2018': "'",
            u'\u2019': "'",
        }
        for k, v in punctuation.iteritems():
            content = content.replace(k, v)
            soup = BeautifulSoup(content)
It ain't pretty, but it gets the job done: those rogue quotation marks get swapped out before they can ruin everyone's night and the future crises are averted. It's not a great solution, but it allows me to add more trouble-maker characters in the future, if need be. And I prefer to have that ability.

Once I figured out that encoding problem, syntax highlighting was good to go. Excepting the not-inconsiderable pain in the ass involved in solving that little coding problem, a surprisingly small amount of work went into enabling syntax highlighting.

I'm going to continue to work on presentation, I think, in coming days and weeks. This is for two reasons:
  1. I'm still getting used to working with django and trying to write the kind of python that you're supposed to write when you write django apps
  2. I don't want to start adding functionality (e.g. discography, photoblog) until I've got a firmer sense of how and when the existing site will misbehave and fail

More to come.