Apache mod_rewrite: Avoid Redirects by Automatically Swapping TLDs
Using a real-life example, I explain how to avoid using redirects (which mess with Google) by rewriting URL's to intelligently swap .net for .com.
Thursday, 2010-02-18 | On the Internet, Programming, Tyranny Belle
| "Why, every one as they like; as the good woman said when she kissed her cow." |
| Rabelais |
The thing about writing/re-writing apache config files is that a.) you're almost always doing it in a "production" environment and b.) you're therefore doing it in a hurry or on the fly which means that c.) as soon as you get it to do what you want, you're going to leave it how it is an not change it again unless absolutely necessary.
What this means is that if you're going to be the guy who gets stuck tinkering with the apache conf, you get pretty good at it pretty quickly.
I have been working with these things for a few years now and I think I'm pretty good at it. I'm not great--there's a lot of confs on the Internet that I see when i'm trying to solve a particular problem that look like Sanskrit to me--but I've cultivated some serviceable, all-purpose skills that give me a range of ability with the things that allows me to handle most problems quickly and effectively.
Take this morning, for example. I was restoring a site, childrensmasterpiecetheatre.net to my Production server after an OS reinstall. Once I got all my data in place but before I added the apache conf to make it work, I decided, on a lark, to see if childrensmasterpiecetheatre.com was still in possession of the nefarious squatters who have been trying to extort $90 out of me since 2004 (when PowWeb, the worst hosting company ever, fucking sold it without my knowledge or consent).
Turns out that it wasn't, so I logged into the famously Protean GoDaddy dashboard, snapped it up, reconfigured the nameservers and decided to write my apache conf around the .com TLD and not the old .net TLD.
So I wrote a (naively, stupidly) simple RedirectPermanent into my conf and restarted the server. The homepage looked fine, but actual requests weren't being handled properly. Take the following URL, for example, which demonstrates how the site dynamically generates URLs:
childrensmasterpiecetheatre.com/?sectionid=12
ServerName childrensmasterpiecetheatre.com
# Typo aliases included here
ServerAlias www.childrensmasterpiecetheatre.com www.childrensmasterpiecetheatre.net childrensmasterpiecetheatre.net www.childrensmasterpeicetheatre.net childrensmasterpeicetheatre.net
# REWRITE INSTRUCTIONS
# no www
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.childrensmasterpiecetheatre\.com$ [NC]
RewriteRule ^(.*)$ http://childrensmasterpiecetheatre.com [R=301]
# typo
RewriteCond %{HTTP_HOST} ^(www\.)?childrensmasterpeicetheatre\.net$ [NC]
RewriteRule ^(.*)$ http://childrensmasterpiecetheatre.com [R=301]
# .net
RewriteCond %{HTTP_HOST} ^(www\.)?childrensmasterpiecetheatre\.net [NC]
RewriteRule ^(.*)$ http://childrensmasterpiecetheatre.com$1 [L]
- ServerName got changed from .net to .com.
- childrensmasterpiecetheatre.net (i.e. without the www.) got added to the ServerAlias list. This is key: if I hadn't added this to the list, my rewrites wouldn't work correctly: if, for example, I tried to redirect %{HTTP_HOST} ^childrensmasterpiecetheatre.net (i.e. without the www.) and it wasn't in the ServerAlias list, nothing would have happened.
- I write more economical conditions by using some basic regex-fu: the ^(www.)? part of the match regex in the last two of the RewriteCond lines matches requests with or without the hated www..
- the [L] in the last RewriteRule signifies that this is the last rule. Supposedly this is "good" conf writing and optimizes apache.
