Sudo Comic

Jenny sent me a link to this great comic:

See it over at xkcd: sandwich!

Leave It To Apple To Bring Revision Control To The Masses

On August 7, 2006, Apple previewed Mac OS X 10.5 Leopard at the WWDC. As usual, some new features received a lot of excitement whereas some received criticism. Time Machine, Apple’s automated backup solution, received acclaim from many people because, as usual, Apple makes the process seem so easy and of course even manages to make it fun with the eye candy and intuitive interface. Some people yawned at the app, claiming that it doesn’t deserve any attention because it is just a glorified backup app, which has been available in Windows for years.

Summing it up as a glorified backup app is an obscene oversimplification. As a developer, I immediately recognized in Time Machine the likelihood that it is running on top of a revision control system, and that is what really fascinates me about it, and incidentally that is also what I feel many people misunderstand about it, leading them to discount it. I’m not alone in making this connection; really, any developer who uses and understands source code repositories probably had the thought cross his or her mind. One of the major advantages mentioned in discussions about revision control is the fact that you can always retrieve your file as it existed at any point in the past when it was committed to the repository.

And it’s not a stretch to see Apple borrowing revision control software to implement Time Machine. Subversion is a very popular revision control project in the opensource community. Sitting on top of FreeBSD, Apple has incorporated many other opensource projects in Mac OS X. For instance, OpenGL, Apache, SSH, FTP, CUPS, and Samba come to mind, just to name a few. Why not add Subversion to the list of excellent technologies to incorporate into an excellent OS?

If Time Machine is running on top of Subversion or a similar revision control system, that would address the concerns that some people have–people who might not understand how revision control systems work. Some people have expressed a concern that a backup drive would not be able to support Time Machine for very long before filling up. For instance, if you have a 1GB file (perhaps a video clip) and make a very small change to it, will Time Machine record that change by making another copy of the entire file? If Time Machine made its backups as full copies of the changed files, you can imagine how quickly a backup volume would fill up. But that simply isn’t how most revision control software works. Most revision control software, including Subversion, uses Delta Compression to calculate only the changes made to a file, and then saves only those changes. Thus, a file can be committed to a repository a dozen times, and yet the amount of drive space taken up would only amount to the changes made, plus perhaps a very slight amount of overhead in the repository for tracking each change.

So that covers feasibility. With some software like Subversion, Time Machine could provide a backup of your entire volume and actually support it for a decent period of time thanks to efficient delta compression. What about the logistics of finding the changed files and saving those changes? This would be simple even with Subversion in its present state. Out of the box, Subversion provides the ability to search for files that have changed since the last “commit”, or the last time changes were saved. Time Machine would just have to ask Subversion to report all the files that have changed, then run the commit. This could be scheduled to execute at a given time, say, midnight. In the event that the computer was off at the scheduled time, the steps could be executed immediately upon startup.

In the past, I’ve thought how nice it would be if all the files on my computer could go under revision control just like my source code when I’m working on an application. It would be just Apple’s style to make that possible not just for tech geeks who use Subversion, but to make it possible for anyone, even the guy who doesn’t care a lick about understanding revision control.

The Mysterious Vanishing WordPress Posts

Three of my recent posts from August 2006 have been mysteriously cut off at the knees (two articles about SELinux and one article about Apple releasing the Mac Pro). The first opening sentences remained, but then mid-sentence at a variant length, the article body was truncated. One of the posts was particularly lengthy, and naturally I didn’t have a backup of the article in any fashion. That is extremely disappointing.

To prevent the lamentable agony of this kind of loss, I could: (a) Set up a WordPress scheduled task (with plugins that provide such functionality) to backup the database on a regular basis, or (b) I could backup the database manually after I post an article. As a different approach, and the most fun because it involves programming, (c) I could set up a scheduled task on my server at home to pull the RSS feed from my site on a daily basis and save that.

My server at home is a Linux box (currently Fedora Core 4), so a quick little Linux script is the best way to go. This is exceptionally easy, so let’s take a look:

fn=/backuppath/rss/nazin-`date –iso-8601=date`.xml
url=http://blog.nazin.com/index.php/feed/
curl -o $fn $url

This obviously could be a one-liner, but to dumb it down, I put the backup file path and the URL of the RSS feed in script variables. The first line says, “Make the path inside the rss directory (relative to the location the script is ran from), with a file called ‘nazin-yyyy-mm-dd.rss’, using today’s date.” If you are new to Linux scripting, anything wrapped in ` symbols will be processed and replaced with its output. So “nazin-`date –iso-8601=date`.rss” will actually become “nazin-2006-09-07.rss” if that is today’s date. The second line obviously just assigns the value of the url variable. The third line is then a basic curl call. It says, “Go browse the $url and put the output in the file at $fn.”

I then just set up that script to run as a scheduled Cron job, and we’re in business! A quick note about the path: You can leave it as a relative path, and the script will work fine when you execute it at a shell prompt, but it may fail as a Cron job. To be safe, provide an absolute path so that it works at both places.
Don’t leave backups to humans. We’re too unreliable. Leave it to your server to handle. :-)

Organizing a Source Code Repository for Web Development

When you develop applications, you may use a revision control system like Subversion. If you are new at it, using revision control for web development may feel foreign at first, and understandably so; source code revision control was established before the web paradigm existed. Nevertheless, there are ways that the repository organization of traditional application development can serve our web development well; conversely, we can also use the repository in new ways that really benefit us on the web platform.

Put Those Development Files in Version Control!

To give yourself the best organization within your web application project’s repository, be sure to take all aspects of your development into consideration, not just the organization of the deployable code itself. For instance, just like in compiled application development, you may have some workfiles that ought to be in the repository: Photoshop images, Flash project files, uncompressed audio or video, and similar development files. Or perhaps unit testing code, which need not be deployed on your production server.

There’s nothing worse than realizing that you no longer have these development files when the time comes to make some requested changes. It is definitely wise to include them in your repository. So why not have a subdirectory structure under trunk separating your dev files from your web files? Perhaps something like this:

  • trunk
    • dev
    • www

Keep Them Separated 

Furthermore, in some cases, you may have one web application, but portions of it reside on different websites. For instance, perhaps you have a public-facing web application on your company’s public website, but the administrative portion resides on your intranet site. Or perhaps the web application is just one module of a larger project whose technology reach is more broad than just the web. It would be wise to have that clearly defined in your repository structure as well.

Multisite Example.

  • trunk
    • dev
    • www
    • intranet 
 - or -
  •  trunk
    • dev
    • www
      • public
      • intranet

Multitech Example (VB and web app combination).

  • trunk
    • dev
    • www
    • vb 
 - or -
  •  trunk
    • dev
    • vb
    • www
      • public
      • intranet

Clearly Mark the Live Release 

One aspect of web application development that is really handy with revision control systems like Subversion is that you can also use the revision control system to help with deployment of the application. In fact, you can even use it to help keep track of which version of the app is a “live” release.

In order to accomplish this, you can check out your web application directly onto your webserver. This can be difficult if you only have FTP access to your server, although it can be done with utilities that map FTP connections to local paths on your computer. However, if you have command-line access or network access to your server, checking out files from the repository directly onto your server should be very similar to checking out files on your workstation. Please note that checking out a working copy of your repository onto your webserver is a security risk unless you supply your webserver a directive to disallow read access to the .svn directories that Subversion creates in the working copy. The Website Releases via Subversion article at RedBalloon Labs explains how to do this nicely.

So, what is the benefit of checking out a working copy onto your server? If you have a QA server you will be testing your code on, then perhaps you will check out the web application in the trunk directory. Once you’ve checked out the code, all you need to do is run an update against the working copy on the server to update it to the latest trunk code!

That’s great for testing purposes, but how can we garner the same benefit for our production server? When testing is completed, you will likely tag a version of trunk in the tags directory. At this point, you could check out the version in the tags directory onto your production server. However, this course has a couple caveats: (a) It assumes you will always have the latest tagged release on your production server. (b) It will require a switch command to migrate the server code to the new version. If you’re fine with these considerations, then by all means, check out the latest version from the tags directory and consider yourself done.

I, on the other hand, am not comfortable with those conditions. If your company has any sort of delay between the moment technical work has been completed and when it can actually be deployed, then a tagged version may not be the version on your production server, even though it is the latest and greatest tagged code. Certain versions may never even see any time on the production server if they are bug fix releases that are held back until perhaps more bug fixes are made and a later version is deployed as a cumulative release. Furthermore, if someone else needed to redeploy the application, they should be able to know what version to redeploy.

The solution to these problems is to create a copy/branch of the tagged version as a copy called “live” or something similar. You could keep the “live” copy in the tags directory, but I found it more straightforward to keep it at the repository’s root level. This will mean your repository’s root might look like this:

  • trunk
  • branches
  • tags
  • live

Live being, not an actual directory, but simply a copy of a version in the tags directory, which is actually a copy of trunk at a particular revision! By following this approach, it will always be clear to anyone how to deploy your website or web application onto the production server. All they have to do is check out the live directory at the root of your repository.

Furthermore, updating is a breeze on the server end. Once you are set to change live to a new version, simply delete live and make a copy of the new version. Then, on the server, all that needs to be done is to run an update on the working copy, and all of your changes will fall into place. I love this part of the process. It feels so clean and inobtrusive, which is naturally important for a production deployment upgrade.

Repository Structure Overview

If we were to employ all of these tips, our repository might look something like this:

  • trunk
    • dev
    • www
      • public
      • intranet
  • branches 
  • tags
    • v1.0 (Copy of a revision of trunk)
    • v1.0.1 (Copy of a revision of trunk)
  • live (Copy of a version in tags)

For many of us, simply taking the step of using a revision control system is a great acomplishment alone. If you’re ready to begin leveraging revision control for web development even further, then the tips we just covered might be helpful to you. I invite you to share with me any standards or practices that you are employing in revision control that I haven’t covered here. I’d love to learn about them.

  Theme Brought to you by Directory Journal and Elegant Directory.