Configuring Subversion for HTTP Access Behind Proxy

For the life of me, I couldn’t figure out why I was unable to checkout from my Subversion repository while I was at work. It worked while I was at home (where my Subversion server is) just a few days ago, and I can view the repository using Subversion’s built-in web page functionality, but if I tried to access the repository from the command-line client or TortoiseSVN, I would get an error message.

C:>svn co http://MyServer/Path/To/Proj/ MyProj
svn: REPORT request failed on ‘/Proj/!svn/vcc/default’
svn: REPORT of ‘/Proj/!svn/vcc/default’: 400 Bad Request (http://MyServer)

Yeah, that’s not cryptic. Fortunately, the solution is simple. Sander Striker explained in a thread REPORT request failed … 400 Bad Request that if you’re behind a proxy at work, and the proxy isn’t configured to support the necessary Subversion calls REPORT, your request will fail.

Like he says in his message, you could request that the proxy be configured to allow the necessary requests, but you could just as well configure your server to work on a different port. Now, I like the fact that my Subversion calls can work on port 80, and I don’t want to change that. So, I configured Apache to have my Subversion sites work on an additional port.

In the following example, let’s use port 81 as the additional port. So my example URL http://MyServer/Path/To/Proj/ would become http://MyServer:81/Path/To/Proj/.

At the following spots in your httpd.conf file, make the one-time additions as marked in bold italics:

Listen 80
Listen 81

And…

NameVirtualHost *:80
NameVirtualHost *:81

And for every virtual host entry you have for a Subversion site (I host a couple different Subversion sites), add *:81 in the VirtualHost header:

< VirtualHost *:80 *:81 >

After restarting Apache, you will now be able to continue to use the URLs you normally use, but anytime you need to checkout from the repository while at work behind a proxy, you can use port 81 to do so successfully.

Proper Placement of mod_dav_svn

When installing a configuring Subversion to work through Apache, you might get an error like this when attempting to start up httpd:

Cannot load /etc/httpd/modules/mod_dav_svn.so into server: /etc/httpd/modules/mod_dav_svn.so: undefined symbol: dav_xml_get_cdata

Please note! Some people have indicated that this is because Apache wasn’t configured with DAV support when it was compiled on your distro. The answer might be a lot simpler than that.

Garrett Rooney noted that it might be as simple as just making sure that mod_dav is loaded before loading mod_dav_svn! I was experiencing this error, and a simple rearrangement of my LoadModule commands in httpd.conf fixed it.

Leave It To Apple To Bring Revision Control To The Masses

On August 7, 2006, Apple previewed Mac OS X 10.5 Leopard at the WWDC. As usual, some new features received a lot of excitement whereas some received criticism. Time Machine, Apple’s automated backup solution, received acclaim from many people because, as usual, Apple makes the process seem so easy and of course even manages to make it fun with the eye candy and intuitive interface. Some people yawned at the app, claiming that it doesn’t deserve any attention because it is just a glorified backup app, which has been available in Windows for years.

Summing it up as a glorified backup app is an obscene oversimplification. As a developer, I immediately recognized in Time Machine the likelihood that it is running on top of a revision control system, and that is what really fascinates me about it, and incidentally that is also what I feel many people misunderstand about it, leading them to discount it. I’m not alone in making this connection; really, any developer who uses and understands source code repositories probably had the thought cross his or her mind. One of the major advantages mentioned in discussions about revision control is the fact that you can always retrieve your file as it existed at any point in the past when it was committed to the repository.

And it’s not a stretch to see Apple borrowing revision control software to implement Time Machine. Subversion is a very popular revision control project in the opensource community. Sitting on top of FreeBSD, Apple has incorporated many other opensource projects in Mac OS X. For instance, OpenGL, Apache, SSH, FTP, CUPS, and Samba come to mind, just to name a few. Why not add Subversion to the list of excellent technologies to incorporate into an excellent OS?

If Time Machine is running on top of Subversion or a similar revision control system, that would address the concerns that some people have–people who might not understand how revision control systems work. Some people have expressed a concern that a backup drive would not be able to support Time Machine for very long before filling up. For instance, if you have a 1GB file (perhaps a video clip) and make a very small change to it, will Time Machine record that change by making another copy of the entire file? If Time Machine made its backups as full copies of the changed files, you can imagine how quickly a backup volume would fill up. But that simply isn’t how most revision control software works. Most revision control software, including Subversion, uses Delta Compression to calculate only the changes made to a file, and then saves only those changes. Thus, a file can be committed to a repository a dozen times, and yet the amount of drive space taken up would only amount to the changes made, plus perhaps a very slight amount of overhead in the repository for tracking each change.

So that covers feasibility. With some software like Subversion, Time Machine could provide a backup of your entire volume and actually support it for a decent period of time thanks to efficient delta compression. What about the logistics of finding the changed files and saving those changes? This would be simple even with Subversion in its present state. Out of the box, Subversion provides the ability to search for files that have changed since the last “commit”, or the last time changes were saved. Time Machine would just have to ask Subversion to report all the files that have changed, then run the commit. This could be scheduled to execute at a given time, say, midnight. In the event that the computer was off at the scheduled time, the steps could be executed immediately upon startup.

In the past, I’ve thought how nice it would be if all the files on my computer could go under revision control just like my source code when I’m working on an application. It would be just Apple’s style to make that possible not just for tech geeks who use Subversion, but to make it possible for anyone, even the guy who doesn’t care a lick about understanding revision control.

Organizing a Source Code Repository for Web Development

When you develop applications, you may use a revision control system like Subversion. If you are new at it, using revision control for web development may feel foreign at first, and understandably so; source code revision control was established before the web paradigm existed. Nevertheless, there are ways that the repository organization of traditional application development can serve our web development well; conversely, we can also use the repository in new ways that really benefit us on the web platform.

Put Those Development Files in Version Control!

To give yourself the best organization within your web application project’s repository, be sure to take all aspects of your development into consideration, not just the organization of the deployable code itself. For instance, just like in compiled application development, you may have some workfiles that ought to be in the repository: Photoshop images, Flash project files, uncompressed audio or video, and similar development files. Or perhaps unit testing code, which need not be deployed on your production server.

There’s nothing worse than realizing that you no longer have these development files when the time comes to make some requested changes. It is definitely wise to include them in your repository. So why not have a subdirectory structure under trunk separating your dev files from your web files? Perhaps something like this:

  • trunk
    • dev
    • www

Keep Them Separated 

Furthermore, in some cases, you may have one web application, but portions of it reside on different websites. For instance, perhaps you have a public-facing web application on your company’s public website, but the administrative portion resides on your intranet site. Or perhaps the web application is just one module of a larger project whose technology reach is more broad than just the web. It would be wise to have that clearly defined in your repository structure as well.

Multisite Example.

  • trunk
    • dev
    • www
    • intranet 
 - or -
  •  trunk
    • dev
    • www
      • public
      • intranet

Multitech Example (VB and web app combination).

  • trunk
    • dev
    • www
    • vb 
 - or -
  •  trunk
    • dev
    • vb
    • www
      • public
      • intranet

Clearly Mark the Live Release 

One aspect of web application development that is really handy with revision control systems like Subversion is that you can also use the revision control system to help with deployment of the application. In fact, you can even use it to help keep track of which version of the app is a “live” release.

In order to accomplish this, you can check out your web application directly onto your webserver. This can be difficult if you only have FTP access to your server, although it can be done with utilities that map FTP connections to local paths on your computer. However, if you have command-line access or network access to your server, checking out files from the repository directly onto your server should be very similar to checking out files on your workstation. Please note that checking out a working copy of your repository onto your webserver is a security risk unless you supply your webserver a directive to disallow read access to the .svn directories that Subversion creates in the working copy. The Website Releases via Subversion article at RedBalloon Labs explains how to do this nicely.

So, what is the benefit of checking out a working copy onto your server? If you have a QA server you will be testing your code on, then perhaps you will check out the web application in the trunk directory. Once you’ve checked out the code, all you need to do is run an update against the working copy on the server to update it to the latest trunk code!

That’s great for testing purposes, but how can we garner the same benefit for our production server? When testing is completed, you will likely tag a version of trunk in the tags directory. At this point, you could check out the version in the tags directory onto your production server. However, this course has a couple caveats: (a) It assumes you will always have the latest tagged release on your production server. (b) It will require a switch command to migrate the server code to the new version. If you’re fine with these considerations, then by all means, check out the latest version from the tags directory and consider yourself done.

I, on the other hand, am not comfortable with those conditions. If your company has any sort of delay between the moment technical work has been completed and when it can actually be deployed, then a tagged version may not be the version on your production server, even though it is the latest and greatest tagged code. Certain versions may never even see any time on the production server if they are bug fix releases that are held back until perhaps more bug fixes are made and a later version is deployed as a cumulative release. Furthermore, if someone else needed to redeploy the application, they should be able to know what version to redeploy.

The solution to these problems is to create a copy/branch of the tagged version as a copy called “live” or something similar. You could keep the “live” copy in the tags directory, but I found it more straightforward to keep it at the repository’s root level. This will mean your repository’s root might look like this:

  • trunk
  • branches
  • tags
  • live

Live being, not an actual directory, but simply a copy of a version in the tags directory, which is actually a copy of trunk at a particular revision! By following this approach, it will always be clear to anyone how to deploy your website or web application onto the production server. All they have to do is check out the live directory at the root of your repository.

Furthermore, updating is a breeze on the server end. Once you are set to change live to a new version, simply delete live and make a copy of the new version. Then, on the server, all that needs to be done is to run an update on the working copy, and all of your changes will fall into place. I love this part of the process. It feels so clean and inobtrusive, which is naturally important for a production deployment upgrade.

Repository Structure Overview

If we were to employ all of these tips, our repository might look something like this:

  • trunk
    • dev
    • www
      • public
      • intranet
  • branches 
  • tags
    • v1.0 (Copy of a revision of trunk)
    • v1.0.1 (Copy of a revision of trunk)
  • live (Copy of a version in tags)

For many of us, simply taking the step of using a revision control system is a great acomplishment alone. If you’re ready to begin leveraging revision control for web development even further, then the tips we just covered might be helpful to you. I invite you to share with me any standards or practices that you are employing in revision control that I haven’t covered here. I’d love to learn about them.

  Theme Brought to you by Directory Journal and Elegant Directory.