Fri 21 Oct 2005
Every month, a group of web developers at Scripps, collectively referred to as WebHeads, meet to discuss web-related topics. At this morning’s meeting, Edgar Milik talked about his group’s experience using Subversion for web projects.
Subversion is a version control system like CVS. It is newer, faster, and has useful features that are harder to implement in CVS, such as restructuring a file system in the repository. Both Subversion and CVS are powerful tools for versioning software source code, especially with languages like C, Java, etc. Versioning web projects, however, is a little trickier.
There are 3 main problems to consider:
1. Web Server Required
To develop and test any web project, you obviously need a web server. This is analogous to requiring gcc or a java runtime environment for compling and running C and Java files respectively.
2. Separation of Development, Staging, and Production Areas
It is important to define 3 distinct areas for the web project. The development area is where users make local changes. The staging area is where the development team tests the web project to ensure nothing is broken. The production area is the actual web site that is served to the public.
This definition of 3 areas differs from the traditional approach of software engineering, where each user has his/her own working area (collectively the development area) and updates/commits code back-and-forth to the repository. With developing software, there’s usually no worry for making it immediately available. However, because a web project must always be available 24/7, a well-defined process must be followed to move code from the development stage to the production site.
3. Different Databases and Config Info
In addition to the 3 working areas mention above, each area may use a different database, possibly with different user accounts, passwords, and potentially on different servers. This configuration information must be kept locally within the project area (development, staging, and production), and it should not be versioned. It may be a good idea to version a template config file that a user can change whenever he/she checks out a new project.
This may be analogous to changing path variables in a Makefile, for instance, when checking out a project written in C or Java.
Solution: Develop in a Webspace!
Edgar’s teams uses a separate server for each area, creating a more secure system for web development. Each area has its own url, so all internal hyperlinks must be relative paths, never absolute! Edgar suggests using a non-routable domain for the development and staging areas (meaning that it can only be accessed from UCSD and with an authentication scheme in place)? This of course means that working remotely requires work-arounds such as webproxy.ucsd.edu or UCSD’s VPN.
Here’s a rough diagram I re-created from memory of the development workflow from Edgar’s team:
Use Virtual Hosts
Example: http://domain.ucsd.edu/users/srhaber/svn/project becomes http://domaindev-srh-01.ucsd.edu
This may help prevent problems where the source code assumes the web root is the domain. (Of course, there are other workarounds to this issue, such as defining the web root relative path in each directory of your web project).
Hide or don’t copy .svn directories
Subversion creates and uses hidden .svn directories in checked-out projects. These files exists purely for subversion and should be kept from being browsed on websites. Thus, they should not be copied/shown on the staging and production servers. Assuming the workflow in the diagram above, using rsync (or another tool/script?) instead of scp may help prevent those directories and files from being copied over to the staging area.
Where to store docs and pdf’s?
Some sites contain links to documents and pdf’s that are not related to the web development process. These files should not be versioned. However, since some web pages may contain hyperlinks to these files, the files should be stored in a shared location where they can be accessed from any area. Another option is to duplicate the files and store them locally within each area. Regardless of the solution, the important part is to make sure the hyperlinks are not broken on the production site.
Trunk, Branches, and Tags
The O’Reilly Subversion book suggests a Trunk, Branches, Tags structure for organizing your repository. The trunk contains the main core of the code. The branches contains personal forks of the project. The tags store version snapshots of the project.
Edgar’s team does not follow this scheme, and perhaps with good reason. The trunk, branches, tags scheme works well for large-scale projects with lots of developers and a constant flux of deadlines and release dates. However, with only a handful of developers, this scheme is overkill. Though it is conceptually a great idea, we can dismiss it for our local projects since we only really use the “trunk” for our versioning needs.
Mounting directories on OS X created crud files?
I’m not aware of the specifics, but OS X can create extraneous files that convolute the web project. These should be detected and deleted.
Can Subversion append log messages at the top of source code files?
A good question brought up during the meeting. We are unsure of the answer. I am unsure of the action anyway, since logs can quickly grow in size and would bloat up the source code files. Perhaps exporting the log to a changelog file is better.
Copying to the staging area is rare, to the production site even rarer
Edgar mentions that copying the source code to the staging area is a rare occurence, maybe happening once a week. This message here seems to be: Never update the staging/production areas arbitrarily! Any time you move code over into those areas, it should be well thought-out before-hand. Note that this is different from committing your source code to the repository, which should happen more regularly…
Commit and Update Regularly
Edgar stresses this as an important practice. Always commit your code and comment it when you make a change. Never commit broken code. If you have errors lingering, be sure to fix them first so that the repository can always contain a working copy. Always update as much as you can to prevent your code from falling out of sync with other developers. Failing to do so may result in a plethora of conflicts and merges later on, so it’s best to keep up-to-date as much as you can.
Comment in Detail
Don’t write half-assed comments each time you commit. Take a minute or two to write a well though-out comment. Try to make it specific about the change you’ve implemented. By committing your code regularly, your comments become more precise and plentiful, resulting in a more informative log.
Communicate
It’s always important to communicate with each other, whether via email, aim, or in person. Using collaborative tools like Subversion is not a substitute, or even a medium, for solid communication.
6 Responses to “Using Subversion for Web Projects”
Leave a Reply
You must be logged in to post a comment.



October 22nd, 2005 at 11:13 pm
Shaun,
Thanks for the detailed summary. Much appreciated.
November 18th, 2005 at 2:01 pm
Nice job Shaun. I missed this one and so appreciate the good review. Good talk today about CMS’s as well.
November 18th, 2005 at 2:06 pm
Thanks Rob,
It’s a shame not many people made it last month’s meeting because it was a very informative talk. Hopefully, others can find this review helpful as well.
Jerry, Mason, and I have been planning to implement a versioning workflow for a while now, and Edgar’s talk was very timely… I made sure to capture as much info as I could!
January 24th, 2006 at 12:03 pm
[…] I first noticed the missing logos as a concern yesterday. I was searching for info on Subversion use with web-related projects, and I stumbled upon my recent blog entry from late last year. Treating myself as an ‘outside visitor’, I noticed that I learned very little about Ocean Informatics, particularly the Who, Where, and Why. The only hint came from the ucsd.edu domain, which would lead me to guess this was a university-related site. […]
February 10th, 2006 at 8:48 am
Why would you not have to version Shared Libraries. I’m assuming you mean, for example in a php environment, packages such as PEAR, PECL, phpmailer, magpierss etc. Of course they should have backwards compatibility, but who’s to say taht will always be the case. Or are you refering to shared libraries as Apache, PHP, Mysql executables and modules etc that should be mirrored and only rolled out when all 3 stages are ready to migrate to new releases?
February 10th, 2006 at 10:24 am
Hi isosceles,
Thanks for the comment. To be honest, I’m not sure what the Shared Libraries are. I recreated this diagram from memory based on the talk given by another programmer.
The Shared Libraries may refer to in-house developed code, not packages like PEAR, etc. I agree that such code should in fact be versioned, but perhaps it should rest separately from the project. Thus, any shared library code should not be versioned along side a web project?
I’m not sure if that’s a fool-proof technique, especially since it raises backwards-compatibility concerns. We have yet to actually implement this technique for our own web projects.
What we do is rather than referring to a shared library, we duplicate our Shared Library code for each project. That duplicated code becomes part of the project (it gets versioned). This means more maintenance for us (we have to remember to update the code in all places), but it also gives each web project more flexibility and removes any backwards-compatability issues.