Hints


Giving WordPress Its Own Directory - This WordPress Codex contains the instructions for storing the core WordPress files in a sub directory. This removes the clutter of wp-* from out document root.

There may be a few buggy links at this time while we complete the transition to WordPress 2.0.4 and install the k2 theme.

Last month I disabled all email notifications for new posts and comments. I did this for a few reasons:

1. We had been getting lots of comment spam, and these were generating lots of unnecessary email notifications. (I’ve since set stricter blacklist filters to immediately remove any spam).

2. More than one user had requested to have their notifications disabled.

3. Email notifications are, well, annoying.

That’s right… email notifications are annoying. They clutter up your inbox and make email more distracting than it needs to be.

Fortunately, there’s a much better solution to staying up-to-date with the latest blogs and news sites: RSS!

What is RSS?
RSS = Really Simple Syndication
It’s basically a lightweight XML file format that distributes articles w/ appropiate metadata (author, date, time, headline, news snippet, etc.) to any RSS Reader.
Another term for RSS is ‘feed’.

Here’s how it works (brief):
A server (like this blog) generates an XML file in RSS format.
A client (any RSS Reader) periodically checks the server (our blog) and retrieves the latest XML file. The client then displays the contents of that file in a human-friendly interface.

Where can I find an RSS Reader?
I use NetNewsWire for the Mac. However, it’s a commercial app, and it will cost you the license fee.

There are some cheaper (free!) alternatives, which may be better… particularly for newbies. Safari and Firefox both have built-in RSS reading capabilities. I would recommend using one of those (whichever is your preferred browser), especially if you wish to keep on track with this blog.

Setting up RSS on Safari
1. With Safari open, open the Bookmarks page: Bookmarks > Show All Bookmarks OR bookmark icon on the bookmarks bar (left side)
2. Make sure Bookmarks Bar is selected under Collections.
3. Add a new bookmark Folder: Bookmarks > Add Bookmark Folder OR click the + sign underneath the Bookmarks frame.
4. A new folder appears called ‘untitled folder’. Rename this to ‘OI Blog’.
5. Close the Bookmarks page.
6. For each of the two RSS links below:
- open up the page.. this opens Safari’s built-in RSS Reader
- bookmark this page under the OI Blog folder: you can quickly do this by dragging the link on top of the folder. To drag the link, select the favicon (the icon just left of the link in the url bar) and drag.

Anytime there’s a new post or comment, a number in paranthesis should appear next to OI Blog. This is the number of unread entries in your feed and also your notification that the blog has been updated in some way. It’s much less obtrusive than an email notification, and much more convenient for checking up on the blog.

RSS Links
Here are the ever-important links to this blog’s RSS feeds. We have two feeds, one for posts and one for comments. These links are also found in the sidebar under Extra Stuff.

- Posts (RSS)
- Comments (RSS)

Setting up RSS on Firefox should be fairly similar to Safari, so I’ll forgo the procedure here.

If you made it this far and need some help with RSS, please contact me. Enjoy!
–Shaun

Yesterday, I encountered first-hand an nerve-wrecking issue that has plagued a fellow worker’s PC. Everytime he loaded a page from the CCE LTER website, the browser would slowly redraw the entire background before displaying the rest of the page.

This occured on a rather fast machine, and only with Internet Explorer 6. Firefox on this machine worked fine. Other PCs worked fine (with both browsers). Macs worked fine (with Safari and Firefox). This issue seemed to be isolated to this one particular machine.

To work-around this issue, I resized the repeating background image from 1×4 px dimension to a 50×200 px dimension. This greatly reduces the amount of processing the browser does in drawing the repeating background image (by a factor of 2500 times) while very slightly increasing the image file size. This seemed to “fix” the slow redrawing issue.

Inspiration for this idea came from here: Tiny gifs: not a good idea.

Some important notes:
- I never technically fixed anything. I still have no idea why IE 6 on this machine rendered the page poorly, while other machines (some slower and older) showed no problems at all.
- I additionally noticed that IE6 on this particular machine rendered images very poorly. Jpegs appeared pixelated, etc. Firefox rendered images fine, as well as IE6 on other PCs (in our lab).
- It was important to see and experience the problem first-hand. Otherwise it would have been virtually impossible to diagnose the issue and find a solution or work-around. In other words, for this particular issue, it was extremely helpful to see the problem on this user’s PC.
- I made the same changes for the Palmer LTER website and also this site, both which employ the same design pattern of a repeating tiled-background.
- The Yahoo! User Interface Blog uses a 1px repeating background. Check it out and see how fast (or slow) your browser draws the background…. especially if you’re running IE6/Win.

I just wanted to share a couple new techniques I’ve learned:

The first is pretty simple - Apple’s mod_auth_apple supports authentication against local accounts. This means there’s no need to maintain a separate .htpasswd list. You can just create a .htgroups file with groups defined as:

groupname: user1 user2…

where the users are usernames for local accounts (note the that account password type can be either shadow or OpenDirectory). The .htaccess configuration is the same as if you were using mod_auth:

AuthName “My Protected Area”
AuthType Basic
AuthGroupFile /path/to/.htgroups
Require group groupname

The second trick I’ve learned deals with the interaction between mod_auth (or mod_auth_apple), mod_rewrite, and SSL. You can use mod_rewrite to force a directory to use SSL by adding something like this to the .htaccess file:

RewriteEngine On
RewriteCond %{SERVER_PORT} 80
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [R=permanent,L]

This rewrites all non-SSL connections to SSL connections; it’s more user-friendly than SSLRequireSSL, which just displays an error for non-SSL connections. However, problems arise when the directory is also protected with mod_auth. The authentication directives are read before the rewrite directives, so the user is prompted to authenticate over a non-SSL connection. Then the rewrites kick in, and rewrites to a SSL connection. Mod_auth sees that the URL has changed, and prompts to user for authentication again. So, what the user sees is one unsecure prompt, followed by a second secure prompt. This is both a security risk and confusing to the user.

The solution I’ve found is to put the rewrite rules in a .htaccess file in the target directory, but to put the authentication rules in the virtual host configuration for port 443 only. This way, when the user attempts a non-SSL connection, there are no authentication rules in place, and the rewrite happens immediately. Once the URL has been rewritten to SSL (and thus to port 443), Apache now sees that that are authentication rules in place and prompts the user to supply a username and password over a secure connection.

This solution could be cleaner if there were an Apache directive similar to that would allow you to discriminate by port number - if this were available, the entire configuration could go in a .htaccess file. As it is now, you have to configure two virtual hosts for each site, one on port 80 and one on port 443. OS X configures sites this way by default, but other for other servers this fix might require more work.

Last year, members from various LTER sites collaborated in creating the LTER EML Unit Registry. This made possible having an authorative source of units for reference in generating EML documents.

LTER Unit metersPerSquareSecond
Though the Unit Registry effort has been successful, there have been a few technical drawbacks. One issue was dealing with “junk” characters in the unit abbreviation field. This is the result of different character encoding types conflicting with each other.

For example, the unit metersPerSquareSecond should have an abbreviation m/s2. However, the LTER EML Unit Registry page is using a charset encoding of iso-8859-1. This encoding type causes the “junk” characters to appear. The picture below shows the source code from the LTER EML Unit Registry home page.

charset=iso

To solve this issue locally, I set the charset encoding type to UTF-8. This Unicode standard ensures that the correct characters appear…. among these are the superscript 2 and 3 (for squared and cubed respectively), and the greek letters Mu (for micro) and Omega (for ohm). The picture below shows the source code from the Ocean Informatics Datazoo home page. The Palmer LTER and CCE LTER Unit Registries are kept in sync with each other.

charset=utf

Notes:
- To remove the junk characters, I copied and pasted “Special Characters…” from the Safari Browser Edit window.
- No changes were required in the MySQL Collation, contrary to initial thought. MySQL is able to store Unicode-encoded strings as text datatypes, using our default Collation of latin1_swedish_ci.
- Unicode-encoded strings should not be wrapped by the htmlentities() function in PHP. This will cause the “junk” characters to appear.
- This page was a good reference for working with Unicode in MySQL and PHP. Additionally, the O’Reilly book Bulding Scalable Web Sites has an entire chapter devoted to character encoding. This book was authored by Cal Henderson of Flickr fame. I was able to read parts of the book at Safari Tech Books Online.

I’ve been using phpMyAdmin to generate database schemas. Since I use phpMyAdmin pretty much every day, it’s very convenient as opposed to DBDesigner, which resides on a separate server and has the overhead of connecting to and opening another app (just for one function).

The schema diagrams generated by phpMyAdmin are pretty, with colorful straight lines to represent table/field relations. However, the major pitfall is that these schema diagrams don’t show the data type. It’s usually helpful to know whether a field is an int, float, char, or text. This is one thing that DBDesigner does quite well.

Given that we have a couple meetings next week involving schemas (one of them being the start of a series of schema meetings), I decided to hack the phpMyAdmin code a bit so that it shows the data types side-by-side with the field names. This way I can continue to use phpMyAdmin to generate diagrams without the comprising of less information.

Beware, the rest of the post gets a bit techinical (aka geeky).

The only file I hacked was pdf_schema.php. This is located in the root dir of the phpmyadmin install.

This file contains the PMA_RT_Table class, which is where all the hacking was done.


class PMA_RT_Table {
// lots of code...
}

First, I added a new private var to store an array of types. This corresponds to the existing array of fieldnames for the class.


class PMA_RT_Table {
   var $fields = array(); // existing code
   var $types = array(); // my code
}

Next, in the constructor function, I added a line of code to populate the types array:


function PMA_RT_Table(...) {
// ...snip

       // load fields
       while ($row = PMA_DBI_fetch_row($result)) {
            $this->types[] = $row[1];  // my code
            $this->fields[] = $row[0];
        }

// ...snip
}

I then added code for the table width calculation to ensure that the drawn table diagrams would be wide enough to display both the field name and data type:


function PMA_RT_Table_setWidth($ff) {
// ...snip

      foreach ($this->fields AS $key => $field) {
           // srh hack to set width for field and type
          $type_arr = split(" +",$this->types[$key]);
          $type = $type_arr[0];
          if ('enum' == substr($type,0,4))
                $type = 'enum';
          if ('set' == substr($type,0,3))
                $type = 'set';
          $field .= " [{$type}]";

// ...snip
}

Finally, I added the same code to actually splice the field name with the data type:


function PMA_RT_Table_draw(...) {
// ...snip

      foreach ($this->fields AS $key => $field) {
           // srh hack to show data types next to field names
          $type_arr = split(" +",$this->types[$key]);
          $type = $type_arr[0];
          if ('enum' == substr($type,0,4))
                $type = 'enum';
          if ('set' == substr($type,0,3))
                $type = 'set';
          $field .= " [{$type}]";

// ...snip
}

I won’t bother going into depth about how and why I hacked the code. Doing so would be hard without presenting more context from the pdf_schema.php file.

The main purpose of this post is to provide me with a memory refresher so I can come back and reference it later at a critical time (i.e. upgrading phpmyadmin). Plus it’s always good to document work like this, no matter how you choose to document it. I chose the blog, and by doing so I am making visible a minor part of my work that would otherwise remain situated in oblivion.

View the difference!

» Before the hack (pdf)

» After the hack (pdf)

This is an interesting read (pulled from my delicious links) on database optimization, and the downsides of normalization:

Normalized data is for sissies

The article links to a pdf presentation given by Cal Henderson, who helped create Flickr. A quick snippet:

In Flickr’s case, they have 13 SELECTs for every INSERT, DELETE, and UPDATE statement hitting their database. Normalization can slow SELECT speed down while denormalization makes your I/D/Us more complicated and slower. Since the application part of Flickr depends so heavily on SELECTs from the database, it makes sense for them to denormalize their data somewhat to speed things up.

A quick note about setting runtime user config settings for Subversion.

First the background:
Lately I’ve been using TextMate (a really cool text editor) for my development work. TextMate likes to create backup files for each file I edit. It saves these files with a prefixed ._

For example, the UNIX filename backup of file.php is ._file.php

How does this affect subversion?
Everytime I issue a svn status command, it shows me a list of all files that have been modified, added, or deleted in the repository. It also shows a list of all files that are not versioned, including these hidden TextMate backup files.

Fortunately, there is a way to tell Subversion to ignore these files.

Every user has a .subversion in their home directory. In this folder is a config file where you can define a global-ignores parameter (set of regular expressions) which tells subversion to ignore these files. The config file already provides a default set of regex patterns. I simply added ._* to the list. Now those hidden TextMate files don’t show up in the svn status list.

You can also define regex patterns to ignore at a directory level. The command
svn propedit svn:ignore [dir]
opens a file where you can specify which files and patterns you want to ignore for that directory. This is useful in the datacat, for example, because I have a couple unversioned folders: docs, lter

By telling subversion to ignore these folders, I am further eliminating the noise from svn status.

About that propedit command: You need to have defined your editor-cmd in the config file, or pass it in as an argument: svn propedit svn:ignore [dir] --editor-cmd vi

I set my editor-cmd to vi in my config file.

I fixed a Javascript bug on the Attribute form. The ‘Add a Row’ link wasn’t working for dynamic tables in IE/Win. It turns out the tr element needs to be appended to the tbody tag, not the table tag. This is issue is documented further here: http://ncyoung.com/entry/494

in working with trasect figures in Matlab7.1 (R14) on my mac over the past few days, i have discovered a few secrets:

- Matlab has a canned feature for plotting with two Y-axis vairables (plotyy.m), but leaves double X-axis plotting up to the user. the way to do this is to create one figure, and then overlay a second set of axes on top of the original by ‘get’ting the axes ‘Position’ from the original set and using that to set the ‘Position’ for the second. when creating dual x-axis figures, there are a few things to consider:
1. overlapping tickmarks - if you create two sets of axes, it is likely because you are trying to plot two seperate sets of variables with different scales. This means dual tick-marks which may make your figure overly confusing. Possible solutions:
- use set(gca,’TickDir’,'out’) on one axes set to differentiate from the inward tick marks.
- do not use tick marks (see the next ’secret’ for more info)
- turn the bounding box off to remove ticks from secondary x and y axes, while leaving the primary tickmarks. set(gca,’Box’,'off’)
2. overlapping axes - only certain visualization fuctions seem to be compatible with xx figures. for example, adding a bar or plot is not possible as far as i can tell, nor are multiple levels to one of the axes (ie. no ‘hold on’ allowed!). much trial and error seems necessary! note: put more complicated figure (ie. contour, pcolor) on the bottom layer.
3. remember to turn off the background color in the overlapped axes, the default is white which will obscure the original figure. set(gca,’Color’,'none’)
4. set the second x-axis to the top by using the ‘XAxisLocation’ setting ‘top’.

- tick marks are fun! there are a few things the books don’t tell you:
1. while you CAN have tickmarks without labels, you CANNOT label intervals without tickmarks! you can get around this by hint #2.
2. you can change the tick mark length! this hidden value is not in any book or help menu that i have found, but it is invaluable for removing tick marks but leaving the labels. i found it using the ‘inspector’ feature under the plot tools. the command is set(gca,’TickLength’,[a b]) where a and b are mystery values. my experimentation shows that a must be a positive value greater than zero (ie. 0.001), or the tick marks will not appear at all, and that b seems linked to the actual length, with 0.4 being a common starting value, 5 being very very long! using [0 0.01] will remove the ticks from the figure. using this command in combination with ‘Box’,'off’ will leave the tick marks on your screen-display figure looking seperated from the axis, but when you export this problem disappears in the .tif version.
3. another way to include labels without the tick marks is to use the text function which still works well in double axes.

- the matlab defaults for image quality are very poor. suggestions are as follows:
1. when exporting, create a .tif with the openGL renderer and 150dpi. this will give a full-page figure of around 2Mb that looks nice. 300dpi is nicer, but the file size starts to get out of hand.
2. printing from the screen figure works well, but for some reason the pageSetup options are greyed out on the printMenu. remember to set your pageSetup Lines and Axes to ‘Color’ before printing!

- i have not yet found a way to make the colorbar logrithmic. i belive this is possible by changing the axes defaults for the colorbar (set bar=colorbar to get the handle for the axes!) but i do not know if it will translate the axes change to the color values. another possibility is to create a new colormap with log intervals for color. i will post more when i have it figured out!

Next Page »