Blog

Due to the persistence of comment spam and at the requests of individuals, all email notifications involving new posts and comments have been disabled. To keep on track with the blog, I highly recommend subscribing via an RSS reader. I like using NetNewsWire on the mac. Mac users may also use Safari’s built-in rss reader. For the Firefox users, I’m sure there are tons of rss plugins available in addition to the Live Bookmarks feature.

We will continue to combat the spam on this blog (a great deal of it is filtered immediately by the blacklist words). Apologies for the email noise over the last few days.
–Shaun

UPDATE:
Additionally, I’ve disabled open registration and anonymous comments. At this point, we can manually add new users as they come along, and users need to log in to leave a comment.

Sorry for the facist-like policy. I’m ideally for open registration and anonymous comments because this lets the casual passer-byer leave his/her two cents. Unfortunately, the amount of spam greatly outweighed the amount of meaningful passer-by comments, so it’s best to shut all doors completely.

I just wanted to share a couple new techniques I’ve learned:

The first is pretty simple - Apple’s mod_auth_apple supports authentication against local accounts. This means there’s no need to maintain a separate .htpasswd list. You can just create a .htgroups file with groups defined as:

groupname: user1 user2…

where the users are usernames for local accounts (note the that account password type can be either shadow or OpenDirectory). The .htaccess configuration is the same as if you were using mod_auth:

AuthName “My Protected Area”
AuthType Basic
AuthGroupFile /path/to/.htgroups
Require group groupname

The second trick I’ve learned deals with the interaction between mod_auth (or mod_auth_apple), mod_rewrite, and SSL. You can use mod_rewrite to force a directory to use SSL by adding something like this to the .htaccess file:

RewriteEngine On
RewriteCond %{SERVER_PORT} 80
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [R=permanent,L]

This rewrites all non-SSL connections to SSL connections; it’s more user-friendly than SSLRequireSSL, which just displays an error for non-SSL connections. However, problems arise when the directory is also protected with mod_auth. The authentication directives are read before the rewrite directives, so the user is prompted to authenticate over a non-SSL connection. Then the rewrites kick in, and rewrites to a SSL connection. Mod_auth sees that the URL has changed, and prompts to user for authentication again. So, what the user sees is one unsecure prompt, followed by a second secure prompt. This is both a security risk and confusing to the user.

The solution I’ve found is to put the rewrite rules in a .htaccess file in the target directory, but to put the authentication rules in the virtual host configuration for port 443 only. This way, when the user attempts a non-SSL connection, there are no authentication rules in place, and the rewrite happens immediately. Once the URL has been rewritten to SSL (and thus to port 443), Apache now sees that that are authentication rules in place and prompts the user to supply a username and password over a secure connection.

This solution could be cleaner if there were an Apache directive similar to that would allow you to discriminate by port number - if this were available, the entire configuration could go in a .htaccess file. As it is now, you have to configure two virtual hosts for each site, one on port 80 and one on port 443. OS X configures sites this way by default, but other for other servers this fix might require more work.

Third time’s a charm! (let’s hope)

I upgraded WordPress to version 2.0.3 from 1.5. This was my third attempt at upgrading. My previous two attempts were unsuccessful in that the blog seemed to perform much slower after upgrading.

This time around, I took all the precautions: backing up data, deactivating plugins, etc.

One by one, I will start upgrading and reactivating some plugins to get everything back to speed (ex. Archives plug-in). I also need to recode some hacks (ex. users can edit own comments).

I’ve written previously about wp2.0 after my 1st upgrade attempt. Read about it here

[NOTE]
The oceaninformatics.ucsd.edu/blog url is an alias oceaninformatics.ucsd.edu/index.php?cat=12

[UPDATE]
All plug-ins and hacks are back up. Enjoy!

Last year, members from various LTER sites collaborated in creating the LTER EML Unit Registry. This made possible having an authorative source of units for reference in generating EML documents.

LTER Unit metersPerSquareSecond
Though the Unit Registry effort has been successful, there have been a few technical drawbacks. One issue was dealing with “junk” characters in the unit abbreviation field. This is the result of different character encoding types conflicting with each other.

For example, the unit metersPerSquareSecond should have an abbreviation m/s2. However, the LTER EML Unit Registry page is using a charset encoding of iso-8859-1. This encoding type causes the “junk” characters to appear. The picture below shows the source code from the LTER EML Unit Registry home page.

charset=iso

To solve this issue locally, I set the charset encoding type to UTF-8. This Unicode standard ensures that the correct characters appear…. among these are the superscript 2 and 3 (for squared and cubed respectively), and the greek letters Mu (for micro) and Omega (for ohm). The picture below shows the source code from the Ocean Informatics Datazoo home page. The Palmer LTER and CCE LTER Unit Registries are kept in sync with each other.

charset=utf

Notes:
- To remove the junk characters, I copied and pasted “Special Characters…” from the Safari Browser Edit window.
- No changes were required in the MySQL Collation, contrary to initial thought. MySQL is able to store Unicode-encoded strings as text datatypes, using our default Collation of latin1_swedish_ci.
- Unicode-encoded strings should not be wrapped by the htmlentities() function in PHP. This will cause the “junk” characters to appear.
- This page was a good reference for working with Unicode in MySQL and PHP. Additionally, the O’Reilly book Bulding Scalable Web Sites has an entire chapter devoted to character encoding. This book was authored by Cal Henderson of Flickr fame. I was able to read parts of the book at Safari Tech Books Online.

Here’s a link to a very spirited exchange on different versioning tools and methodologies:

http://ask.slashdot.org/article.pl?sid=06/06/05/1922209&threshold=-1

Here’s an intriguing excerpt that talks about SVN and WebDAV:

SVN + WebDAV + Autoversioning
(Score:5, Informative)
by HFShadow (530449) on Monday June 05, @06:20PM (#15475989)
http://svnbook.red-bean.com/nightly/en/svn.webdav. autoversioning.html [red-bean.com]

From the SVN Handbook:
“Because so many operating systems already have integrated WebDAV clients, the use case for this feature borders on fantastical: imagine an office of ordinary users running Microsoft Windows or Mac OS. Each user “mounts” the Subversion repository, which appears to be an ordinary network folder. They use the shared folder as they always do: open files, edit them, save them. Meanwhile, the server is automatically versioning everything. Any administrator (or knowledgeable user) can still use a Subversion client to search history and retrieve older versions of data.”

And here’s the link to the SVN handbook this came from:

http://svnbook.red-bean.com/nightly/en/svn.webdav.autoversioning.html

Worth a look and perhaps a followup discussion.

Video Conferencing has come to the forefront within the LTER community recently. For this the digital interface standard/protocol is a determining factor. And with standards there’s always some history so some introduction to video conferencing (VC)/ video teleconferencing (VTC) are helpful:
-H.323 history: http://myhome.hanafos.com/~soonjp/vchx.html
-Instant messaging Session Initiation Protocol (SIP ) history:
http://www.showkit.com/download/samples/with/internet/html/slide6.html
-iChat: http://en.wikipedia.org/wiki/Ichat
-OV: http://www.terena.nl/publications/files/videoconf-reccomendations-dec2005.pdf
-glossary: http://www.terena.nl/activities/iptel/chapters/Glossary.pdf

I am using two instant interface environments with different standards/protocols (see I)SIP and II)H.323 below); the first I have been using for some time and the second we tested out yesterday for the first time. With two platform configurations (Karen/Shaun: $130 isight camera; free xMeeting on mac; Don: $50 logitech camera; $105 Polycom PVX8.0 on pc) we made successful xmeeting tests with mac-mac (Shaun/Karen; Shaun/Mick) and with mac-pc (LNO set-up; Don interactive).

The lessons learned for H.323 include:
-must turn off firewall or caller receives msg “far site could not be reached”
-set microphone to external (ie isight) microphone (instead of built-in)
-use headphones to remove feedback
-desktop sharing is poor resolution; requires further investigation

I) Aim: a stand-alone proprietary AIM (AOL Instant Messanger) client software available for free for MSWindows, Mac OS, Linux, Wiondows CE and Palm OS.
protocol: Standard SIP protocol
Mac Side: iChat AV shipped with platforms; four-way multi conferencing available in recent system software though must be initiated with G5 architecture
http://docs.info.apple.com/article.html?artnum=301050

II) Polycom VC: standalone hardware units or client software for purchase
protocol: H.323 standard set by the International Telecommunication Union (ITU)
ITU H.323 compatible VC clients
Polycom hardware for small/medium multiple conference groups:
Polycom VSX5000 (~$3500); VSX7000x (~$5500); V500 (~$150)
PC Side standalone config: : PVX software with a Logitech camera ($160)
as alternative to V500 or VSX300
Mac Side: ohphone or xmeeting as next generation (open source)
xmeeting 0.2 is at xmeeting.sourceforge.net; recommended for tiger;
has all ohphone features; built on new architecture
in xmeeting preferences need to h.323/enable h.323; SIP enable SIP

We’ve recently been getting lots of comment spam on this blog. Fortunately, pretty much all of it has filtered in the Moderation Queue because of matching keywords. However, the influx of spam is getting annoying. Thus, I am blacklisting the following words. If any of these words appear in a comment, the comment will be automatically deleted with no notification. Missing from this list are some more common words (shoes, casino, slot, etc.). Those words still appear in the Moderation list.

-online
4u
adipex
advicer
ambien
baccarrat
blackjack
bllogspot
booker
byob
car-rental-e-site
car-rentals-e-site
carisoprodol
cialis
credit-report-4u
cwas
cyclen
cyclobenzaprine
dating-e-site
day-trading
debt-consolidation-consultant
discreetordering
duty-free
dutyfree
equityloans
fioricet
flowers-leading-site
freenet-shopping
freenet
health-insurancedeals-4u
homeequityloans
homefinance
holdem
holdempoker
holdemsoftware
holdemtexasturbowilson
hotel-dealse-site
hotele-site
hotelse-site
incest
insurance-quotesdeals-4u
insurancedeals-4u
jrcreations
levitra
macinstruct
mortgage-4-u
mortgagequotes
online-gambling
onlinegambling-4u
ottawavalleyag
ownsthis
palm-texas-holdem-game
paxil
penis
phentermine
poker-chip
rental-car-e-site
shemale
slot-machine
texas-holdem
thorcarlson
top-site
top-e-site
tramadol
trim-spa
ultram
valeofglamorganconservatives
viagra
vioxx
xanax
zolus

In a search for some inspiration on my current project, I revisited one of my old favorite websites, the MIT OpenCourseWare website. MIT has generously posted all of the materials associated with hundreds of their classes (in theory, it should be all of them, but it sometimes doesn’t happen that way). If you want to learn about, say, Algorithms for Biological Computing, just navigate over to the computer science section of opencourseware and click on the class title. Inside are complete lecture notes, readings, tests, homeworks, projects, resources, etc. It’s a fantastic example of interoperability between departments and an open framework for the dissemination of knowledge. In essence, you have the complete lesson plans for every class at MIT at your disposal.

In the midst of the ever-evolving database design process, I am starting to collect ADCP tidbits to better inform the future addition of this dataset to any site databases. The following notes are random and unorganized, but most will hopefully help advance the general understanding of these more complex data:

NOTE: The ADCP datatypes that will most concern us will be bottom-mounted, mooring-mounted (upward or downward looking), and shipboard. (other types would be ROV-mounted, instruments side-mounted on oil-rigs, etc) The following notes apply ONLY to shipboard ADCP data.

- Each ship has it’s own quirks, an understanding of which is needed to sucessfully process the data from each instrument.
New Horizon - instrument: Teledyne/RDInstruments Ocean Surveyor Broadband/Narrowband 150kHz ADCP
acquisition: PC with RDInstruments VmDas Version 1.42 software

- General: Teri Chereskin has been involved in all aspects of the CalCOFI ADCP data, from designing the instruement profiling scheme, to setting up the instrument pre-cruise, pulling data off of the collection computer post-cruise, transport, processing, any post-processing needed, and archiving. She maintains an independent ADCP database on her systems. I am trying to learn the ropes to take over some of those responsibilities for the CCE-LTER cruises. Teri’s webpage with documentation, ADCP data websites and other info is here:
http://tryfan.ucsd.edu/adcp/adcp.htm

- Data Size: The CalCOFI NH0604 cruise ADCP data is 2.9Gb, has 593 files and 5 different formats.

- Formats: .ENR - single ping raw data in beam coordinates (binary)
.ENS - serial info added at computer level (ie. with GPS brought in), beam coordinates with extra info (binary)
.ENX - in earth coordinates, calculated from internal header information (binary)
.N1R - GPS data (ascii)
.N2R - Ashtek data (ascii)

I believe that the ENS and ENX formats can be re-created by using the ENR and N1R, N2R files? I do not know the details yet of this nor of the CalCOFI final/reporting formats.

- Transport: Mark Ohman purchased a LaCie d2 Hard Drive Extreme for moving data from the ship to SIO. I originally formatted the drive for the PC in NTFS, but should possibly re-format to fat32? The NTFS format was readable on my Mac OSX, and was compatible with Teri’s system (type?). Teri has a disk on coast which can be directly connected with the external hard drive in the CCS server room.

more later…

What is a Dictionary?

Collecting words and their defintions into dictionaries is the work of lexicography. Funk and Wagnall’s Standard Dictionary of language specifies the meaning of the word ‘dictionary’ as ‘1. A reference work containing alphabetically arranged words together with their definitions, pronunciations, etymologies, etc. 2. A lexicon whose words are given in one language together with their equivalents in another. 3. A reference work containing information relating to a special branch of knowledge and arranged alphabetically.’

A research science dictionary is 1. A reference work containing a collection of terms that are used in a scientific community along with information that is required to understand each term. 2. A reference work that prescribes a standard for the community language. 3. A reference work written to help translate terms between texts and languages (i.e.. from a journal to a computer processing program). While a standard language dictionary helps a reader understand an unfamiliar word by relating it to information categorizing that term specifically (internal), a scientific dictionary helps researchers understand and utilize data collected elsewhere by defining terms both internally and externally in the context of the community. The definition of a term may be dependent on any combination of the following features:

Internal
Human Usage: Abbreviations, formal names and publication preferences must all be taken into account in the definition of a term so that it can be widely recognized.
Standards: Standards to which a term relates should be described fully in a dictionary. For example, measurement terms will often refer to International System of Units (SI) standard, so a unit dictionary will include definitions that relate a measurement back to the parent SI unit of the same unit type.
History: A dictionary can also bridge technology leaps and changes in community practice. For instance, previously used data processing programs might have needed one set of information while current programs use another, or changes in data collection from human-gathered to instrument-collected can cause language barriers to data comparisons. A dictionary, in providing language used for all cases, can provide back-compatibility to datasets that might not otherwise be useable.

External
Community Culture: (see Databits article: “Designing a Dictionary Process: Site and Community Dictionaries”) “Although names and their definitions are seemingly mundane and even trivial concepts, this does not mean that the articulation, exchange, and blending of unit and attribute names are simple matters. Names go to the heart of local work practices and of data interoperability.” Local nicknames propagate through work practices and become standard within that community; recognizing and including both local and intra-community culture as part of a dictionary creates a human-accessible document for translation between groups of people.
Computer Usage: In this age of rapidly increasing technological power, computers are taking over parts of data analysis previously preformed by humans. To do this, the computer and specifically any programs need to know many things about the data, for example if they are binary or ASCII, string or integer, etc. A dictionary, as Funk and Wagnalls noted, is a tool to translate from one language into another, in this case from human-accessible data into programming terms for automated computations.
Technology Infrastructure: Database software, analysis software and programs themselves all need different types of descriptors in order to run efficiently, and to allow the greatest access, search and display features. A dictionary can provide many types of technological information to facilitate cross-platform and cross-system access, for example the format for the dates and times present, etc.

Dictionary Purpose
A dictionary is created for a number of reasons listed above, including describing terms and prescribing a standard, however the purpose of a dictionary is also directly tied to the needs of the end-user and the audience for whom it is created. In fulfillment of these needs, a dictionary’s purpose also includes providing access to shared data, aiding in database searches, providing information needed for interoperability, guiding entry-level projects and informing controlled vocabulary work.

Uses for Different Sized Groups
A small team of people such as a laboratory group may use a dictionary in order to move away from ‘tribal knowledge’ and articulate their local standards for field acquisition and data processing. On this level, a dictionary can also bring together the language of people with different job descriptions; a field technician and lead scientist can use a dictionary to log and document all appropriate methods and acquisition metadata, a programmer can use a dictionary in order to optimize processing code, and an information manager can use a dictionary in order to efficiently archive files into a database or reference the proper standard, etc.
When multiple small groups are collaborating on a project, the dictionary becomes a tool of interoperability that allows the merging of datasets collected and processed by the individual groups on the human and computer levels. Intra-group differences in methodology and abbreviations for like measurements are clearly articulated and possibly resolved in a single dictionary or a combination of dictionary types (see following Dictionary Types section for a brief list).
A community-wide dictionary allows for automated data comparisons spanning many differences such as in acquisition methodologies. Carbon production for example can refer to land or water-based measurements collected with vastly different methodologies, processed using different calculations, etc. Dictionaries enable the collation of carbon production data from many sources, enabling comparisons and faciliting any potential unit conversions.
Dictionary Types

There are many types of dictionaries, a few examples are listed here:
A code dictionary is a mechanism by which coded entries in a dataset can be explained by outside documentation. Codes are a straightforward and efficient way for a group to communicate locally, and storing the code information in a dictionary format provides a centralized clearinghouse for this important knowledge so users not familiar with the colloquialisms can reference material without speaking to an individual within the group. A common use of codes is in naming field stations; a code dictionary might contain a list of field station names translated into latitude and longitude, or pointers to a paper describing the field grid layout and station positions. An acronym dictionary would also fall under this type.
A unit dictionary links local measurements to a standard or an accepted scientific convention (i.e. the SI standard of units) and bridges local abbreviations and unit names to language preferred by journals and technical publications. From a unit dictionary, a user can generate a list of all entries of (SI) unit type ‘length’, convert between them and provide proper abbreviations as used in a domain journal. Unit dimensions and types are also an important part of the unit dictionary as this information facilitates automated conversions and informs the creation of new units that may not directly relate to the standard, such as units of abundance.
An attribute dictionary details information about attributes stored in a database, including links to unit and code dictionaries. For example, a temperature measurement might be defined by an attribute dictionary with information including what type of temperature is recorded (sea surface temperature), what units the measurement is in (pointer to the unit dictionary entry for ‘Celsius’), a description of the value (a real number, stored as a float with a precision of 0.01). use micromolar example here?
A method dictionary is one way of standardizing methodologies and aiding in metadata entry to a database. Rather than writing a complete method section for each dataset, references to predetermined and accepted practices will pull the proper information out of a method dictionary for insertion into a database or file. find USGS example

Dictionary Vision
A dictionary results from a collaborative process where people with different research goals from different scientific projects, and even from different branches of science, come together with the goal of comparing and/or sharing field measurements and models as well as providing a framework for interoperability to answer larger scientific questions. A dictionary bridges differences in datasets to enable direct comparisons and it fosters understanding between scientists who may use different terminology, computer processing techniques or operating systems. Further, it provides a mechanism to use collected data for a purpose beyond it’s original scope.
Deciding what information is needed to define a particular term involves the interpretation and discretion of the dictionary creator(s), but in this openness and lack of restriction is a flexibility that makes the notion of a dictionary so useful and important. Science is not a rigid field, it is fluid and ever-changing as hypotheses are proved and disproved, and as new perspectives, concepts and technology expand our ability to measure, analyze and perceive the world. A dictionary is dynamic in order to accommodate changes in understanding while at the same time serving as a static standard to inform data use.

« Previous PageNext Page »