Community


Here’s a link to a very spirited exchange on different versioning tools and methodologies:

http://ask.slashdot.org/article.pl?sid=06/06/05/1922209&threshold=-1

Here’s an intriguing excerpt that talks about SVN and WebDAV:

SVN + WebDAV + Autoversioning
(Score:5, Informative)
by HFShadow (530449) on Monday June 05, @06:20PM (#15475989)
http://svnbook.red-bean.com/nightly/en/svn.webdav. autoversioning.html [red-bean.com]

From the SVN Handbook:
“Because so many operating systems already have integrated WebDAV clients, the use case for this feature borders on fantastical: imagine an office of ordinary users running Microsoft Windows or Mac OS. Each user “mounts” the Subversion repository, which appears to be an ordinary network folder. They use the shared folder as they always do: open files, edit them, save them. Meanwhile, the server is automatically versioning everything. Any administrator (or knowledgeable user) can still use a Subversion client to search history and retrieve older versions of data.”

And here’s the link to the SVN handbook this came from:

http://svnbook.red-bean.com/nightly/en/svn.webdav.autoversioning.html

Worth a look and perhaps a followup discussion.

We’ve recently been getting lots of comment spam on this blog. Fortunately, pretty much all of it has filtered in the Moderation Queue because of matching keywords. However, the influx of spam is getting annoying. Thus, I am blacklisting the following words. If any of these words appear in a comment, the comment will be automatically deleted with no notification. Missing from this list are some more common words (shoes, casino, slot, etc.). Those words still appear in the Moderation list.

-online
4u
adipex
advicer
ambien
baccarrat
blackjack
bllogspot
booker
byob
car-rental-e-site
car-rentals-e-site
carisoprodol
cialis
credit-report-4u
cwas
cyclen
cyclobenzaprine
dating-e-site
day-trading
debt-consolidation-consultant
discreetordering
duty-free
dutyfree
equityloans
fioricet
flowers-leading-site
freenet-shopping
freenet
health-insurancedeals-4u
homeequityloans
homefinance
holdem
holdempoker
holdemsoftware
holdemtexasturbowilson
hotel-dealse-site
hotele-site
hotelse-site
incest
insurance-quotesdeals-4u
insurancedeals-4u
jrcreations
levitra
macinstruct
mortgage-4-u
mortgagequotes
online-gambling
onlinegambling-4u
ottawavalleyag
ownsthis
palm-texas-holdem-game
paxil
penis
phentermine
poker-chip
rental-car-e-site
shemale
slot-machine
texas-holdem
thorcarlson
top-site
top-e-site
tramadol
trim-spa
ultram
valeofglamorganconservatives
viagra
vioxx
xanax
zolus

In a search for some inspiration on my current project, I revisited one of my old favorite websites, the MIT OpenCourseWare website. MIT has generously posted all of the materials associated with hundreds of their classes (in theory, it should be all of them, but it sometimes doesn’t happen that way). If you want to learn about, say, Algorithms for Biological Computing, just navigate over to the computer science section of opencourseware and click on the class title. Inside are complete lecture notes, readings, tests, homeworks, projects, resources, etc. It’s a fantastic example of interoperability between departments and an open framework for the dissemination of knowledge. In essence, you have the complete lesson plans for every class at MIT at your disposal.

What is a Dictionary?

Collecting words and their defintions into dictionaries is the work of lexicography. Funk and Wagnall’s Standard Dictionary of language specifies the meaning of the word ‘dictionary’ as ‘1. A reference work containing alphabetically arranged words together with their definitions, pronunciations, etymologies, etc. 2. A lexicon whose words are given in one language together with their equivalents in another. 3. A reference work containing information relating to a special branch of knowledge and arranged alphabetically.’

A research science dictionary is 1. A reference work containing a collection of terms that are used in a scientific community along with information that is required to understand each term. 2. A reference work that prescribes a standard for the community language. 3. A reference work written to help translate terms between texts and languages (i.e.. from a journal to a computer processing program). While a standard language dictionary helps a reader understand an unfamiliar word by relating it to information categorizing that term specifically (internal), a scientific dictionary helps researchers understand and utilize data collected elsewhere by defining terms both internally and externally in the context of the community. The definition of a term may be dependent on any combination of the following features:

Internal
Human Usage: Abbreviations, formal names and publication preferences must all be taken into account in the definition of a term so that it can be widely recognized.
Standards: Standards to which a term relates should be described fully in a dictionary. For example, measurement terms will often refer to International System of Units (SI) standard, so a unit dictionary will include definitions that relate a measurement back to the parent SI unit of the same unit type.
History: A dictionary can also bridge technology leaps and changes in community practice. For instance, previously used data processing programs might have needed one set of information while current programs use another, or changes in data collection from human-gathered to instrument-collected can cause language barriers to data comparisons. A dictionary, in providing language used for all cases, can provide back-compatibility to datasets that might not otherwise be useable.

External
Community Culture: (see Databits article: “Designing a Dictionary Process: Site and Community Dictionaries”) “Although names and their definitions are seemingly mundane and even trivial concepts, this does not mean that the articulation, exchange, and blending of unit and attribute names are simple matters. Names go to the heart of local work practices and of data interoperability.” Local nicknames propagate through work practices and become standard within that community; recognizing and including both local and intra-community culture as part of a dictionary creates a human-accessible document for translation between groups of people.
Computer Usage: In this age of rapidly increasing technological power, computers are taking over parts of data analysis previously preformed by humans. To do this, the computer and specifically any programs need to know many things about the data, for example if they are binary or ASCII, string or integer, etc. A dictionary, as Funk and Wagnalls noted, is a tool to translate from one language into another, in this case from human-accessible data into programming terms for automated computations.
Technology Infrastructure: Database software, analysis software and programs themselves all need different types of descriptors in order to run efficiently, and to allow the greatest access, search and display features. A dictionary can provide many types of technological information to facilitate cross-platform and cross-system access, for example the format for the dates and times present, etc.

Dictionary Purpose
A dictionary is created for a number of reasons listed above, including describing terms and prescribing a standard, however the purpose of a dictionary is also directly tied to the needs of the end-user and the audience for whom it is created. In fulfillment of these needs, a dictionary’s purpose also includes providing access to shared data, aiding in database searches, providing information needed for interoperability, guiding entry-level projects and informing controlled vocabulary work.

Uses for Different Sized Groups
A small team of people such as a laboratory group may use a dictionary in order to move away from ‘tribal knowledge’ and articulate their local standards for field acquisition and data processing. On this level, a dictionary can also bring together the language of people with different job descriptions; a field technician and lead scientist can use a dictionary to log and document all appropriate methods and acquisition metadata, a programmer can use a dictionary in order to optimize processing code, and an information manager can use a dictionary in order to efficiently archive files into a database or reference the proper standard, etc.
When multiple small groups are collaborating on a project, the dictionary becomes a tool of interoperability that allows the merging of datasets collected and processed by the individual groups on the human and computer levels. Intra-group differences in methodology and abbreviations for like measurements are clearly articulated and possibly resolved in a single dictionary or a combination of dictionary types (see following Dictionary Types section for a brief list).
A community-wide dictionary allows for automated data comparisons spanning many differences such as in acquisition methodologies. Carbon production for example can refer to land or water-based measurements collected with vastly different methodologies, processed using different calculations, etc. Dictionaries enable the collation of carbon production data from many sources, enabling comparisons and faciliting any potential unit conversions.
Dictionary Types

There are many types of dictionaries, a few examples are listed here:
A code dictionary is a mechanism by which coded entries in a dataset can be explained by outside documentation. Codes are a straightforward and efficient way for a group to communicate locally, and storing the code information in a dictionary format provides a centralized clearinghouse for this important knowledge so users not familiar with the colloquialisms can reference material without speaking to an individual within the group. A common use of codes is in naming field stations; a code dictionary might contain a list of field station names translated into latitude and longitude, or pointers to a paper describing the field grid layout and station positions. An acronym dictionary would also fall under this type.
A unit dictionary links local measurements to a standard or an accepted scientific convention (i.e. the SI standard of units) and bridges local abbreviations and unit names to language preferred by journals and technical publications. From a unit dictionary, a user can generate a list of all entries of (SI) unit type ‘length’, convert between them and provide proper abbreviations as used in a domain journal. Unit dimensions and types are also an important part of the unit dictionary as this information facilitates automated conversions and informs the creation of new units that may not directly relate to the standard, such as units of abundance.
An attribute dictionary details information about attributes stored in a database, including links to unit and code dictionaries. For example, a temperature measurement might be defined by an attribute dictionary with information including what type of temperature is recorded (sea surface temperature), what units the measurement is in (pointer to the unit dictionary entry for ‘Celsius’), a description of the value (a real number, stored as a float with a precision of 0.01). use micromolar example here?
A method dictionary is one way of standardizing methodologies and aiding in metadata entry to a database. Rather than writing a complete method section for each dataset, references to predetermined and accepted practices will pull the proper information out of a method dictionary for insertion into a database or file. find USGS example

Dictionary Vision
A dictionary results from a collaborative process where people with different research goals from different scientific projects, and even from different branches of science, come together with the goal of comparing and/or sharing field measurements and models as well as providing a framework for interoperability to answer larger scientific questions. A dictionary bridges differences in datasets to enable direct comparisons and it fosters understanding between scientists who may use different terminology, computer processing techniques or operating systems. Further, it provides a mechanism to use collected data for a purpose beyond it’s original scope.
Deciding what information is needed to define a particular term involves the interpretation and discretion of the dictionary creator(s), but in this openness and lack of restriction is a flexibility that makes the notion of a dictionary so useful and important. Science is not a rigid field, it is fluid and ever-changing as hypotheses are proved and disproved, and as new perspectives, concepts and technology expand our ability to measure, analyze and perceive the world. A dictionary is dynamic in order to accommodate changes in understanding while at the same time serving as a static standard to inform data use.

by Lynn and Jerry

In the past, physical copies of data, articles and technical reports published at SIO were collected through the SIO Publications Committee (a sub-committee of the SIO Staff Council), given a SIO Reference Series publication number, and listed in an annually published hardcopy bibliography. This centralized system broadened community awareness of research and provided a common location for all publications independent of the scientist or lab involved in the research. Institutional support for this program ended in 2002, and with the retirement of Kitty Kuhns, the annual bibliographies ended.

Today, as demand for access to articles and data is growing, the need for a centralized, stable and accessible repository has re-emerged with some new obstacles. The web has provided the needed centralized space of which many researchers and groups have taken advantage. However, with constantly shifting URLs, servers, and funding cycles, maintaining these local access points is a laborious and frustrating task.

To fill the need, the SIO Library has assumed the local responsibility to maintain an online archive and repository called the e-scholarship Repository . This is a free service provided by a partnership between the University of California Office of the President (UCOP) and the non-profit California Digital Library(CDL). An author can submit a published paper in PDF format to the e-scholarship system through an email to siolib@sio.ucsd.edu, an email alias provided by Peter Brueggeman, Director of the SIO Library. The paper will be posted under the appropriate headers (headers include: Tech Reports, SIO Reference, etc) and given a permanent URL. Data, photos, graphs, and other file types can be submitted as well, either independently or as an ‘associated file’ to a paper. Web URLs can be included, though as websites grow, shift and age, links commonly break and are discouraged unless realistically stable for the long-term.

SIO has a separate space in the e-scholarship Repository and allows further subdivision into ‘centers’ and ’series’. The OceanInformatics group for instance can request designation as a ‘center’ with all submissions falling under this hierarchical step. A center can have multiple series which group submissions by designated subjects.

All files in the repository are stored on the servers at the CDL. The CDL is responsible for moving files and maintaining the permanent URLs through server upgrades, etc.

A note about copyrights:

To post a published article as part of the SIO Repository, the lead author must be granted permission through the publishing house or agency. While many journals have denied copyright releases for online access, a few have begun to open up to the idea. Anecdotally, the Elsevier journals (including Continental Shelf Research, Progress in Oceanography and others) will grant authors permission to post their articles online. The research society journals (i.e., those published through the AMS, AGU, etc.) have tended to be less cooperative in this regard though AGU will consider allowing pre-prints to be posted. Authors should investigate any copyright restrictions before posting a published work, the SHERPA/ROMEO list is an excellent place to start.

This week I finished porting the Palmer site into the new theme/framework. From Mason’s advice, I ported most of the files over from the Production area since Mason had previously used a link checker to originally move only the necessary files into the Production area, trimming it down significantly from the Dev area.

I ported practically every page manually. This allowed me to address some usability issues on certain pages, clean up some html markup, and most importantly, detect broken links and missing files. We have a few missing files in the Production area because the link checker we used failed to follow symbolic links. In addition, some documents may not have been detected, partically if they were referenced by a different protocol (file:/// vs. http://). These issues have been fixed for the New site.

The New Palmer site has reached a somewhat stable point. We have succeeded in restructuring the file system so to be parallel with the navigation setup of the site. It is now time (particularly for Karen) to fine-tune the structuring. Some folders and files may be moved around as Karen continues to add more content. However, the framework of the new file structure should help guide us to place and locate documents in a semantically meaningful way.

I would eventually like to run a link checker on the New site as part of the fine-tuning process. Mason used a fairly good one, nice interface and all that… except it’s Windows only. Plus it’s limited to 500 links w/o a registered license. We can use that link checker as a Plan B. I may try to find a Mac link checker next week. Best bet is probably to check versiontracker.com.

We will ultimately throw this site into Subversion. However, we need to address some issues before hand, particularly concerning images and documents. The Palmer site is loaded with distributed docs, pdfs, txts, and other documents. Is it worth loading these into svn and having them eat up space? I still beleive they should not g0 in the repository. This leaves a couple options:

  1. Create a central location to store images and documents, like pal.lternet.edu/images and pal.lternet.edu/docs
    This method requires an extra global variable in php to store the permanent URI of these locations. In this case, that would be pal.lternet.edu
    This method also implies that we could somewhat mirror the file structure inside these folders. For instance, a pdf referenced in the sci-research page could be located under /docs/sci-research
  2. Store images and documents in the both Development and Production areas. I am against this idea, mainly because it requires double-maintenance of any changes made to these files. It also makes the site less extensible. Should someone else want to checkout a 3rd copy from the repository, they would also need to copy over these folders from one of the two areas.

A lot of these documents are embedded deep into the multiple tiers of the file system. To sweep thru the system and move them all to a central docs folder would be quite tedious. Not only does it involve the physicall moving of the object, it also requires to make changes in the html to all hyperlinks that refer to that document. However, a workaround for that may be to create symbolic links to the documents’ new resting place so we could leave the hrefs at peace. If we choose to leave the files where they sit now, this corresponds to Option #2. It makes it easier on us by removing the burden of extra work, but it results in duplicated files spread thinly throughout the site.

***

I’ve also worked on some datacat stuff this week. Yesterday, I recoded the html for the datasets details page, so it fits well with the new template, and today I recoded the html for the attributes form. In addition, I’ve also been working on the IOD Personnel page.

Jotting down some notes and to-dos for the IOD personnel app:

- Jerry and I enabled http-based authentication yesterday over an SSL connection to provide extra security. These measures are in place to ensure that only privileged users can access and edit a user’s profile, especially since we need a secure place to add and view users keyed by their UCSD Employee #.

- We still need to establish a direct connection to the BLINK database. Currently I am using an indirect route: Pulling data from Wayne, who in turn is pulling from BLINK. Wayne is only pulling a subset of fields, exluding ones we’d like to have (alt. phone, lab phone, mobile phone, title, etc.).

- Once we can pull all the data we need from BLINK, we need to create lookup tables to match abbreviated values to their actual names. Michelle has already sent an email containing the mappings that belong in the lookup tables.

- For the IOD people who have R.A.B (Research Activities) profiles, we need to harvest those profile links and add them to their IOD profile.

- We should create a “print view” of the personnel, one for the users to download, and perhaps one with extended info for michelle to use.

- The emails should be removed from the list view on the People page to prevent robots from harvesting them. I’d prefer to do this once we have a valid field to take it’s place, like Title.
[UPDATE 1/27] - Replaced the Email column with Building. Ultimately this should be Title, but until we have the database completely populated, this will work fine.
[UPDATE 1/27] - Changed Building to Location and appended the room # if it exists. This works under the assumption that we will replace this column with Titles before Michelle may start adding multiple room numbers for people.

- Last names are in CAPS in BLINK, but should preferably be in Uppercase here.
[UPDATE 1/26] - I wrote a script called name.php which converts all LAST NAMES to just have Upper Case First Letters. This is done using two php functions in conjuction with each other:
$name = ucwords(strtolower($str))
I also modified the syncing script to call these functions when pulling the last name from BLINK.

- Should the phone number format change? Currently it’s (xxx)xxx-xxxx.

- How should we implement the People Search feature?

To anyone else involved with this project, feel free to update or comment on this post. We can experiment with using it as a running log to track the progress.

i recently received an email from a friend who is using the following quote as her email signature, i thought it would be a good way to ring in the new year as we take our ‘lessons learned’ into 2006:

“The colossal misunderstanding of our time is the assumption that insight will work with people who are unmotivated to change. Communication does not depend on syntax, or eloquence, or rhetoric, or articulation but on the emotional context in which the message is being heard. People can only hear you when they are moving toward you, and they are not likely to when your words are pursuing them. Even the choices words lose their power when they are used to overpower. Attitudes are the real figures of speech.”

Edwin H. Friedman
http://www.wisdomquotes.com/002877.html

though i do think articulation and eloquence help!

How to summarize - grab hold of - a productive year so it not only adds to individual recollections but also appears as a common foundation for next year’s endeavors? How to merge our interdependent discoveries/perspectives/understandings? Creating a set of our own ‘informatics principles’ seems a bit lofty - off-putting even or premature - so maybe we cam grpw a collection of lessons learned along with some resolutions instead.Here’s a list begun as we dispersed from the oceaninformatics year end story hour - trailing AmoebaLikeBacteriaShapedMarineSnowflake balloons.

Ocean Informatics 2005

Retrospective: A Lesson Learned
Some ‘one’ may provide ‘A Data Solution’ but no ‘one’ can provide ‘The Data Solution’. That is, there is much to be learned for those focusing on a one-stop shop/one-stop system data solution, and much else to be learned for those focusing on new approaches to design/integration/interoperability.

Prospective: Some New Year Resolutions
-to identify and articulate our own assumptions while learning to learn
-to work on recognizing the loss of fundamental categories and perceptions evident with technology
-to learn from diversity and difference, tending to our language, its ambiguities, and our local needs
-to aim for sustainability: think global, design local
-foreground the learning and the design /learning environment making use of collaborative, interdisciplinary prototyping
-to continue creating and articulating the multiple dimensions of infrastructure
-to focus on opening up new roles, ie of information manager, information scientist, data curator/mediator/liason, social informaticist, and environmental scientist
-to contribute to the dialogue about responsibilities and ethics of data producers (ie a change from being data users) while cultivating the enviornomental science-information science communications
-to consider data (flow, sustainability, synthesis), data support (IT, sustainability, alignment) and data use (local, public, long-term)
-to develop and inform ourselves about methods, approaches and perspectives for informatics, technology interfaces, communities of practice, and learning environments

Tomorrow is another WebHeads meeting, and I’ve been asked to talk some about Content Management Systems and our experiences with them. At last month’s meeting, Edgar talked about using Subversion for versioning web-applications, which I recapped with the blog post Using Subversion for Web Projects. For this post, I plan to outline some generalized notes about the different CMS’s we’ve used to help have a more focused talk for tomorrow.

This post is not intended to be read as a conventional post. I am updating this post live:

Content Management System

Google define: content management system

System for the creation, modification, archiving and removal of information resources from an organised repository. Includes tools for publishing, format management, revision control, indexing, search and retrieval.
members.optusnet.com.au/~webindexing/Webbook2Ed/glossary.htm

In the context of a Web site a CMS is a collection of tools designed to allow the creation, modification organisation and removal of information from a Web site. It is common for a CMS to require users to have no knowledge of HTML in order to create new Web pages.
www.bized.ac.uk/educators/16-19/business/marketing/lesson/sup_glossary.htm

Resources

Reviews

PostNuke

OI PostNuke

Mambo

OI Mambo

Xoops

Interoperability Xoops

Drupal

WordPress is an open source weblogging platform. It’s the platform I use to manage this blog and the platform - with some modifications - that Global Voices runs on. It has a reputation for being very user friendly, but for having some underlying architectural problems that make it hard to scale. Drupal is an open source multi-purpose content management system designed for the support of complex websites with multiple authors. It has a reputation for being ludicriously flexible, ungodly powerful and far too complex for mere mortals to use.
http://drupal.org/node/29364

OI Drupal

MediaWiki

OI MediaWiki

WordPress

OI WordPress — this site!

Next Page »