April 2006
Monthly Archive
Mon 24 Apr 2006
In the midst of the ever-evolving database design process, I am starting to collect ADCP tidbits to better inform the future addition of this dataset to any site databases. The following notes are random and unorganized, but most will hopefully help advance the general understanding of these more complex data:
NOTE: The ADCP datatypes that will most concern us will be bottom-mounted, mooring-mounted (upward or downward looking), and shipboard. (other types would be ROV-mounted, instruments side-mounted on oil-rigs, etc) The following notes apply ONLY to shipboard ADCP data.
- Each ship has it’s own quirks, an understanding of which is needed to sucessfully process the data from each instrument.
New Horizon - instrument: Teledyne/RDInstruments Ocean Surveyor Broadband/Narrowband 150kHz ADCP
acquisition: PC with RDInstruments VmDas Version 1.42 software
- General: Teri Chereskin has been involved in all aspects of the CalCOFI ADCP data, from designing the instruement profiling scheme, to setting up the instrument pre-cruise, pulling data off of the collection computer post-cruise, transport, processing, any post-processing needed, and archiving. She maintains an independent ADCP database on her systems. I am trying to learn the ropes to take over some of those responsibilities for the CCE-LTER cruises. Teri’s webpage with documentation, ADCP data websites and other info is here:
http://tryfan.ucsd.edu/adcp/adcp.htm
- Data Size: The CalCOFI NH0604 cruise ADCP data is 2.9Gb, has 593 files and 5 different formats.
- Formats: .ENR - single ping raw data in beam coordinates (binary)
.ENS - serial info added at computer level (ie. with GPS brought in), beam coordinates with extra info (binary)
.ENX - in earth coordinates, calculated from internal header information (binary)
.N1R - GPS data (ascii)
.N2R - Ashtek data (ascii)
I believe that the ENS and ENX formats can be re-created by using the ENR and N1R, N2R files? I do not know the details yet of this nor of the CalCOFI final/reporting formats.
- Transport: Mark Ohman purchased a LaCie d2 Hard Drive Extreme for moving data from the ship to SIO. I originally formatted the drive for the PC in NTFS, but should possibly re-format to fat32? The NTFS format was readable on my Mac OSX, and was compatible with Teri’s system (type?). Teri has a disk on coast which can be directly connected with the external hard drive in the CCS server room.
more later…
Thu 20 Apr 2006
What is a Dictionary?
Collecting words and their defintions into dictionaries is the work of lexicography. Funk and Wagnall’s Standard Dictionary of language specifies the meaning of the word ‘dictionary’ as ‘1. A reference work containing alphabetically arranged words together with their definitions, pronunciations, etymologies, etc. 2. A lexicon whose words are given in one language together with their equivalents in another. 3. A reference work containing information relating to a special branch of knowledge and arranged alphabetically.’
A research science dictionary is 1. A reference work containing a collection of terms that are used in a scientific community along with information that is required to understand each term. 2. A reference work that prescribes a standard for the community language. 3. A reference work written to help translate terms between texts and languages (i.e.. from a journal to a computer processing program). While a standard language dictionary helps a reader understand an unfamiliar word by relating it to information categorizing that term specifically (internal), a scientific dictionary helps researchers understand and utilize data collected elsewhere by defining terms both internally and externally in the context of the community. The definition of a term may be dependent on any combination of the following features:
Internal
Human Usage: Abbreviations, formal names and publication preferences must all be taken into account in the definition of a term so that it can be widely recognized.
Standards: Standards to which a term relates should be described fully in a dictionary. For example, measurement terms will often refer to International System of Units (SI) standard, so a unit dictionary will include definitions that relate a measurement back to the parent SI unit of the same unit type.
History: A dictionary can also bridge technology leaps and changes in community practice. For instance, previously used data processing programs might have needed one set of information while current programs use another, or changes in data collection from human-gathered to instrument-collected can cause language barriers to data comparisons. A dictionary, in providing language used for all cases, can provide back-compatibility to datasets that might not otherwise be useable.
External
Community Culture: (see Databits article: “Designing a Dictionary Process: Site and Community Dictionaries”) “Although names and their definitions are seemingly mundane and even trivial concepts, this does not mean that the articulation, exchange, and blending of unit and attribute names are simple matters. Names go to the heart of local work practices and of data interoperability.” Local nicknames propagate through work practices and become standard within that community; recognizing and including both local and intra-community culture as part of a dictionary creates a human-accessible document for translation between groups of people.
Computer Usage: In this age of rapidly increasing technological power, computers are taking over parts of data analysis previously preformed by humans. To do this, the computer and specifically any programs need to know many things about the data, for example if they are binary or ASCII, string or integer, etc. A dictionary, as Funk and Wagnalls noted, is a tool to translate from one language into another, in this case from human-accessible data into programming terms for automated computations.
Technology Infrastructure: Database software, analysis software and programs themselves all need different types of descriptors in order to run efficiently, and to allow the greatest access, search and display features. A dictionary can provide many types of technological information to facilitate cross-platform and cross-system access, for example the format for the dates and times present, etc.
Dictionary Purpose
A dictionary is created for a number of reasons listed above, including describing terms and prescribing a standard, however the purpose of a dictionary is also directly tied to the needs of the end-user and the audience for whom it is created. In fulfillment of these needs, a dictionary’s purpose also includes providing access to shared data, aiding in database searches, providing information needed for interoperability, guiding entry-level projects and informing controlled vocabulary work.
Uses for Different Sized Groups
A small team of people such as a laboratory group may use a dictionary in order to move away from ‘tribal knowledge’ and articulate their local standards for field acquisition and data processing. On this level, a dictionary can also bring together the language of people with different job descriptions; a field technician and lead scientist can use a dictionary to log and document all appropriate methods and acquisition metadata, a programmer can use a dictionary in order to optimize processing code, and an information manager can use a dictionary in order to efficiently archive files into a database or reference the proper standard, etc.
When multiple small groups are collaborating on a project, the dictionary becomes a tool of interoperability that allows the merging of datasets collected and processed by the individual groups on the human and computer levels. Intra-group differences in methodology and abbreviations for like measurements are clearly articulated and possibly resolved in a single dictionary or a combination of dictionary types (see following Dictionary Types section for a brief list).
A community-wide dictionary allows for automated data comparisons spanning many differences such as in acquisition methodologies. Carbon production for example can refer to land or water-based measurements collected with vastly different methodologies, processed using different calculations, etc. Dictionaries enable the collation of carbon production data from many sources, enabling comparisons and faciliting any potential unit conversions.
Dictionary Types
There are many types of dictionaries, a few examples are listed here:
A code dictionary is a mechanism by which coded entries in a dataset can be explained by outside documentation. Codes are a straightforward and efficient way for a group to communicate locally, and storing the code information in a dictionary format provides a centralized clearinghouse for this important knowledge so users not familiar with the colloquialisms can reference material without speaking to an individual within the group. A common use of codes is in naming field stations; a code dictionary might contain a list of field station names translated into latitude and longitude, or pointers to a paper describing the field grid layout and station positions. An acronym dictionary would also fall under this type.
A unit dictionary links local measurements to a standard or an accepted scientific convention (i.e. the SI standard of units) and bridges local abbreviations and unit names to language preferred by journals and technical publications. From a unit dictionary, a user can generate a list of all entries of (SI) unit type ‘length’, convert between them and provide proper abbreviations as used in a domain journal. Unit dimensions and types are also an important part of the unit dictionary as this information facilitates automated conversions and informs the creation of new units that may not directly relate to the standard, such as units of abundance.
An attribute dictionary details information about attributes stored in a database, including links to unit and code dictionaries. For example, a temperature measurement might be defined by an attribute dictionary with information including what type of temperature is recorded (sea surface temperature), what units the measurement is in (pointer to the unit dictionary entry for ‘Celsius’), a description of the value (a real number, stored as a float with a precision of 0.01). use micromolar example here?
A method dictionary is one way of standardizing methodologies and aiding in metadata entry to a database. Rather than writing a complete method section for each dataset, references to predetermined and accepted practices will pull the proper information out of a method dictionary for insertion into a database or file. find USGS example
Dictionary Vision
A dictionary results from a collaborative process where people with different research goals from different scientific projects, and even from different branches of science, come together with the goal of comparing and/or sharing field measurements and models as well as providing a framework for interoperability to answer larger scientific questions. A dictionary bridges differences in datasets to enable direct comparisons and it fosters understanding between scientists who may use different terminology, computer processing techniques or operating systems. Further, it provides a mechanism to use collected data for a purpose beyond it’s original scope.
Deciding what information is needed to define a particular term involves the interpretation and discretion of the dictionary creator(s), but in this openness and lack of restriction is a flexibility that makes the notion of a dictionary so useful and important. Science is not a rigid field, it is fluid and ever-changing as hypotheses are proved and disproved, and as new perspectives, concepts and technology expand our ability to measure, analyze and perceive the world. A dictionary is dynamic in order to accommodate changes in understanding while at the same time serving as a static standard to inform data use.
Thu 20 Apr 2006
Posted by lynn under
Tools ,
Blog[2] Comments
A few Matlab resources that I have found helpful over the years for anyone who is interested (and as a reference to myself!):
Mastering Matlab: A Comprehensive Tutorial and Reference - by Duane Hanselman and Bruce Littlefield
This book is the single best reference I have found/used. It is practically written with fairly easy-to-understand-and-apply examples, and the apendices alone with concise lists of object properties are well worth whatever purchase price you pay. I have the Mastering Matlab 5, they have since put out re-writes for Matlab 6 and 7 releases, but my copy has been helpful through the upgrades.
Using Matlab Graphics Manual - Mathworks
What? A manual that is actually helpful? Shocking but true! I have not gotten all that much from the other manuals in the series, but the graphics manual has great information about all aspects of figures. Very helpful for complicated visualizations (multi-axis, multi-layer, etc).
Generally I have avoided the Mathworks website at all costs, I find it to be terribly difficult to navigate and search, and in most cases if I can find what I am looking for, the documentation is severely lacking. However, the function list can be helpful:
http://www.mathworks.com/support/functions/alpha_list.html?sec=1
The SEA-MAT mailing list has come in handy in the past when the above resources failed me (I was looking for movie-making tips). It is a VERY slow group (I will go months with no emails at all) but when there is a question, many experienced and helpful people pop to the surface:
http://woodshole.er.usgs.gov/operations/sea-mat/mail-list.html
Please feel free to add others! Also, if you have found the secret *helpful* part of the Mathworks website, please let me know what the handshake looks like!
Fri 14 Apr 2006
Posted by jrw under
BlogNo Comments
The iFolder project just came to my attention.
http://www.ifolder.com/index.php/Home
iFolder runs as an application on Linux, MacOSX, and Windows. It hooks into the OS of the platform it’s running on and then provides synchronization with a server running the iFolder server. Similar to the .Mac iDisk, you can drag files into the iFolder directory/folder on your desktop and then iFolder will sync it with the server automatically when the computer is connected to the net. Files are also available via a web browser as well as the local filesystem, so you can access files stored in iFolder on any computer you happen to use. This has implications for filesharing and automatic backups. I haven’t tried this yet, but plan to in the next week or so. I’ll update this post when I do.
Thu 6 Apr 2006
Following discussion of data base management system types this week, Geof sent a follow-up link http://www.service-architecture.com/database/articles/index.html. There’s a summary table at the bottom comparing dbms standards.
Tue 4 Apr 2006
I suppose it’s not exactly groundbreaking or revolutionary, but Seed is a new science magazine in the same vein as Discover, Scientific American or Science. It’s not as commercialized or plastic as Discover, not as stodgy and old school as Scientific American, and not as technical as Science. What it is, however, is a periodic look into the world of science as culture. Science not solely as a method, hobby or body of knowledge, but rather science as a social binding agent and source of personal subjectivity. Much like Wired has examined the interface between society and technology, teasing out the effect one has on the other, Seed attempts to examine the overlap between society and hard science, and does so while preserving the authenticity of its subject matter.
Perhaps the most refreshing aspect of the magazine is its art design. It places photography and art at a premium, and what results is both a visual and textually interesting read. You often feel like you are reading an art magazine, with intelligent staff writing and mature layouts, yet without the looming sense of paying 6 dollars to look at pretty pictures for 20 minutes.
So, there you go. A glowing endorsement for something that doesn’t quite fit into the bounds of oceaninformatics, but is exciting enough that I think you’ll forgive me. Oh, and this month’s issue has an article about a new NOAA ship that is being built as a remote sensing operation, where ROV’s roam the ocean floor and scientists around the world can monitor the data output from their home computers. Now that’s pretty OI if you ask me…