November 2005
Monthly Archive
Tue 15 Nov 2005
This is a new blog theme, a modified version of Connections. I changed the header image, and pretty much every other image and stylesheet (changed the hue from greenish to blue-ish).
In addition, the blog root now rests at http://oceaninformatics.ucsd.edu. The main blog link remains the same.
Check out the Live Archives plugin on the Archives page (the top nav bar). It’s pretty sweet and uses AJAX and stuff….
More to come about the new blog/appearance/migration/etc. This post is acting as a quick FYI.
Fri 4 Nov 2005
Posted by srhaber under
Data/MetadataNo Comments
Open-form discussion notes:
- Lack of focus on Quality Assurance at QARTOD? QA is essential aspect.
- Need QC and QA… How to approach QA?
To data providers: Share algorithms, workflows, etc…
On metadata:
- Julie Bosch - Coming along nicely on info that needs to be captures. Focus on defining critical information, not how to fit it in a metadata format.
- On FGDC metadata - specifications manual is cumbersome.
More Metadata Discussion
Spear-headed by Steve Diggs…
Not here to “bash” metadata, but…
Why do we need metadata? We’ve always had it, but why do we need it?
With very few datasets, no need to worry about QC… didn’t need that info in the dataset. Smaller hard drives
mean less storage capacity, etc.
At present, more data being taken. Data volume is continually increasing. Advanced technology and instruments. But still stuck in a rut in metadata paradigms…. do we really need to integrate our data?
What is the data/metadata dividing line? A lot of “metadata” is more important… shouldn’t be defined as “metadata”.
Analogy of dog chasing car… we are chasing metadata objectives.
What would we do if we had PERFECT metadata for all datasets? What could or would we do?
Ontologies (top-down) vs. Folksonomies (bottom-up)
…oy.. too much to hype here right now… we are all taking stabs at this one.
iPod: bottom-up demand top-down design
Folksonomy for data discovery, ontology for data delivery
Ideal: Google search “waves data”. Returns lots of data in any format…
Thu 3 Nov 2005
I’ve upgraded ImageMagick on iOcean to the latest release, 6.2.5. ImageMagick is a set of libraries for creating, manipulating, and converting images. There are some built-in command line tools that interface with these libraries, as well as interfaces for many common programming languages. In the process of installing ImageMagick I’ve also updated the jpeg and png libraries on iOcean. Additionally, I’ve installed the Image-Magick-Thumbnail module for Perl (the basic interface for Perl, PerlMagick, came bundled with the ImageMagick install). I’ll be installing a PHP interface in the near future - there are a few out there and I’m not sure yet which one I’ll be installing.
The reason I went through the trouble of installing these software packages is so that I could write an image gallery generator script. The last few galleries I’ve put up have been more time consuming than they really need to be, especially creating thumbnails for each image, so I’m going to write a helper Perl script. I’m curious to hear if anyone else has any other uses for scripted image manipulation.
Thu 3 Nov 2005
Posted by srhaber under
Data/MetadataNo Comments
Last post was a duplicate effort from Melissa Carter’s notes… she’s the Recorder. I’m sure her notes will eventually be online as well (but not as quickly!)
Building a matrix now…. I scratched the table.. too clunky to handle in a blog. Using h3, strong, and newlines instead to model matrix.
Parameter: Conductivity
Range Test (Gross)
Criteria: Absolute number
Range Test (Climatology profile)
Criteria:+/- (n) standard deviation
Gradient Test
Criteria:
Spike Tests
Criteria: Determined by data provider
Definitions
Bounds - defined to US EEZ only or to entire world
Climatology - historically data as function of …TBD
Range tests (gross) - bounds on the parameter: removes gross errors
Range tests (climatology) - bounds determine for specific zones: possible references - Ocean laboratory table
Spiking routines - look at parameter spectra and determine outliers
Gradient test - Difference in the value between adjacent measured values (at specific locations?): spatial and temporal
Thu 3 Nov 2005
Posted by srhaber under
Data/MetadataNo Comments
Showed up late this morning… CTD is now meeting in Vaughn 100, Waves is in Old Library…
People are discussing QA/QC methods, tests, etc. for CTD. Copying what’s been written so far…. this may be a duplicate effort, but can also be viewed as a backup
Focus on Quality Control and Real-time data collected in-situ
What is real-time data?
- Is definition up to the user?
- OCSD - within hours
What is Quality Control?
- Portion after the data is collected.
- Requires an activity which checks the parameters.
Methods of colection
- Profiling from ship
- Moored
- Profiling floors
- Fixed platofrms
- Gliders/AUVs… potentional for RT with telemetry
- Expendables
Parameters - define required for QC
Primary Sensors
- Conductivity
- Temperature
- Pressure
- Oxygen
- Other optical
- Position
- Date/Time/Time reference
Derived
- Depth
- Salinity
- Depth
Additional
Consider derived parameters
• Methods
• Varied Instruments
Metadata
- Time
- Position
- Time reference
- Bottom depth/Station depth… is this a parameter? Part of QA or QC?
Ways to verify data collected
No brainer tests
- Range tests: storm options.
- Climatology test: consider specific areas and seasonality
- Gradient test
- Spiking routines - 3 point test, running std of data
- Comparison with other parameters: correlation of std and compared to other sensors
- Comparison with prior or archived data: running std of data
Further tests: additional methods
- Dual sensors
- Descent rate - specific to profiles
- Ensure derived parameters are within boundaries
- Freeezing point test
- Comparison between adjacent sensors - vertical and horizontal
- Discrete samples or additional data from sensors- not real-time, QA?
- TSP relationships: water mass characteristics that differ depending on location
- Comparison with models
Consider what is required, perferred, etc.
Words of Caution
- Flag instead of throwing out data
To remove data or to flag?
- Recognize instrument problem: recover or remove?
- Allow for user to determine whether they want flagged data
Who’s responsibility is it?
Problems and how to QC
- Stuck sensor: see constant
Approaches to QC
- Automated QC versus human checking
Ordering for NOAA
- Tests of location and identification of a station/date/time.
- Stage two: spikes
- Stage three: climate
- Stage four: visual inspection
Wed 2 Nov 2005
During lunch break, Steve Diggs asked me “Why are ontologies important?” to which I aptly responded: “They aren’t”.
I explained the theory of a folksonomy, an emerging vocabulary set resulting from a bottom-up process in which members of a community freely choose keywords to their liking. A folksonomy is self-evolving, and provides an accurate model of the dynamic world we are trying to describe. This makes more sense to me than an ontology, which attempts to break everything into distinct categories from a top-down perspective.
Some sites that are based on folksonomies are (surprise!) delicious and Flickr. In fact, even Google’s search engine page-rank algorithm is based on a folksonomy. Instead of Yahoo!’s old approach of categorizing the web, Google ranks pages by popularity. But how do they know which sites are popular?…. they get that data straight from us! All Google does is aggregate existing data and perform algorithms to determine a site’s popularity, and thus, it’s rank order for search results.
The same logic applies to tagging for Delicious and Flickr. The more times one tag is used for the same object, the more meaningful that tag becomes. Statistical analysis can then be performed to determine which tags are frequently used and can relate like tags together.
An ontology serves a purpose only when it’s needed in a controlled environment. Building an ontology makes sense when all factors are considered and recognized. Software agents built on ontologies will run faster and more efficiently.
However, the world is not controlled. Scientific data is not controlled. Building an ontology here just doesn’t seem to make sense.
Wed 2 Nov 2005
Posted by srhaber under
Data/Metadata1 Comment
How Metadata affects QC - Julie Bosch
MMI Project - Notes about conference from Aug 2005. OWL-based ontologies.
COTS/ONR Project - …
QARTOD 1 - Dec 2003. QA/QC flags to be defined. Test cases for flags. Pros and cons of existing standards.
QARTOD 2- Feb-Mar 2005. Discipline specific metadata: waves, in-situ currents, remote currents.
POST QARTOD 2 - How to fit QC data into FDGC format?
Salinity Workshop - Aug 2005. QC for real-time salinity measurements. Drafted metadata record example of salinity data attributes.
Waves - Nov 2005. Significant advancements on QC requirements and recommendations. Identify best practices for continuing discipline scientific approach.
Katrina Analogy - Damaged bridge = gaps in scientific community. Build gaps to create fluid workflow…
11:00am
Why are ontologies important? - Luis Bermudez
Wikipedia definition of ontology. Wikipedia is not that reliable of a source, but it
good’s enough. Next is a longer more philosophical definition. Too long to repeat here.
Keywords from definition: Specific Purpose of Practical Difference
Example of Google Directory - Practical use of organization
Specification of conceptualizations. Ex. lake vs river. Each has properties: body of water… similarites identified.
Concepts are created and expressed as a class: Body of water, Lake, River
Classes are related.
Properties of class relations: isPartOf, isTransitive
Why use ontologies? Share common understandings, for software agents. Enable reuse of domainn knowledge. Make domain assumptions explicit.
Why part of QARTOD? - quality levels, flags, sensors, instrument methodology, calibration procedures, QC software, validation and verification methods, etc.
Can we map two different QC codesets?
Semantic issues - direct relations and inferred relations.
Use OWL format to build ontologies. Web Ontology Language. Based on RDF.. blah blah blah
How to convert to OWL? created tool called VOC2OWL. Java-based input form takes values and performs conversion. Other tools: Protege.
VINE - Vocabularly Integration Environment (oh, that’s what it stands for!). Mapping relations… this is sooooo MMI.
Notes about MMI Conference. Mapping results…
MMI Website
11;30am
Best Practices Workshop on Salinity - Jim Boyd
- stick with small group - too many people causes confusion, requires “educating” across topical areas
- specifically defined outcome
- breakout rooms for each topical area
- reconvene in plenary to share
Tue 1 Nov 2005
Posted by srhaber under
ToolsNo Comments
This one threw me off a bit. For a program that’s highly advanced in a plethora of categories, WordPress has no native support for detecting daylight savings time. Fortunately, there’s a plugin available (called Time Zones) that adds this (usually already added and) basic functionality.
WordPress’s lack of this feature became apparent after Mason’s previous post. Both his post and my first comment to his post where dated an hour into the future. We have since corrected the post times and installed the Time Zones plugin, so now all future posts and comments will contain the proper timestamp.
In my opinion, it is not a good sign to rely on peripheral plugins for adding basic functionality to a program. Plugins should add extraneous and “luxury” functionality. Basic functionality should already exist in the core of the software. Hopefully the WordPress team can fix this in future releases. I guess this is a good reminder that nothing is perfect.
This post time should be around 4:30pm.
Tue 1 Nov 2005
Posted by mkortz under
Tools[4] Comments
Google offers a free search solution for educational institutions and non-profit organizations called Google Public Service Search. This allows you to put a search box on your site, which can be used to search your domain or the web as a whole. Google doesn’t include ads on the Public Service searches, and you can turn off the WebSearch feature and restrict searches to your domain only. The search results page is hosted by Google but allows visual customization, including a custom header and footer.
The downside is that each instance can only search over one domain. UCSD already has a Google-powered search that covers the ucsd.edu domain. As I understand it, we can implement a more specific search - for example, the pallter.sio.ucsd.edu domain - but we can’t create a single search we spans multiple specific domains. Thus, it looks like we can’t have an OI search page that searches pallter.sio.ucsd.edu, ccelter.ucsd.edu, etc. all at once.
« Previous Page —