Blog

A few Matlab resources that I have found helpful over the years for anyone who is interested (and as a reference to myself!):

Mastering Matlab: A Comprehensive Tutorial and Reference - by Duane Hanselman and Bruce Littlefield
This book is the single best reference I have found/used. It is practically written with fairly easy-to-understand-and-apply examples, and the apendices alone with concise lists of object properties are well worth whatever purchase price you pay. I have the Mastering Matlab 5, they have since put out re-writes for Matlab 6 and 7 releases, but my copy has been helpful through the upgrades.

Using Matlab Graphics Manual - Mathworks
What? A manual that is actually helpful? Shocking but true! I have not gotten all that much from the other manuals in the series, but the graphics manual has great information about all aspects of figures. Very helpful for complicated visualizations (multi-axis, multi-layer, etc).

Generally I have avoided the Mathworks website at all costs, I find it to be terribly difficult to navigate and search, and in most cases if I can find what I am looking for, the documentation is severely lacking. However, the function list can be helpful:
http://www.mathworks.com/support/functions/alpha_list.html?sec=1

The SEA-MAT mailing list has come in handy in the past when the above resources failed me (I was looking for movie-making tips). It is a VERY slow group (I will go months with no emails at all) but when there is a question, many experienced and helpful people pop to the surface:
http://woodshole.er.usgs.gov/operations/sea-mat/mail-list.html

Please feel free to add others! Also, if you have found the secret *helpful* part of the Mathworks website, please let me know what the handshake looks like! :)

The iFolder project just came to my attention.

http://www.ifolder.com/index.php/Home

iFolder runs as an application on Linux, MacOSX, and Windows. It hooks into the OS of the platform it’s running on and then provides synchronization with a server running the iFolder server. Similar to the .Mac iDisk, you can drag files into the iFolder directory/folder on your desktop and then iFolder will sync it with the server automatically when the computer is connected to the net. Files are also available via a web browser as well as the local filesystem, so you can access files stored in iFolder on any computer you happen to use. This has implications for filesharing and automatic backups. I haven’t tried this yet, but plan to in the next week or so. I’ll update this post when I do.

Following discussion of data base management system types this week, Geof sent a follow-up link http://www.service-architecture.com/database/articles/index.html. There’s a summary table at the bottom comparing dbms standards.

I suppose it’s not exactly groundbreaking or revolutionary, but Seed is a new science magazine in the same vein as Discover, Scientific American or Science. It’s not as commercialized or plastic as Discover, not as stodgy and old school as Scientific American, and not as technical as Science. What it is, however, is a periodic look into the world of science as culture. Science not solely as a method, hobby or body of knowledge, but rather science as a social binding agent and source of personal subjectivity. Much like Wired has examined the interface between society and technology, teasing out the effect one has on the other, Seed attempts to examine the overlap between society and hard science, and does so while preserving the authenticity of its subject matter.

Perhaps the most refreshing aspect of the magazine is its art design. It places photography and art at a premium, and what results is both a visual and textually interesting read. You often feel like you are reading an art magazine, with intelligent staff writing and mature layouts, yet without the looming sense of paying 6 dollars to look at pretty pictures for 20 minutes.

So, there you go. A glowing endorsement for something that doesn’t quite fit into the bounds of oceaninformatics, but is exciting enough that I think you’ll forgive me. Oh, and this month’s issue has an article about a new NOAA ship that is being built as a remote sensing operation, where ROV’s roam the ocean floor and scientists around the world can monitor the data output from their home computers. Now that’s pretty OI if you ask me…

by Lynn and Jerry

In the past, physical copies of data, articles and technical reports published at SIO were collected through the SIO Publications Committee (a sub-committee of the SIO Staff Council), given a SIO Reference Series publication number, and listed in an annually published hardcopy bibliography. This centralized system broadened community awareness of research and provided a common location for all publications independent of the scientist or lab involved in the research. Institutional support for this program ended in 2002, and with the retirement of Kitty Kuhns, the annual bibliographies ended.

Today, as demand for access to articles and data is growing, the need for a centralized, stable and accessible repository has re-emerged with some new obstacles. The web has provided the needed centralized space of which many researchers and groups have taken advantage. However, with constantly shifting URLs, servers, and funding cycles, maintaining these local access points is a laborious and frustrating task.

To fill the need, the SIO Library has assumed the local responsibility to maintain an online archive and repository called the e-scholarship Repository . This is a free service provided by a partnership between the University of California Office of the President (UCOP) and the non-profit California Digital Library(CDL). An author can submit a published paper in PDF format to the e-scholarship system through an email to siolib@sio.ucsd.edu, an email alias provided by Peter Brueggeman, Director of the SIO Library. The paper will be posted under the appropriate headers (headers include: Tech Reports, SIO Reference, etc) and given a permanent URL. Data, photos, graphs, and other file types can be submitted as well, either independently or as an ‘associated file’ to a paper. Web URLs can be included, though as websites grow, shift and age, links commonly break and are discouraged unless realistically stable for the long-term.

SIO has a separate space in the e-scholarship Repository and allows further subdivision into ‘centers’ and ’series’. The OceanInformatics group for instance can request designation as a ‘center’ with all submissions falling under this hierarchical step. A center can have multiple series which group submissions by designated subjects.

All files in the repository are stored on the servers at the CDL. The CDL is responsible for moving files and maintaining the permanent URLs through server upgrades, etc.

A note about copyrights:

To post a published article as part of the SIO Repository, the lead author must be granted permission through the publishing house or agency. While many journals have denied copyright releases for online access, a few have begun to open up to the idea. Anecdotally, the Elsevier journals (including Continental Shelf Research, Progress in Oceanography and others) will grant authors permission to post their articles online. The research society journals (i.e., those published through the AMS, AGU, etc.) have tended to be less cooperative in this regard though AGU will consider allowing pre-prints to be posted. Authors should investigate any copyright restrictions before posting a published work, the SHERPA/ROMEO list is an excellent place to start.

On 6 March, 2006 there was an alert from iocean regarding probable failure with the RAID disk system:


==============================================================
A problem has been detected on this server.

Status Summary

Reason(s) for notification:
Drives

Server:
Host : iocean
Model : RackMac3,1
Uptime : 67016 minutes
OS version : Mac OS X Server 10.4.4 (8G32)
Processor : 2 x 2000 MHz
Memory : 1024 MB
BootROM : $0005.17f1
Serial : QP41703XPNK

Memory:
Memory Slot "DIMM0/J11" : 512MB, ECC DDR SDRAM, PC3200U-30330
Memory Slot "DIMM1/J12" : 512MB, ECC DDR SDRAM, PC3200U-30330

Drives:
Drive 1 (disk2) : Normal
Drive 2 (disk1) : Normal
Drive 3 (disk0) : Warning
==============================================================

Over the course of the few days leading up to the failure warning, a number of users reported slow database performance in some web applications and services provided by iocean (Hlab Forum, etc.). The data disk[s] on iocean (/Volumes/iodata) is a RAID 1 (mirrored) diskset consisting of two (disk0 and disk2) of the three disks on the server (the third disk, disk1, is a standalone volume, ioceanHD, which contains the operating system and applications). The RAID volume, iodata, is a software-based RAID.

iocean is under AppleCare warranty which provides hardware support for up to three years after purchase. We ordered a new disk from Apple through the Upper-Campus Tech Services and it arrived late afternoon on 7 March. We replaced the failed disk with the new disk the next morning, 8 March, and tried to start rebuilding the RAID array using the Disk Utility (GUI) tool on iocean. This is accomplished by dragging the new disk icon into the RAID window. However, this didn’t work. The GUI tool gave no indication as to why this failed. We next went to the commandline tool, ‘diskutil’ to diagnose the problem. Using the command:


===========================================
iocean:~ jrw$ diskutil list
/dev/disk0
#: type name size identifier
0: Apple_partition_scheme *233.8 GB disk0
1: Apple_partition_map 31.5 KB disk0s1
2: Apple_Boot 128.0 MB disk0s2
3: Apple_RAID 233.6 GB disk0s3
/dev/disk1
#: type name size identifier
0: Apple_partition_scheme *76.7 GB disk1
1: Apple_partition_map 31.5 KB disk1s1
2: Apple_HFS ioceanHD 76.6 GB disk1s3
/dev/disk2
#: type name size identifier
0: Apple_partition_scheme *233.8 GB disk2
1: Apple_partition_map 31.5 KB disk2s1
2: Apple_Boot 512.0 KB disk2s2
3: Apple_RAID 233.8 GB disk2s3
/dev/disk3
#: type name size identifier
0: Apple_HFSX iodata *233.6 GB disk3
===========================================

This showed that the old disk, ‘disk2′ (formated in Panther) had a slightly different partition mapping than the new disk, ‘disk0′. The disk2 had a smaller “Apple_Boot” (disk2s2) partition than the new disk formatted with Apple Disk Utility, thus, the main data partition (disk2s3) was larger than the old disk. The software RAID application won’t allow RAIDs with disks of dissimilar partition maps.

The solution:

Create a single filesystem disk out of the new disk using the Disk Utility partitioning option (HFS+ w/Case Insensitivity and Journaling) .

Copy the current filesystem running on the remaining disk of the degraded RAID set over to the newly formatted disk using Carbon Copy Cloner (or the Restore tab of Disk Utility).

Create a new unpaired mirror RAID set on the new disk, using the “enableRAID” command under the commandline ‘diskutil’ application.

Delete the old RAID array (do this with extreme caution because all of the data on this disk will be erased, i.e., make sure all of the data on this disk has been copied to the new unpaired RAID array before taking this step) using either Disk Utility (GUI) or diskutil (commandline).

Repartition the old disk (the remaining good disk in the old RAID array) so it matches the partition map of the new disk

Using the “repairMirror” option of the commandline ‘diskutil’ application, or by dragging the newly partitioned old disk into the new RAID set in Disk Utility, this disk is incorporated into the newly established RAID array.

The RAID repair, or rebuild, is run as a background process, which means that the computer continues to function online, though with somewhat degraded performance, throughout the rebuild. All user activity should be unaffected. If the problem with the different partition maps hadn’t cropped up, the entire process outlined above could have happened without ever taking the system offline. The physical disks are “hot-swappable” (they can be removed and inserted without taking the system offline) so they can be replaced and the RAID rebuilt without a break in service.

I’ve been using phpMyAdmin to generate database schemas. Since I use phpMyAdmin pretty much every day, it’s very convenient as opposed to DBDesigner, which resides on a separate server and has the overhead of connecting to and opening another app (just for one function).

The schema diagrams generated by phpMyAdmin are pretty, with colorful straight lines to represent table/field relations. However, the major pitfall is that these schema diagrams don’t show the data type. It’s usually helpful to know whether a field is an int, float, char, or text. This is one thing that DBDesigner does quite well.

Given that we have a couple meetings next week involving schemas (one of them being the start of a series of schema meetings), I decided to hack the phpMyAdmin code a bit so that it shows the data types side-by-side with the field names. This way I can continue to use phpMyAdmin to generate diagrams without the comprising of less information.

Beware, the rest of the post gets a bit techinical (aka geeky).

The only file I hacked was pdf_schema.php. This is located in the root dir of the phpmyadmin install.

This file contains the PMA_RT_Table class, which is where all the hacking was done.


class PMA_RT_Table {
// lots of code...
}

First, I added a new private var to store an array of types. This corresponds to the existing array of fieldnames for the class.


class PMA_RT_Table {
   var $fields = array(); // existing code
   var $types = array(); // my code
}

Next, in the constructor function, I added a line of code to populate the types array:


function PMA_RT_Table(...) {
// ...snip

       // load fields
       while ($row = PMA_DBI_fetch_row($result)) {
            $this->types[] = $row[1];  // my code
            $this->fields[] = $row[0];
        }

// ...snip
}

I then added code for the table width calculation to ensure that the drawn table diagrams would be wide enough to display both the field name and data type:


function PMA_RT_Table_setWidth($ff) {
// ...snip

      foreach ($this->fields AS $key => $field) {
           // srh hack to set width for field and type
          $type_arr = split(" +",$this->types[$key]);
          $type = $type_arr[0];
          if ('enum' == substr($type,0,4))
                $type = 'enum';
          if ('set' == substr($type,0,3))
                $type = 'set';
          $field .= " [{$type}]";

// ...snip
}

Finally, I added the same code to actually splice the field name with the data type:


function PMA_RT_Table_draw(...) {
// ...snip

      foreach ($this->fields AS $key => $field) {
           // srh hack to show data types next to field names
          $type_arr = split(" +",$this->types[$key]);
          $type = $type_arr[0];
          if ('enum' == substr($type,0,4))
                $type = 'enum';
          if ('set' == substr($type,0,3))
                $type = 'set';
          $field .= " [{$type}]";

// ...snip
}

I won’t bother going into depth about how and why I hacked the code. Doing so would be hard without presenting more context from the pdf_schema.php file.

The main purpose of this post is to provide me with a memory refresher so I can come back and reference it later at a critical time (i.e. upgrading phpmyadmin). Plus it’s always good to document work like this, no matter how you choose to document it. I chose the blog, and by doing so I am making visible a minor part of my work that would otherwise remain situated in oblivion.

View the difference!

» Before the hack (pdf)

» After the hack (pdf)

O’Reilly’s Ambient Findability (Peter Morville) provided a broad contemporary overview of data and information issues. Here are three recent publications relevant to today’s data environments:

-A Nature journal issue with Microsoft on 2020 Vision
http://research.microsoft.com/towards2020science/nature.htm;
http://www.nature.com/nature/journal/v440/n7083/index.html

-The term “dataspace” creates a wider-than-one-system conceptual umbrella for collections
MFranklin,. AHalevy, and DMaier, 2005.
From Databases to Dataspaces: A New Abstraction for Information Management,
ACM SIGMOD Record;
http://portal.acm.org/citation.cfm?id=1107499.1107502

-A look at the handling large amounts of streamed data - historic and real-time
Schandrasekaran and MFranklin. 2004.
Remembrance of Streams Past: Overload-Sensitive Management of Archived Streams.
Proceedings of the 30th VLDB Conference, Toronto, Canada
http://www.cs.berkeley.edu/~franklin/Papers/ChandrasekaranVLDB2004.pdf

This is an interesting read (pulled from my delicious links) on database optimization, and the downsides of normalization:

Normalized data is for sissies

The article links to a pdf presentation given by Cal Henderson, who helped create Flickr. A quick snippet:

In Flickr’s case, they have 13 SELECTs for every INSERT, DELETE, and UPDATE statement hitting their database. Normalization can slow SELECT speed down while denormalization makes your I/D/Us more complicated and slower. Since the application part of Flickr depends so heavily on SELECTs from the database, it makes sense for them to denormalize their data somewhat to speed things up.

Mason and I have moved the dev CCE LTER into the Production area. Both the dev and production areas are versioned under Subversion. This makes it much easier to keep both sites in sync (with content, etc.).

The CCE LTER Site location is: http://cce.lternet.edu/

« Previous PageNext Page »