Tue 23 Aug 2005
The oceaninformatics mounted directory (…/projects/oceaninformatics) is cluttered with unorganized, temporary, and obsolete data. This makes it difficult to filter out a needed file, or to determine where to save a given file. I feel it’s time we add some conventions on how we structure our shared working environment. We may also consider using conventions for versioning our data, or perhaps even using Subversion to do some dirty work.
Here’s my proposition: All data belonging to specific projects reside in a folder with the project’s name, first letter capitalized. Examples: PersonnelDirectory, Dictionary, LTER, etc.
I suggest using a CamelCase naming convention (no spaces between words) to make things easier. These folders can be referred to as Project folders.
We may also have a set of “data” folders, where each data folder contains a specified category of data. Examples: photos, schemas, files, etc.
All data folders names are lowercase.
Any Project folders can contain other Project folders and data folders. Data specific to a project is stored in a data folder under that Project folder. Example: The source photos used for making thumbnails for the personnel directory are stored in: oceaninformatics/PersonnelDirectory/photos/src. Likewise, the thumbs may be stored in: oceaninformatics/PersonnelDirectory/photos/thumb.
Project folders may contain “sub-projects”. Example: LTER may contain the Project folder “IM Meeting in Montreal Aug4-7 2005″.
This rearrangement of folders and files is fairly trivial. I am not interested in creating a strict convention for organizing project-specific data. Rather, I would like to invoke a simple working-space skeleton that organizes our projects and data in a logical sense. This approach works well for static data files (files and documents that never change); however, we need to determine a better way to store our dynamic documents (files we edit… a lot!).
Our existing method of storing dynamic documents is to put them into personalized tmp folders (e.g. temp_ksb/, temp_srh/). We also have personalized working folders (i.e. working_lry/) used for storing “up-to-date” files. I dislike the concept of using personalized folders in a collaborative working environment. That’s what our home directories are for. I feel we should dismiss personalized tmp/working folders in favor of keeping our documents organized by topic, not by author.
(On a side note, what’s the difference between a temp folder and a working folder? To me, they are equivalent. Both types of folders tend to get bloated with “up-to-date” documents and archived revisions of those documents.)
That being said, I still see the value of a single tmp folder. It is a useful place to store documents, only with the anticipation of moving them elsewhere or trashing them.
It may be a good idea to embed author names and revision numbers into the file names (of dynamic documents), particularly when taking turns editing files. For example, I create file A and save it as: A_srh01.doc. Karen edits it and saves it as: A_ksb02.doc.
Because we are sharing documents without the aid of a file management tool, we must adhere to some kind of naming convention to keep our revisions ordered.
We may also consider using Subversion to version some of our files. Though it comes at the expense of extra overhead in the workflow process (the advantage we have now is that we are using no tools!), it efficiently saves each revision we commit and enforces us to log comments for each revision. Even if we choose to bypass Subversion and continue with our tool-less approach, we should at least practice the conceptual ideas of file versioning that Subversion offers.
8 Responses to “File Sharing Conventions”
Leave a Reply
You must be logged in to post a comment.


August 23rd, 2005 at 4:56 pm
To summarize then, in the shared Ocean Informatics working environment there are project directories and data directories. I would suggest the following variation on the suggested naming convention: A_V02_ksb.doc so that sorting occurs by name and then version. I’m for staying with this versioning with a tool-free environment for a bit longer.
Though now obsolete, the temp/working distinction denoted the following in practice: temp is owned by an individual and for exchange of files; working is a directory where others may come in to work.
August 23rd, 2005 at 5:42 pm
Shaun,
It’s been my experience with rule/process imposing systems (such as revision control (RCS), and ’sudo’ from the systems admin domain), whatever administrative overhead they might impose, though often painful to adopt at first, become like second nature after continued use. Also, the benefits gained from employing these structures are immense. In the above examples, having our system config files under RCS control make recovering from mistakes trivial (e.g., deleting a /etc/hosts file), whereas if we hadn’t employed revision control, life becomes much more complex, often at the most stressful times. As long as we can build a reasonable interface to these systems, one that doesn’t impose undue burden or angst on the user (us), I vote to make it so.
August 23rd, 2005 at 10:33 pm
Jerry,
So does that mean you favor using Subversion as a backend tool to do file versioning? This of course may be viewed as a disruptive burden to the workflow, but I agree that a reasonable interface, and continued use, would turn it into a second nature effort.
Talking with Karen earlier today, we discovered there are three distinct uses for the shared workspace:
1. STORAGE
2. COLLABORATION
3. EXCHANGE
I cited two types of files in my post: static and dynamic. Static files go together with STORAGE. That is, a simple naming convention for folders is useful in organizing specific files related by topic/project. Likewise, dynamic files relate to COLLABORATION. An area where people can work together on single documents, and the use of a versioning mechanism to properly archive the files’ history, is necessary for maintaining a clean shared working environment.
The third use, EXCHANGE, was something I completely overlooked. Karen describes using the shared workspace to exchange certain files with key members of the group. An example is the exchange of audio recording files from working/reading groups. These audio files are primarily stored in the interoperability project space. Suppose a member of the oceaninformatics group wants to listen to one of these recordings, but they don’t have permission to access the interoperability files. As a workaround, Karen may temporarily place the recording file in the oceaninformatics workspace, giving that person the proper access.
Exchanging files in the shared workspace requires heavy use of a tmp folder, and falls under my take that the tmp folder is useful for storing files only with the anticipation that they will be removed shortly.
When organizing our workspaces, perhaps it is better to think of the file’s use in order to guide us where/how to store it. Are we placing it for long-term storage, collaborative editing, or transient access?
August 24th, 2005 at 1:58 pm
fwiw, my feeling is that a file will have many uses during it’s lifespan so i tend to be uncomfortable with the idea of organization based on purpose. especially since the versioning process is also up for discussion, if a collaborative file ends up with 6 or 7 drafts of different names, these files would also need to be moved around with the ‘final’ file as the purpose of the final file changes. thus the file will have a bunch of baggage when it reaches the ’storage’ resting place, but the purpose of the baggage was more in the editing purpose? i guess my point is that a file’s purpose will change over time and as the purpose of other related files change, i can see things (files, myself) getting lost in all of the shuffling?
i like the project-based organization, but i think the key will be to keeping it on such a level where the overlap is minimum. i.e. to me, the Dictionary effort is part of LTER (and LTER to me is inclusive of CCE and PAL) so any and all nesting of these would be logical to me while perhaps not to someone else. my suggestion is to start on the ‘project’ level but divide by the end product, so a ‘Metadata’ folder would include ‘Dictionaries’, and ‘PersonnelDirectory’ and ‘Meetings’ would be folders, but rather than one big ‘LTER’ folder have a LTERData or LTERWeb or _______?? with CCE and PAL designations in each?
the benefit of a working directory to me is that i know where everything that i am involved in lives. being part time and doing most of my work outside of the OI infrastructure means that i am not familiar with all of the projects or larger umbrellas and i have been known to get lost in other people’s file structures. i am not very good at remembering to finalize/document/close out ‘working’ files which i am sure is not helping though! i do see the need though to get more organized and i am sure that just like versioning, whatever is decided upon will become second nature!
August 24th, 2005 at 2:49 pm
ps. in terms of uses, are ’storage’ and ‘access’ distinct or is access folded into storage? i.e. would files for web display/access be covered under the current three uses?
August 24th, 2005 at 3:26 pm
Lynn,
My last comment tried to make three mutually exclusive use cases for files stored in our workspace.
STORAGE is like archiving. Suppose I’ve worked on some new graphics for a web page. I’d like an area where I can store the original images and photoshop files. Moreover, I should store them somewhere that relates to their topic.
COLLABORATION is basically “working” files, like your working space with the dicitionary files. I use the term collaboration loosely here, because sometimes only a single person is doing all or most of the work. I think the more important issue here is to adhere to a versioning convention. However, the files should also be located somewhere where other people can easily find them and make sense of them.
EXCHANGE is where I mentioned “transient access”. Obviously any file is accessible from the workspace, but only those files that are temporarily placed and should be removed promptly are seen as transient.
In answering your question, ’storage’ and ‘transient access’ should be distinct, and moreover, the conceptual distinction should be made between the three use cases for a file. I’ve overlooked the possibility of a file taking on many uses over a period of time. Can you provide some examples?
It is useful to remember that the real ‘project’ level actually starts a level higher than our workspace. The path to the oceaninformatics workspace is:
/Volumes/iodata/projects/oceaninformatics
We have already created several projects, oceaninformatics being one of them, the others including: pallter, ccelter, interoperability, etc.
In the case of LTER, any cce or pal specific files should be placed under those shared project workspaces. Oceaninformatics is used for anything else that doesn’t easily fall in one place.
I agree with your view of organizing by “end products”, which is a term I should have used in place of Project in my original post. Perhaps Topic is a better word at this level? …and we should have no fear of creating highly specific topics. Thus LTERData, LTERWeb, PersonnelDirectory, Dictionary, etc. could all exist as separate root folders in the shared workspace.
August 24th, 2005 at 4:55 pm
thanks for the clarification shaun! i think i am a bit behind in my thinking, since most ’stored’ data is now accessible (long term accessible through online data catalongs rather than transient accessible) those two are blended really. i keep thinking of the old CCS structure where we had an internal project/ space for incoming raw data and processing (files would sit here for a LONG time, sometime active and sometimes not, much to jerry and nate’s dismay!) and then a seperate space (the zoo) for archiving finished, now-public data and products.
i had also been ignoring the project level as you pointed out, as the bottom-up person i don’t have access (or any need for access!) to the higher levels so i honestly forget they are there! i am not sure if using a ‘topic’ oriented structure will accommodate all projects, but it seems that since many overlap in direction (i.e. pallter and cceleter and oceaninfo can all benefit from a dictionary effort?) this might still work??
examples of purposes changing over time: i create a quick documentation file on my desktop. i ask for karen’s input and put the file into the transient space. karen feels it is worthwhile and we start working on it together so the file is moved into collaboration. we create 6 drafts before we are happy, so now there are 7 related files one of which is to be put online. so if we structure by purpose, we would leave the 6 draft versions in collaboration since that is the main purpose of those files (or move all to storage since they are ‘done’? or eventually delete?). if we change our minds and decide to re-work the doc we need to move all 7 files from storage back into collaboration (or go back and find the previous versions if left in collaboration) and do it all again. just seems like there is a lot of moving going on, whereas if things were structured by topic or project all of the above would happen in the same place? also, sometimes the line between ‘working’ and ‘done’ is a grey one
again, since i am not involved in the bigger picture i am going to defer to all of you on this!
August 25th, 2005 at 6:45 pm
While both the pallter and ccelter projects will benefit from the dictionary directly, oceaninformatics does not. Oceaninformatics acts as an umbrella, taking both pallter and ccelter under it’s wing. The oceaninformatics project itself is concerned with creating/maintaining a collaborative work environment that spans all sorts of projects and personnel. In the example of the dictionary project, people from all over (pal, cce, calcofi, etc.) can contribute their efforts to the project, all under the name of oceaninformatics.
Our workspace should be structured with this in mind. We can organize our files by topic, regardless of whether they are final versions or rough drafts. This is where a versioning convention comes in handy. It shouldn’t be necessary to constantly move a backlog of files from one place to another just to make a simple change to (what will previously be) the final draft. If the versioning system is in place, then we should know where to look for the current file and all subsequent files.
Basically, there is no physical distinction between a ’storage’ place and a ‘collaborative’ place. However, the conceptual distinction should remain.