San José State University MLIS E-Portfolio

Erica Krimmel, May 2014

Core Competency G: Information Organization

“Demonstrate understanding of basic principles and standards involved in organizing information, including classification, cataloging, metadata, or other systems.”

Organizing information is not a new challenge; it is an age-old human characteristic, and a theme of the Library & Information Science field. As the types and formats of information evolve so do our theories and systems for organizing them, however, the concepts of classification, metadata, and taxonomies remain applicable.

Classification involves categorical sets, created and assigned by humans, that allow similar documents to be linked, either virtually or by physical location. Information retrieval is assisted by notation, which may be visual or text. Most people are familiar with the Dewey Decimal and Library of Congress classification systems from seeing them in use at public libraries. Although these are highly developed systems, classification is innately human and need not be complicated–even a child organizing his rock collection by color is a form of classification.

That said, the more complex classification is, the more difficult it becomes to guarantee adequate precision and recall. When a user searches a collection, precision refers to the percent of information objects retrieved that are relevant. Recall compares the number of retrieved to total relevant objects. Because being able to find relevant information is a primary goal of information organization, precision and recall are useful measures. They are also inherently subjective; as Morville says in his book Ambient Findability (2005), “like beauty, relevance exists in the eye of the beholder,” (p. 131). Herein lies what Morville terms “the people problem.” Much of LIS involves mitigating the people problem by creating multiple avenues to access information, and by standardizing information descriptors. Metadata and taxonomies are two related concepts that support these issues through effective information organization and findability.

Metadata, or data about the data, aims to describe an information object using standard fields. According to Cornell’s digital imaging tutorial (2000-2003), metadata can be descriptive, structural, or administrative. Descriptive metadata is what we are most familiar with. Structural metadata aids in translating between analog and digital, e.g. what page of the book an image is of. Administrative metadata records internal information, such as preservation status and rights management. Most metadata, particularly descriptive metadata, must be assigned by a human. Morville (2005) explains this by saying, “though relevance ranking algorithms can factor in location and frequency of word occurrence, there is no way for software to accurately determine aboutness,” (p. 127). In the Getty’s Introduction to Metadata (2008), Gilliland further embodies the complexity of metadata today: “metadata creation and management have become a complicated mix of manual and automatic processes and layers created by many different functions and individuals at different points during the lifecycle of an information object,” (“Setting the Stage”).

Despite the complexity of metadata, standardization has the ability to unite diverse collections by using metadata as a primary key to make datasets interoperable. Duval et al. (2002) explain how to apply standards modularly through internalization (neutral standards) and localization (adapting neutral standards to specific organizational situations). Chagoya (2010) supports a compatible system of “tiered” metadata, where preliminary tiers capture essential metadata and later tiers focus on increasing levels of detail. Standards can help in planning a tiered system–essential metadata fields are likely to be present across multiple schemas, so there’s no need for system designers to start from scratch. The most intriguing benefit, in my opinion, of both tiered and modular metadata is that these strategies maximize interoperability, which supports information organization at a comprehensive scale.

Furthermore metadata can be built up into a taxonomy, or knowledge organization system, which enables information collections to represent logical relationships and improve information findability through index and retrieval support (Hedden, 2010 and NISO, 2005). To do so, taxonomies use controlled vocabulary and thesauri, and are represented at a more semantic level by ontologies–essentially taxonomies accompanied by a specific domain’s use rules (Hedden, 2010). A taxonomy disambiguates natural vocabulary by defining relationships between terms; terms may be equivalent, associated, or hierarchical (Hedden, 2010). NISO (2005) adds that vocabulary control is achieved by distinguishing homographs, linking synonyms, and defining scope. The benefits of well-designed vocabulary reverberate, as Morrison (2004) points out when he describes how websites structured around taxonomies translate to better information organization and access for users.

Just like metadata, taxonomies should be standardized to maximize interoperability, and to enable metasearching, indexing, merging databases, merging controlled vocabularies, and multilingual searches (NISO, 2005). Many domains have developed their own standard taxonomies, including Dublin Core, Visual Resources Association, Machine Readable Cataloging, and Darwin Core. Additionally, some of these standard taxonomies, such as Dublin Core, operate modularly using qualifiers that adapt them to specific domains.

Applications

Using classification, metadata, and taxonomies to organize information can help LIS professionals incorporate accessibility and interoperability into the organization, whether it is of a physical or virtual collection. Through my coursework at SLIS, I applied these concepts to several types of information, as discussed below.

EVIDENCE 1. My first introduction to the theoretical concepts of information organization was in LIBR 202 – Information Retrieval. One of our assignments was to elicit attributes from a collection of postcards. Through this process I learned about logical opposites, attribute fields, and controlled vocabulary. I observed why mutual exclusivity is important, and how controlled vocabulary can reduce ambiguity. Although in retrospect this is an elementary assignment, I am including it here because it represents my early experience organizing information.

EVIDENCE 2. I continued to explore different techniques for organizing information in LIBR 246 – XML. XML, or eXtensible Markup Language, is a flexible, interoperable programming language that allows users to classify metadata in a scheme that both computers and humans can easily understand. Schemas define different XML taxonomies, and stylesheets allows users to write multiple display rules for one XML file.

The evidence I present here is an assignment to create two different stylesheets (XSLT documents) for one XML source file describing a recipe. The first stylesheet outputs an HTML file with a grocery list for the recipe; the second HTML output lists the ingredients and also includes the cooking methods. This assignment highlights the flexibility of XML in displaying information, and my ability to make XML data accessible via different displays.

EVIDENCE 3. During my final semester at SLIS I took LIBR 282 – Digital Asset Management, which covered classification, metadata, and taxonomies in detail. Digital Asset Management, or DAM, takes advantage of metadata and software systems to help organize and repurpose assets, e.g. a photoshoot generates images used for multiple ad campaigns. Different DAM system processes support these benefits—coauthoring, archiving, version control, storage management, multiple formats, publishing tools, search tools, workflow, wide area distribution—leading to more creative, more informed workers. (Austerberry, 2006).

Our second project of the course was to create a basic DAM system for a collection of video clips. To do this, I evaluated each of the clips, looking for similarities, differences, and core information. I then designed my metadata fields, and filled them in for each video. In my analysis paper, I provided a justification of my metadata fields and how they could increase this collection’s accessibility, discussed bulk and automatic metadata, and postulated on the appropriate taxonomy for this collection.

Conclusion

At the end of the day, we organize information in order to access it more easily later. Classification, metadata, and taxonomies can help us do this, but are only a portion of the process. When applying these concepts, we need to be conscious of standards and interoperability, and also to consider the type and format of information we are organizing. Most importantly, we need to be driven by purpose, so as to create an information collection that will be truly useful.

References

Austerberry, D. (2006). Digital Asset Management. [Kindle Cloud version]. Retrieved from http://amazon.com.

Chagoya, F. (2010). Metadata: Principles, practical application, best practices, optimization and workflow. Journal of Digital Asset Management, 6(5), 257–261. Retrieved from http://www.palgrave-journals.com/dam/archive/index.html

Cornell University Library / Research Department. (2000-2003). Moving theory into practice: Digital imaging tutorial. Retrieved from http://www.library.cornell.edu/preservation/tutorial/contents.html

Duval, E., Hodgins, W., Sutton, S., and Weibel, S. (2002). Metadata principles and practicalities. D-Lib Magazine, 8(4). Retrieved from http://www.dlib.org/dlib/april02/weibel/04weibel.html

Gill, T., Gilliland, A., and Woodley, M. Baca, M. (ed.). (2008). Introduction to metadata: Pathways to digital information, Online edition, version 2.1. Los Angeles: J. Paul Getty Trust / Getty Standards Program. Retrieved from http://www.getty.edu/research/publications/electronic_publications/intrometadata

Hedden, H. (2010). Chapter 1: What are taxonomies? In The Accidental Taxonomist (pp. 1–37). Medford, NJ: Information Today.

Morrison, J. (2004). How to create effective taxonomy. ZDNet Asia Retrieved from http://www.zdnetasia.com/builder/program/dev/0,39045513,39190441,00.htm

Morville, P. (2005). Ambient Findability: What We Find Changes Who We Become. Sebastopol, CA: O’Reilly.

National Information Standards Organization (NISO). (2005). Guidelines for the construction, format, and management of monolingual controlled vocabularies: an American national standard developed. Bethesda, MD: NISO Press.