San José State University MLIS E-Portfolio

Erica Krimmel, May 2014

Core Competency E: Information Retrieval

“Design, query and evaluate information retrieval systems.”

Information retrieval systems lie at the heart of Library & Information Science. They vary widely–from analog card catalogs and filing systems, to bibliographic and relational databases–however, all information retrieval systems are designed to allow users to access the data they need via querying, or searching. As a LIS professional, I have not only mastered the art of querying, but I am also capable of designing and evaluating information retrieval systems from the ground up.

The first interaction most people have with information retrieval systems is querying. Querying isn’t limited to library catalogs anymore–it has become a feature of most content-filled websites, from Wikipedia to Facebook. Search engine sites like Google and Bing exist purely as internet query tools. When people search through any of these sites, they may be unaware of it, but they are actually querying databases.

Querying has a gentle learning curve that makes it accessible to all skill levels, while also being extremely powerful. For example, nearly all search engines allow users to opt for “advanced” searches, allowing multiple terms, limiters, and other methods of crafting the right query for the question. When a user submits the search, it is sent to the database as SQL, or structured query language: the programming language that retrieves results from a database. Understanding basic SQL opens doors for LIS professionals in unexpected areas such as website design, where I can use SQL queries in combination with another website’s API (Application Programming Interface) to integrate their live data into my own site.

Evaluating information retrieval systems is as important as evaluating any other product, and many of the techniques used are similar. We can evaluate based on the design of the interface, the effectiveness of use, and the satisfaction of users. Interface design and user satisfaction involve more generic evaluation, while effectiveness requires esoteric criteria. Terms like “precision” (i.e. the percentage of results that are relevant) and “recall” (i.e. the percentage of relevant results that were retrieved), for example, help parse out areas to improve information retrieval effectiveness.

Through evaluation, we learn what creates good design. Even though the design of any one information retrieval system must happen first, learning how to design these systems is a cumulative skill that I learned last. Information retrieval systems require design from multiple angles: user access, security, performance, and data integrity, to name several. Maximizing any one of these is a design process in and of itself. In my limited experience working with database design, I have learned to approach a problem by determining the information need, the target audience, the sources of information, and the format of the data (Liu, 2008). These four considerations can inform the design concept, with more details filled in later as the concept moves towards implementation. Database management systems (e.g. Oracle, MySQL, Access, FileMaker) facilitate the design process by providing visual interfaces and a test environment.

Applications

EVIDENCE 1. I designed the “Core IR Literature” bibliographic database for LIBR 202 – Information Retrieval, in my first semester at SLIS. The database itself was created in DB/Textworks and is not presented here, rather, I have uploaded a presentation I gave evaluating the effectiveness of my database design.

Core IR Literature directs novice information professionals to relevant and important information retrieval literature, using boolean search terms and post-coordinated controlled vocab. The articles included in this database deal with accepted methodologies and seminal concepts, e.g. Kuhlthau’s Information Search Process; therefore, some of the thesaurus terms are very narrow to facilitate aggregation and discrimination. Additionally, articles are ranked by importance, which is measured by citation tracking.

In my presentation, I go over three example queries based on different information needs to demonstrate the effectiveness of my database, as determined by precision and recall. In addition, I also summarize an article about the roles of domain versus search expertise in information retrieval, which led me to realize the importance of considering search intermediaries, e.g. reference librarians, when designing information retrieval systems.

EVIDENCE 2. For LIBR 210 – Reference Services, I compiled a presentation guide on using Google Scholar as a reference tool. Although Google Scholar is not a traditional reference library database, it is an important tool because in 2011, over 60% of undergraduates surveyed used it as their primary off-campus search engine (Herrera, 2011). Its familiarity thus offers an excellent jumping off point for further instruction on information retrieval.

While evaluating this information retrieval system, I considered Google Scholar’s access, content coverage, audience, and query features. I found that articles included in the database are ranked by keywords, authors, publisher, and citations. To illustrate my evaluation, I based my presentation on two example reference needs and went through the process of creating effective queries in Google Scholar for each. I also compared Google Scholar to Google Search, finding that while they are offered by the same company, the results are very different.

EVIDENCE 3. One of the most difficult courses I took at SLIS was LIBR 242 – Database Design. In this class, our semester-long group project was to design and implement a relational database to support a citizen science project observing animals. The evidence presented here is our team’s normalized database schema and associated data dictionary, which we developed over the first six weeks of LIBR 242, and later used to construct our database in Oracle.

Our team of five collaborated for the most part simultaneously on this project, and the feedback we exchanged helped all of us see past our own preconceptions of what the database should look like. We drafted and redrafted entity-relationship diagrams, and struggled over functional dependencies. Throughout this process, I had a natural knack for the material, and did my best to help teammates who weren’t understanding concepts. In the evidence presented below, everyone contributed equally to the data dictionary, but it was my job to normalize all of our tables and draw up the resulting schema.

Conclusion

My knowledge of queries, evaluation, and design come together to support my understanding of information retrieval systems and their role in the Library & Information Science field. Thanks to hands-on experience with each of these components, I feel confident in my ability to query effectively, evaluate critically, and keep both of these in mind for design projects.

References

Herrera, G. 2011. Google Scholar users and user behaviors: An exploratory study. College & Research Libraries 72(6):316-330.

Liu, G. (12 Dec 2008). LIBR 242: Relational database design and implementation [Powerpoint]. Accessed from SJSU SLIS D2L on 15 Mar 2014.