Distraction Free Reading

“Discovery” Systems and Algorithmic Culture

Understanding “discovery”—the processes through which people locate previously unknown information—is a critical issue for academic libraries and librarians as they endeavor to provide and make accessible materials for students, faculty members, and other library users.  Until relatively recently, people seeking information at an academic library were typically faced with a myriad of confusing catalogs, indexes, and databases, each with a different topical coverage, organizational structure and search interface.  For people increasingly accustomed to Google’s simple search interface and natural language functionality, the “cognitive load” of siloing information in this way can be extremely high.  Library discovery systems were developed to address this problem.  By creating a centralized index of a library’s resources, these tools allow a user to simultaneously query almost all of a library’s holdings via a single Google-style search box.

Along with my colleagues Lynda Duke and Suzanne Wilson, I recently completed a research study examining how undergraduate students located information using two discovery tools, the “Ebsco Discovery Service (EDS)” and Serials Solution’s “Summon,” as well as Google Scholar and the “traditional”  suite of library catalogs and databases (see Asher, Duke & Wilson 2013).  In this study, we asked students to find resources for a set of research questions that were similar to research-paper assignments they might receive for a course for a class.  After they were finished finding these materials, we played back a recording of their searches for them and conducted a debriefing interview during which we asked them to discuss how they approached finding particular resources and how they evaluated sources and information they chose to use.

As we observed how students interacted with the discovery tools, we also learned about how these systems perform an epistemological function by structuring how students use information and construct knowledge.  No matter which search system the students used, the process by which they approached searches typically followed a single pattern.  Students generally treated every search interface they encountered like a Google search box, using simple keyword search and ignoring more advanced functionality.  Simple keyword searches accounted for 82% of the searches we observed in our study.

Used in this way, all of the discovery systems in this study will return a very large number of items for any given query.  Faced with a results set that was almost always too large to evaluate comprehensively on an item-by-item basis, students instead relied primarily on the effectiveness of the search algorithms to determine resources’ quality, making rapid appraisals of an item’s usefulness based on its title or a superficial scan of its abstract, and almost never considering materials displayed after the first page of results.  In total, 92% of the resources utilized by students in our study were found on the first page.

This de facto outsourcing of the evaluation process to the search algorithm itself makes the default ranking criteria of the discovery system perhaps the single most important factor in determining which resources students chose to use.  Moreover, differences in the way the discovery systems produce search results could be directly observed in the resources chosen.  For example, students using Summon utilized more newspaper and trade journal resources than those using EDS since EDS weights results braced on article length (i.e. when other factors are held constant, longer materials rank higher).  Similarly, students using Google Scholar used more book resources due to its integration with Google Books.

This intersection between student’s search practices—which likely reflect their day-to-day usage of Google and other general search engines—and the design of discovery system’s interfaces and algorithms illustrate an important example of  “algorithmic culture.”  Ted Striphas (2011)  uses the term “algorithmic culture” to describe how some aspects of the work of culture–“the sorting, classifying, hierarchizing, and curating of people, places, objects, and ideas”– are becoming the purview of “machine-based information processing systems.”  He continues, “some of our most basic habits of thought, conduct, and expression. . .are coming to be affected by algorithms, too.  It’s not only that cultural work is becoming algorithmic; cultural life is as well” (Striphas 2011).

Through the act of ordering and ranking, search systems’ relevancy algorithms impart (and reinforce) a sense of authority and credibility to the results.  Students in our study regularly assumed that information that is objectively “best” will be ranked first, substituting the judgment of the algorithm for their own thought processes.  This “trust bias” is well documented in the literature on search engines (see Vaidhyanathan 2011:59; Hargittati et. al. 2010; Hargittai 2007; Pan et. al. 2007), and is also reflexive; because the search system alone holds the power to create a ranked list of resources from the huge number possible choices, it self-validates the quality of these results.

Relevancy-ranking algorithms are also cultural artifacts, and can be understood as embodying a set of socially and culturally embedded negotiations, decisions, judgments, biases, politics, and ideologies.  For example, PageRank, the ranking and relevancy algorithm that comprises the core of Google search, is premised on a concept of aggregated social judgment, that is, the assumption that a mathematical calculation based on the number of links to a website combined with an evaluation of the relative importance of the websites from which those links originate, can be used as a proxy for evaluating the quality or value of a site (see Brin & Page 1998; Page et. al. 1999; Battelle 2005:75-76).  Likewise, the discovery systems used in our study also contain a set of embedded decisions about information organization and quality, each of which represents a specific decision about the relative value of information.  For example, each system must define what characteristics qualify a journal as “peer-reviewed” and scholarly, as well as how to treat these materials once a determination has been made.

Unfortunately, since discovery systems are for the most part proprietary technologies, many of these judgments and decisions are kept secret from the user.  For this reason, students can not properly interrogate how a discovery system works even if they want to, and must simply put their faith and trust in the algorithm and the people who designed it.  From a pedagogical standpoint this is quite concerning since students for the most part appear to mistakenly view discovery and other search systems as neutral tools and do not consider their potential biases.

By shaping the processes through which information is found, discovery systems thus exert a form of disciplinary power that provides the scaffolding for how students complete their academic work and structures the way they acquire knowledge.  For this reason, libraries using or considering the implementation of these systems should carefully and critically assess their design and functionality as well as the potentially determinative effect these systems might have on students’ research outcomes.  Students’ practices of primarily utilizing the basic search functionality of any search system, relying only on the first page of search results, and trusting the relevancy rankings of a given discovery system makes the default settings of these tools critically important.   These patterns also underscore the instructional needs of students in both the technical and conceptual aspects of search, as well as in algorithmic literacy and the understanding of algorithmic cultures.

References

Asher, Andrew, Lydna Duke, & Suzanne Wilson
2013. “Paths of Discovery: Comparing the Search effectiveness of EBSCO Discovery Service, Summon, Google Scholar, and Conventional Library Resources.” College and Research Libraries.  Forthcoming July 2013.  Preprint available at http://crl.acrl.org/content/early/2012/05/07/crl-374.full.pdf+html .

Battelle, J.
2005. The search: How Google and its rivals rewrote the rules of business and transformed our culture. New York: Portfolio.

Brin, S., and L. Page.
1998. “The anatomy of a large-scale hypertextual Web search engine.” Computer networks and ISDN systems 30 (1-7): 107–117.

Hargittai, E.
2007. “The social, political, economic, and cultural dimensions of search engines: An introduction.” Journal of Computer-Mediated Communication 12 (3): 769–777.

Hargittai, E., L. Fullerton, E. Menchen-Trevino, and K.Y. Thomas.
2010. “Trust online: young adults’ evaluation of Web content.” International Journal of Communication 4: 468–494.

Page, L., S. Brin, R. Motwani, and T. Winograd.
1999. “The PageRank citation ranking: Bringing order to the web.”

Pan, B., H. Hembrooke, T. Joachims, L. Lorigo, G. Gay, and L. Granka.
2007. “In Google we trust: Users’ decisions on rank, position, and relevance.” Journal of Computer-Mediated Communication 12 (3): 801–823.

Striphas, Ted.
2011.  “Who Speaks for Culture?” posted Sept. 26, 2011, http://www.thelateageofprint.org/2011/09/26/who-speaks-for-culture/

Vaidhyanathan, Siva.
2011. The Googlization of Everything (and Why We should Worry).  Berkeley: University of California Press.

Leave a Reply

Your email address will not be published. Required fields are marked *