2014 was the year that the major players in qualitative data analysis (QDA) software released native versions for the Mac. For me, the timing was perfect: my dissertation fieldwork in North Dakota had drawn to a close by summer’s end, and my advisor was encouraging me to roll up my sleeves and start working through my material. I wasn’t sure which software package would serve me best, though, and most of the guidance I could find around the Web declined to make head-to-head comparisons. Then, too, I was mindful of the critiques charging that QDA software of any stripe contributes to the mystification of method and amounts to an overpriced means of avoiding index cards, glue, and scissors. I have nothing against index cards, but with operating system issues off the table and student licenses available for under $100, I decided to see if one of these tools could help me to organize my data and get writing.
After sizing up the available options, I downloaded trial versions of two well-known QDA products: NVIVO and Atlas.ti. I knew I was looking for an attractive and intuitive user interface that would allow me to code data in multiple formats: handwritten field notes, interview transcripts, documents I collected in the field. I had little faith that calculating the frequency and co-occurrence of the codes I assigned would unlock some deep, hidden structure of my material. But, taking a cue from one of the founding texts of software studies, I resolved to approach QDA software as an object that “deserves a reciprocation of the richness of thought that went into it, with the care to pay attention to what it says and what it makes palpable or possible.” How, I wondered, would my choice of software package make some kinds of analytical thinking possible and forestall others? What would my choice commit me to? « Read the rest of this entry »
The Asthma Files is a collaborative ethnographic project focused on the diverse ways people in settings around the world have experienced and responded to the global asthma epidemic and air pollution crisis. It is experimental in a number of ways: It is designed to support collaboration among ethnographers working at different sites, with different foci, such that many particular projects can nest within the larger project structure. This is enabled through a digital platform that we have named PECE: Platform for Experimental, Collaborative Ethnography. PECE is open source and will become shareable with other research groups once we work out its kinks.
PECE has been built to support collaborative, multi-sited, scale-crossing ethnographic research addressing the complex conditions that characterize late industrialism – conditions such as the global asthma epidemic and air pollution crisis; conditions that implicate many different types of actors, locales and systems – social, cultural, political-economic, ecological and technical, calling call for new kinds of ethnographic analyses and collaboration. The platform links researchers in new ways, and activates their engagement with public problems and diverse audiences. The goal is to allow platform users to toggle between jeweller’s eye and systems-level perspective, connecting the dots to see “the big picture” and alternative future pathways.
The Asthma Files has taken us “beyond academia” in a number of ways. Ethnographically, we are engaging an array of professionals, organizations and communities, trying to understand how they have made sense of environmental public health problems. We want to document their sense-making processes, and what has shaped them; we also want to facilitate their sense-making processes – through ethnography that help them understand their own habits of thought and language, and those of others with whom they likely need to work cooperatively. For example, we’ve recently been contacted by a New Orleans housing contractor who wished to know the kind of research being done on asthma and housing in Louisiana. PECE is designed to support this, making space for different kinds of participants at different points in the ethnographic process.
We’ve also gone “beyond academia” to learn how to think about and build a digital platform to support ethnographic work. One step involved selection of the best – for our purposes, for now – online content management system. Quickly, it became apparent that most technical professionals had strong preferences, sometimes based on assessments of functionality, sometimes – it seemed – as a matter of habit. Through a long, comparative process, we ultimately decided on Plone, an open source content management system known for its security capabilities (important in creating space where groups of ethnographers can work together with material, perhaps IRB restricted, out of sight even though online), for its capacity to archive original content (such as interview recordings), and for the ways it supports our effort to nest multiple projects within a larger project structure.
Another important step, which we are still figuring out, is to hire the ongoing technical help we need for PECE. We need ongoing technical help because the platform isn’t finished, as we now envision it. But also because we want the platform to continually evolve as we continue to figure out what kinds of functionality we need to support collaborative ethnographic work. And this may be specific to each project housed on PECE. So we need on-going, ever learning relationships with people who can provide the technical support PECE requires, such as computer scientists, IT specialists, or programmers. As ethnographers, we know that technical professionals will think very differently about the work that we do. And we need to learn to work with this. We need to engage with skills and knowledges that are traditionally outside of the discipline of anthropology by taking on, in a practical way, the continual anthropological challenge of figuring out how difference works.
The Asthma Files and PECE are experiments that have taken us in many new directions – beyond academia, as well as back to basic questions about what should be considered ethnographic material, where theory is in ethnography, how ethnographic findings are best presented, etc. We keep open a call for new collaborators. Let us know if you would like to be in our mix.
Ethnographic Analytics for Anthropological Studies: Adding Value to Ethnography Through IT-based Methods
Ethnographic analytics? What’s that? In short, ethnographic analytics takes advantage of today’s technology to benefit anthropological studies, and is a great example of how science and technology can come together to help us understand and explain much about society and our human condition overall. I suggest that, using the computing power of software tools and techniques, it is possible to construct a set of useful indicators or analytics to complement the five human senses for ethnographic investigation.
Where did the idea of ethnographic analytics originate? How have ethnographic analytics been used and with what results? How can you incorporate them in your work? These are all questions I will address in the following short example of a recent study application in which ethnography and IT-based analytics complemented one another to produce insights about organizational innovation. In this blog, I will focus on one indicator that I have found very useful: an emotion indicator called the Positivity Index.
Over the past three decades, I believe it has been readily apparent that computing has entered our daily lives and especially the business world in the physical forms of desktops, laptops, tablets and smartphones. These devices are tied together with an invisible infrastructure powered by the internet, and now the “cloud,” using software applications to help us do our work, connect with others around the world, and manage many of our daily activities. Two of my colleagues, Julia Gluesing, an anthropologist and also my wife, along with Jim Danowski, a communication professor, and I thought that this new extensive information technology infrastructure could be tapped as resource to help study the diffusion of innovations in globally networked corporations. The result of our collaboration was a five year National Science Foundation (NSF) grant titled: “Accelerating the Diffusion of Innovations: A Digital Diffusion Dashboard Methodology for Global Networked Organizations” (NSF 2010). This mixed methods study provided a very real demonstration of how IT-based methods can complement and extend conventional ethnographic methods. For more detail about the study see the chapter “Being There: the Power of Technology-based Methods” in the new book Advancing Ethnography in Corporate Environments: Challenges and Emerging Opportunities, edited by Brigitte Jordan, which was recently released in 2012.
Overall, we used three software tools, Linguistic Inquiry and Word Count (LIWC), WORDij and Condor, to create a set of seven diffusion indicators or analytics that provided us guidance in selecting a sample of workers and managers for ethnographic interviews and shadowing to explore the context of engineering sub-teams who were working to deliver an innovation for a new vehicle. Working with our sponsors, the company’s legal team, and two university IRBs, we were able to collect 45,000 emails exchanged by a global innovation team working in the early stages of an automotive product innovation. With that data one of the indicators we computed was a weekly “emotion” analytic, which we called the Positivity Index, for the engineering sub-teams using the Linguistic Inquiry and Word Count software (LIWC).
Specifically, we divided the LIWC category percent “posemo” by the category percent ”negemo” to compute the Positivity Index analytic. The “posemo” category contains 407 word or word stems like: “benefit, cool, excit*, great, opportun* etc. The “negemo” category contains 499 word or word stems like: awful, damag*, miss, lose, risk* etc. For example, at the beginning of one project an electrical sub-team had a high Positivity Index about an idea they had using the words “excited potential”, “significant benefit” etc. However, after a few weeks of email exchanges with the transmission group, the Positivity Index plummeted when the combined team realized they would “miss” their deadline, and “risk” not meeting their cost targets. A listing of the LIWC 64 standard categories is available here. Research by Marcial Losada (1999) indicates that a 2.9:1 (positive to negative) ratio is needed for a healthy social system. This is referred to as the “Losada Line.”
If the positivity ratio is above 2.9:1, individuals and business teams flourish, and if it is below 2.9:1, they languish (Fredrickson and Losada 2005). High-performance teams have a positive ratio of 5.6:1 and low-performance teams ratio of 0.4:1. Moreover, there appears to be an upper limit of 11.6:1 where it is possible to have too high a positivity ratio, creating the likelihood that the team will flounder because it does not consider or ignores negative input.
We used the Positive Index to create a graph of scores over time to provide us with an initial sense of each sub-team’s progress in forming and working as a team. Some sub-teams had quite an emotional roller coaster, while the emotion in others did not oscillate nearly as much.
The graphs provided us with a handy, easily understood analytic to explore with the teams to gain a deeper understanding of the context surrounding their work. In another case, rather than email, we used meeting minutes to assess sub-team performance using emotion and found that a Positivity Index derived from minutes also provided a reliable indicator of the health of a team. Over the years, I have calculated the Positivity Index on interviews, newspaper reviews of products, letters, web sites and host of other texts. I have consistently found that it gave me an initial assessment to guide subsequent ethnographic interviews and observations. I have had some false negative readings on occasion, however. In one instance, the texts of plant safety reports described that there were “no deaths or fatalities”. In this case, the Positivity Index gave an inaccurate negative reading of the emotion; “no deaths or fatalities” for the monthly report actually was quite positive. Also, sometimes there is not enough text to generate a percent “negemo,” or negative emotion, to compute a ratio. These outliers have been few, and I now routinely calculate the Positivity Index on my textual research data.
You can try out the Positivity Index using the LIWC software for free. Note: the LIWC website does use an older engine and users will get a slight difference in the results between the Tryonline and the full LIWC version.
The web page will ask you to identify your gender and age, then paste in your text. The words in the text will be counted and a percentage calculated for 7 of the 64 LIWC categories, including self-references (“I,” “me,” “my”), social words, positive emotions, negative emotions, overall cognitive words, articles (“a,” “an,” “the”), and big words (more than six letters). LIWC does provide you with the ability to customize the dictionary with your own vocabulary as well.
My research experience has made me a fan of text analytics that can augment and enhance ethnographic methods with speed and accuracy using the natural language of participants in a very systematic manner. And best of all, the analytics, like the Positivity Index to measure the emotional content of text, are reusable and repeatable.
Fredrickson, Barbara L., and Marcial F. Losada
2005 Positive Affect and the Complex Dynamics of Human Flourishing. American Psychologist 60(7):678–686.
1999 The Complex Dynamics of High Performance Teams. Mathematical and Computer Modeling 30(9–10):179–192.
National Science Foundation (NSF)
2010 Award Abstract No. 0527487. DHB: Accelerating the Diffusion of Innovations: A Digital Diffusion Dashboard Methodology for Global Networked Organizations. Available at http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=0527487. Accessed January 15, 2013.
2012 Being There: The Power of Technology-based Methods. In Advancing Ethnography in Corporate Environments: Challenges and Emerging Opportunities. Brigitte Jordan, ed. pp. 38-55. Walnut Creek: Left Coast Press, Inc.
Although anthropologists have been working with large-scale data sets for quite some time, the term “big data” is currently being used to refer to large, complex sets of data combined from different sources and media that are difficult to wrangle using standard coding schemes or desktop database software. Last year saw a rise in STS approaches that try to grapple with questions of scale in research, and the trend toward data accumulation seems to be continuing unabated. According to IBM, we generate 2.5 quintillion bytes of data each day. This means that 90% of the data in the world was created during the last 2 years.
Big data are often drawn and aggregated from a very large variety of sources, both personal and public, and include everything from social media participation to surveillance footage to consumer buying patterns. Big data sets exhibit complex relationships and yield information to entities who may mine highly personal information in a variety of unpredictable and even potentially violative ways.
The rise of such data sets yields many questions for anthropologists and other researchers interested both in using such data and investigating the techno-cultural implications and ethics of how such data is collected, disseminated, and used by unknown others for public and private purposes. Researchers have called this phenomenon the “politics of the algorithm,” and have called for ways to collect and share big data sets as well as to discuss the implications of their existence.
I asked David Hakken to respond to this issue by answering questions about the direction that big data and associated research frameworks are headed. David is currently directing a Social Informatics (SI) Group in the School of Informatics and Computing (SoIC) at Indiana University Bloomington. Explicitly oriented to the field of Science, Technology, and Society studies, David and his group are developing a notion of social robustness, which calls for developers and designers to take responsibility for the creation and implications of techno-cultural objects, devices, software, and systems. The CASTAC Blog is interested in providing a forum to exchange ideas on the subject of Big Data, in an era in which it seems impossible to return to data innocence.
Patricia: How do you define “big data”?
David: I would add three, essentially epistemological, points to your discussion above. The first is to make explicit how “Big Data” are intimately associated with computing; indeed, the notion that they are a separate species of data is connected to the idea that they are generated more or less “automatically,” as traces normally a part of mediation by computing. Such data are “big” in the sense that they are generated at a much higher rate than are those large-scale, purpose-collected sets that you refer to initially.
The second point is the existence of a parallel phenomenon, “Data Science,” which is a term used in computing circles to refer to a preferred response to “Big Data.” Just as we have had large data sets before Big Data, so we have had formal procedures for dealing with any data. The new claim is that Big Data has such unique properties that it demands its own new Data Science. Also part of the claim is that new procedures, interestingly often referred to as “data mining.” will be the ones characteristic of Data Science. (What are interesting to me are the rank empiricist implications of “data mining.”) Every computing school of which I know is in the process of figuring out how to deal with/“capitalize” on the Data Science opportunity.
The third point is the frequently-made claim that the two together, Big Data and Data Science, provide unique opportunities to study human behavior. Such claims become more than annoying for me when it is asserted that the Big Data/Data Sciences uniquenesses are such that those pursuing them need not pay any attention to any previous attempt to understand human behavior, that only they and they alone are capable of placing the study of human behavior on truly “scientific” footing, again because of their unique scale.
Patricia: Do you think that anthropologists and other researchers should use big data, for instance, using large-scale, global information mined from Twitter or Facebook? Do you view this as “covert research”?
David: We should have the same basic concern about these as we would any other sources of data: Were they gathered with the informed consent of those whose activities created the traces in the first place? Many of the social media sites, game hosts, etc., include permission to gather data as one of their terms of service, to which users agree when they access the site. This situation makes it hard to argue that collection of such data are “covert.” Of course, when such agreement has not been given, any gathered data in my view should not be used.
In the experience of my colleagues, the research problem is not so much the ethical one to which you refer so much as its opposite—that the commercial holders of the Big Data will not allow independent researchers access to it. This situation has led some colleagues to “creative” approaches to gathering big data that have caused some serious problems for my University’s Institutional Review Board.
In sum, I would say that there are ethical issues here that I don’t feel I understand well enough to take a firm position. I would in any particular case begin with whether it makes any sense to use these data to answer the research questions being asked.
Patricia: Who “owns” big data, and how can its owners be held accountable for its integrity and ethical use?
David: I would say that the working assumption of the researchers with whom I am familiar is either the business whose software gathers the traces or the researcher who is able to get users to use their data gathering tool, rather than the users themselves. I take it as a fair point that such data are different from, say, the personal demographic or credit card data that are arguably owned by the individual with whom they are associated. The dangers of selling or similar commercial use of these latter data are legion and clear; of the former, less clear to me, mostly because I don’t know enough about them.
Patricia: What new insights are yielded by the ability to collect and manipulate multi-terrabyte data sets?
David: This is where I am most skeptical. I can see how data on the moves typically made by players in a massive, multiplayer, online game (MMOG) like World of Warcraft ™ would be of interest to an organization that wants to make money building games, and I can see how an argument could be made that analysis of such data could lead to better games and thus be arguably in the interest of the gamers. When it comes to broader implications, say about typical human behavior in general, however, what can be inferred is much more difficult to say. There remain serious sampling issues however big the data set, since the behaviors whose traces are gathered are in no sense that I can see likely to be randomly representative of the population at large. Equally important is a point made repeatedly by my colleague John Paolillo, that the traces gathered are very difficult to use directly in any meaningful sense; that they have to be substantially “cleaned,” and that the principles of such cleaning are difficult to articulate. Paolillo works on Open Source games, where issues of ownership are less salient that they would be in the proprietary games and other software of more general interest.
Equally important: These behavioral traces are generated by activities executed in response to particular stimulations designed into the software. Such stimuli are most likely not typical of those to which humans respond; this is the essence of a technology. How they can be used to make inferences about human behavior in general is beyond my ken.
Let me illustrate in terms of some of my current research on MMOGs. Via game play ethnography, my co-authors (Shad Gross, Nic True) and I arrived at a tripartite basic typology of game moves: those essentially compelled by the physics engine which rendered the game space/time, those responsive to the specific features designed into the game by its developers, and those likely to be based on some analogy with “real life” imported by the player into the game. As the first two are clearly not “normal,” while the third is, we argue that games could be ranked in terms of the ratio between the third and the first two, such ratio constituting an initial indicator of the extent of familiarity with “real life” that could conceivably be inferred from game behavior. Perhaps more important, the kinds of traces to be gathered from play could be changed to help make measures like this easier to develop.
Patricia: What are the epistemological ramifications of big data? Does its existence change what we mean by “knowledge” about behavior and experience in the social sciences?
David: I have already had a stab at the first question. To be explicit about the second: I don’t think so. There are no fundamental knowledge alterations regarding those computer mediations of common human activity, and we don’t know what kind of knowledge is contained in manipulations of data traces generated in response to abnormal, technology-mediated stimuli.
Patricia: boyd and Crawford (2011) argue that asymmetrical access to data creates a new digital divide. What happens when researchers employed for Facebook or Google obtain access to data that is not available to researchers worldwide?
David: I find their argument technically correct, but, as above, I’m not sure how important its implications are. I am reminded of a to-remain-unnamed NSF program officer who once pointed out to a panel on which I served that NSF was unlikely to be asked to fund the really cutting edge research, as this was likely to be done as a closely guarded, corporate secrete.
Patricia: What new skills will researchers need to collect, parse, and analyze big data?
David: This is interesting. When TAing the PhD data analysis course way back in the 1970s, I argued that to take random strolls through data sets in hopes of stumbling on a statistically significant correlation was bad practice, yet this is in my understanding, the approach in “data mining.” We argue in our game research that ethnography can be used to identify the kinds of questions worth asking and thus give a focus, even foster hypothesis testing, as an alternative to such rampant empiricism. Only when such questions are taken seriously will it be possible to articulate what new skills of data analysis are likely to be needed.
Patricia: How can researchers insure data integrity across such mind-boggling large and diverse sets of information?
David: Difficult question if dealing with proprietary software; as with election software, “trust me” is not enough. This is why I have where possible encouraged study of Open Source Projects, like that of Giacomo Poderi in Trento, Italy. Here, at least, the goals of designers and researchers should be aligned.
Patricia: To some extent, anthropologists and other qualitative researchers have always struggled to have their findings respected among colleagues who work with quantitative samples of large-scale data sets. Qualitative approaches seem especially under fire in an era of Big Data. As we move forward, what is/will be the role and importance of qualitative studies in these areas?
David: As I suggested above, in my experience, much of the Data Science research is epistemologically blind. Ethnography can be used to give it some sight. By and large, however, my Data Science colleagues have not found it necessary to respond positively to my offers of collaboration, nor do I think it likely that either their research communities of funders like the NSF, a big pusher for Data Science, will push them toward collaboration with us any time soon.
Patricia: What does the future hold for dealing with “big data,” and where do we go from here?
David: I think we keep asking our questions and turn to Big Data when we can find reason to think that they can help us answer them. I see no reason to jump on the BD/DS bandwagon any time soon.
On behalf of The CASTAC Blog, please join me in thanking David Hakken for contributing his insights into a challenging new area of social science research!
Patricia G. Lange
The CASTAC Blog
I would like to respond to Patricia’s questions about tools and techniques by reflecting on my journey regarding qualitative data organisation and analysis, from a hierarchical tree-based approach to a wiki-based network approach. Like probably many other qualitative researchers using Windows, I started out with standard software packages from household names that had come pre-installed on my home and university PCs. But as the amount of my collected data grew, and as I started to get a better sense of what I wanted to do with my data, the limitations of my initial “system” (or rather, lack of it) started to become apparent.
One problem had to do with storing information in hierarchical folder structures, whether in the My Documents folders or in other two-pane outliners and personal information managers (PIM). The inherent limitation of most two- or three-pane viewers of data (such as Windows Explorer) is that in the hierarchical tree you can usually only see the contents of one note or folder at a time. Once you have hundreds and thousands of files and notes, you reach a point after which it becomes increasingly difficult to browse the archive, find things, remember where they are, and retain a general sense of the shape and contents of the data. Having to decide about the place of a note or a piece of data in the tree too soon in the analysis can also reduce the chances for alternative interpretations and subsequent conceptual discoveries.
I thought my problems would be solved once I had imported and reorganised everything in NVivo 8 (and later 9), the computer assisted qualitative data analysis software (CAQDAS) I had selected out of the two that were recommended and subsidised by my university (the other one being Atlas.ti). I threw myself into the coding process with great gusto, only to realise after having coded half of my data that I still ended up being locked into yet another hierarchical system, where despite the ample availability of tools to cross-reference and connect my documents I started to lose a sense of where everything was and what I were to do with the hierarchical tree of codes that emerged out of the analysis.
I needed help and I found it in the Outliner Software forum, a wonderful community of information management software users, many of whom confess to suffering from CRIMP, “a make-believe malady called compulsive-reactive information management purchasing.” Following their recommendations I have assembled an arsenal of helpful software tools over the years. It was also through them that I have discovered that there was a cure to my condition of being “hierarchically challenged,” and it was called a desktop wiki.
A desktop wiki (also called a personal wiki) solves the problem of “not seeing the data forest from your hierarchical trees” by enabling you to create a flat, network-based representation of data. Essentially you end up creating your own mini-Internet – or more precisely, intranet – on your desktop computer, by connecting up all your data in the manner of interlinked webpages.
There are a number of immediate benefits to a flat, network-based wiki organisation over the hierarchical structure that I described above. First, a wiki requires you to set up a homepage, which is very helpful, as it can be used as the research project dashboard, the alpha and the omega from which and to which everything else can be connected in one way or another. Should you feel overwhelmed by your data or lost in the bowels of your project, all you need to do is hit the Home button, and you can get your bearings again.
The second benefit of using a wiki is that the hierarchical folder structure is absent (at least from view). When you can create a new document you don’t need to make a decision about where it needs to fit in within a hierarchy upfront. A few links or categories inserted here and there will keep it anchored in the network, and the implicit hierarchy of the linked documents gets rearranged dynamically every time links between documents are altered.
This is not to say that hierarchies are inherently bad and that seeing them cannot be helpful at times. The desktop wiki solution that I have adopted – ConnectedText (CT) – has visualisation tools like the Navigator, which enable you to construct very purposeful hierarchies. What sets CT apart from the standard hierarchical folder system is that you can switch this visualisation off at will and immerse yourself in the flat network structure of your interlinked pages whenever you feel like it. Besides the Navigator, CT has a number of other ways to find what you are looking for within the database, including a powerful search tool. You have both your own Internet and your own search engine.
As Manfred Kuehn suggests, a wiki can be described as an electronic version of the traditional “index cards in a slip-box” system that qualitative researchers have long used for taking, keeping and coding notes. Each wiki page represents an index card with notes on it. The advantage of an electronic (wiki) version is that the “index cards” can be freely associated with each other via hyperlinks, without the need for a separate catalogue to keep track of how your index cards are related conceptually.
As I gradually delved into ConnectedText, I realised that not only can it serve as the main project dashboard and database for most of my data (reading notes, participant observation notes, collected files, including PDFs and images etc.), but that it also has tools that allow me to code my data, making NVivo redundant. I ended up using CT to develop my own system for coding that suited my specific needs and which was more consistent with the methodological approach I wanted to pursue.
My key requirements were 1) the ability to store, retrieve and rearrange data without a hierarchical tree emerging and obstructing my view too early on in the process; 2) the ability to carry out analysis, abstraction and synthesis; and 3) the ability to maintain a complete and easily navigable audit trail from the raw data to my final conclusions and back.
It is true (and may seem paradoxical) that by using a wiki to avoid a hierarchical structure I still ended up with a hierarchical organisation of my data and findings. However, this eventual hierarchy emerged through a deliberate process of induction, in line with my chosen research philosophy, rather than being the result of unintentional deduction imposed upfront by the technical constraint of hierarchical folder organisation.
We all know that robust tools can help facilitate research, but we do not always have the time to test the latest products and processes. Here’s a place to offer advice, suggestions, and ask for help on how to tackle specific problems. What software have you found helpful for capturing data, transcribing interaction, conducting research, or analyzing findings? What problems tend to come up? Are there techniques in conceptualization, mapping, coding or other stages of the research process that you have identified as particularly helpful? Feel free to share information about what worked and what didn’t when using technology to gain insight into your projects.