Anonymous is a banner used by individuals and as well as multiple, unconnected groups unfurling operations across the globe from Brazil to the Philippines, from the Dominican Republic to India. Since 2008, activists have used the name to organize diverse forms of collective action, ranging from street protests to web site defacement. Their iconography—Guy Fawkes masks and headless suited men—symbolically asserts the idea of anonymity, which they embody in deed and words. To study and grasp a phenomenon that proudly announces itself “Anonymous” might strike one as a futile and absurd exercise or exercise in futility and absurdity. A task condemned to failure.
Over the last five years, I felt the sting of disorienting madness as I descended deep down the multiple rabbit holes they dug. Unable to distinguish truth from lies, and unable to keep up with the explosive number of political operations underway at one time, a grinding doubt settled deep into my mind many times. There was no way I could get the story right, get at all the nuances, much less all the cabals that populate Anonymous, I often told myself. Gaining access and the trust of scores of individuals who tunnel and mine and undermine, who desire to be incomprehensible, concealed and enigmatic to slightly rephrase Nietzsche’s opening to DayBreak, often felt like an impossible task.
They have been devilishly hard to study but not impossible. Time has been a kind friend. Sticking around over a five year period has certainly helped, especially as I met more participants in person. I protested with Anonymous on the streets in New York City and Dublin and attended court hearings as hackers received sometimes light, sometimes stiff sentences. A handful would come by to say hello or thank me heartily after a public talk. I spent time with them in pubs in Europe and bars in North America, and even had the rare opportunity to picnic with a group of them in a sun-drenched park in an area of the world—Ireland—when it was undergoing a rare two week heatwave. Although I preferred evenings in the pubs and day time picnics, I spent most of my time with them online using various chatting protocols, usually Internet Relay Chat (IRC).
As will come as no surprise, the ethical conundrums flowing out of my research were many, so many it is a theme I explore time again in the book now under works. But I can’t help but think of what anthropologist Danilyn Rutherford calls—kinky empiricism—a term she uses to define the (often tortured) nature of anthropological research. By kinky she means to convey a shape which captures the notion that knowledge is not smooth or straightforward but comes with knots and kinks. By kinky she also means to convey a spirit of “S and M and other queer elaborations of established scenarios, relationships, and things.” Foremost, she introduces kinky empiricism to portray the deeply ethical character of anthropological research: “[anthropological] methods create obligations, obligations that compel those who seek knowledge to put themselves on the line by making truth claims that they know will intervene within setting and among the people they describe.”
My obligations to Anonymous have been many and they range from writing letters to judges pleading for leniency, to translating their world to multiple publics. But the one obligation on my mind the most these days is self-imposed and it has to do with my desire to balance between two opposing forces: the rational and the mystical, the Apollonian force of empiricism and logic, and the Dionysian force of pleasure and ecstasy.
In my writings, I want to stamp out misinformation, to be critical of some of their actions, and to clear up the confusion of the so-called chaos in Anonymous; they are sensible, and must be rendered such, given that nation-states and prosecutors and judges would like to cast them as mere criminals unwilling to entertain their actions as politically motivated. But I also want to keep the magic of Anonymous alive. To disenchant them would be, in my estimation, tantamount to breaking my own moral pact and also to miss what makes them interesting.
Only with time and the judgements of others (and, hopefully, through the process of writing my book) will I know whether I have the cunning to simultaneously make chaos seem like order and order seem like chaos, the cunning necessary to give justice to Anonymous. For now, I will leave you with a rather Apollonian nugget, a report I wrote for the Center for International Governance Innovation, that seeks to stamp out some misinformation about Anonymous through a detailed, though basic, introduction to their politics and hope I can bring you some nugget of pleasure and ecstasy in the not so distant future.
Although anthropologists have been working with large-scale data sets for quite some time, the term “big data” is currently being used to refer to large, complex sets of data combined from different sources and media that are difficult to wrangle using standard coding schemes or desktop database software. Last year saw a rise in STS approaches that try to grapple with questions of scale in research, and the trend toward data accumulation seems to be continuing unabated. According to IBM, we generate 2.5 quintillion bytes of data each day. This means that 90% of the data in the world was created during the last 2 years.
Big data are often drawn and aggregated from a very large variety of sources, both personal and public, and include everything from social media participation to surveillance footage to consumer buying patterns. Big data sets exhibit complex relationships and yield information to entities who may mine highly personal information in a variety of unpredictable and even potentially violative ways.
The rise of such data sets yields many questions for anthropologists and other researchers interested both in using such data and investigating the techno-cultural implications and ethics of how such data is collected, disseminated, and used by unknown others for public and private purposes. Researchers have called this phenomenon the “politics of the algorithm,” and have called for ways to collect and share big data sets as well as to discuss the implications of their existence.
I asked David Hakken to respond to this issue by answering questions about the direction that big data and associated research frameworks are headed. David is currently directing a Social Informatics (SI) Group in the School of Informatics and Computing (SoIC) at Indiana University Bloomington. Explicitly oriented to the field of Science, Technology, and Society studies, David and his group are developing a notion of social robustness, which calls for developers and designers to take responsibility for the creation and implications of techno-cultural objects, devices, software, and systems. The CASTAC Blog is interested in providing a forum to exchange ideas on the subject of Big Data, in an era in which it seems impossible to return to data innocence.
Patricia: How do you define “big data”?
David: I would add three, essentially epistemological, points to your discussion above. The first is to make explicit how “Big Data” are intimately associated with computing; indeed, the notion that they are a separate species of data is connected to the idea that they are generated more or less “automatically,” as traces normally a part of mediation by computing. Such data are “big” in the sense that they are generated at a much higher rate than are those large-scale, purpose-collected sets that you refer to initially.
The second point is the existence of a parallel phenomenon, “Data Science,” which is a term used in computing circles to refer to a preferred response to “Big Data.” Just as we have had large data sets before Big Data, so we have had formal procedures for dealing with any data. The new claim is that Big Data has such unique properties that it demands its own new Data Science. Also part of the claim is that new procedures, interestingly often referred to as “data mining.” will be the ones characteristic of Data Science. (What are interesting to me are the rank empiricist implications of “data mining.”) Every computing school of which I know is in the process of figuring out how to deal with/“capitalize” on the Data Science opportunity.
The third point is the frequently-made claim that the two together, Big Data and Data Science, provide unique opportunities to study human behavior. Such claims become more than annoying for me when it is asserted that the Big Data/Data Sciences uniquenesses are such that those pursuing them need not pay any attention to any previous attempt to understand human behavior, that only they and they alone are capable of placing the study of human behavior on truly “scientific” footing, again because of their unique scale.
Patricia: Do you think that anthropologists and other researchers should use big data, for instance, using large-scale, global information mined from Twitter or Facebook? Do you view this as “covert research”?
David: We should have the same basic concern about these as we would any other sources of data: Were they gathered with the informed consent of those whose activities created the traces in the first place? Many of the social media sites, game hosts, etc., include permission to gather data as one of their terms of service, to which users agree when they access the site. This situation makes it hard to argue that collection of such data are “covert.” Of course, when such agreement has not been given, any gathered data in my view should not be used.
In the experience of my colleagues, the research problem is not so much the ethical one to which you refer so much as its opposite—that the commercial holders of the Big Data will not allow independent researchers access to it. This situation has led some colleagues to “creative” approaches to gathering big data that have caused some serious problems for my University’s Institutional Review Board.
In sum, I would say that there are ethical issues here that I don’t feel I understand well enough to take a firm position. I would in any particular case begin with whether it makes any sense to use these data to answer the research questions being asked.
Patricia: Who “owns” big data, and how can its owners be held accountable for its integrity and ethical use?
David: I would say that the working assumption of the researchers with whom I am familiar is either the business whose software gathers the traces or the researcher who is able to get users to use their data gathering tool, rather than the users themselves. I take it as a fair point that such data are different from, say, the personal demographic or credit card data that are arguably owned by the individual with whom they are associated. The dangers of selling or similar commercial use of these latter data are legion and clear; of the former, less clear to me, mostly because I don’t know enough about them.
Patricia: What new insights are yielded by the ability to collect and manipulate multi-terrabyte data sets?
David: This is where I am most skeptical. I can see how data on the moves typically made by players in a massive, multiplayer, online game (MMOG) like World of Warcraft ™ would be of interest to an organization that wants to make money building games, and I can see how an argument could be made that analysis of such data could lead to better games and thus be arguably in the interest of the gamers. When it comes to broader implications, say about typical human behavior in general, however, what can be inferred is much more difficult to say. There remain serious sampling issues however big the data set, since the behaviors whose traces are gathered are in no sense that I can see likely to be randomly representative of the population at large. Equally important is a point made repeatedly by my colleague John Paolillo, that the traces gathered are very difficult to use directly in any meaningful sense; that they have to be substantially “cleaned,” and that the principles of such cleaning are difficult to articulate. Paolillo works on Open Source games, where issues of ownership are less salient that they would be in the proprietary games and other software of more general interest.
Equally important: These behavioral traces are generated by activities executed in response to particular stimulations designed into the software. Such stimuli are most likely not typical of those to which humans respond; this is the essence of a technology. How they can be used to make inferences about human behavior in general is beyond my ken.
Let me illustrate in terms of some of my current research on MMOGs. Via game play ethnography, my co-authors (Shad Gross, Nic True) and I arrived at a tripartite basic typology of game moves: those essentially compelled by the physics engine which rendered the game space/time, those responsive to the specific features designed into the game by its developers, and those likely to be based on some analogy with “real life” imported by the player into the game. As the first two are clearly not “normal,” while the third is, we argue that games could be ranked in terms of the ratio between the third and the first two, such ratio constituting an initial indicator of the extent of familiarity with “real life” that could conceivably be inferred from game behavior. Perhaps more important, the kinds of traces to be gathered from play could be changed to help make measures like this easier to develop.
Patricia: What are the epistemological ramifications of big data? Does its existence change what we mean by “knowledge” about behavior and experience in the social sciences?
David: I have already had a stab at the first question. To be explicit about the second: I don’t think so. There are no fundamental knowledge alterations regarding those computer mediations of common human activity, and we don’t know what kind of knowledge is contained in manipulations of data traces generated in response to abnormal, technology-mediated stimuli.
Patricia: boyd and Crawford (2011) argue that asymmetrical access to data creates a new digital divide. What happens when researchers employed for Facebook or Google obtain access to data that is not available to researchers worldwide?
David: I find their argument technically correct, but, as above, I’m not sure how important its implications are. I am reminded of a to-remain-unnamed NSF program officer who once pointed out to a panel on which I served that NSF was unlikely to be asked to fund the really cutting edge research, as this was likely to be done as a closely guarded, corporate secrete.
Patricia: What new skills will researchers need to collect, parse, and analyze big data?
David: This is interesting. When TAing the PhD data analysis course way back in the 1970s, I argued that to take random strolls through data sets in hopes of stumbling on a statistically significant correlation was bad practice, yet this is in my understanding, the approach in “data mining.” We argue in our game research that ethnography can be used to identify the kinds of questions worth asking and thus give a focus, even foster hypothesis testing, as an alternative to such rampant empiricism. Only when such questions are taken seriously will it be possible to articulate what new skills of data analysis are likely to be needed.
Patricia: How can researchers insure data integrity across such mind-boggling large and diverse sets of information?
David: Difficult question if dealing with proprietary software; as with election software, “trust me” is not enough. This is why I have where possible encouraged study of Open Source Projects, like that of Giacomo Poderi in Trento, Italy. Here, at least, the goals of designers and researchers should be aligned.
Patricia: To some extent, anthropologists and other qualitative researchers have always struggled to have their findings respected among colleagues who work with quantitative samples of large-scale data sets. Qualitative approaches seem especially under fire in an era of Big Data. As we move forward, what is/will be the role and importance of qualitative studies in these areas?
David: As I suggested above, in my experience, much of the Data Science research is epistemologically blind. Ethnography can be used to give it some sight. By and large, however, my Data Science colleagues have not found it necessary to respond positively to my offers of collaboration, nor do I think it likely that either their research communities of funders like the NSF, a big pusher for Data Science, will push them toward collaboration with us any time soon.
Patricia: What does the future hold for dealing with “big data,” and where do we go from here?
David: I think we keep asking our questions and turn to Big Data when we can find reason to think that they can help us answer them. I see no reason to jump on the BD/DS bandwagon any time soon.
On behalf of The CASTAC Blog, please join me in thanking David Hakken for contributing his insights into a challenging new area of social science research!
Patricia G. Lange
The CASTAC Blog