Data Ethics

There are lots of ways to think about data ethics, but perhaps the most comprehensive in a sociological perspective is data feminism. We often refer to a diverse collection of social theories as “feminism” rather than “feminisms” but we should be mindful that there are many feminist perspectives in sociology. We’ll borrow some ideas from these perspectives to understand power and oppression so that we can ask questions about how data practices are implicated in both processes of oppression and, potentially, liberation.

I’m summarizing the argument of Catherine D’Ignazio and Lauren F. Klein in their book Data Feminism. I highly recommend it. We won’t need to go into the full detail of their argument in this class. If you are interested in feminist theory, it is a good resource. I teach with the book in both data analysis and sociology of science classes.

The key idea is that to understand data practices, we have to begin with the concept of power. One of the most significant contributions of feminist theory in sociology is the concept of intersectionality, a recognition that power operates in complex ways that are manifest in multiple dimensions simultaneously. So we talk about race and gender and class rather than just race or gender or class. This is a good place to begin because we want to view data practices in context: how they are connected to social processes and institutions. Recognizing that power is intersectional, we can avoid an overly simplistic account of the context of data practices.

Table 1 from Data Feminism by D'Ignazio and Klein

The second key idea is that knowledge is situated in social contexts. As we discussed with the data biography, everyone who is involved in a data practice is positioned with regard to the operations of power. Our positionality shapes how we experience the world, what we believe, and how we act. To understand the effects of a data practice–which is the essential question in any data ethics–we have to know how the positionality of the researchers. The kinds of data collected, or not collected, is partly determined by the positionality of the researchers.

The third key idea follows from an understanding of power as a social arrangement. In sociology, we often talk about inequality, but this concept obscures the operation of power. It appears (or presents itself) as a fact, that some groups have more and others less. But a feminist approach recognizes that the arrangements that produce inequality are socially constructed, so their operation produces inequality; there is agency in the process. Those who benefit from the arrangement may or may not recognized their privilege, but the operation of power that produces their privilege simultaneously produces oppression. Rather than think of data practices in terms of inequality, a more sociological perspective would ask about oppression. How do these data practices produce advantages for some and disadvantages, or harms, for others?

As D’Ignazio and Klein relate, “what counts gets counted.” That is, the things that are deemed important (by those with the power to determine what is important) are more likely to be incorporated into data practices. In many contexts today, the determination of what is important is a private decision by corporations, often related to their profit motives. So these data are collected, in order to be commodified, and other kinds of data are ignored. As data scientists advocate for data practices like “machine learning” to create algorithms that make important decisions within organizations, the availability of some kinds of data and the absence of others distorts the “learning” that guides these algorithms. The result is benefits go to some and harms are inflicted on others.

A data ethics would require of us to use data practices for good. Rather than think of a data practice in terms of its effect on inequality, which is an abstraction, we should frame our analysis in terms of resisting oppression. How do we use data practices for liberation?

This shift in the frame helps us to see the connection between data skills and activism. Instead of abstract notions like “data for good” we can use data practices for co-liberation. We speak of “co-liberation” because the positionality of the researchers and of those harmed by data practices are often not the same. As activists, we have to work together, not on behalf of those harmed, but as partners with those who are harmed.

By expanding the community involved in data practices we can extend the horizons of our viewpoint. More data democracy means greater insight. The limits of our understanding as a result of our positionality can be mediated by dialogue with others differently positioned. We are all experts in our own experience, and as researchers, we can share our expertise in data practices with those whose lives are affected by them.

Author: Timothy Shortell, Ph.D.

Timothy Shortell, Ph.D., Professor & Chair, Department of Sociology, Brooklyn College CUNY