Data Biography

WORLD HAPPINESS REPORT

URL: https://worldhappiness.report/ed/2022/#appendices-and-data

The first step in a thorough and reflexive data ethics is data biography. If we are considering using a data source, we need to know the basic facts about the project. We should always keep in mind that data are never “raw”–that is, are never a direct, unfiltered view of the social world. Every data source is the result of labor in a particular context.

So we begin by asking “who?”: who collected these data? A data source that doesn’t provide information about the personnel of the project is suspect. In academic settings, status and reputation are often taken to signify quality work. We should be skeptical of that perspective, but it is important to know if the people responsible for the project are trained in a relevant data field and stand behind the work by attaching their names to the project.

In a related way, we want to know the context for the data collection labor. Who, or what organizations, support the project? Were the data collective for a specific purpose, which might be different from our research question, so we should think about how those differences might shape what is in, and what is absent from, these data.

This aspect of the data biography can be complex, because there might be lots of labor involved. You want to be able to develop a brief narrative that you can use in your own work analyzing these data. What is the story you want to tell about who collected the data and why?

The other element of the biography is the “how?”: how were these data collected? Again, we should be suspicious of any data source that doesn’t explain its methods. Are these data from a survey? from official sources? from mass media? or, are these data from multiple sources? It might require us to track down individual sources if the data source is effectively an aggregate of different kinds and different sources of data.

With the World Happiness Report, we can consult the most recent report and investigate the who, how, and why questions.

https://worldhappiness.report/ed/2022/foreword/

As we create a narrative to explain the data source, keep in mind questions of power. The topics that inform the data collection are shaped by social processes and institutions, and these–we know as sociologists–are the result of the operation of power. That data are shaped by power is not a reason, itself, to dismiss these data. Rather, we need to be mindful that every social actor is positioned along various dimensions of power and that positionality limits what we experience and what we believe, so perspectives can be inclusive or exclusive depending on that positionality. We always want to ask what in the social world is visible to those who collect data (in a broad sense, from defining research questions and designing a survey to the labor of collecting responses) and what is invisible?

In sociology, it can be a challenge to ask what is absent. We are trained to see what is present and visible, and to take careful measurements. It is much harder to think about what we are missing.

In this process, we want to be mindful of our own positionality. Our view of the data source and of data practices more generally is also shaped by who we are and how our own horizons are limited.

As part of the data biography, you should get a copy of the codebook or data dictionary for the source you are examining. This is necessary for analysis, but we can use it in the biography also as we ask what is included and what is absent.

Activity.
Each group will sketch a data biography for a sample question by (a) identifying a data source, (b) getting a copy of the codebook or data dictionary, (c) investigating the who, how, and why questions from information included with the data source.

1. Are workers more dissatisfied with their work or workplace today than before the pandemic?
2. How much do states spend on public education?
3. Are voters more politically polarized than they were twenty years ago?
4. Are there gender differences in wages and are these differences affected by race?

Author: Timothy Shortell, Ph.D.

Timothy Shortell, Ph.D., Professor & Chair, Department of Sociology, Brooklyn College CUNY