JCMC: The Discussion Section Ep. 3 - Jessica Vitak and Katie Shilton on Data Ethics for Researchers, IRBs, and Journal Reviewers

Nicole Ellison 0:18
Welcome to The Discussion Section, JCMC’s own podcast. I'm Nicole Ellison, editor-in-chief of JCMC. And with me today, I am thrilled to welcome Jessica Vitak, who is one of our associate editors, and Katie Shilton. Jessica is an associate professor in the College of Information Studies and an affiliate professor in the Communication Department at the University of Maryland. She's also the director of the University's Human-Computer Interaction Lab. Her research evaluates the privacy and ethical implications of big data, the internet of things, and other “smart” technologies.

Katie Shilton is an associate professor in the College of Information Studies at the University of Maryland, College Park and leads the Ethics & Values in Design Lab. Her research explores ethics and policy for the design of information technologies. She is the PI of the PERVADE project, a multicampus collaboration focused on big data research ethics. Today, Jessica and Katie will speak with me about their work with the PERVADE project, which establishes norms and increases awareness of data ethics questions among researchers, journal editors and reviewers, IRBs, and other stakeholders. Welcome, Jessica and Katie.

Jessica Vitak 1:31
Happy to be here.

Katie Shilton 1:32
Thank you. Yeah, really pleased to be here.

Nicole Ellison 1:34
So, before we start, can you tell us a little bit about PERVADE and how it got started? And what are the key goals?

Katie Shilton 1:41
So PERVADE got started out of conversations with a whole bunch of colleagues who just starting to work on ethics and big data. So it was emerging as a question in many places 6 to 10 years ago, at conferences like CSCW. And actually, the Consortium for Sociotechnical Studies has an annual summer camp and a group of us got together there and had some time and were given some resources to brainstorm about, what can we bring to this conversation about ethics? People were asking for guidelines, and people were asking for norms. And we realized there were empirical questions that were unanswered in this space around norms. And those included things like, what are data scientists actually doing? What are they doing to think through the ethics of their own work and their own projects? Are they going to their IRBs? Are they not? If so, why or why not?

So there are questions about research practices. There are also questions about what users expect. What do the people who generate pervasive data think about having their data used in research? Do they expect to be asked for consent? Do they not? Does it matter what kind of data it is? Does it matter what platform they're on? And then we also didn't know much about how IRBs were regulating this data, if at all.So actually, some of the early work Jessica and I did was about IRB practices around this data.

So there were lots of people asking these questions and we thought, well, the best way to answer a bunch of these questions at once would be to have a collaboratory, to have a big group of people who could access similar data, who had different research skills, so survey researchers, and ethnographic researchers, and computer scientists doing measurements of risk. But we would all be in conversation with each other so that we could ask questions that move across all of these stakeholder groups, move across IRBs, users, and data scientists.

So that's how PERVADE came to be. The National Science Foundation provided us with quite a bit of funding to coordinate a six-campus project. And here we are six-ish years later, and we have answers to a lot of those questions.

Nicole Ellison 3:30
I would love to hear you talk through each of these stakeholder groups that you mentioned, the researchers, social media users and participants, and then also the IRB and the folks who are trying to regulate this new world. What are some of the key findings or insights for each of these important groups?

Jessica Vitak 3:52
For end users, participants, people for whom data is being collected, analyzed, reported, shared, etc., we've done a few studies. And a lot of this started with Casey Fiesler, who's a professor at the University of Colorado Boulder. She had done some work in the very beginning of this project with Nick Proferes, who's a faculty member at Arizona State. And they were looking at Twitter users because Twitter is probably the most common social media platform for scraping. The terms of service allow it. There's tremendous amounts of data, very easy, you don't need any skills to pull it at this point. And so what they did is they interviewed people and did a survey with Twitter users about whether they had any knowledge of this practice, that researchers were regularly scraping the platform, and how they felt about it.

One of the things that we hear again and again is this argument that if the data is public, it's a free-for-all. Sure, you post publicly on Twitter, you basically lose all rights to it. You're assuming that it will be used in various ways. And Casey's work found that is not how users feel about it at all. They were surprised. They didn't actually have expectations of their data being used in that way. Their frame for thinking about tweets was rather narrow. And they wanted some kind of notification, even if that wasn't consent. They wanted to be alerted.

So something that Katie, myself, and our postdoc at the time, Sarah Gilbert, did is we ran a series of four studies looking at four different platforms: Facebook, Instagram, Reddit, and dating apps. Now, we used a method called factorial vignettes, which are really useful for ethical questions because what they allow us to do is to present people with multiple vignettes and very small things in them, and that allows us to start to see what are the individual contextual factors that shape people's attitudes, and how small shifts can make something go from being viewed as appropriate to inappropriate. Probably the biggest finding is reinforcing what Casey found around consent and notification. People very clearly want to learn about this, ideally beforehand, and have an option for not participating, like an opt-out of research at the platform level. But even if not that, they want to be notified at some point, even if it's after the fact. The takeaway here is that from an ethical perspective, relying just on the legal aspect of it does not make it ethical, and that users are not thinking in these very simplistic ways, that “I'm signing up for a platform. I acknowledge that everything that I share on this platform can be used in any way.”

Katie Shilton 6:43
We realized that this is actually not so far off from another human sciences tradition, which is that of anthropology and ethnography, specifically, where people go into public settings, or sometimes semi-private or quasi-public–and that line gets really blurry, just like it does on social media–and observe. They collect data, but with their own sensors, with their eyes, and their pens, and their ears, so not digital streams of data, but data, and make interpretations about people and their behaviors and groups and their behaviors. And ethnographers, interestingly and similarly to data scientists, have also a really fraught history of research ethics. So ethnography comes out of a history of colonialism and it was a tool of colonialism for a long time. Data science comes out of a place of surveillance and a history of surveillance. It is, in many cases, a tool of surveillance.

And so we think that data scientists may need to grapple with their tools and techniques, just like ethnographers had to grapple with history of their tradition. So while ethnographers have developed all kinds of ways of letting people know that they're in the space– it's called gaining entree. You get permission to be there, and maybe not from every single person in the space, but from leaders or from people who might be actually at more risk or more sensitive about you being there. And we think that there are analogies there for data scientists to think about what it means to be in a space to be observing with people's data, so for awareness; but also, ethnographers think a lot about power. And we wanted to bring forward that discussion to data science as well, that the use of data is not neutral. Researchers need to be thinking about who they are making more vulnerable by using online data and what they might do to protect that data should, say, government bodies come after it. We know that this is happening. We know that datafication has put particular groups at risk. As a researcher, what are you going to do, if you are confronted with extra risk for your participants in various ways? And we want to see data scientists thinking about that, just like ethnographers have had to think about it and reflect on it in their papers, and pull it forward and make it part of the discussion.

Nicole Ellison 8:50
What do you think then we as researchers should take away from this in terms of our data use practices? How do we balance being true and ethical to participants while also doing good science?

Katie Shilton 9:10
So one of the very unsatisfying answers to that question is it's going to depend in every single case. It's going to depend on what kind of data you're using, what kind of expectations folks on a particular platform might have, what kinds of vulnerable populations or vulnerable situations might be caught up in that data or not. So one of the things we are trying to do next at PERVADE is give researchers a systematic way of thinking through all of those, “it depends,” all of those “what ifs, what ifs.” And so we're working on a decision support tool. We're likening it to a Buzzfeed quiz for your data practices, where you could answer a series of questions about the kinds of data, the kinds of questions you're trying to answer with that data, whether you're linking the different data sets together, whether you have talked to an IRB, whether there are norms in your field for, say, deception, or other sorts of interventions, and then give you a set of resources and give you a read on if there are concerns that might be present in your project, as well as resources to go to to say, “here's where to look if you're worried about deception, or here's where to look if you're using hacked data, or here's the best set of research as we have now on each of these topics.” So we're working on that. We think it will be a good tool for students, but also for researchers for teaching research ethics because there are just no bright lines here. It's so dependent.

We've also been interviewing data scientists, and I will say, data scientists who wanted to be interviewed by a group interested in data ethics. One of the things we're seeing is real attention to qualitative practices alongside data science practices, so people who are going and hanging out in online gaming spaces before they do any studying, they have to play for 100 hours. Or people who were looking at addiction recovery resources but spent time in those spaces before doing any scraping to understand, why do people share on these spaces, and why not? And we're seeing these sensitive, thoughtful practices around getting to know what these data spaces mean, and what kinds of risks there might be there. We just want to push those practices forward.

Jessica Vitak 11:17
I think this speaks to a larger issue that we have certainly encountered a lot around the fact that people don't receive training on identifying potential ethical issues in data science. So what we've heard on multiple occasions is people say, “Well, IRB said this isn't human subjects research,” and are conflating that with it being ethical research, when what IRBs do is a risk assessment to human subjects. They're not making a determination that this is ethical or unethical research. Obviously, the two are correlated. And so thinking about how to help people identify those points at which they should be stopping and reflecting, and the questions that they might want to consider, that's another one of these goals of the decision support tool we're working on, simply because people are like, “I don't know where to start. When I was trained, I was told scraping was alright, and I didn't need to worry about anything because it didn't violate the terms of service.” People want that checklist. But ethics, as we know, is almost all gray. And it requires training and how to think about data, how to know what questions to ask, how to find resources to answer those questions.

Nicole Ellison 12:36
Do you see this tool as having an IRB-facing component? Or do you think this could also be a place where IRBs, when they feel legitimately without guidance, could come, as well as researchers?

Katie Shilton 12:51
We think a tool like ours can be helpful, because, you know, we think we've baked in a good set of considerations. maybe not all of them, but a good set of considerations. We think it's you know, it's a, The tool is not going to tell anybody to not to do their research, but is instead going to say, please pay attention to these five things if you're going to do this research, right and please write about how you addressed each one of them. I think it's a conversation worth having with IRBs about whether they would find something like this to be useful as a resource, not as a decision-making tool, but as a decision support tool, to support their decisions to support researcher decisions with a set of resources and not necessarily a judgment. IRBs are set up to make the judgment. We're not set up to do that.

Jessica Vitak 13:29
IRBs, like Katie has already said, they don't have the bandwidth to suddenly double the amount of applications they're reviewing. And much of this research doesn't fall under their purview. So we may need additional regulatory bodies that act in a similar way to IRB but are focusing more on digital data.

Katie Shilton 13:49
This is an area too where there are interesting parallels from 20 years ago in the practices of ethnography. So ethnographers were butting heads with IRBs in the 80s and 90s because IRBs wanted, for instance, consent forms from every single person in the field. And that didn't make sense or was maybe even dangerous in some settings for some participants and things like that. As a field, anthropology took on the task of educating IRBs about how ethnographic research methods work, what the norms are in the field, how they take risk into account, and how they take power into account. I think data science can do something like that here. It's harder because data scientists are trained in not just one field, whereas anthropologists are trained in a set of institutions, a set of programs that are pre-identifiable. So we have a harder challenge in data science in that it's not a discipline, it's a set of tools. But that said, I do think as communities within data science develop norms around what is and isn't acceptable uses of data, they can talk to IRBs about, “This is how things work in our field. Here's how we are careful. Here's what we think,” and that conversation could be really helpful.

Nicole Ellison 14:50
What do you think is the role of journals, and reviewers, and that whole kind of part of the research ecosystem, right, the publishing piece, where it gets, you know, it’s translated from a Word document on my hard drive to a have, you know, knock on wood, a peer-reviewed publication? I know your kind of main focus groups are the researchers, participants, and IRBs, but what do you think as journal editors, as reviewers, what things can we be doing or should we be thinking about?

Katie Shilton 15:22
I love this question, partially because research cohort, we need to move together towards expecting good ethics to be part of good methods, that those two things are intrinsically linked. And so we hope for a future in which in people's method sections, they are talking about appropriate data uses, what they did to mitigate risk, and why they think this is a good use of this particular data set as part of how they did the research, as a part of an explanation, and a world where reviewers are comfortable evaluating that, in the same way that we get comfortable, we learn to evaluate people's methods. That's the goal, the dream. And I think journal editors can do a lot to encourage that, like asking reviewers, for instance, are you comfortable evaluating the data use here for appropriate data use, or something like that? Or should we bring in a third reviewer who might have more expertise there?

I do think review processes can be a really good place to do that. That said, this all leaves out the hard question of, if they've truly created risk, that research has been done at the point of review. It already created the risk. And so that's where IRBs can be really useful and are different from the gatekeeping function of publishing because IRBs can stop things before the risk happens. So there's an important role in norming and in growing a field of people who think through ethics as part of what they do that journals and conferences can gatekeep. But IRBs also have a role to play.

Nicole Ellison 16:43
Dr. Vitak, I'm wondering, as someone who is very involved and supportive of the communication field, are there any of these insights, arguments, or practices that you think would be especially relevant for those of us in the communication field?

Jessica Vitak 17:02
I think communication in general is embracing a lot of data science-esque techniques. And I've seen a big shift in the 10 years since I got my PhD to more computational social science within communication fields. So I think a lot of these questions are the same. I will say that one thing that stood out the most when we did a study back in 2015 of people who were from many disciplines working with human subjects data, there was this very interesting pattern around a lot of people who took this survey thought deception was really problematic. Communication researchers and some of the other social science disciplines use deception regularly in their research. So I think there is a knee-jerk response in some communities around using deception because a lot of communities don't ever use it, so they don't understand how deception can be done ethically as part of a research study. But sure, if you're doing human subjects research, most of these questions are important to consider around how you collect data, how you analyze data, how, if at all, you share data. So yes, I think what we're finding definitely is widely applicable to communication researchers.

Nicole Ellison 18:21
Great. I know that you have gone to various conferences and set up a booth where researchers are encouraged to come and chat with an ethicist. I would love to hear you talk a little bit about that experience, and what gave you that idea, what things come up. Do you think this is something that you would encourage other conferences to consider?

Jessica Vitak 18:44
Yeah, so we have obviously been doing work for the grant for five years. All of us had been doing work before then. We have a lot of data, a lot of findings, and we're entering the last year of this project. So we were like, why don't we hit the road? We're gonna do a road show, where we go to various conferences that include the people who are of most interest to us, data scientists broadly writ.

We're setting up with the other exhibitors. So we're there next to the big tech companies who are recruiting, which is very confusing to people who are attending these conferences. And we literally set up a Ask an Ethicist booth, where we encourage people to ask questions about various ethical quandaries they've experienced. We've tried to garner ideas from them on what types of resources their community might need. And this is largely motivated by the idea if we can go to diverse conferences, but they're all connected by working with pervasive data, we're going to start to see different needs and different approaches. And how are these communities talking about it versus other communities? That can be very helpful in figuring out resource allocation. Oh, well we want to make sure that we promote this decision support tool in this community where people are saying, “we don't have any resources to help us think through these types of questions.” I think most people we spoke to were really appreciative of the idea that we're trying to bring these conversations to the forefront around data science research more broadly. And many feel that it needs more attention.

Nicole Ellison 20:28
Let's say you're starting to design your study. You have the sense that there could be some ethical questions around the data that you want to use from social media platforms. What do you think would be some good first steps for someone just starting out and wanting to educate themselves about these issues?

Jessica Vitak 20:47
The resource I always turn to comes from the Association of Internet Researchers, which emerged in the 1990s, just as the internet was really becoming a thing. They were among the first people who were studying the internet. They also have some really amazing ethicists, philosophers in that group who have created ethical guidelines for doing internet-related research. And they've updated it multiple times, including just a couple of years ago. And they provide us very detailed guidance. And again, it's very similar to the things we've been talking about today, where they're not telling you do this, don't do that. They're providing you with a set of questions and considerations for when you're doing internet-based research. It's very accessible. Just search for “Association of Internet Researchers” or a “AoIR ethics guidelines.” You're going to find them. And I always point people to that first because it's just so well done, and it’s very current because it is regularly updated.

Katie Shilton 21:48
The other resource I really like to point people to, if you are working with data, and you are looking for a set of principles to guide your data use, Data Feminism by D’Ignazio and Klein. It's a set of really clear principles to put the people behind the data forward, the work behind the data, and the bodies behind the data. I think it's really usable. It's really readable. There are great examples to the book. It's also beautiful color illustrations. So that's another one I point students to.

Nicole Ellison 22:14
Great. Okay, thank you so much, Katie Shilton and Jessica Vitak, for this really interesting, illustrative, and educational chat. I will look forward to seeing your project continue to develop. And I’m very excited about some of the tools that you're developing. So thank you again for sharing your thoughts with the Discussion Section.

Katie Shilton 22:32
Thanks so much for having us.

Jessica Vitak 22:33
Thanks.

Nicole Ellison 22:40
JCMC: The Discussion Section is a production of the International Communication Association Podcast Network. This episode of The Discussion Section is sponsored by Oxford University Press. OUP is the proud publisher of the ICA journals, including the Journal of Computer-Mediated Communication. Their mission is to create the highest quality academic and educational resources and services and to make them available across the world. Our producer is Kate In. Our Executive Producer is DeVante Brown. The theme music is by Nicholas Rowe. Please check the show notes in the episode description to learn more about me, the articles we discussed, and JCMC: The Discussion Section overall. Thanks for listening!

JCMC: The Discussion Section Ep. 3 - Jessica Vitak and Katie Shilton on Data Ethics for Researchers, IRBs, and Journal Reviewers
Broadcast by