CCC and CRA-I Respond to NTIA Request for Comment on Ethical Guidelines for Research Using Pervasive Data

The following is by Haley Griffin and reposted from the Computing Community Consortium (CCC) blog. 

Last week, CRA-Industry, in collaboration with Computing Community Consortium (CCC), submitted a Response to the National Telecommunications and Information Administration (NTIA), Department of Commerce’s Request for Comments: Ethical Guidelines for Research Using Pervasive Data. The response was written by Nazanin Andalibi (University of Michigan), David Danks (University of California, San Diego), Haley Griffin (Computing Research Association), Mary Lou Maher (Computing Research Association), Jessica McClearn (Google), Chinasa T. Okolo (The Brookings Institution), Manish Parashar (University of Utah), Jessica Pater (Parkview Health), Katie Siek (Indiana University), Tammy Toscos (Parkview Health), Helen V. Wright (Computing Research Association), and Pamela Wisniewski (Vanderbilt University). 

The National Telecommunications and Information Administration (NTIA) was seeking, “public input on the potential writing of ethical guidelines for the use of ‘pervasive data’ in research. Such guidelines, if warranted, would detail how researchers can work with pervasive data while meeting ethical expectations of research and protecting individuals’ privacy and other rights.” Below are some of the main points from CCC & CRA-I’s response.

(1) Benefits to the proposed guidelines:

  • Accountability and a standard of research to support researchers and enhance public trust.
  • A consistent approach to research ethics across different institutions, including but not limited to, universities, technology companies, and industries (e.g. healthcare, education, transportation).
  • Methods justification and improvement for researchers who work with sensitive, pervasive data.
  • Protection for researchers who study controversial topics by providing ethical foundations for their work.
  • Protection for the participants whose data is being analyzed and strengthen the protections under Common Rule (many IRBs deem some pervasive data studies as “non-human subject research,” which does not ensure the same level of ethical review as research deemed human subjects research).
  • Clarity for companies who could use the guidelines to standardize their research processes, codify best practices, make their work more equitable (by ensuring all stakeholders are considered), and likely improve the efficacy of their research as well.

(2) Drawbacks to the proposed guidelines:

  • The guidelines would need to stay robust to shifts in data practices, governance, other adjacent recognized guidelines (e.g. IRB), etc. They recommend a regular review of the guidelines to ensure alignment with current policies and needs.
  • There is no enforcement mechanism to ensure good faith adoption among researchers. The communities and people that should be protected and treated ethically may still face significant risks.
  • Researchers using pervasive data may not be aware of the guidelines. Awareness could be increased through integration with IRB CITI or NSF RCR training and by obtaining buy-in from organizations that publish research (e.g., ACM, IEEE, National Academies) or fund research (e.g., NSF, NIH, The Knight Foundation).
  • Researchers may be aware of guidelines but misuse them to justify unethical data practices (they might also design research studies to fall outside the boundaries of such guidelines–similar to researchers trying to avoid IRB review).
  • Guidelines may restrict flexibility and innovation of ethical research that falls outside the existing guidelines.
  • Data and researchers can both be outside of the U.S., and national guidelines need to consider international contexts. 

(3) The NTIA definition of pervasive data could be improved by,

  • using “digital services” or “networked services” rather than “online” since “online” can be interpreted as “on the internet”, which is too narrow in scope, and
  • including the following data: in the definition: health data, biometric data, sensor data (e.g., tracking body movements, sensed behavior of humans), non-publicly available data, personally identifiable information (PII), data of marginalized or at-risk communities (i.e., people at risk for poor health and social well being), and inferred data (i.e., algorithmic inferences of one’s identity, activities, emotion or affect, likeness, etc.).

(4) Existing barriers to accessing pervasive data:

  • Pervasive data collected and stored in technology companies is generally inaccessible to researchers outside of those companies. The predominant challenge is the misalignment between companies’ priorities (profit, legal liability) and researchers’ priorities (creating new knowledge). 
  • Costs for data access can be prohibitively expensive (which disproportionately impacts resource-constrained researchers).

Even once data is obtained, the authors expressed that there are hardships to actually use the data to conduct research (e.g. assessing the quality of data, determining who can provide consent/permission to use the data, etc.). They concluded that if researchers are to have access to pervasive data, we would need a large-scale shift in thinking about how that data is made available, what protections are available to companies, what standards researchers should be held to, and how to evaluate the quality of the data. 

(5) Data held by online services that would be most valuable to the public interest if researchers were able to access it is data that:

  • Aligns with societal priorities (e.g., democracy, healthcare, housing, children, poverty). This could allow researchers and companies to develop technologies and policies towards the greater good – e.g., helping those most in need. 
  • Helps us understand how technology is shaping our social lives (e.g., social media). This is important for understanding and tackling societal challenges, like disinformation, harassment, and mental health. 
  • Is collected, analyzed, and used (by various actors) about people, without their informed consent or even awareness. This is important for protecting people’s privacy and enabling them to make informed decisions about their digital behaviors, identities, and likeness.  

(6) Guidance for researchers working with pervasive data considering consent and autonomy. Researchers should clarify if a user allowing access to their data is,

  • Legally required (e.g. age before purchasing alcohol to be delivered) or is motivated by the company’s desire for data,
  • Required in order to use the service/obtain the information the user is attempting to access, and 
  • Going to potentially be sold/accessed to a data broker and/or researchers and/or other actors (e.g., government, law enforcement).

They also presented the principle of “do no harm” as an alternative model to traditional consent that can provide protection for data subjects in cases where autonomy is limited or consent is given in circumstances where it is required for the individual to have access to required or desired resources. 

(7) In order to take future technological advances into account, the guidelines should do the following:

  • Be reviewed every 6 -12 months (could consider establishing a working group that is responsible for this process) because as technology evolves, so should these guidelines. 
  • Rely on the principle of precedence – learn from past decisions to inform future ones (which may be different technology but similar ethical considerations).
  • Require researchers to certify a “do no harm” statement that acknowledges that the capabilities of technology will evolve, but they should never use the data in a way that could negatively impact or be used against the person who supplied it (even if they consented for it to be used for future research needs). It could include a section on what a researcher can and cannot do with pervasive data, and shift the responsibility for unforeseen negative consequences onto the researcher rather than the data subject.
  • Account for the perspectives and expectations of data subjects, with attention to the unique contexts and identities implicated in data collection, analysis, etc. (e.g. if the degree of perceived sensitivity of data type A is high for group B, then perhaps said data should not be collected from group B to begin with).
  • Update the understanding of what situations require what kind of privacy protection, accounting for what data types are sensitive to data subjects (e.g. data about affect/emotions that is increasingly relevant in pervasive data collection/inferences/use, and are considered to be “sensitive” by data subjects (Andalibi & Buss, 2020).
  • Consider how federal, state, and international data privacy regulations align and conflict and communicate these limitations to researchers so they can consider how to apply them to their respective projects.

Read the full CCC & CRA-I RFC Response here.