Social Audits of AI: Towards Participatory Impact Evaluations of AI-Based Systems

PI: Neha Kumar, Associate Professor, School of Interactive Computing, Georgia Tech.
Student Investigator: Aman Khullar, C.S. PhD Student, School of Interactive Computing

Why It's Important: Generative Artiﬁcial Intelligence (GenAI) based systems’ harms, like AI psychosis, exacerbation of echo chambers, and spread of misinformation, are increasingly coming to light with such systems. Researchers are investigating ways to assess these harms through robust evaluation techniques. These techniques, usually involving assessments of GenAI models on benchmark evaluation datasets, help assess potential harms a system could have and perpetuate. Recent works have shown the limitations of a narrow focus on model evaluations, which miss incorporating the end users’ experiences as part of system evaluations. To include end users as part of system evaluations, researchers have recommended including feedback from individuals using the systems as part of holistic system evaluations. Our work builds on such recommendations to study and oﬀer an approach to include system users’ voices while evaluating the impact of GenAI-based systems in high-stakes domains.

Our Approach: Our research goal is to develop a framework to include systems' users' experiences as part of impact evaluations of AI-based systems. We draw on the principles of social audits—community-led evaluations of government projects, laws, and policies aiming to ensure transparency and accountability of government programs—to unpack a method for participatory evaluations of AI-based systems.

To conduct our study, we will collaborate with Noora Health, a public health non-proﬁt organization, working with family caregivers from underserved communities, to create awareness around caregiving practices. We will co-design a GenAI-based chatbot to help respond to community members' queries on maternal and child health in India. Through log-data analysis and semi-structured interviews, we will study the experiences of the community members in appropriating chatbot's recommendations in their daily lives. We will also conduct semi-structured interviews with administrative oﬃcials and elected representatives at the village level in India to understand how they currently conduct social audits of the government's programs to ensure public accountability and how it may be extended to the evaluation of AI-based programs.

Through our proposed framework, we hope to build on and add to the science of evaluating GenAI-based systems within the ﬁeld of Human-Computer Interaction.

Cloud Hub at Georgia Tech

Georgia Institute of Technology