Consent & Show

## Introduction: the origin story

Upol Ehsan shares the "why" behind the workshop followed by an introduction of the organizing committee. Watch this if you want to learn how the workshop came about, what our goals were, and how we plan to build on the conversation around Human-centered XAI.

## Tim Miller

We are excited to announce that our keynote speaker will be Tim Miller! Tim is a Professor in the School of Computing and Information Systems at The University of Melbourne, and Co-Director for the Centre of AI and Digital Ethics

Tim's work has been formative for many threads of XAI, from bridging lessons from the social sciences to designing counterfactual explanations.
Just like us, Tim is also excited to engage with the participants at the workshop. Here is what he had to say about the workshop:

"I'm looking forward to engaging with participants because the HCXAI workshop is encouraging an interdisciplinary approach to XAI -- just look at the breadth of knowledge on the organising committee! My talk will focus on evaluation in XAI, including asking whether we should care about trust in XAI evaluation, and how it should be done. This sits right at the heart of interdisciplinary research!"

Consent & Show

## Keynote

Tim Miller from the University of Melbourne gives an excellent keynote on Trust and Evaluation in XAI from a human-centered perspecive by bridging work from the CS and social sciences.

Consent & Show

## Session #1: Involving End Users

How can we involve end-users? What should we consider while involving them? What are the pitfalls we need to consider?

Session Chair: Andreas Riener

Paper #1: System Explanations: A Cautionary Tale

Ellen Voorhees

Paper #2: Interactive End-User Machine Learning to Boost Explainability and Transparency of Digital Footprint Data

Florian Bemmann, Daniel Buschek and Heinrich Hussmann

Paper #3: Evaluating human understanding in XAI systems

Davide Gentile, Greg A. Jamieson and Birsen Donmez

Consent & Show

## Session #2: Explanation Design

What are the factors we need to keep in mind when we design explanations? What are the challenges and opportunities?

Session Chair: Marc Streit

Paper #1: Explanation Interfaces: Two Approaches for Grounding Design Decisions

Henrik Mucha and Franziska Schulz

Paper #2: Counterfactual Explanations for Machine Learning: Challenges Revisited

Sahil Verma, John Dickerson and Keegan Hines

Paper #3:  Explaining the Road Not Taken

Hua Shen and Ting-Hao K. Huang

Consent & Show

## Session #3: Frameworks

What are the frameworks for human-centered perspectives in XAI? How can we operationalize them?

Session Chair: Q. Vera Liao

Paper #1: The XAI Primer: A Digital Ideation Space for Explainable Artificial Intelligence Strategies

Paper #2: LEx: A Framework for Operationalising Layers of AI Explanations

Ronal Singh, Upol Ehsan, Marc Cheong, Mark O. Riedl and Tim Miller

Paper #3: Learning to Explain Machine Learning

Vedant Nanda, Duncan McElfresh and John P. Dickerson

## Expert Panel Discussion

We are excited to have a stellar line up of renowned scholars -- Michael Muller, Simone Stumpf, Brian Lim, and Enrico Bertini -- engage in an interactive panel discussion at the workshop!

Bridging the diverse threads of their work, the panelists will discuss:

• What is the biggest barrier to operationalize human-centered perspectives in XAI?
• How might we address this barrier?
• What can go wrong if we don't address this?

Consent & Show

## Expert Panel Discussion

We had an amazing panel discussion amongst renowned scholars -- Michael Muller (IBM Research), Simone Stumpf (City University of London), Brian Lim (National University of Singapore), and Enrico Bertini (New York University).

Bridging the diverse threads of their work, the panelists discussed:

• What is the biggest barrier to operationalize human-centered perspectives in XAI?
• How might we address this barrier?
• What can go wrong if we don't address this?

Consent & Show

## Poster Spotlight Videos

Beyond the live presentations throughout the three sessions, we had a series of two-minute madness poster (pre-recorded) presentations. Afterwards, each "poster" had its own Discord channel where participants "dropped in" to discuss about their projects. While the presentations themselves were stellar, the best part was to engage with multiple authors.

Consent & Show

## Wrap Up: where do we go from here?

Upol Ehsan concludes the workshop by sharing how the evolution of bicycles can help us navigate the Explainable AI landscape-- where we are right now, where we can go in the future, and next steps to shape the discourse around HCXAI.

### Operationalizing Human-Centered Perspectives in Explainable AI

Upol Ehsan, Philipp Wintersberger, Q. Vera Liao, Martina Mara, Marc Streit, Sandra Wachter, Andreas Riener, Mark O. Riedl

The realm of Artificial Intelligence (AI)’s impact on our lives is far reaching – with AI systems proliferating high-stakes domains such

as healthcare, finance, mobility, law, etc., these systems must be able to explain their decision to diverse end-users comprehensibly. Yet the discourse of Explainable AI (XAI) has been predominantly focused on algorithm-centered approaches, suffering from gaps in
meeting user needs and exacerbating issues of algorithmic opacity. To address these issues, researchers have called for human-centered approaches to XAI. There is a need to chart the domain and shape the discourse of XAI with reflective discussions from diverse stakeholders. The goal of this workshop is to examine how human-centered perspectives in XAI can be operationalized at the conceptual, methodological, and technical levels. Encouraging holistic (historical, sociological, and technical) approaches, we put an emphasis on “operationalizing”, aiming to produce actionable frameworks, transferable evaluation methods, concrete design guidelines, and articulate a coordinated research agenda for XAI.

### System Explanations: A Cautionary Tale

Ellen Voorhees

There are increasing calls for systems that are able to explain themselves to their end users to increase transparency and help engender trust. But, what should such explanations contain, and how should that information be presented? A pilot study of justifications produced by textual entailment systems serves as a cautionary tale that there are no general answers to these questions. Six different judges each acting in the role of surrogate end-user independently rated how comprehensible justifications of entailment decisions were on a five-point scale. Interrater agreement was low, with kappa scores less than 0.2. More than half of the explanations received both one rating of 'Very Poor' or 'Poor' and one rating of 'Good' or 'Very Good'; and in 32 cases, the same explanation received all five possible ratings from 'Very Poor' through 'Very Good'.

### Explanation Interfaces: Two Approaches for Grounding Design Decisions

Henrik Mucha and Franziska Schulz

Human decision-making is increasingly supported by artificially intelligent (AI) systems. Yet, it is not clear how machine advice can be made interpretable through human-machine interfaces. In this paper, we make a case for systemizing the design process of explanation interfaces by making transparent design decisions, also referred to as design rationale. To this end, we present a top-down and a bottom-up approach for defining design spaces. These spaces then represent an empirically grounded knowledge base from which to systematically derive alternative design solutions for explanation interfaces.

### The XAI Primer: A Digital Ideation Space for Explainable Artificial Intelligence Strategies

Explainable Artificial Intelligence (XAI) processes typically combine various explanation and verification strategies to support the analysis in different domains. Due to the increasing number of techniques and the variety of XAI methods deployed, deriving a comprehensive overview framework of different strategy combinations remains challenging. The paper presents a proposal for a digital ideation space in which designers of XAI processes can derive inspiration and investigate existing works. We propose an exploratory interface depicting both XAI strategies and applications to support designers conceptualizing and developing new projects. The {XAI} Primer is designed based on the metaphor of a museum. Designers can explore the presented ideation space as if they were artists visiting an art gallery. We enable serendipitous and guided explorations, allowing them to investigate and probe the state of the art as a source of inspiration.

### Counterfactual Explanations for Machine Learning: Challenges Revisited

Sahil Verma, John Dickerson and Keegan Hines

Counterfactual explanations (CFEs) are an emerging technique under the umbrella of interpretability of machine learning (ML) models. They provide "what if" feedback of the form "if an input datapoint were $x'$ instead of $x$, then an ML model's output would be $y'$ instead of $y$." Counterfactual explainability for ML models has yet to see widespread adoption in industry. In this short paper, we posit reasons for this slow uptake. Leveraging recent work outlining desirable properties of CFEs and our experience running the ML wing of a model monitoring startup, we identify outstanding obstacles hindering CFE deployment in industry.

### LEx: A Framework for Operationalising Layers of AI Explanations

Ronal Singh, Upol Ehsan, Marc Cheong, Mark O. Riedl and Tim Miller

Several social factors impact how people respond to AI explanations used to justify AI decisions. In this position paper, we define a framework called the Layers of Explanation (LEx), a lens through which we can assess the appropriateness of different types of explanations. The framework uses the notions of sensitivity (emotional responsiveness) of features and the level of stakes (decision's consequence) in a domain to determine whether types of explanations are appropriate in a given context. We demonstrate how to use the framework to assess the appropriateness of different types of explanations in different domains.

### Evaluating human understanding in XAI systems

Davide Gentile, Greg A. Jamieson and Birsen Donmez

Explanation in human-AI systems can provide the foundations for supporting joint decision-making and effective reliance. However, this desideratum depends on the relevant stakeholders understanding the AI’s capabilities and the reason behind its outputs. In this position paper, we compare two approaches used to measure human understanding in XAI: proxy tasks, i.e., artificial tasks evaluating user ability to simulate the AI decision, and mental models, i.e., user internal representation of the structure and function of a given system. We argue that, although widely used, proxy tasks (i) can fail at evaluating the effectiveness of explanations in human-AI systems due to the absence of a realistic end-goal, and (ii) may not translate to system performance in actual decision-making tasks. We further propose that existing research in human factors and the social sciences can guide mental model-based evaluations of human understanding with realistic decision-making tasks. Given the objective of providing explanations that facilitate decision-making tasks, we conclude by arguing that a rigorous evaluation of explainable systems needs to integrate a quantitative assessment of users’ prior knowledge of AI systems.

### Interactive End-User Machine Learning to Boost Explainability and Transparency of Digital Footprint Data

Florian Bemmann, Daniel Buschek and Heinrich Hussmann

Data collecting applications today only inform users about what data is collected directly, but not about what can be inferred from it. However, awareness of potential inferences is important from a data privacy perspective, especially as inferred information has been shown to be applicable for unethical applications as well. We propose interactive user involvement in model building: Participatory Model Design lets users interactively investigate what happens to their data, to convey which further information could be inferred. To operationalize such interactive explainability in practice, we created a prototype that integrates interactive personalized model training into a behaviour logging app for mobile sensing research. With our prototype we hope to spark discussions and further work towards strong direct user involvement in data collection and inference, to increase data privacy in the age of big data, and to facilitate explainability and transparency of downstream prediction systems.

### Learning to Explain Machine Learning

Vedant Nanda, Duncan McElfresh and John P. Dickerson

Explainable AI (XAI) methods yield human-understandable, post-hoc descriptions of a machine learning (ML) model's behavior. Evaluation metrics for XAI methods fall within readily-measurable dimensions such as fidelity of the explanation to the underlying ML model, various form of human comprehensibility, computational overhead, and others. We argue that---given ML models' role as only one piece of larger, deployed sociotechnial systems---these metrics alone do not enable selection of an appropriate XAI method, or methods, for a specific use case. Indeed, it is necessary to include additional context, related to the user of the system as well as the downstream impact of the ML model. Inspired by prior work in human-computer interaction and computational social choice, we propose a learning-based framework for the selection of XAI methods that is tailored to each user and context.

### Explaining the Road Not Taken

Hua Shen and Ting-Hao K. Huang

It is unclear if existing interpretations of deep neural network models respond effectively to the needs of users. This paper summarizes the common forms of explanations (such as feature attribution, decision rules, or probes) used in over 200 recent papers about natural language processing (NLP), and compares them against user questions collected in the XAI Question Bank. We found that although users are interested in explanations for the road not taken — namely, why the model chose one result and not a well-defined, seemly similar legitimate counterpart — most model interpretations cannot answer these questions.

### Explaining How Your AI System is Fair

Boris Ruf and Marcin Detyniecki

To implement fair machine learning in a sustainable way, choosing the right fairness objective is key. Since fairness is a concept of justice which comes in various, sometimes conflicting definitions, this is not a trivial task though. The most appropriate fairness definition for an artificial intelligence (AI) system is a matter of ethical standards and legal requirements, and the right choice depends on the particular use case and its context. In this position paper, we propose to use a decision tree as means to explain and justify the implemented kind of fairness to the end users. Such a structure would first of all support AI practitioners in mapping ethical principles to fairness definitions for a concrete application and therefore make the selection a straightforward and transparent process. However, this approach would also help document the reasoning behind the decision making. Due to the general complexity of the topic of fairness in AI, we argue that specifying "fairness" for a given use case is the best way forward to maintain confidence in AI systems. In this case, this could be achieved by sharing the reasons and principles expressed during the decision making process with the broader audience.

### Empirical hints of cognitive biases despite human-centered AI explanations

Nicolas Scharowski, Klaus Opwis and Florian Brühlmann

In explainable artificial intelligence (XAI) research, explainability is widely regarded as crucial for user trust in artificial intelligence (AI). However, empirical investigations of this assumption are still lacking. There are several proposals as to how explainability might be achieved and it is an ongoing debate what ramifications explanations actually have on humans. In our work-in-progress we explored two post-hoc explanation approaches presented in natural language as a means for explainable AI. We examined the effects of human-centered explanations on trust behavior in a financial decision-making experiment (N = 387), captured by weight of advice (WOA). Results showed that AI explanations lead to higher trust behavior if participants were advised to \textit{decrease} an initial price estimate. However, explanations had no effect if the AI recommended to \textit{increase} the initial price estimate. We argue that these differences in trust behavior may be caused by cognitive biases and heuristics that people retain in their decision-making processes involving AI. So far, XAI has primarily focused on biased data and prejudice due to incorrect assumptions in the machine learning process. The implications of potential biases and heuristics that humans exhibit when being presented AI explanations have received little attention in the current XAI debate.

Prerna Juneja and Tanushree Mitra

In this position paper, we propose the use of existing XAI frameworks to design interventions in scenarios where algorithms expose users to problematic content (e.g. anti-vaccine videos). Our intervention design includes facts (to indicate algorithmic justification of what happened) accompanied with either forewarnings or counterfactual explanations. While forewarnings indicate potential risks of an action to users, the counterfactual explanations will indicate what actions the user should perform to change the algorithmic outcome. We envision the use of such interventions as `decision aids' to users which will help them make informed choices.

### Understanding Mental Models of AI through Player-AI Interaction

Jennifer Villareale and Jichen Zhu

Designing human-centered AI-driven applications require deep understandings of how people develop mental models of AI. Currently, we have little knowledge of this process and limited tools to study it. This paper presents the position that AI-based games, particularly the player-AI interaction component, offer an ideal domain to study the process in which mental models evolve. We present a case study to illustrate the benefits of our approach for explainable AI.

### Challenges for operationalizing XAI in Critical Interactive Systems

Célia Martinie

Large scale critical interactive systems (e.g. aircraft cockpits, satellite command and control applications…) aim to carry out complex missions. Users of such system usually perform predefined tasks for which they are trained and qualified. Nowadays, critical systems tend to embed an increasing number of on-board sensors, which collect large amounts of data to be interpreted and extrapolated, which tend to make more complex user tasks. Artificial Intelligence (AI) could be a powerful option to support the users in managing their tasks and handling this complexity. However, operationalizing AI in critical interactive systems requires proving that the AI behavior is consistent with user tasks, as well as transparent to the users and to the certification stakeholders. Explainable AI (XAI) is key as it could be a significant mean to satisfy this requirement. Nevertheless, at the same time, XAI will also have to comply with needs and common practices for the design and development of critical interactive systems. This position paper discusses the main challenges for operationalizing XAI in critical interactive systems.

### Designer-User Communication for XAI: An epistemological approach to discuss XAI design

Juliana Jansen Ferreira and Mateus de Souza Monteiro

Artificial Intelligence is becoming part of any technology we use nowadays. If the AI informs people's decisions, the explanation about AI's outcomes, results, and behavior becomes a necessary capability. However, the discussion of XAI features with various stakeholders is not a trivial task. Most of the available frameworks and methods for XAI focus on data scientists and ML developers as users. Our research is about XAI for end-users of AI systems. We argue that we need to discuss XAI early in the AI-system design process and with all stakeholders. In this work, we aimed at investigating how to operationalize the discussion about XAI scenarios and opportunities among designers and developers of AI and its end-users. We took the Signifying Message as our conceptual tool to structure and discuss XAI scenarios. We experiment with its use for the discussion of a healthcare AI-System.

### AI Explainability: Why One Explanation Cannot Fit All

Milda Norkute

This paper describes the challenges by faced by practitioners in selecting explainability methods for Artificial Intelligence (AI) models. Research into explainability of Natural Language Processing (NLP) models with attention mechanisms is discussed to illustrate how the value of the same explanation method depends on not only the audience but also the context and the use case for the explanation. It is proposed that when designing explanations, we should research what users of the model will use them for, so that they can be designed to task.

### Are Users in the Loop? Development of the Subjective Information Processing Awareness Scale to Assess XAI

Tim Schrills, Mourad Zoubir, Mona Bickel, Susanne Kargl and Thomas Franke

While interacting with intelligent systems, users need to rely on automated information processing. However, often systems do not offer sufficient transparency to keep users in the loop. Based on Situation Awareness Theory we constructed the 12-item SIPA scale. SIPA assesses if during system interaction a user feels enabled to 1) perceive the information used by the system, 2) understand the system’s information processing and 3) predict its outcome. In an online study with N = 55, two systems - a web search and an online-symptom checker - were evaluated. Results indicate that the scale is highly reliable, shows moderate to strong correlations with the Trust in Automation scale and the Affinity for Technology Interaction (ATI) Scale. We also found a moderate correlation with the Explanation Satisfaction Scale. Based on the results, the questionnaire appears to be a promising tool to guide future human-centered XAI research.

### A Multistakeholder Approach Towards Evaluating AI Transparency Mechanisms

Ana Lucic, Madhulika Srikumar, Umang Bhatt, Alice Xiang, Ankur Taly, Q. Vera Liao and Maarten de Rijke

Given that there are a variety of stakeholders involved in, and affected by, decisions from machine learning (ML) models, it is important to consider that different stakeholders have different transparency needs. Previous work found that the majority of deployed transparency mechanisms primarily serve technical stakeholders. In our work, we want to investigate how well transparency mechanisms might work in practice for a more diverse set of stakeholders by conducting a large-scale, mixed-methods user study across a range of organizations, within a particular industry such as health care, criminal justice, or content moderation. In this paper, we outline the setup for our study.

### Conveying Agent Behavior to People

Ofra Amir

To date, work on explainable AI in the HCI community focused mostly on explaining supervised learning models, such as decision-support systems and recommender systems. In this position paper, we discuss the problem of conveying the behavior of agent operating in sequential decision-making settings (e.g., reinforcement learning agents) to people. This setting presents several challenges beyond the supervised learning setting, as the agents act in the world over an extended duration, rather than making one-shot predictions. This introduces various problems, such as conveying the policies the agents follow, their state representations and their objectives. We discuss three specific challenges and lay out initial ideas for addressing them.

### Using Situated Case Studies for the Human-Centered Design of Explanation User Interfaces

Claudia Müller-Birn, Katrin Glinka, Peter Sörries, Michael Tebbe and Susanne Michl

Researchers and practitioners increasingly consider a human-centered perspective in the design of machine learning-based applications, especially in the context of Explainable Artificial Intelligence (XAI). However, clear methodological guidance in this context is still missing because each new situation seems to require a new setup, which also creates different methodological challenges. Existing case study collections in XAI inspired us; therefore, we propose a similar collection of case studies for human-centered XAI that can provide methodological guidance or inspiration for others. We want to showcase our idea in this workshop by describing three case studies from our research. These case studies are selected to highlight how apparently small differences require a different set of methods and considerations. With this workshop contribution, we would like to engage in a discussion on how such a collection of case studies can provide a methodological guidance and critical reflection.

### A Human-Centered Interpretability Framework Based on Weight of Evidence

David Alvarez Melis, Harmanpreet Kaur, Hal Daume, Hanna Wallach and Jennifer Wortman Vaughan

We take a human-centered approach to interpretable machine learning. First, drawing inspiration from the study of explanation in philosophy, cognitive science, and the social sciences, we propose a list of design principles for machine-generated explanations that are meaningful to humans. We show that these principles can be operationalized through the concept of weight of evidence from information theory, which can be adapted to handle high-dimensional, multi-class settings, yielding a flexible meta-algorithm for generating explanations. We evaluate our method through a qualitative user study with machine learning practitioners, where we observe that the resulting explanations are usable despite some participants struggling with background concepts like prior class probabilities. Finally, we surface design implications for interpretability tools.

### Think like a Human, Act like a Bot: Explaining Instagram’s Automatic Ban Decisions

In a world where we are exposed to more and more automated decisions every day, it's scary to be blocked by a social media algorithm without knowing the reason and possible consequences. Instagram users are often blocked without knowing the exact reasons or having effective ways to change the situation. In this study, we surveyed 14 Instagram users with blocking experiences about their understanding of algorithmic decision making and their need for explanations to understand the situation and interact with the system. Our results show that instead of post-decision explanations, users need meaningful pre-decision information to understand the logic of the decision and take appropriate actions.

### Explainable AI: Intrinsic, Dialogic, and Impact Measures of Success

Rebekah Wegener and Jörg Cassens

This paper presents a brief overview of requirements for development and evaluation of human centred explainable systems. We propose evaluation models that include intrinsic measures, dialogic measures and impact measures. The paper outlines these different perspectives and looks at how the separation might be used for explanation evaluation bench marking and integration into design and development. We propose several avenues for future work.