Conceptual Challenges for Interpretable Machine Learning

2023-11-08

Conceptual Challenges for Interpretable Machine Learning

David S. Watson1

'Department of Statistical Science, University College London, London, UK

Email for correspondence: david.watson@ucl.ac.uk

§0 Abstract

As machine learning has gradually entered into ever more sectors of public and private life, there has been a growing demand for algorithmic explainability. How can we make the predictions of complex statistical models more intelligible to end users? A subdiscipline of computer science known as interpretable machine learning (IML) has emerged to address this urgent question. Numerous influential methods have been proposed, from local linear approximations to rule lists and counterfactuals. In this article, I highlight three conceptual challenges that are largely overlooked by authors in this area. I argue that the vast majority of IML algorithms are plagued by (1) ambiguity with respect to their true target; (2) a disregard for error rates and severe testing; and (3) an emphasis on product over process. Each point is developed at length, drawing on relevant debates in epistemology and philosophy of science. Examples and counterexamples from IML are considered, demonstrating how failure to acknowledge these problems can result in counterintuitive and potentially misleading explanations. Without greater care for the conceptual foundations of IML, future work in this area is doomed to repeat the same mistakes.

Keywords: Artificial intelligence, explainability, causality, pragmatics, severe testing

§1 Introduction

Machine learning (ML) is ubiquitous in modern society. Complex learning algorithms are widely deployed in private industries like finance (Heaton, Polson, & Witte, 2017) and insurance (Lin et al., 2017), as well as public services such as healthcare (Topol, 2019) and education (Peters, 2018). Their prevalence is largely driven by results. ML models outperform humans not just at strategy games like chess (Silver et al., 2018) and StarCraft (Vinyals et al., 2019), but at important scientific tasks like antibiotic discovery (Stokes et al., 2020) and predicting protein structure (Jumper et al., 2021).

High-performance algorithms are often opaque, in the sense that it is difficult or impossible for humans to understand the internal logic behind individual predictions. This raises fundamental issues of trust. How can we be sure a model is right when we have no idea why it predicts the values it does? Accuracy on previous cases may suggest reliability, but epistemol-ogists are well aware that a good track record is no guarantee of future success. Just as inductive inferences can lead us astray when presumptions of uniformity fail, so models can err when deployed in new contexts. This can lead to discriminatory predictions with potentially disastrous consequences in high-stakes settings like healthcare (Obermeyer et al., 2019) and criminal justice (Angwin et al., 2016). European regulators, sensitive to these concerns, have begun introducing explainability guidelines into data protection law, although the proper interpretation of the relevant texts remains a matter of some dispute (Selbst & Powles, 2017; Wachter, Mittelstadt, & Floridi, 2017).

While interpreting models is by no means a new concern in computer science and statistics, it is only in the last few years that a formal subfield has emerged to address the issues surrounding algorithmic opacity. I shall refer to this subdiscipline as interpretable machine learning (IML), also sometimes called explainable artificial intelligence (XAI). I employ the former term because it emphasizes the subjective goal of interpretation over the (purportedly) objective goal of explanation, while simultaneously specifying the focus on ML as opposed to more generic artificial intelligence tasks. IML comprises a diverse collection of technical approaches intended to render statistical predictions more intelligible to humans.1 My focus in this article is primarily on model-agnostic post-hoc methods, which attempt to explain the outputs of some underlying target function without making any assumptions about its form. Such explanations may be global (spanning the entire feature space) or local (applying only to some subregion of the feature space). Both types are considered here.

The last few years have seen considerable advances in IML, several of which will be examined in detail below. Despite this progress, I contend that the field has yet to overcome or even properly acknowledge certain fundamental conceptual obstacles. In this article, I highlight three in particular:

  • (1) Ambiguous fidelity. Everyone agrees that algorithmic explanations must be faithful -but to what exactly? The target model or the data generating process? Failure to appreciate the difference has led to confusing and unproductive debates.

  • (2) Error rate control. The vast majority of IML methods do not even bother to quantify expected error rates. This makes it impossible to subject algorithmic explanations to severe tests, as is required of any scientific hypothesis.

  • (3) Process vs. Product. Current approaches overwhelmingly treat explanations as static deliverables, computed once and for all. In fact, successful explanations are more of a process than a product. They require dynamic, iterative refinements between multiple agents.

A number of other conceptual challenges surrounding IML have already garnered much attention in the literature, especially those pertaining to subtle distinctions between explanations, interpretations, and understanding (Krishnan, 2020; Paez, 2019; Zednik, 2019); the purported trade-off between model accuracy and intelligibility (Rudin, 2019; Zerilli et al.,

  • 2019) ; as well as typologies and genealogies of algorithmic opacity (Burrell, 2016; Creel,

  • 2020) . I have little to add to those debates here, which I believe have been well argued by numerous authors. The challenges I highlight in this article, by contrast, are woefully underexamined despite their obvious methodological import. To make my case, I shall draw upon copious literature from epistemology and philosophy of science to unpack points (1)-(3) and demonstrate their relevance for IML through a number of real and hypothetical examples. While each point raises unique issues, together they point toward a singular conclusion - that despite undeniable technical advances, the conceptual foundations of IML remain underdeveloped. Fortunately, there are glimmers of hope to be found in this burgeoning discourse. I consider exceptions to each trend that collectively suggest a promising horizon of possibility for IML research.

The remainder of this article is structured as follows. I review relevant background material in §2, framing IML as a demand for causal explanations. In §3, I distinguish between two oft-conflated notions of explanatory fidelity, revealing the apparent contradiction to be a simple confusion between complementary levels of abstraction. In §4, I draw on error-statistical considerations to argue that popular IML methods fail to meet minimal severity criteria, making it difficult to judge between competing explanations. I defend a dialogic account of explanation in §5, arguing that satisfactory solutions must include some degree of user interaction and feedback. I conclude in §6 with a review of my findings and some reflections on the role and limits of philosophy as a theoretical guide in critiquing and designing algorithmic explanations.

§2 Background

In this section, I provide necessary background on IML methods, as well as formal details on empirical risk minimization and structural causal models. Building on Woodward (2003)’s minimal theory of explanation, I frame the IML project as a certain sort of causal inquiry. This perspective elucidates the conceptual challenges that follow, as causal reasoning helps to disambiguate targets (§3), identify proper estimands for inference (§4), and ensure fruitful explanatory dialogue (§5).

§2.1 All IML is causal

Say some high-performance supervised learner f has been trained on copious volumes of biomedical data, and diagnoses Jack with rare disease y. Jack’s general practitioner, Dr. Jill, is as perplexed as he is by this unexpected diagnosis. Jack shows no outward symptoms of y and does not match the typical disease profile. Treatment for y is aggressive and potentially dangerous, so Jack wants to be certain before he proceeds. When Jack and Dr. Jill try to find out why /made this prediction, they receive a curt reply from the software company that licenses the technology, informing them that they should accept the diagnosis because / is very accurate. Most commentators would agree that this answer is unsatisfactory. But how exactly should we improve upon it? What is the proper form of explanation in this case?

I shall argue that what Jack and Dr. Jill seek is a causal account of why /made the particular prediction it did. Following the interventionist tradition, I regard an explanation as causal insomuch as it identifies a set of variables which, when set to some values, are sufficient to bring about the outcome in question, and, when set to alternative values, are sufficient to alter the outcome in some prespecified way. Woodward (2003, p. 203) formalizes these criteria, stating that model M provides a causal explanation for outcome Y if and only if:

  • (i) The generalizations described by M are accurate, or at least approximately so, as are the observations Y = y and X = x.

  • (ii) According to M, Y = y under an intervention that sets X = x.

  • (iii) There exists some possible intervention that sets X = x' (where x ^ %'), with M correctly describing the value Y = y' (where y ^ y') that Y would assume under the intervention.

The full details of Woodward’s program are beyond the scope of this article.2 However, his minimal account of explanation is a valuable starting point for analysis. In Jack’s case, we may satisfy these criteria empirically by finding some other patient who is medically similar to Jack but receives a different diagnosis. Alternatively, we could query the model / directly using synthetic data in which we perturb Jack’s input features until we achieve the desired outcome. If, for instance, we devise an input vector x' identical to Jack’s input x except along one dimension - say, decreased heartrate - and the model does not diagnose this hypothetical datapoint with rare disease y, then we may justifiably conclude that heartrate is causally responsible for the original prediction. This kind of explanation constitutes at least one viable explanans for the target explanandum.

Current IML approaches can be roughly grouped into three classes: feature attribution methods, case-based explanations, and rule lists. The latter category poses considerable computational challenges for large datasets, which may explain why the first two are generally more popular. Local linear approximators, a kind of feature attribution technique, are the most widely used approach in IML (Bhatt et al., 2020). Notable instances include local interpretable model-agnostic explanations, aka LIME (Ribeiro, Singh, & Guestrin, 2016); and Shapley additive explanations, aka SHAP (Lundberg & Lee, 2017). Specifics vary, but the goal with these methods is essentially the same - to compute the linear combination of inputs that best explains the decision boundary or regression surface near a point of interest (see Fig. 1). Counterfactual explanations (Wachter, Mittelstadt, & Russell, 2018), which account for predictions via synthetic matching techniques like those described above, are another common approach. Variants of LIME, SHAP, and counterfactual explanations have recently been implemented in open-source algorithmic explainability toolkits distributed by major tech firms such as Google,3 Microsoft,4 and IBM.5 When I speak of “popular IML methods”, I have these algorithms in mind.

Figure 1. A nonlinear functionf(x) (blue curve) is approximated by a linear function L(x) (green curve) at the point x = a. Since L is simpler thanf, it may help users better understand the model’s predictive behavior near the input. Computing such tangents is the basic idea behind local linear approximators like LIME and SHAP.

No matter one’s methodological approach, the central aim of IML is always, more or less explicitly, to answer questions of the form:

Q. Why did model f predict outcome yt as opposed to alternative y[ ^ yt for input vector X[ ?

A global explanation answers Q for each i e [n], while local explanations limit themselves to individual samples. At either resolution, successful answers must satisfy Woodward’s three criteria. Those that fail to do so are unfaithful to their target (i), or else do not provide necessary (iii) or sufficient (ii) conditions for the explanandum.6 This is perhaps most obviously true in the case of rule lists (see, e.g., Ribeiro et al., 2018), which specify sufficient conditions (i.e., causal rules) for certain sorts of model predictions. An explanatory rule list for Jack’s diagnosis may say something like, “If heartrate is decreased, then predict y'” The causal connection is similarly straightforward for feature attribution methods, which attempt to quantify the predictive impact of particular variables. In Jack’s case, it may be that heartrate receives the largest variable importance score because it has the greatest causal effect on model outcomes. Interestingly, the creators of the counterfactual explanation algorithm explicitly motivate their work with reference to Lewis’s theory of causation (1973). According to this view, we causally explain Jack’s prediction by appealing to the nearest possible world in which he receives a different diagnosis. Though there are important differences between this account and the interventionist theory I endorse here, the citation only serves to underscore the reliance of IML on causal frameworks - as well as the ambiguity this reliance can engender.

If the causal foundations of IML are not always clear, perhaps this is because most authors in this area are steeped in a tradition of statistics and computer science that has historically prioritized prediction over explanation (Breiman, 2001; Shmueli, 2010). I will briefly formalize the distinction between supervised learning and causal modelling to pre-empt any potential confusion and ground the following discussion in established theory.

§2.2 Empirical risk minimization and structural causal models

A supervised learning algorithm is a method for predicting outcomes f e Rk based on inputs Xe Rd with minimal error.7 This requires a training dataset of input/output pairs zf = {(xi'Vi)}”=i, where each sample zt represents a draw from some unknown distribution P(Z). An algorithm is associated with a function space T, and the goal is to find the model f e T that minimizes some predetermined loss function L(f,Z), which quantifies the distance between model outputs f(X) = Y and true outcomes Y. Common examples include mean squared error for regression and cross-entropy for classification. The expected value of the loss is the risk, and empirical risk minimization (ERM) is the learning strategy whereby we select whichever model attains the minimal loss within a given function class T. ERM is provably consistent (i.e., guaranteed to converge uniformly upon the best model in T) under two key assumptions (Vapnik & Chervonenkis, 1971): (1) samples are independently and identically distributed (i.i.d.); and (2) T is of bounded complexity.8

The ERM approach provides the theoretical basis for all modern ML techniques, including support vector machines (Scholkopf & Smola, 2017), boosting (Schapire & Freund, 2012), and deep learning (Goodfellow, Bengio, & Courville, 2016).9 As noted in §1, these algorithms have proven incredibly effective at predicting outcomes for complex tasks like image classification and natural language processing. However, critics argue that ERM ignores important structural dependencies between predictors, effectively elevating correlation over causation. The problem is especially acute when variables are confounded. To cite a famous example, researchers trained a neural network to help triage pneumonia patients at Mount Sinai hospital in New York (Caruana et al., 2015). The model was an excellent predictor, easily outperforming all competitors. Upon close inspection, however, the researchers were surprised to discover that the algorithm assigned low probability of death to pneumonia patients with a history of asthma, a well-known risk factor for emergency room patients under acute pulmonary distress. The unexpected association was no simple mistake. Because asthmatics suffering from pneumonia are known to be high risk, doctors quickly send them to the intensive care unit (ICU) for monitoring. The extra attention they receive in the ICU lowers their overall probability of death. This confounding signal obscures a more complex causal picture that ERM is fundamentally incapable of capturing on its own.

Examples like this highlight the importance of interpretable explanations for high-stakes ML predictions such as those commonly found in clinical medicine (Watson et al., 2019). They also demonstrate the dangers of relying on ERM when the i.i.d. assumption fails. The external validity of a given model depends on structural facts about training and test environments (Pearl & Bareinboim, 2014), e.g. the assignment mechanism that dictates which patients are sent to the ICU. If we were to deploy the pneumonia triage algorithm in a new hospital where doctors are not already predisposed to provide extra care for asthma patients - perhaps a clinic where doctors rely exclusively on a high-performance ML model to prioritize treatment - then empirical risk may substantially underestimate the true generalization error. In light of these considerations, a number of prominent authors have advocated for an explicitly causal approach to statistical learning (Pearl, 2000; Peters, Janzing, & Scholkopf, 2017; Spirtes, Glymour, & Scheines, 2000; van der Laan & Rose, 2011). The basic strategy can be elucidated through the formalism of structural causal models (SCMs). A probabilistic SCM .M is a tuple (U, V, F, P(u)), where U is a set of exogenous variables, i.e. unobserved background conditions; V is a set of endogenous variables, i.e. observed features; F is a set of deterministic functions mapping causes to direct effects; and P(u) is a probability distribution over U. An SCM can be visually depicted as a directed graph, where nodes are variables and edges denote direct causal relationships (see Fig. 2). A fully specified M provides a map from background conditions to a joint distribution over observables, M:U ^ P(r).

With SCMs, we can express the effects not just of conditioning on variables, but of intervening on them. In graphical terms, an intervention on a variable effectively deletes all incoming edges, resulting in the submodel Mx. Interventions are formally expressed by Pearl’s (2000) do-operator. The interventional distribution P(F|do(A = 1)) may deviate considerably from the observational distribution P(F |A = 1) within a given M. For instance, if all and only men (Z = 1) take some drug (X = 1), then health outcomes Y could be the result of sex or treatment, since P(K|X = 1) = P(K\Z = 1). However, if we randomly assign treatment to patients independent of their sex, then we may get a very different value for P(y |do(X = 1)), especially if there is a confounding effect between sex and outcomes, for example if men are more likely than women to respond to treatment. Only by breaking the association between X and Z can we disentangle the relevant from the spurious effects. This is the motivating logic behind randomized control trials (RCTs), which are widely used by scientists and regulatory agencies to establish treatment efficacy.10 The do-calculus provides a provably complete set of rules for reasoning about interventions (Shpitser & Pearl, 2008), including criteria for deciding whether and how causal effects can be estimated from observational data.

Figure 2. Simple examples of causal graphs. Solid edges denote observed causal relationships, dashed edges unobserved. (a) A model with confounding between variables X and Y. (b) The same model after intervening on X, thereby eliminating all incoming causal effects.

Though the models we seek to explain with IML tools are typically ERM algorithms, the causal nature of this undertaking arguably demands an SCM approach. The mismatch between these two modelling strategies sets the stage for a number of conceptual problems. Sullivan (2020) argues that algorithmic opacity derives not from any inherent complexity in models or systems per se, but rather from the “link uncertainty” that results when there is little empirical evidence connecting the two levels. Even when such links are well-established, however, it is not always clear which level is the intended target of explanation. Causal reasoning, as formalized by SCMs, can help diagnose and resolve issues of link uncertainty by making the assumptions of any given IML tool more explicit.

§3 Ambiguous fidelity

One obvious desideratum for any IML tool is accuracy. We want explanations that are true, or at least probably approximately correct, to use Valiant’s memorable phrase (1984). This accords with the first of Woodward’s three criteria cited above. In this section, I argue that this uncontroversial goal is underspecified. Though the problem emerges for any IML approach, I will focus here on a longstanding dispute between proponents of marginal and conditional variable importance measures, two popular kinds of feature attribution methods. I show that the debate between these two camps is dissolved (rather than resolved) as soon as we recognize that each kind of measure is faithful to a different target. The question of which should be preferred for a given IML task cannot be answered without taking into account pragmatic information regarding the context, level of abstraction, and purpose of the underlying inquiry.

§3.1 Systems and models

I have argued that IML’s fundamental question Q poses a certain sort of causal problem. However, it is important to note how Q differs from more familiar problems in the natural and social sciences. Toward that end, I briefly review three well-known and interrelated challenges that complicate efforts to infer and quantify causal effects.

The problem of induction. Although commonly associated with Hume (1739, 1748) in the anglophone tradition, inductive skepticism goes back at least as far as Sextus Empiricus (Fl

本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

Conceptual Challenges for Interpretable Machine Learning 的相关文章

随机推荐

  • C++中vector迭代器失效问题及其解决方法

    C 中vector迭代器失效问题及其解决方法 迭代器的主要作用就是让算法能够不用关心底层数据结构 其底层实际就是一个指针 或者是对指针进行了封装 比如 vector的迭代器就是原生态指针T 因此迭代器失效 实际就是迭代器底层对应指针所指向的
  • php socket error 111,php中的socket_connect上的“连接被拒绝”错误

    我试图将一些代码从perl转换为php Perl代码如下所示 my handle Connect port host 我试图使用socket在PHP中做同样的事情 我试过socket create和socket connect socket
  • Windows(10/11)端vscode开发、调试远程Linux(Ubuntu14.04)端c++ 开发环境部署步骤

    1 安装vscode 进入https code visualstudio com 即vsocde官网选择Windows x64版本下载并安装 安装过程中推荐勾选往右键菜单添加通过vscode打开文件夹的选项 2 vsocde插件安装 打开v
  • 《影响力》第七章:稀缺

    稀缺 物以稀为贵 稍纵即逝 越是得不到就越觉得香 这是为啥 至于是不是真的香 由于得不到 也不得而知 例如 我有一个朋友觉得佐佐木希如果能娶来做老婆是很香的 简直就是夫复何求 然而这世上当真有人会把女神娶回家然后出轨还家暴 这说来还真是让人
  • 判断点是否在任意多边形内(java)

    import java util ArrayList public class Test public static void main String args double px 113 0253 double py 23 98049 A
  • 文心一言和讯飞星火全面对比测试:(三)常识问题

    前文回顾 在 一 语言理解能力测试中 我们主要测试了两个大语言模型对复杂语义的理解 对文章情绪的识别 对文章进行摘要总结 对文章进行要素提取 测试结果表明 在语言理解能力上 除了有些问题他拒绝回答之外 讯飞星火的表现明显要好于文心一言 可以
  • 函数和windows对象 有惊喜✔

    一 函数 函数的定义 类似于Java中的方法 是完成特定任务的代码语句块 1 系统函数 eval lt 表达式 gt 得到一个文本框的值 表单 例 var sname eval doucment form sname value parse
  • Java多线程导致CPU占用100%解决及线程池正确关闭方式

    文章目录 前言 一 cpu占用高排查问题 二 解决办法 使用AtomicLong 统计线程是否完成 再执行executor submit 提交新的任务导队列中 三 多线程关闭与令牌限流 前言 情景 1000万表数据导入内存数据库 按分页大小
  • 超详细java web实验环境 (4) Eclipse配置Tomcat配置

    目录 一 确保Tomcat服务器处于关闭状态 二 在Eclipse中配置Tomcat 三 测试Tomcat是否配置完成 四 若出现异常 一 确保Tomcat服务器处于关闭状态 二 在Eclipse中配置Tomcat 1 打开Eclipse
  • C++类与对象:拷贝构造函数&浅拷贝

    标题 拷贝构造函数 默认拷贝构造 应用一 用已存在的类类型对象创建新对象 1 类中不涉及资源管理 可以使用默认拷贝构造函数 2 类中涉及资源管理 应用二 函数参数为类类型 应用三 函数返回值为类类型 拷贝构造函数 目的 为了初始化新对象 同
  • docker入门实践,实战搭建nginx续集,利用Dockerfile制作属于自己的镜像

    前言 在看这一篇之前 可以先回顾一下使用现成的nginx镜像在搭建 https blog csdn net hl java article details 86232900 可以发现 搭建成功后 服务是可以访问的 http localhos
  • Android Studio汉化教程

    先去AS的官网下载一个 AS官网 1 下载汉化包 AS汉化包下载地址 2 找到AS安装目录 然后找到lib文件夹 lib文件夹里面有一个resources en的东东 将该文件复制到桌面 并改名为resources cn 然后用解压工具打开
  • QT之获取布局内容及删除布局

    1 删除布局 图1 QLayout p ui gt verticalLayout 7 gt itemAt 0 gt layout while p gt count QWidget pWidget p gt itemAt 0 gt widge
  • 图像修复(Image Inpainting)任务中常用的掩码数据集

    文章目录 前言 mask数据集分类及介绍 总结 前言 在 Image Inpainting 图像修复 任务中 需要使用掩码数据集在图像上人为添加缺陷区域 以便在设计的深度学习上进行训练学习 mask数据集分类及介绍 目前图像修复任务中最长用
  • @Transactional注解失效场景之——同类中方法调用,事务失效

    文章目录 一 亲身案例 二 改进方式 三 原理分析 该篇博客为总结自己曾写下的Bug 一 亲身案例 当时的场景为 在controller层获取一笔交易单的信息 前台传给controller层为Map类型的键值对 然后controller层直
  • Mysql入门基本认识和工具安装

    Mysql概念 DB 数据库 database 存储数据的 仓库 它保存了一系列有组织的数据 数据库的特点 将数据放在表中 表再放到库中 一个数据库中可以有多张表 每个表都有一个名字用来标识自己 表名具有唯一性 表具有一些特性 这些特性定义
  • MongoDB环境搭建

    文章目录 MongoDB环境搭建 1 下载包 2 安装注意事项 选择custom 设置自定义目录 设置数据库数据存储路径 3 设置服务 MongoDB环境搭建 1 下载包 https www mongodb com download cen
  • android 阿拉伯语下的光标,android – EditText中的双光标,用于输入类型编号/电话(RTL阿拉伯语)...

    我有一个EditText设置为重力右 所以如果语言是阿拉伯语 文本从右边开始 注意 我的应用程序支持RTL 我没有为EditText设置TextDirection 因为它会遇到同样的问题 设置为Right的Gravity可以完美地完成工作
  • 【LeetCode-面试经典150题-day25】

    目录 530 二叉搜索树的最小绝对差 230 二叉搜索树中第K小的元素 98 验证二叉搜索树 530 二叉搜索树的最小绝对差 题意 给你一个二叉搜索树的根节点 root 返回 树中任意两不同节点值之间的最小差值 差值是一个正数 其数值等于两
  • Conceptual Challenges for Interpretable Machine Learning

    Conceptual Challenges for Interpretable Machine Learning David S Watson1 Department of Statistical Science University Co