Self-Explaining Cognitive Agents

Aim: to develop explanation facilities that support explaining the behavior of an agent.

Self-explanation is useful for example for explaining events to human players in serious gaming and for debugging an agent program.

The agent programming paradigm has provided the tools for implementing cognitive agents that derive their choice of action from their beliefs and goals. An important observation is that beliefs and goals are also used by humans to explain their actions. The question thus arises whether the cognitive agents paradigm also provides a paradigm for self-explaining software. That is, are cognitive software agents able to explain their own behaviour.


The application of cognitive agents that are able to explain their own behaviour is diverse and ranges from virtual training environments to debugging of agent programs. Agents that can explain their behaviour are claimed to enhance the training and learning in virtual training environments. Explanation serves as a basis for understanding and learning (Lombrozo, 2006Lombrozo et al., 2008). For example, agent programmers that can ask a software agent why it performed a particular action may gain insight in the logic of an agent program or be able to identify bugs more easily. Interestingly, explanation facilities for the famous Mycin expert system were considered useful primarily as a resource to support developers debugging system reasoning (Haynes et al, 2009).

Theories of Explanation

Explanations promote understanding (Lombrozo, 2006). Understanding the process of explaining raises a number of fundamental questions:

  • What constitutes an explanation?
  • What makes some explanations better than others?
  • How are explanations generated?
  • When are explanations sought for?

Attribution Theory

Attribution theory is concerned with how individuals interpret events and how this relates to their thinking and behavior. According to Heider (1958), a person can make two types of attribution:

  • Internal attribution: the inference that a person is behaving in a certain way because of something about the person, such as attitude, character or personality;
  • External attribution: the inference that a person is behaving a certain way because of something about the situation he or she is in.

Attributions are also significantly driven by emotional and motivational drives. People will also tend to ascribe less variability to other people than oneselve, seeing oneselve as more multifaceted and less predictable than others.

Harold Kelley (1967) developed the covariational theory of attribution based on the covariational principle. This principle says that people attribute behavior to the factors that are present when a behavior occurs and absent when it does not. According to Kelley, people make attribution decisions based on consensus information (how do other people behave in similar situations), distinctiveness information (how different stimuli result in different behavior), and consistency information (how frequent is the same behavior observed in various situations).

One of the interesting observations of attribution theory is that attributions are often biased. More specifically, "our perceptions of causality are often distorted by our needs and certain cognitive biases" (Heider, 1958). Several biases have been described in the literature:

  • the fundamental attribution error: the tendency to over-value dispositional (personality, character and ability) explanations for the observed behaviors of others while under-valuing situational explanations for those behaviors.
  • the spotlight effect error: the tendency of an individual to overestimate the extent to which others are paying attention to the individual's appearance and behavior.
  • actor/observer asymmetry: tendency to attribute other people’s behaviors to their dispositional factors while attributing own actions to situational factors.
  • self-serving bias: tendency to attribute dispositional and internal factors for success and external, uncontrollable factors for failure.

It appears that some biases are made more in individualist cultures than in collectivist cultures. Individualist cultures value individuals more than groups whereas this value preference is reversed in collectivist cultures. Individualist cultures tend to attribute a person’s behavior to his internal factors whereas collectivist cultures tend to attribute a person’s behavior to his external factors. People from individualist cultures are more inclined to make fundamental-attribution error and the self-serving bias. People from collectivist cultures are more inclined to make the self-effacing bias, which is: attributing success to external factors and blaming failure on internal factors.

Explaining Intentional Behaviour

Attribution theory does not distinguish between intentional and unintentional behaviour but claims that all behaviour by citing causes that are either internal or external to the agent ("personal" and "situational" causes). Malle, F. 2001 argues that attribution theory simplifies the conceptual structure and social functions of folk behaviour explanations. Any theory of explanation should account for the fundamental difference between intentional and unintentional behaviour. Dennett (1987) argued that one can take three different stances to explain any event: the physical, the design, and the intentional stance. According to Dennett, there is no principled reason for choosing between these different stances but only pragmatic reasons for selecting one rather than another. However, the dominant strategy for explaining human behaviour is the intentional stance, which explains behaviour mainly in terms of the beliefs and goals that the actor has.

According to Malle, F. 2001, the minimal conditions for classifying an action as being intentional are that the actor:

  • desired the outcome of the action,
  • believed the action produces the outcome,
  • intended to perform the action,
  • has the skill to perform the action, and
  • was aware of all of the above and that the action fulfills the intention.

Malle, F. 2001 distinguishes three explanation modes of intentional behavior: reasonscausal history of reasons (CHR), and enabling factors (Malle, F. 2001). Reasons refer to beliefs and goals, causal history of reasons refers to the origin of beliefs and goals, and enabling factors refer to e.g. the abilities of the actor. These different modes of explanation can be reliably distinguished when coding naturally occuring explanations of folk behaviour. We now briefly discuss the three modes that Malle distinguishes for explaining intentional behaviour.

Enabling Factors

The "enabling factors" mode explains intentional behaviour by means of the skills of the agent. According to Malle, F. 2001, folk psychology only recognizes skills as a necessary enabling factor. That is, an actor can only perform an action intentionally if that action is performed with skill rather than luck. Explanations of actions may also refer to how the skill was acquired (e.g. by citing the actor has been practicing the skill all week or the actor has an occupation that requires extensive training of the skill). Enabling factors explain how it was possible that the actor performed the action (intentionally) and clarify the performance rather than the motivation for intending an action. Such factors are often cited when an action is difficult to perform. Other enabling factors include persistence, opportunities, and removed obstacles but these factors do not explain why the action was performed intentionally.

Summarizing, enabling factor explanations of intentional action cite among others:

  • the skill of the actor explicitly, e.g. she made a delicious cake because she knows how to bake.
  • training of the skill, e.g. she completed the exam within an hour because she has studied hard the last few weeks.
  • a reputation or an occupation of the actor, e.g. she made an amazing dinner because she is a chef.


Reason explanations are the most frequently (80% of the time) used explanation mode Malle, F. 2001. Reasons are representational, mental states such as beliefs, desires, valuings that may lead to an intention to perform an action. Mental states only count as reasons if they played a role in the agent's reasoning toward forming an intention to act. Reasons thus can only be provided by taking the agent's subjective point of view. This implies that the agent that is being explained is assumed to be at least minimally aware of these reasons herself. Besides awareness reason explanations need to satisfy basic rationality assumptions.

Malle calls the belief that an action leads to an outcome and the desire for the outcome of the action that are required for judging an action to be intentional minimal reasons. Explanations of actions based on reasons can cite these minimal reasons but do not need to do so. Instead of pointing to the desire for the outcome of the action and/or a belief that the action will bring about the outcome, other reasons may be cited. Examples include desires for avoiding alternative outcomes, beliefs about the context in which the action is performed, beliefs about consequences, and valuings of the action itself (e.g. helping someone is good). Valuings are primarily used to indicate the inherent desirability of an action, from an agent's subjective point of view.

Summarizing, Malle, F. 2001 distinguishes three main types of reason explanations:

  • explanations that cite beliefs,
  • explanations that cite desires, and
  • explanations that cite valuings.

A key issue is how people select the reasons they cite in explanations from the possibly many alternatives that are available. For example, people often cite only a belief or a desire to explain an action. But how do they determine when to cite one or the other? Partly this selection is determined by the explainer's knowledge, the explainer's assumptions about the audience's knowledge, or by a desire to present the actor as being rational or moral. Interestingly, self-explanations of actors cite belief reasons to make the agent appear rational whereas observers tend to use desire reasons more than belief reasons.

Note that the intention to perform an action does not explain itself why the action is performed. Intentions are important, however, as they describe what the agent is intending to do. For intentional actions, the description of the action must present the action from the agent's perspective. Other descriptions of the same action may take a different perspective, as in e.g. the description that someone forgot to take his keys.

Causal History of Reasons

Causal history of reason explanations describe the context, background, or origin of reasons. Dispositions or personality traits provide one particular example of a CHR explanation. For example, someone being friendly may lead that person to invite someone else to her home. The situational context may also produce CHR explanations as in, for example, the explanation that staying home induced someone to watch television.

CHR explanations are provided more by observers than by actors, possibly because of a lack of knowledge of observers about other's reasons. CHR explanations may also be cited as a way to parsimoniously explain a series of intentional behaviors (as, for example, going to the supermarket often may be explained by the fact that the actor has three children). CHR explanations may sometimes provide more informative explanations, e.g. when the reasons for doing an action are obvious. CHR explanations also can be used to downplay the agent's reasoning process, and are, for example, used more when explaining actions that are not high-valued.

Coding Scheme for Folk Explanations of Behaviour

A very detailed coding scheme created by Malle for classifying folk explanations of behaviour is available here.


TODO: Discuss work of Dretske.

The User

A user in this context is a person who requests an explanation. The aim of providing an explanation is to provide a user with an understanding of what has happened. An adequate explanation takes the context and the userthat asks for the explanation into account. Explanations accommodate novel information in the  context of prior beliefs (Lombrozo, 2006). The user that asks for an explanation should be provided with relevant information he is not already aware of. At the same time people usually need remarkably little information for a satisfying explanation (Keil, 2006). Moreover, people may prefer one explanation over another but are frequently not be able to explain why (Kozhevnikov and Hegarty, 2001).

This poses several challenges for a software agent. One of the biggest challenges is that the software agent must be aware of what the user knows and does not yet know. An explanation should fill the gaps in the user's knowledge but should avoid providing information already available to the user. This means that a self-explaining agent needs to maintain a model of the user.

The user might be another software agent. Work by Su et al., 2003 proposes an approach where agents provide explanations among themselves.

Explanation in Artificial Intelligence

Explanation facilities for intelligent systems was first introduced in the field of expert systems in Artificial Intelligence. Expert systems provide advice and expert systems that support explanation are able to explain the advice provided. Different types of explanation in this context are:

  • trace explanations provide an explanation by means of the steps that lead to the advice.
  • justification explanations provide reasons for the advice.
  • strategy explanations explain the strategy used by the expert system to derive the advice.
  • terminological clarifications provide explanations of the concepts used in the advice.

TODO: Swartout et al, 1991

TODO: Lacave and Diez, 2004

Dhaliwal and Benbasat, 1996 discusses the use and effects of knowledge based system explanations. The paper also provides a framework for empirically evaluating the effects of explanation with respect to four categories: explanation use behavior, learning, perceptions, and judgmental decision making.

Explaining Agent Behaviour

We focus here in particular on explanations of the behaviour of agents, i.e. on cognitive agents that are able to self-explain their actions. This focus is particularly relevant in the context of e.g. virtual training environments (cf. Harbers, 2011) and in the context of debugging agent programs.

TODO: Johnson 1994

TODO: Tambe 1996

TODO: Gregor et al, 1999

TODO: Kaminka et al, 2001

TODO: Gomboc, 2005

TODO: Van Lent et al, 2004,Core et al, 2006

Taylor et al, 2006 present an extension of the Visualization Toolkit for Agents (VISTA). The authors recognize that different kinds and forms of explanations are useful for different tasks and users. They focus in particular on the following questions:

  • Causal Antecedent: Asks about states or events that have caused the current state,
  • Goal Orientation: Asks about motives or goals behind an action,
  • Enablement: Asks about a causal relationship between an activity and physical or social enablers,
  • Expectational: Asks about the causal antecedent of an event that did not occur,
  • Concept Completion: Who, what, where, and when questions, asking for the completion of a concept,
  • Quantification: Asks for an amount, and
  • Feature Specification: Asks about some property of a subject.

Strictly speaking, only the first four questions are requests for an explanation of agent behaviour.

In order to address the questions listed, an explanatory facility needs to have access to the right kind of knowledge. Taylor et al, 2006 recognize that the trace behaviour that typically is readily available in agent systems will not provide sufficient information to answer all of the questions. On top of the (i) behavioural trace data, they identify four more knowledge sources: (ii) agent design rationale (the rationale that went into the design of the agent performing the task), (iii) domain knowledge (background knowledge about objects and relationships in the domain), (iv) display ontology (knowledge about the display and information available to the user), and (v) explanation knowledge (knowledge about how to develop and present explanations).

Taylor et al, 2006 propose an architecture that introduces an explanation agent that is distinct from the agent whose behaviour needs to be explained. The main reason for doing so is to ensure generality. This choice avoids the need for any particular assumptions about the agent's architecture. Of course, by adding an explanatory facility outside the agent we also do not obtain a self-explaining cognitive agent. Another disadvantage is that the additional knowledge sources that need to be provided external to the agent itself may more or less duplicate the information in the total system. For example, domain knowledge will also need to be represented internally by the agent itself to be able to operate effectively. It is moreover difficult to see how e.g. Goal Orientation questions can be answered by VISTA without introducing additional assumptions about the information that is available in behavioural trace data.

Haynes et al, 2009 provides a design that can be reused to create intelligent agents capable of explaining themselves. This design includes ways to provide ontological, mechanistic, and operational explanations. Ontological explanations provide conceptual clarification, which appears to be very similar to the terminological clarification provided in expert systems. These explanations try to answer "what" explanation-seeking questions, which are categorized in identity, definition, relation, and event questions. Mechanistic explanations refer to the causes and consequences of events to explain why an event occurred and what followed as a result of an event. This type of explanation seems loosely related to the trace explanations of expert systems. Finally, operatial explanations refer to the goals that motivated an agent to perform an action and the instrumental and procedural means (e.g. plans) for realizing goals. This mode of explanation seems most related to the justification explanations of expert systems. Haynes et al, 2009 provide design patterns for implementing explanation facilities for each of the types of explanation discussed.

Most of the questions provided as examples in the paper, however, suggest that explanations are requested about how to interpret and use the virtual environment more than about the behaviour of agents. Such questions do not ask for explanations of the agents' behaviour in the environment but rather concern user interaction with the environment itself. Operational explanations, for example, are similar to end-user documentation. The example system discussed, which is used to illustrate the various explanation components, also suggest that the explanation facilities are most useful for agent programmers rather than end users. Haynes et al, 2009, however, also suggest that explanation for intelligent agents may be different from that of more traditional productivity software systems. They conclude that "it may be the case that providing information about how an agent works may be more important than information related to how the agent is used."

A complicating issue that Haynes et al, 2009 identify is that a user may request explanation on the virtual environment and it may be difficult to answer questions about the virtual environment itself as this may involve access to knowledge that is not present in the agent itself either. Another issue that Haynes et al, 2009 identify concerns the computational overhead that explanation facilities (e.g. ones that trace agent behaviour over time) generate. Explanation components may create performance problems and further research is required to ensure that the costs of explanation facilities are justified.

Design of Explanation Facilities

Several challenges must be addressed when designing an explanation facility that is able to explain the agent's behaviour in a virtual environment to a user:

  • Identify the explanation requirements of the user(s). Which explanations are sought by a user in the given application context?
  • Design of appropriate form and content of explanations. Which content provides a relevant explanation in the given context? How is this content best presented to a user? This design challenge is closely related to the previous challenge.
  • Design for explanation in context. Explanations of an agent's behaviour ideally relate action choices to the virtual environment in which the agent operates. This may require combining information about the agent's and the environment's state.
  • Design of a user model. To provide adequate explanations, a user model needs to be maintained that can be accessed by the explanation facility in order to select a mode of explanation (cf. Malle, F. 2001; a user model also is needed to select e.g. a particular reason when a reason explanation of an intentional action is provided).
  • Performance. Adding explanation facilities to an agent system will create performance overhead. In real-time systems this overhead may affect the behaviour of an agent as agents will need more time to decide on what to do next. Ideally, however, an agent system can be run with and without such facilities while producing the same or similar behaviour.

Research in the early nineties (Swartout, 1991) revealed that much of the knowledge needed to explain an intelligent system was not explicitly present in the system itself but rather was part of the designer's knowledge used to develop the software system. In response, it was argued that explanations of an expert system should be made by referencing the design rationale underlying the system’s architecture. Design rationale refers to the entire design space explored by a development team including all of the design questions identified, the alternatives considered in response to these questions, and the criteria used to select a solution from these alternatives (Haynes et al, 2009).

More concretely, applied to explanations of the behaviour of agents, the agent system itself (ideally) will be able to provide explanations why action choices were made by an agent, but may not be able to access knowledge that is needed to explain why not questions. Why not questions "require, for example, that the explanation knowledge base include not only comprehensive information about what is in the agent and what behaviors it can perform, but also the rationale for why alternative structures and behaviors were not selected. Even in simpler cases [...], answering pragmatic queries requires examining the branching behavior of an agent and the situation that caused a particular branch to be followed." (Haynes et al, 2009).

Evaluating the Effectiveness of Explanation Facilities

TODO: Lim et al, 2009


  • Heider, F. 1958. The psychology of interpersonal relations. New York: John Wiley & Sons.
  • Kelley, H. H. 1973. The process of causal attribution. American psychologist, 28(2), 107-128.
  • Malle, B.F. 2004. How the Mind Explains Behavior: Folk Explanations, Meaning, and Social Interaction. MIT Press.