User interface softbots

This proposal summary contains excerpts that identify the deliverables for our work. Most of the text is gray, to allow for color annotations that indicate the status of specific items.


This proposal focuses on the concept of ibots, interface agents that interact with software applications through the graphical user interface, in essentially the same way that human users do. The proposal will produce the following results:

An environment for agent exploration and evaluation. From an agents perspective, the user interface acts as a surrogate for the real world, providing a simplified, more tractable environment in which interesting problems can still be posed and solved. User interfaces vary, however, in the degree to which they facilitate problem solving. (Contrast word processors with games, for example, which deliberately contain barriers to easy use.) The current ibot substrate makes strong assumptions about the visual structure and dynamic behavior of the interface; we propose to relax these assumptions, to improve the scope and robustness of the substrate. This entails work in three areas:
  • Domains. We propose to build tools to help researchers develop detailed models (``domains'') of the visual interface, to support focused testing of agent sensing and acting capabilities [software release (pattern/object definition interfaces); Ajay: thesis in preparation (PDDL translator).]
  • Control. We propose to extend the substrate to allow a tighter, better integrated interaction between a controller and the user interface; here the goal is to facilitate the evaluation of agents that use different control strategies to solve problems in a complex environment.
  • Evaluation. We propose to combine these efforts with our ongoing work on intelligent data analysis to produce a general evaluation environment for ibot performance [Rob: P:2002e, Penn State: A:1.]
The effectiveness of this work will be demonstrated in the areas of computational cognitive modeling and AI planning. The extended substrate and domain generation tools will allow cognitive models to interact with off-the-shelf applications; the evaluation environment as a whole will aid researchers in testing the ecological validity of their models with respect to real-world problems and environments. For planning researchers, the evaluation environment will act as the equivalent of a flexible, planner-independent testbed, with some important advantages over artificial testbeds: realistic problems, both simple and complex, become available at relatively low cost; problems are of interest both experimentally and practically; planner evaluations, being based on real applications, will have continuing relevance as long as such applications remain in use.

Cognition and tool use in the interface. Our initial work treated ibot/interface interaction as an engineering problem with relatively few constraints. Ensuring that the process remains cognitively faithful is a different, more difficult problem. We propose to rework the current ibot substrate toward the goal of a cognitively plausible architecture for problem solving in the interface, focusing on the phenomenon of tool use. While the issue of agent/environment interaction has received some attention in the agents literature, the more specific problem of characterizing and modeling the use of tools remains largely open. Fortunately, the literature of human and animal cognition considers this issue in some detail; we propose to adopt their conceptual framework, to produce a taxonomy of tools and tool-using behavior [P:2002b, P:2002f, P:2003a; TR:2002a.]

This taxonomy will help us build a cognitive framework, with a strong visual component, for understanding tool use. Drawing on research in cognitive modeling and computational vision, we will develop an end-to-end cognitive model that exhibits visually guided tool-using behavior in the user interface [P:2002a.] (We have extended this goal to consideration of agents in simulated physical environments; see Note 1.)


An environment for agent exploration and evaluation

. . .Our work will be applied to two specific problems: the generation of a testbed for the evaluation of AI planners, and the construction of sensors and effectors to allow computational cognitive models to interact directly with off-the-shelf software.

Domains.

. . .We propose to remedy these problems with a set of tools for defining and evaluating domains. Addressing the first and second problems requires building an interactive object specification tool, one that allows a developer to build symbolic and numerical descriptions of the objects in an application interface, based on existing and newly defined features, and to adjust the representation until it reaches an appropriate level of detail. Addressing the third and fourth problems entails internal changes to the image processing module to allow for user input in object identification [software release (pattern/object definition interfaces); Ajay: thesis in preparation (PDDL translator).] The issue here is management of the SegMan's object recognition knowledge base, such that it can be extended without disproportionate effort.

Control.

SegMan is effective largely because its design can exploit the strong correspondences between planning assumptions and restrictions on the dynamic behavior of the user interface. Thus if a controller requires that the environment be discrete, deterministic, static, accessible, and so forth, as many theoretically motivated planners do, this does not impose an impossible burden on SegMan; most applications provide such an environment most of the time. These assumptions, however, do not hold universally. An interactive application may break a general guideline for many reasons. User interfaces often have variable response times; this can result in non-deterministic behavior for a controller with a too-short execution cycle. Exogenous events sometimes occur, in the form of notifications that new mail has arrived or a transient system error has been detected. State information, such as whether the paste buffer holds data, may be not be directly accessible. (Even worse, some dynamic applications, such as games, work in direct opposition to many of the most common design guidelines for business and productivity applications, but these are beyond the scope of our proposed work.)

We propose to modularize the controller interface so that its internal interaction strategies can be activated and deactivated on demand, depending on the capabilities of a given controller. [Sameer: thesis in preparation.] This will allow for the selective relaxing of assumptions a controller makes about its environment. In addition, rather than following an ad hoc development process, we further propose to adopt a general modeling approach to representing interaction with the interface, in which restrictions on the environment are not assumptions, but rather parameters of the model. We propose to build an MDP-based representation of agent/user interface interaction, to support a clear understanding of the relationship between a controller and SegMan's controller interface. (See Note 2. Nevertheless we have made some progress toward a more restricted model for a related problem: identifying efficient mappings between interface controls and low-level user actions across platforms [Clarence: P:2003b].) This work will add a new chapter to research on formal models of human-computer interaction, as well as extending past work on the use of Markov models to describe user interaction in dynamic environments.

Evaluation.

Our proposed extensions to Aide involve two components:

A planning testbed.

Our proposed work capitalizes on recent movement in the planning community toward common standards for domain and plan representation. Extensive use over a long period of time by planning researchers would be the most most compelling evidence for the value of such a testbed. Nevertheless, in the shorter term informal evaluation methods can give us important feedback as to its effectiveness. Briefly, our plan involves the following:

An interface execution system for cognitive models.

Our proposed work will allow researchers to automatically develop realistic input scenarios, to evaluate the ecological validity of models with respect to real-world applications, and in general to treat cognitive modeling as a tool for user interface exploration, expanding the current boundaries of experimental practice.


Interface agent modeling

A taxonomy of tool use in the interface.

Our goal will be to develop a conceptual framework based on these informal (and even conflicting) characterizations, in which we can describe and differentiate specific agent activities in the interface as examples of tool use. More specifically, we propose to construct a taxonomy to describe tool-related behavior in the user interface [P:2002b, P:2002f, P:2003a.]

. . .Our work will flesh out this brief description to cover a much broader range of activities in the user interface. With widespread consistency in interface controls and their functionality, we believe it is possible to build a relatively comprehensive taxonomy along these lines. This will provide the groundwork for a more difficult task: building a computational model that can represent and reproduce such tool use in the interface.

Cognitive models and tool use in the interface.

We see a natural correspondence between the task-oriented properties of this vision model and the interaction requirements for intelligent tool use. We propose to build a new component in SegMan to replicate the functionality of the current image processing module. This new component will constitute a cognitively plausible vision model, based on the high-level vision concerns briefly laid out above [Kunal: thesis in preparation.] (A spin-off from this effort has had implications for human-robot interaction: [P:2003c.]) The modular structure of the current SegMan will facilitate development; we expect that visual routines, for example, can be constructed from elemental operators based on simple combinations of the existing interpretation rules. One advantage we have over previous work is that SegMan supports the development of an end-to-end model of vision, from early vision processes to high-level vision. The vision models associated with current unified cognitive models mainly address higher-level processing and thus will have limited fidelity in this situation; we will have the chance to explore cognitive processing dependence on lower-level vision results. A visual routines approach is also attractive in that it supports what Chapman calls visually guided activity. We believe this will be key to a model of effective tool use.

In addition to building a new model of vision in the SegMan substrate, we also propose to develop a controller that implements a cognitive model of tool use, relying on visual processing to guide its behavior [P:2002a; Thomas: thesis in in preparation (tool simulation); Ergun: thesis in preparation (common-sense reasoning for tool use).] (See Note 4.) This model will need to accommodate a number of novel influences, including task context, work practice, and long term behavioral policies. It will also require the ability to reason about ecological relationships between effectors, goals, and tools, at a level of detail deeper than usually considered in cognitive modeling (or agent planning) work. We expect to draw on current work with unified cognitive models and their close relatives, task analysis models, plus MDP modeling. . . Research in all of these areas addresses important issues for our work, in particular the relationship between sensing and acting.


Notes

Note 1: Our interest in tool use has extended beyond the user interface to representations of tool use in the real world. A technical report [TR:2002a] gives a relatively detailed overview of our current thinking. Ideally, we would like to build a detailed simulation of a physical environment in which a simulated robotic agent can learn how to use tools.

Note 2: We have attempted to develop a more sophisticated controller interface, and we have experimented more extensively with the current system. We have found that there is insufficient variability in standard user interfaces to justify a probabilistic model of user interaction. Behavior can be largely deterministic, with fixups only rarely needed. We have considered an MDP-based model for more dynamic interfaces, such as the driving game, but a move in this direction means that we abandon almost all of the "facilitating" properties we associate with user interfaces; it becomes equivalent to a much-simplified vision and robotics domain.

Note 3: Ajay's preliminary testing has shown that so-called "primitive-action" planning techniques [Wilkins and desJardin, AI Magazine, 2001] are not sufficiently powerful to reason about the hundreds of objects visible on the screen at one time. This is not set in stone, however; we are continuing to test different planners. An obvious solution to this potential problem, the addition of a focusing mechanism, puts too much of the responsibility of planning on an external system; what's left would not be of interest in planning research. We might try to develop novel planning techniques, possibly a knowledge-based planner, but this is too far beyond the scope of the proposed work. Instead, we are pursuing these alternatives:

Note 4: We have extended the scope of this area of our work. Thomas's HabilisDraw work supports a relatively high-level analysis of tool use, but of course has significant differences from physical tools. With Thomas and Ergun's new efforts to build a physical simulation system in which an agent can reason about the properties of simulated physical tools, we should be able to make much stronger connections.


Publications