A dynamic interactive account of human visual action understanding

Lead Research Organisation: Bangor University
Department Name: Sch of Psychology

Abstract

Daily life is filled with encounters with other people. Normally, we quickly and effortlessly understand the meaning of the actions they perform -- a remarkable human capacity. Doing so is key to normal social life, because those actions provide vital clues about others' intentions, beliefs, and personalities. For example, on seeing a family member chopping vegetables in the kitchen, we know that he intends to cook a meal; seeing a friend opening an umbrella suggests that she believes it will soon rain; and observing a stranger make a donation at a shop entrance indicates that she may be an empathetic person. How action understanding is so readily achieved remains poorly understood.

Our project offers a novel view of human action understanding as arising from interactions of two mental processes. Perceptual systems gather evidence about the actions we see, extracting the objects, movements, body postures, and scene context that make up an action. Returning to the previous cooking example, these systems would locate and identify the knife, cutting board, vegetables, and other objects; compute the posture of the cook, his grasp of the knife, and its up-and-down movements; and describe the layout of the scene and identify it as a kitchen.

Evidence from these perceptual systems interacts with a mental library of "action frames", each of which captures the typical roles, relationships, and reasons that comprise an action. For example, an action frame for "cooking" captures our knowledge that this generally involves the manipulation of food ingredients, with certain tools and movements, with the goal to transform them into an edible result, all of which typically takes place in a kitchen setting. Action frames also express some of our (normally unconscious) knowledge about probabilities related to actions. For example, we know that chopping motions are more likely to occur with a knife than a spoon; stirring often occurs in cooking but also in painting; and the kinds of actions that occur in a kitchen tend not to overlap with those typically seen in a garage. Action understanding arises when the activity of the perceptual systems and the action frames converges on a consistent interpretation, in which the key roles of the action frame are filled, and competing, less-likely action frames are excluded.

We plan to test this framework in two ways. First, we have designed simple action-related tasks that will require judgments from human adult volunteers - such as to notice whether two actions that are shown one after the other are the same or not; or to judge whether a written label is the right or wrong one to describe an action picture. These tests are grouped under three broad themes. In brief, they examine: 1) the impact of degraded perceptual information on action understanding; 2) how expectations affect the efficiency of action understanding; and 3) how action frames "fill in" aspects of actions that we don't actually see. A fourth cross-cutting theme assesses how mental "load" (e.g. visual distractions or juggling multiple mental tasks) impacts action understanding. Our second approach is to model each of these tasks in detail with simple but powerful "neural network" computer models. These allow us to frame our predictions in a precise, quantitative way, and to make new predictions about how action understanding behaviour will unfold.

With this combined approach, we hope to demonstrate how our framework explains at least some of the human ability to understand others' actions efficiently. We see potential for this framework to inform future research in child development, group dynamics, social learning, artificial vision or other disciplines with a stake in how human observers understand the meaning and learning opportunities behind others' actions. We propose to assemble an international Consortium of interested researchers from these and related disciplines, to accelerate those potential impacts.

Publications

10 25 50