Towards a human-inspired control architecture for visually-guided action

Lead Research Organisation: University of Birmingham
Department Name: School of Psychology

Abstract

Despite the increasing sophistication of robotic systems, humans remain superior to robots in an impressive number of ways - not least when it comes to acting in natural environments. The proposed project aims at understanding a set of basic processes that are crucial for such interactions, with this understanding tested through the development of a computer model that is evaluated relative to human behaviour. The process to be modelled concern our ability to shape an action to a task-relevant object present in a complex (multi-object) environment. The proposed model is based on the premise that successful human actions depend on two forms of information: the abstract function of the objects (this is something to cut with) and the visual properties of the objects that are associated with action (that the handle but not the blade can be grasped). Psychological evidence also indicates that the intended action for a given task can influence how human attention is directed. Selection of the object to act upon is not separated from the selection of which action to make. To capture this, the selection of objects and the selection of actions will be modelled as interactive process.Prior work by the applicant provides a promising backdrop to the project. In particular, a model of object selection has been generated (the SAIM model) which has been successfully applied to a large body of psychological data. A second model has simulated how actions are select to single objects (the NAM model). In this project the models will be merged together, to simulate interactions between perceptual selection and action. The model will be realised using a neural-like architecture, and tested in comparison to human data. In the longer term, the model will provide an architecture for real-time control of a robot arm operating in a complex environment

Publications

10 25 50
 
Description Contemporary approaches to visually-guided robotics typically follow a two-stage approach: First the problem of object recognition is addressed. Second, and independent of the perceptual process, actions are selected and parameterized. However, recent psychological evidence shows that human actions are not only guided by such an indirect processing route, but also are directly guided by visual information (affordance-based processing). In previous work we developed a computational model of this dual route architecture (Naming and Action model (NAM)).

However, NAM's functionality is very limited. Especially NAM lacks the ability to generate action parameters, e.g. target locations for movements. In order to add this functionality to NAM we developed the Selective Attention for Action model (SAAM). For SAAM we chose as prototypical action parameter contact points on flat objects produced when humans aim to perform a transporting action. SAAM generates these contact points with the help of a soft-constraint satisfaction approach in a connectionist framework. The first set of constraints takes into account the anatomy of the hand, e. g. maximal possible distances between fingers. The second set of constraints (geometrical constraints) considers suitable contact points on objects by using simple edge detectors. The third set of constraints ensures that the model generates only one location per finger. Interestingly, the third constraint enables SAAM to generate contact points on only one object in scenes with multiple objects. This result mimics attentional behaviour which is guided by action-relevant properties of objects (hence, the name of the model). Interestingly, this type of attentional guidance has been found in recent experimental studies with humans. Also SAAM's reaction times (the time it takes to determine the contact points) vary depending on the visual input. Reaction times are shorter when the object is easy to grasp compared to when the object affords a more difficult grasp. Finally, it is important to note that SAAM can be adapted to generate grasp responses for a robot hand by implementing the anatomy of a robot hand in the second constraint.



To verify SAAM's simulation results we designed a novel experimental procedure. In this procedure we place an object in front of participants and ask them to lift the object and hold it in front of a camera to document the participants' grasp. We were able to show that human grasps match the contact points generated by SAAM. Moreover, we measured the participants' reaction times (the time it takes participants to begin reaching for the object). The results indicate that the easiness of grasping objects, e.g. a bar vs. a pyramid, influences reaction times, in line with the model's predictions. Hence, SAAM is a viable model of how humans calculate contact points for stable grasps on flat objects.

Finally, it should be noted that the network architecture of SAAM was inspired by my work on visual attention. In this line of work I examined the role of attention in different tasks: visual search in hierarchical patterns; visual search across time and space; and translation-invariant object identification in multiple object scenes. The corresponding connectionist models consist of a combination of excitatory and inhibitory connections. Interestingly, even though these models realize seemingly very different selection mechanisms, e.g. selecting objects vs. selecting contact points, the only major difference between the network architectures lies in the topology of the excitatory connections. These connections operate as "functional wiring" implementing the task at hand. Hence the architecture behind these models may represent a general neural processing principle that can be employed in a variety of tasks. Future research, e.g. the analysis of neural topologies in the brain, needs to collects supporting evidence for this speculation.





Another interesting implication of our work results from the fact that SAAM's architecture is very similar to an earlier model developed in my lab, the Selective Attention for Identification model (SAIM, Heinke & Humphreys, 2003). SAIM implements translation-invariant object recognition in multiple object scenes. SAIM can model a wide range of experimental evidence on attention and its disorders.

In contrast to SAAM SAIM selects locations in one object. However, like in SAAM, SAIM employs a soft-constraint satisfaction to implement its operation. The constraints are: bright locations (a simple model assumption), a spatial cluster of bright locations and selection of only one of such cluster of locations. Interestingly the selection constraints in the two networks are implemented by the same inhibitory connections. And the two "functional" constraints, hand anatomy and spatial clusters, are implemented with excitatory connections. Only, in terms of these excitatory connections the two networks differ. Hence, the architecture of SAAM and SAIM is the same and only in the exact way the excitatory connections are wired the very different functionality of the two networks is implemented. Hence, it can be speculate whether the architecture of SAAM and SAIM represent a general principle of neural information processing. Future research needs to establish whether this is the case, e.g. by analysing neural pathways.
Exploitation Route The architecture developed in this project may lead to novel ways of controlling robots. Collaborative projects with robotics experts at the University of Birmingham are currently discussed.
URL http://www.comp-psych.bham.ac.uk/projects.php