📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Learning Unconstrained Human Pose Estimation from Low-cost Approximate Annotation

Lead Research Organisation: University of Leeds
Department Name: Sch of Computing

Abstract

This research is in the area of computer vision - making computers which can understand what is happening in photographs and video. As humans we are fascinated by other humans, and capture endless images of their activities, for example photographs of our family on holiday, video of sports events or CCTV footage of people in a town center. A computer capable of understanding what people are doing in such images would be able to do many jobs for us, for example finding photos of our child waving, fast forwarding to a goal in a football game, or spotting when someone starts a fight in the street. A fundamental task in achieving such aims is to get the computer to understand a person's pose - how are they standing, is their arm raised, where are they pointing? This pose estimation problem is easy for humans but very difficult for computers because people vary so much in their pose, their body shape and the clothing they wear.Much work has tried to solve this problem, and works well in particular settings for example where people wear a special suit with markers to help find the limbs, but does not work for real-world pictures because it uses simple stick man models of humans. We will investigate better models of how humans look by teaching the computer by showing it many example pictures. This approach of learning from pictures instead of building models by hand is showing great progress, but needs example pictures where the pose has been marked or annotated by a human annotator. Because annotating pictures is slow and tiresome current methods make do with a few hundred pictures and this isn't enough to learn all the ways a human can appear. We will overcome this problem by annotating pictures only roughly in a way which is very fast so we can annotate lots of pictures with low cost. We will then develop methods where the computer can learn from this rough annotation, working out what the corresponding exact annotation would be by combining many pictures and information we already know such as how the human body is put together.By having lots of images to learn from, and methods for making use of rough annotation, we will be able to make stronger models of how humans look as they change their pose. This will lead to pose estimation methods which work better in the real world and contribute to longer-term aims in understanding human activity from photographs and video.

Planned Impact

Impact beyond academia will initially be to commercial organizations involved in computer vision research and provision of vision-based technology. There are a number of potential applications including image and video indexing, sports, computer games and security or care in the home. Results would be of economic advantage to companies engaged in these activities, and would benefit end-users in ways including providing easier access to imagery and enriched experience interacting with photos and video or computer games. In the security domain, better understanding of people in imagery could improve wide area tracking, automated monitoring of the environment and searching CCTV footage for events of interest. This is an area of significant and growing economic importance. In addition to economic impact through the commercialization of results by technology companies, there would be secondary impact in reduction of commercial and public costs, for example by reduction in security staffing costs enabled by better automated monitoring of CCTV. Within the period of the project, the key impact will be in terms of technological advance and improvements in robustness of methods. The use of a large and varied dataset (WP1) will ensure generality of results such that they can be transferred to a wide range of applications. Development of core technologies in the project would contribute to full solutions at an industrial level of robustness and computational efficiency in the medium term following completion of the project. Interested parties outside academia will be made aware of our research through presentation and demo sessions at the main conferences, where industry already has a strong presence. We will also present results at the BMVA technical meeting series which has strong industry participation. Data, code and results will be distributed on the web (WP3), as has been done for previous work by the PI, to ensure easy availability to the international community. The university press office will assist with dissemination of results to the wider public. Previous research by the PI has been exploited by industry, and has been reported by the popular science press. The PI has some experience of presenting research to industry and the general public. Training in communication, media and knowledge transfer skills will be available to the RA through the university's staff development unit at no cost, and such skills would be transferrable to ongoing employment. Exploitation of results will be managed with the assistance of the Keyworth Institute in the university, which has 15 years of experience in forming relations with industry, and Techtran group who will offer services in protection and commercialization of intellectual property.

Publications

10 25 50