Teaching computers how to read body language Monday, 24 July 2017

American researchers have demonstrated how computers can understand body poses and movements of multiple people from video in real time, including the pose of each individual's hands and fingers.

The method was developed with the help of a Panoptic Studio, a two-storey dome embedded with 500 video cameras, at Carnegie Mellon University's (CMU) Robotics Institute.

Yaser Sheikh, associate professor of robotics at CMU, said these methods for tracking 2D human form and motion open up new ways for people and machines to interact with each other and for people to use machines to better understand the world around them.

"We communicate almost as much with the movement of our bodies as we do with our voice," Sheikh said. "But computers are more or less blind to it."

He says the ability to recognise hand poses, for instance, would make it possible for people to interact with computers in new and more natural ways, including pointing at things. Or a self-driving car could get an early warning that a pedestrian is about to step into the street by monitoring body language.

In sports analytics, real-time pose detection would make it possible for computers to track not only the position of each player on the field of play, as is now the case, but to know what players are doing with their arms, legs and heads at each point in time.

Sheikh said tracking multiple people in real time, particularly in social situations where they may be in contact with each other, presents a number of challenges. Simply using programs that track the pose of an individual does not work well when applied to each individual in a group, particularly when that group gets large. His team took what they call a 'bottom-up' approach, which first locates all the body parts in a scene (such as arms, legs and faces) and then associates those parts with particular individuals.

The challenges for hand detection are greater. As people use their hands to hold objects and make gestures, a camera is unlikely to see all parts of the hand at the same time. Unlike the face and body, large datasets do not exist of hand images that have been annotated with labels of parts and positions.

This is where the multi-camera Panoptic Studio came in handy providing 500 views of a person's hand in each scene.

"The Panoptic Studio supercharges our research," Sheikh said. It is now being used to improve body, face and hand detectors by jointly training them.

[Robotics Institute researchers Gines Hidalgo Martinez and Hanbyul Joo demonstrate how a real-time detector understands hand gestures and tracks multiple people. Photo: CMU]