Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation
Submitted by Arjun Jain (@stencilman) on Wednesday, 11 May 2016
We propose a new hybrid architecture that consists of a deep Convolutional Network and a Markov Random Field. We show how this architecture is successfully applied to the challenging problem of articulated human pose estimation in monocular images. The architecture can exploit structural domain constraints such as geometric relationships between body joint locations. We show that joint training of these two model paradigms improves performance and allows us to significantly outperform existing state-of-the-art techniques.
Background knowledge of ConvNets and Markov Random Fields
Arjun Jain is the cofounder of Perceptive Code. Prior to this, he was researcher with a special project team at Apple and a post-doctoral researcher at the Computer Science department at New York University’s Courant Institute. He received his Ph.D. in Computer Science from the Max-Planck Institute for Informatics in Germany. Broadly, his research lies at the interface of computer graphics, computer vision, and machine learning, with a focus on human pose estimation and data-driven artistic content creation tools. Arjun has worked as a developer for several companies, including Yahoo! in Bangalore and Weta Digital in New Zealand. Arjun served as a developer for Weta Digital’s vision-based motion capture system. This system has been used in many feature films, and Arjun was credited for his work in Steven Spielberg’s, The Adventures of Tintin. Arjun’s work has resulted in several academic publications, a patent, and has been featured by mainstream media, including in the magazines: New Scientist, Discovery, BCC, Vogue, Wired, India Today, and The Hollywood Reporter, among other outlets.