Low dimensional vs point-wise complex object detection, refinement and tracking

Date
2014
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
In this dissertation, I focus on algorithms to detect, refine, and track complex visual objects, as determined by their shape, appearance, or range of motion. Two different classes of representations, namely low-dimensional and point-wise (non-parametric), are explored, compared, and analyzed by combining multiple features from both imaging and depth sensors. Such perception modules may provide crucial decision-making and planning information for intelligent systems, and we show links to several robotic application areas. On one side, a multimodal human detection algorithm which utilizes multiple sensor data sources is detailed. The human is represented in a low-dimensional space, and a ladar-camera architecture is constructed which with visual and geometric cues improves on the detection rate and speed of conventional human classifiers. Unlike existing approaches, the proposed human classifier does not make any restrictive assumptions on the range scan positions, and thus is applicable to a wide range of real-life detection tasks. On another side, a modified graph cut-based method is proposed to refine complex objects--e.g. human hands, hiking trails for navigation, and household objects--from a rough estimate of the object pose given by a low-dimensional shape detector or tracker. The standard graph cut method is modified to incorporate color and shape distance terms, adaptively weighting them at run time to favor the most informative cue in different visual conditions. Also, this method is extended for point-wise tracking of the objects through iterative refinement without estimation of the object displacement over time. Moreover, a novel algorithm which combines low- and high-level observations in a graph cut framework is developed for the purpose of refining low-dimensional representations of the human shape provided by some human detection or tracking methods. A multi-layer graph cut framework is introduced to combine low-level observations such as color and depth with high-level cues including the estimated ground plane and point-wise shape confidence scores given by a classifier. Lastly, a point-wise tracking algorithm which fuses multiple cues obtained from depth and color cameras is developed. It employs a motion estimation technique in which displacements are calculated from the 3-D locations of image keypoints matched between frames. Color and point-wise shape cues are combined in the same feature vector to allow adaptive weighting in different parts of the scene. This tracker is generic in that it does not make any assumptions or restrictions about the shape of the object, and experiments demonstrate better dense point-wise tracking performance than comparable algorithms over large viewpoint changes and object deformations.
Description
Keywords
Citation