Manifold, deep and adversarial learning for visual object detection

Date
2019
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
In recent years, emerging technologies such as deep learning have become critical in enabling robots to interact with complex real-world environments. This dissertation is focused on developing new algorithms to improve object detection for mobile robots. We start by exploring unsupervised algorithms to detect class-agnostic object saliency using a novel graph model. Next, we enable detection of class-specific objects and extend it to tracking. Furthermore, we demonstrate a joint learning scheme for simultaneous detection and fine-grained classification. Finally, we present a framework to perform 3D visual object detection on monocular images. ☐ Firstly, we propose an unsupervised class-agnostic object detection approach by exploiting novel graph structure and background priors. The input image is represented as an undirected graph with superpixels as nodes. Feature vectors are extracted from each node to cover regional color, contrast and texture information. A novel graph model is proposed to effectively capture local and global saliency cues. To obtain more accurate results, we optimize the saliency map by using a robust background measure. Comprehensive evaluations on benchmark datasets indicate that our algorithm universally surpasses state-of-the-art unsupervised solutions and performs favorably against supervised approaches. ☐ Secondly, we show a deep visual object detector without using object proposals. We modify a generic object detection network to train deep neural networks (DNN) models for bird and nest categories and extend it to enforce temporal continuity for tracking. The system shows satisfactory speed and accuracy for both detection and tracking. We also contribute a new dataset for nest detection. The proposed detector is well-suited for environmental robotic applications which demands real-time performance. ☐ Next, we demonstrate a unified framework to detect and classify fine-grained objects. To evaluate performance, we have created a new benchmark for fine-grained recognition. Experiments show that our approach performs favorably against competitive methods. Our network structure provides more desirable characteristics for practical computer vision applications and reaches a good balance between the model size, computational complexity, and accuracy. ☐ Moreover, we extend our work to RGB-D datasets and begin by introducing an algorithm to produce 3D shapes of a scene on a mobile device. The algorithm leverages stereo cameras to generate a full resolution depth map of the scene, recording 3D geometry information. Quantitative analysis showed that this new 3D imaging algorithm consistently outperformed the existing methods. We also show a novel scheme for rendering dynamic Depth of Field (DoF) effects based on the generated depth map. ☐ Lastly, we develop a framework to detect and classify 3D objects from monocular images. Experiments show that our approach performs favorably against competitive methods trained on LiDAR data. Our method leverages generative adversarial networks (GANs) to perform monocular depth estimation. The GAN approach is more flexible in terms of extending to other computer vision tasks. Also, we integrate both visual and structural cues into the feature map representation, which distinguishes our method from those purely operating on LiDAR data, and those who learn depth from a monocular image. Experiments show that our approach performs favorably against competitive methods trained on LiDAR data.
Description
Keywords
Citation