Human attention simulation on nature scenes in computer vision

Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Human Attention Simulation is a long-standing problem in computer vision area. Researchers either attempt to locate the most interesting objects for human beings in images or predict where people will look at in nature scenes. Accurate and reliable human attention detection can benefit numerous tasks ranging from tracking and recognition in vision to image manipulation in graphics. For example, successful saliency detection algorithms facilitate automated image segmentation, more reliable object detection, effective image thumbnailing and retargeting. However, state-of-the-art techniques neglect some essential problems that limit saliency models from precisely simulating human attention mechanism. On the input front, images used for saliency detection tasks always fail to preserve the high dimensional information of the scene; on the task front, it is inevitable to observe inconsistency among ground truth provided by different persons, resulting in uncertainty on prediction performances. ☐ Regarding the input data, existing saliency detection approaches using images as inputs are sensitive to foreground/background similarities, complex background textures, and occlusions. I explore the problem of utilizing light fields as input for saliency detection. The proposed technique is enabled by the availability of commercial plenoptic cameras that capture the light field of a scene in a single shot. I show that the unique refocusing capability of light fields provides useful focusness, depths, and objectness cues. I further develop a new saliency detection algorithm tailored for light fields. To validate the approach, I acquire a light field database of a range of indoor and outdoor scenes and generate the ground truth saliency map. Experiments show that the saliency detection scheme can robustly handle challenging scenarios such as similar foreground and background, cluttered background, complex occlusions, \etc, and achieve high accuracy and robustness. ☐ As for methods using high-dimensional data beyond regular images as saliency input, they are tailored for different data types. Those techniques adopt very different solution frameworks, in both types of features and procedures on using them. In this dissertation, I present a unified saliency detection framework for handling heterogeneous types of input data. The proposed approach builds dictionaries using data-specific features. Specifically, I first select a group of potential foreground superpixels to build a primitive saliency dictionary. I then prune the outliers in the dictionary and test on the remaining superpixels to iteratively refine the dictionary. Comprehensive experiments show that the proposed approach universally outperforms the state-of-the-art solution on all 2D (regular image), 3D (image with depth information) and 4D (light field) data. ☐ Regarding the saliency detection task, tremendous efforts have been focused on exploring a universal saliency model across users despite their differences in gender, race, age, \etc Yet recent psychology studies suggest that saliency is highly specific than universal: individuals exhibit heterogeneous gaze patterns when viewing an identical scene containing multiple salient objects. In this dissertation, I show that such heterogeneity is common and critical for reliable saliency prediction. The conducted study also produces the first database of personalized saliency maps (PSMs). I model PSM based on universal saliency map (USM) shared by different participants and adopt a multi-task CNN framework to estimate the discrepancy between PSM and USM. Comprehensive experiments demonstrate that the new PSM model and prediction scheme are effective and reliable.
Description
Keywords
Applied sciences, Fixation prediction, Light field, Object dectection, Saliency detection
Citation