Towards better systems for complex search tasks

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
When users interact with search engines, in a large number of cases, they first formulate a query and examine the results, and then reformulate it one or several more times until they either satisfy their information need or give up. Complex search tasks fall into those cases. Unlike simpler tasks in which users are looking for a particular homepage or a particular single piece of information or an answer to a single specific question, complex search tasks often span multiple search queries (i.e. a sequence of queries) and can span multiple sessions (i.e. multiple sequences of queries). ☐ In this thesis, we present several efforts for building more effective and robust retrieval systems for complex search tasks in situations where we have only small amounts of search history data. We first start by investigating and understanding users’ preferences with respect to document comprehensiveness and topical relevance grade. ☐ Then, using our findings from that experiment, we introduce heuristic data fusion methods to improve search results in a search session by leveraging most recent search history and query logs. ☐ Next, we go beyond simple average effectiveness by considering risk-sensitivity as an essential part of our retrieval systems. For that purpose, we present re-ranking approaches that exploit the “popularity” of documents and we show that they produce results with improved robustness and effectiveness over a variety of retrieval systems used as baselines. Risk-sensitive ranking (or robustness-aware ranking) focuses on improving the robustness of the system by minimizing the risk of obtaining, for any topic, a result subpar with that of the baseline system. In other words, robustness refers to the ability of the ranker to reduce and mitigate poor performance on certain individual queries while striving to improve the overall performance as well. ☐ Our next endeavor consists in going beyond heuristic retrieval models. For that purpose, we propose a probabilistic data fusion framework for retrieval and ranking inspired by the well-known probability ranking function, and we use it to solve search over sessions, as well as ad hoc search, novelty and diversity search. ☐ Finally, in order to achieve high effectiveness for search over a session even in the absence of search history, we propose to simulate search interactions that provide data similar to what we could have obtained if a user were to have prior interactions with the search engine (previous queries, top results returned for previous queries, etc.).
Description
Keywords
Applied sciences, Complex search task, Diversity and novelty search, Information retrieval models, Robust search systems, Session search, Simulation of query reformulations
Citation