Modeling representation uncertainty in concept-based multimedia retrieval


Aly, Robin Benjamin Niko (2010) Modeling representation uncertainty in concept-based multimedia retrieval. thesis.

Abstract:This thesis considers concept-based multimedia retrieval, where documents
are represented by the occurrence of concepts (also referred to as semantic
concepts or high-level features). A concept can be thought of as a kind of
label, which is attached to (parts of) the multimedia documents in which
it occurs. Since concept-based document representations are user, language
and modality independent, using them for retrieval has great potential for
improving search performance. As collections quickly grow both in volume
and size, manually labeling concept occurrences becomes infeasible and the
so-called concept detectors are used to decide upon the occurrence of concepts
in the documents automatically.
The following fundamental problems in concept-based retrieval are identified
and addressed in this thesis. First, the concept detectors frequently make
mistakes while detecting concepts. Second, it is difficult for users to formulate
their queries since they are unfamiliar with the concept vocabulary, and
setting weights for each concept requires knowledge of the collection. Third,
for supporting retrieval of longer video segments, single concept occurrences
are not sufficient to differentiate relevant from non-relevant documents and
some notion of the importance of a concept in a segment is needed. Finally,
since current detection techniques lack performance, it is important
to be able to predict what search performance retrieval engines yield, if the
detection performance improves.
The main contribution of this thesis is the uncertain document representation
ranking framework (URR). Based on the Nobel prize winning Portfolio
Selection Theory, the URR framework considers the distribution over all
possible concept-based document representations of a document given the
observed confidence scores of concept detectors. For a given score function,
documents are ranked by the expected score plus an additional term of the
variance of the score, which represents the risk attitude of the system.
User-friendly concept selection is achieved by re-using an annotated development
collection. Each video shot of the development collection is transformed
into a textual description which yields a collection of textual descriptions.
This collection is then searched for a textual query which does not
require the user’s knowledge of the concept vocabulary. The ranking of the
textual descriptions and the knowledge of the concept occurrences in the development
collection allows a selection of useful concepts together with their
weights. The URR framework and the proposed concept selection method are used
to derive a shot and a video segment retrieval framework. For shot retrieval,
the probabilistic ranking framework for unobservable events is proposed. The
framework re-uses the well-known probability of relevance score function
from text retrieval. Because of the representation uncertainty, documents
are ranked by their expected retrieval score given the confidence scores from
the concept detectors.
For video segment retrieval, the uncertain concept language model is proposed
for retrieving news items – a particular video segment type. A news
item is modeled as a series of shots and represented by the frequency of each
selected concept. Using the parallel between concept frequencies and term
frequencies, a concept language model score function is derived from the language
modelling framework. The concept language model score function is
then used according to the URR framework and documents are ranked by
the expected concept language score plus an additional term of the score’s
The Monte Carlo Simulation method is used to predict the behavior of
current retrieval models under improved concept detector performance. First,
a probabilistic model of concept detector output is defined as two Gaussian
distributions, one for the shots in which the concept occurs and one for the
shots in which it does not. Randomly generating concept detector scores for a
collection with known concept occurrences and executing a search on the generated
output estimates the expected search performance given the model’s
parameters. By modifying the model parameters, the detector performance
can be improved and the future search performance can be predicted.
Experiments on several collections of the TRECVid evaluation benchmark
showed that the URR framework often significantly improve the search
performance compared to several state-of-the-art baselines. The simulation
of concept detectors yields that today’s video shot retrieval models will show
an acceptable performance, once the detector performance is around 0:60
mean average precision. The simulation of video segment retrieval suggests,
that this task is easier and will sooner be applicable to real-life applications.
Item Type:Thesis
Electrical Engineering, Mathematics and Computer Science (EEMCS)
Research Group:
Link to this item:
Official URL:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page