Management of uncertain data : towards unattended integration


Keijzer, Ander de (2008) Management of uncertain data : towards unattended integration. thesis.

open access
Abstract:In recent years, the need to support uncertain data has increased. Sensor
applications, for example, are dealing with the inherent uncertainty about
the readings of the sensors. Current database management systems are not
equipped to deal with this uncertainty, other than as a user defined attribute.
This forces the user of the DBMS to take on the responsibility of managing
the uncertainty associated with the data.
In this thesis, we present a new data model, based on XML that is capable
of storing uncertainty about elements and subtrees. The XML data
model is extended in such a way, that probabilities can be associated with
the elements and subtrees, dependency and independency of elements can be
expressed and even the existence of entire elements or subtrees can be uncertain.
We give a sound semantical foundation for dealing with the uncertainty
associated with the data, and show how querying using this semantics works.
The probabilistic XML data model is used in an information integration
application. Decisions about equality are postponed if the integration system
is uncertain about equality. This uncertainty is stored using the probabilistic
XML data model, making the integration process itself unattended. The
amount of uncertainty arising from this integration can be large. We therefore
introduce knowledge rules that help deciding on equality during the integration
phase. Using these rules, integrated documents contain less uncertainty
and are therefore smaller in size. We also introduced two measures with
which the amount of uncertainty in the document can be quantified. Uncertainty
density measures the amount of uncertainty in the database. The
second measure, answer decisiveness, quantifies the ease with which most
likely possibilities in query results can be chosen.
At a later stage, when the user is querying the information source, and
therefore already actively using the system, feedback can be provided on
query results. This feedback is explained in the same semantical setting as
querying. Feedback statements can either be positive, i.e. the query result
can be observed in the real world, or negative, i.e. the query result cannot
be observed in the real world. We show that using this feedback technique, if
used with caution, reduces the amount of uncertainty and lets the information
source converge to a correctly integrated document. To measure the quality
of query results, we adapted precision and recall for probabilistic data in a
way that, for example incorrect answers with low probability do not have the
same negative impact as incorrect answers with a high probability.
Item Type:Thesis
Science and Technology (TNW)
Research Group:
Link to this item:
Official URL:
Export this item as:BibTeX
HTML Citation
Reference Manager


Repository Staff Only: item control page

Metis ID: 250864