Duplicate Detection in Probabilistic Data
Panse, Fabian and Keulen van, Maurice and Keijzer de, Ander and Ritter, Norbert (2009) Duplicate Detection in Probabilistic Data. In: 2nd International Workshop on New Trends in Information Integration, NTII, March 1-6, 2010, Long Beach, California, USA. (In Press)
| PDF 200Kb |
| Abstract: | Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities. |
| Item Type: | Conference or Workshop Item |
| Faculty: | Electrical Engineering, Mathematics and Computer Science (EEMCS) Science and Technology (TNW) |
| Research Group: | |
| Link to this item: | http://purl.utwente.nl/publications/68597 |
| Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page

Show download statistics for this publication
Show download statistics for this publication