Paper Description: MIP-0204

BibTeX entry:

author="A. Kemper, B. Stegmaier",
title="Evaluating Bestmatch-Joins on Streaming Data",
institution="Fakult{\"a}t f{\"u}r Mathematik und Informatik, Universit{\"a}t Passau",


The Internet constitutes a huge distributed information source. Data sources on the Internet are often inherently infinite, e.g., dynamically generated data streams, or very large. New expressive query operators are needed to generate "interesting" data combining these data sources. A common problem is finding best matching pairs of data objects given user-defined multi-dimensional criteria. Traditional techniques do not give satisfying results, because a single "best" pair cannot be determined, since diverse pairs, each being best in different aspects of the comparison, are interesting. We propose the novel class of bestmatch-join (BMJ) operators to solve this problem. Unfortunately, the BMJ-operators are inherently blocking (pipeline-breakers), such that, in their basic form, they are not applicable to streaming data or "infinite" data sources. To improve the quality of the result and to overcome this problem we propose the constrained BMJ-operators. The constraints in combination with physical properties of the data stream, i.e., being ordered according to a constrained join attribute, enable our new pipelined BMJ-algorithms, which are based on synchronously shifting windows over the data streams. Finally, we present the encouraging results of our experiments, which demonstrate the effectiveness of our approach.

Paper itself:

Cross links:

Nathalie Vollstaedt