6. 矢量数据处理功能参考 — KingbaseES产品手册.pdf

Computing Discrete Fréchet Distance Thomas Eiter and Heikki Mannila Christian Doppler Labor für Expertensyteme Technische Universität Wien Paniglgasse 16, 1040 Wien E-mail: eiter@vexpert.dbai.tuwien.ac.at April 25, 1994 CD-TR 94/64 Computing Discrete Fréchet Distance∗ Thomas Eiter Heikki Mannila Information Systems Dept. Technical University of Vienna Dept. of Computer Science University of Helsinki Abstract The Fréchet distance between two curves in a metric space is a measure of the similarity between the curves. We present a discrete variation of this measure. It provides good approximations of the continuous measure and can be efficiently computed using a simple algorithm. We also consider variants of discrete Fréchet distance, and find an interesting connection to measuring distance between theories. keywords: Fréchet distance, metrics, curves, polygonal curves 1 Introduction Given two curves in a metric space, the Fréchet distance δF between them can be defined intuitively as follows. A man is walking a dog on a leash: the man can move on one curve, the dog on the other; both may vary their speed, but backtracking is not allowed. What is the length of the shortest leash that is sufficient for traversing both curves? The Fréchet distance is a measure of similarity between curves that takes into account the location and ordering of the points along the curves. Therefore it is often better than the well-known Hausdorff distance. This distance function was introduced by Fréchet in 1906 [6]. A fundamental study on the computational properties of the Fréchet distance was done by Alt and Godau [1]. They give an algorithm that computes the exact Fréchet distance between two polygonal curves in time O(pq log2 pq), where p and q are the number of segments on the polygonal curves. The algorithm is fairly involved, as it uses the parametric search technique. In this paper we describe a discrete variation of the Fréchet distance for polygonal curves. The variation is called the coupling distance δdF . It is based on the idea of ∗ Authors’ addresses: Thomas Eiter, Information Systems Department, Technical University of Vienna, Paniglgasse 16, A-1040 Vienna, Austria; Heikki Mannila, Department of Computer Science, University of Helsinki, P.O. Box 26 (Teollisuuskatu 23), SF-00014 University of Helsinki, Finland. email: eiter@vexpert.dbai.tuwien.ac.at, mannila@cs.helsinki.fi 1 looking at all possible couplings between the end points of the line segments of the polygonal curves. We show that δdF provides good approximations to δF . Specifically, we show that δdF is an upper bound for δF , and that the difference between these measures is bounded by the length of the longest edge of the polygonal curves. We also show that δdF can be computed in O(pq) time using a very simple algorithm. On the basis of these results, the following way of approximately computing the Fréchet distance between two arbitrary curves suggests itself: First compute proper polygonal approximations to the curves and then compute their coupling distance, rather than computing their exact Fréchet distance. We also briefly address variants of the discrete Fréchet distance, and find an interesting relation to a measure of distance between logical theories. 2 Discrete Fréchet distance Following [1], we define a curve as a continuous mapping f : [a, b] → V , where a, b ∈ < and a ≤ b and (V, d) is a metric space. Given two curves f : [a, b] → V and g : [a0 , b0 ] → V , their Fréchet distance is defined as δF (f, g) = inf max d(f (α(t)), g(β(t))), α,β t∈[0,1] where α (resp. β) is an arbitrary continuous nondecreasing function from [0, 1] onto [a, b] (resp. [a0 , b0 ]). In computing the Fréchet distance between arbitrary curves one typically approximates the curves by polygonal curves. A polygonal curve is a curve P : [0, n] → V , where n is a positive integer, such that for each i ∈ {0, 1, . . . , n − 1}, the restriction of P to the interval [i, i + 1] is affine, that is P (i + λ) = (1 − λ)P (i) + λP (i + 1). Let P : [0, n] → V be a polygonal curve. We denote the sequence (P (0), P (1), . . . , P (n)) of endpoints of the line segments of P by σ(P ). Let P and Q be polygonal curves and σ(P ) = (u1 , . . . , up ) and σ(Q) = (v1 , . . . , vq ) the corresponding sequences. A coupling L between P and Q is a sequence (ua1 , vb1 ), (ua2 , vb2 ), . . . , (uam , vbm ) of distinct pairs from σ(P ) × σ(Q) such that a1 = 1, b1 = 1, am = p, bm = q, and for all i = 1, . . . , q we have ai+1 = ai or ai+1 = ai + 1, and bi+1 = bi or bi+1 = bi . Thus, a coupling has to respect the order of the points in P and Q. The length ||L|| of the coupling L is the length of the longest link in L, that is, ||L|| = max d(uai , vbi ). i=1,...,m Given polygonal curves P and Q, their discrete Fréchet distance is defined to be δdF (P, Q) = min{||L|| | L is a coupling between P and Q}. 2 It is immediate that δdF (P, P ) = 0, δdF (P, Q) = δdF (Q, P ); furthermore, one can check that δdF (P, Q) = 0 implies P = Q and that δdF (P, Q) ≤ δdF (P, R) + δdF (R, Q). We thus have the following. Proposition 1 δdF defines a metric on the set of polygonal curves. The relationship of δdF to δF is captured by the following two lemmata, from which we immediately obtain the quality of approximation. Lemma 2 For all polygonal curves P and Q we have δF (P, Q) ≤ δdF (P, Q). Proof. A coupling with maximal edge r gives a way of walking around P and Q with leash at most r. 2 Let for any polygonal curve P = (u1 , . . . , up ) denote D(P ) = maxi=2,...,p d(ui−1 , ui ). Lemma 3 Let P : [0, n] → V and Q : [0, m] → V be polygonal curves. Then, δdF (P, Q) ≤ δF (P, Q) + max{D(P ), D(Q)}. Proof. Let α(t) (resp. β(t)) be a continuous nondecreasing function from [0, 1] to [0, n] (resp. [0, m]). Let σ(P ) = (u1 , . . . , up ) and σ(Q) = (v1 , . . . , vq ). For each point u ∈ σ(P ) let t(u) ∈ [0, 1] be the smallest value such that α(t(u)) = u, and for each point v ∈ σ(Q) let s(v) ∈ [0, 1] be the smallest value such that β(s(v)) = v. We construct a coupling R between P and Q iteratively. First add edge (u1 , v1 ) to R. Assume then that R is already a coupling between the sequences (u1 , . . . , ui ) and (v1 , . . . , vj ). We extend R by one edge as follows. If j = q or t(ui+1 ) < s(vj+1 ), then add the link (ui+1 , vj ). The endpoint vj of this link is the left endpoint of the line segment in which the point β(t(ui+1 )) lies. Thus for the length of this link we have the inequality d(ui+1 , vj ) ≤ d(α(t(ui+1 )), β(t(ui+1 ))) + D(Q). (1) If i = p or t(ui+1 ) > s(vj+1 ), then add the link (ui , vj+1 ). Again, the endpoint ui is the left endpoint of the line segment in which the point α(s(vj+1 )) lies, and we have the inequality d(ui , vj+1 ) ≤ d(α(s(vj+1 )), β(s(vj+1 ))) + D(Q). (2) Otherwise, we have t(ui+1 ) = s(vj+1 ), and we add the link (ui+1 , vj+1 ), whose length is bounded by d(α(t(ui+1 ), β(t(ui+1 ))). It is easy to see that the constructed sequence R is a coupling. The above inequalities 1 and 2 give us that ||R|| ≤ max d(α(t), β(t)) + max{D(P ), D(Q)}. t∈[0,1] 3 Since α and β were arbitrary, we obtain ||R|| ≤ δF (P, Q) + max{D(P ), D(Q)}. 2 Theorem 4 For any polygonal curves P and Q δF (P, Q) ≤ δdF (P, Q) ≤ δF (P, Q) + max{D(P ), D(Q)}. From this, we can regard the coupling measure δdF as a discrete version of the Fréchetdistance. More precisely, say that a polygonal curve Q is a refinement of the polygonal curve P , if σ(P ) = (u1 , . . . , un ) and for some i = 1, . . . , n we have σ(Q) = (u1 , . . . , ui , v, ui+1 , . . . , un ), and the point v is on the line segment between ui and ui+1 . Then, Proposition 5 Let P0 , P1 , . . . and Q0 , Q1 , . . . be sequences of polygonal curves such that Pi+1 (resp. Qi+1 ) is a refinement of Pi (resp. Qi ) for all i ≥ 0 and limi→∞ D(Pi ) = limi→∞ D(Qi ) = 0. Then, lim δdF (Pi , Qi ) = δF (P0 , Q0 ). i→∞ Proof. By Theorem 4. 2 In fact, from the results in [1] it can be seen that for any polygonal curves P = (u1 , . . . , up ) and Q = (v1 , . . . , vq ), there always exist sequences of refinements of P and Q leading to curves P 0 and Q0 , respectively, that both contain at most p + q points and satisfy δdF (P 0 , Q0 ) = δF (P, Q). 3 Computation An advantage of the coupling measure is its efficient computability by dynamic programming, without need of complicated data structures. The algorithm dF in Table 1 can be coded easily; the following lemma on its output is straightforward. Lemma 6 dF(P, Q) = δdF (P, Q) for any polygonal curves P and Q. Thus, we obtain the following result. Theorem 7 Let P : [0, n] → V and Q : [0, m] → V be polygonal curves. Denote σ(P ) = (u1 , . . . , up ) and σ(Q) = (v1 , . . . , vq ). The measure δdF (P, Q) can be computed in O(pq) time. Proof. By Lemma 6, dF(P, Q) computes δdF (P, Q). It is easy to see that the runtime of dF(P, Q) is O(pq); hence the result. 2 We remark that an algorithm of Godau for deciding δF (P, Q) ≤ ? [7, 1] may be used to approximate δF (P, Q) on a particular machine by doing a binary search over the range r of reals that the computer normally handles. This amounts to an O(pq log r) time algorithm for fixed range of reals, which works fine for practical purposes. However, the algorithm is much more involved than the simple algorithm dF from above. 4 Function dF(P, Q): real; input: return: polygonal curves P = (u1 , . . . , up ) and Q = (v1 , . . . , vq ). δdF (P, Q) ca : array [1..p, 1..q] of real; function c(i, j): real; begin if ca(i, j) > −1 then return ca(i, j) elsif i = 1 and j = 1 then ca(i, j) := d(u1 , v1 ) elsif i > 1 and j = 1 then ca(i, j) := max{ c(i − 1, 1), d(ui , v1 ) } elsif i = 1 and j > 1 then ca(i, j) := max{ c(1, j − 1), d(u1 , vj ) } elsif i > 1 and j > 1 then ca(i, j) := max{ min(c(i − 1, j), c(i − 1, j − 1), c(i, j − 1)), d(ui , vj ) } else ca(i, j) = ∞ return ca(i, j); end; begin for i = 1 to p do for j = 1 to q do ca(i, j) := −1.0; return c(p, q); end. Table 1: Algorithm computing the coupling measure 4 Variants of discrete Fréchet distance A variant of discrete Fréchet distance is obtained if the length of a coupling L between P and Q is measured by the sum of the links in L (denote this by ||L||s ), i.e., ||L||s = X d(uai , vbi ). i=1,...,m The implied measure δsdF (P, Q) gives the minimum of the total distance of an orderpreserving correspondence between points of P and Q, such that each point of P corresponds to at least one point of Q and vice versa. A dog-and-leash interpretation of this measure would be the minimum total length of leashes needed for men walking their dogs, such that all points on the routes P and Q are occupied and leashes do not cross over. Clearly, δsdF (P, P ) = 0, δsdF (P, Q) = δsdF (Q, P ), and that δsdF (P, Q) = 0 implies P = Q; the triangle inequality, however, fails in general. It is easy to see that δsdF is computed by the variant of algorithm dF in Table 1 obtained by replacing “max” operations with additions; hence, δsdF can be computed in time O(pq). It is interesting to note a connection between measuring the distance of curves and the distance of logical theories. If one gives up on order-preservation of couplings, 5 then the measures defined as δdF and δsdF resemble two measures for distance between logical theories. The motivation for studying such measures comes from the notion of truthlikeness in philosophy of science [11] and from artificial intelligence, where the task of quantifying the distance of theories is important for example for machine learning and theory approximation [10, 13, 12, 8, 3, 4]. Theories viewed as sets of models naturally correspond to curves viewed as sets of points. Given a metric on the models, the distance between the sets of models may be interpreted as a measure of similarity between theories [11]. Let an unordered coupling between P and Q be any sequence U = (ua1 , vb1 ), (ua2 , vb2 ), . . . , (uam , vbm ) of distinct pairs from σ(P ) × σ(Q) such that every point in P (resp. Q) occurs among the uai (resp. vbi ), and let ||U ||, ||U ||s be defined analogous as in the case of ordered couplings. Define u δdF (P, Q) = min{||U || | U is an unordered coupling between P and Q}, u δsdF (P, Q) = min{||U ||s | U is an unordered coupling between P and Q}. u Both measures have been proposed for measuring theory distance. Notice that δdF u collapses with the Hausdorff distance. The measure δsdF describes the minimum cost of a correspondence between the points of P and Q; it can be seen in the spirit of the unordered similarity measures in [2]. This measure, called link measure in the context of [5], does not define a metric, since the triangle does not hold in general. u Intuitively, the computation of δsdF seems to be involved, and one might suspect NPu hardness of this problem. Somewhat surprising, however, is that δsdF can be computed in polynomial time by using graph matching techniques (cf. [9] for a background). Given P , σ(P ) = (u1 , . . . , up ) and Q, σ(Q) = (v1 , . . . , vq ), define a complete bipartite graph G0 = (A ∪ B, E), where A = {a1 , . . . , ap } and B = {b1 , . . . , bq }, in which each edge e = {ai , bj } has weight w(e) = d(ui , vj ). Let G1 be a zero-weight copy of G0 , and let G be the graph obtained from G0 ∪ G1 by connecting each node from G to its copy by weight equal to its nearest neighbor in G. Then, the following property of G holds. Theorem 8 Let P and Q be polygonal curves. The cost of a minimum perfect matching u (P, Q). in the graph G for P and Q is identical to δsdF u (A proof can be found in [5].) Consequently, δsdF can be efficiently computed by applying a minimum cost perfect matching algorithm, which is feasible in polynomial time (cf. [9]). 5 Conclusion We have presented a discrete variant of the Fréchet distance between curves in a metric space, and we described a simple and efficient algorithm for computing this measure. Besides its own interest, discrete Fréchet distance may be used for approximately computing the Fréchet distance between two arbitrary curves, as an alternative to using 6 the exact Fréchet distance between a polygonal approximation of the curves or an approximation of this value. Moreover, we found an interesting connection between distance of curves and logical theories. This connection suggests that distance measures for curves (sets of points) may be fruitfully applied to logical theories (sets of models). Acknowledgements We thank Michael Godau and Hannu Toivonen for useful comments on this paper. References [1] H. Alt and M. Godau. Measuring the resemblance of polygonal curves. In Proc. 8th Annu. ACM Sympos. Comput. Geom., pages 102–109, 1992. [2] H. Alt, K. Mehlhorn, H. Wagener, and E. Welzl. Congruence, similarity and symmetries of geometric objects. Discrete Comput. Geom., 3:237–256, 1988. [3] M. Cadoli. Semantical and Computational Aspects of Horn Approximations. In Proceedings IJCAI-93, pages 39–44, Chambery, France, August 1993. [4] M. Cadoli and M. Schaerf. Tractable Reasoning via Approximation. Artificial Intelligence, 1993. To appear. [5] T. Eiter and H. Mannila. Theory Distance and Similarity: Measures and Computation. Technical Report CD-TR 94/60, CD-Laboratory for Expert Systems, TU Vienna, Austria, February 1994. [6] M. Fréchet. Sur quelques points du calcul fonctionnel. Rendiconti del Circolo Mathematico di Palermo, 22:1–74, 1906. [7] M. Godau. A natural metric for curves – computing the distance for polygonal chains and approximation algorithms. In Proc. 8th Sympos. Theor. Aspects of Comp. STACS, LNCS vol. 480, pages 127–136, 1991. [8] R. Greiner and D. Schuurmans. Learning Useful Horn Approximations. In Proceedings of the Third International Conference on Knowledge Representation and Reasoning (KR-92), pages 383–392, 1992. [9] K. Mehlhorn. Graph Algorithms and NP-Completeness, volume 2 of Data Structures and Algorithms. Springer, 1984. [10] B. K. Natarajan. Machine Learning: A Theoretical Approach. Morgan Kaufmann, San Mateo, CA, May 1991. [11] I. Niiniluoto. Truthlikeness. D. Reidel Pub. Comp., Dordrecht, Holland, 1987. [12] B. Selman and H. Kautz. Knowledge Compilation Using Horn Approximations. In Proceedings AAAI-91, pages 904–909, Anaheim, August 1991. [13] S. Wrobel. On the Proper Definition of Minimality in Specialization and Theory Revision. In P. B. Brazdil, editor, Machine Learning: ECML-93. European Conference on Machine Learning, Lecture Notes in AI 667, pages 65–82. Springer-Verlag, 1993. 7