Η παρουσίαση φορτώνεται. Παρακαλείστε να περιμένετε

Η παρουσίαση φορτώνεται. Παρακαλείστε να περιμένετε

Distance Functions on Hierarchies Eftychia Baikousi.

Παρόμοιες παρουσιάσεις


Παρουσίαση με θέμα: "Distance Functions on Hierarchies Eftychia Baikousi."— Μεταγράφημα παρουσίασης:

1 Distance Functions on Hierarchies Eftychia Baikousi

2 Outline Definition of metric & similarity Various Distance Functions  Minkowski  Set based  Edit distance Basic concept of OLAP  Lattice  Distance in same level of hierarchy  Distance in different level of hierarchy

3 Definition of metric A distance function on a given set M is a function d:MxM  , that satisfies the following conditions:  d(x,y)≥0 and  d(x,y)=0 iff x=y Distance is positive between two different points and is zero precisely from a point to itself  It is symmetric: d(x,y)=d(y,x) The distance between x and y is the same in either direction  It satisfies the triangle inequality: d(x,z) ≤ d(x,y)+ d(y,z) The distance between two points is the shortest distance along any path Is a metric

4 Definition of similarity metric Let s(x,y) be the similarity between two points x and y, then the following properties hold:  s(x,y) =1 only if x=y (0≤ s ≤1)  s(x,y) =s(y,x)  x and y (symmetry)  The triangle inequality does not hold

5 Outline Definition of metric & similarity Various Distance Functions  Minkowski  Set based  Edit distance Basic concept of OLAP  Lattice  Distance in same level of hierarchy  Distance in different level of hierarchy

6 Minkowski Family norm-1, City-Block, Manhattan L 1 (x,y)= Σ i |x i -y i | norm-2, Euclidian L 2 (x,y)=(Σ i |x i -y i | 2 ) 1/2 norm-p, Minkowski L p (x,y)=(Σ i |x i -y i | p ) 1/p infinity norm L  =lim p   (Σ i |x i -y i | p ) 1/p =max i (|x i -y i |)

7 Set Based Simple matching coefficient Jaccard Coefficient Extended Jaccard, Tanimoto (Vector based) Cosine (Vector based) Dice’s coefficient

8 Edit Distance- Levenshtein distance Edit distance between two strings x=x 1 ….x n, y=y 1 …y m is defined as the minimum number of atomic edit operations needed  Insert : ins(x,i,c)=x 1 x 2 …x i cx i+1 …x n  Delete : del(x,i)=x 1 x 2 …x i-1 x i+1 …x n  Replace : rep(x,i,c)=x 1 x 2 …x i-1 cx i+1 …x n Assign cost for every edit operation c(o)=1

9 Edit distances Needleman-Wunch distance or Sellers Algorithm  Insert a character ins(x,i,c)=x 1 x 2 …x i cx i+1 …x n  with cost(o)=1 a gap ins_g(x,i,g)=x 1 x 2 …x i gx i+1 …x n  with cost(o)=g  Delete a character del(x,i)=x 1 x 2 …x i-1 x i+1 …x n  with cost(o)=1 a gap del_g(x,i)=x 1 x 2 …x i-1 x i+1 …x n  with cost(o)=g  Replace a character rep(x,i,c)=x 1 x 2 …x i-1 cx i+1 …x n  with cost(o)=1

10 Edit distances Jaro distance Let two strings s and t and  s’= characters in s that are common with t  t’ = characters in t that are common with s  T s,t =number of transportations of characters in s’ relative to t’

11 Edit distances Jaro distance Example Let s =MARTHA and t =MARHTA  |s’|=6  |t’|=6  T s,t = 2/2 since mismatched characters are T/H and H/T

12 Edit distances Jaro Winkler JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t))) Where:  prefixLength : the length of common prefix at the start of the string  PREFIXSCALE: a constant scaling factor which gives more favourable ratings to strings that match from the beginning for a set prefix length

13 Edit distances Jaro Winkler Example Let s =MARTHA and t =MARHTA and PREFIXSCALE = 0.1  Jaro(s,t)=0.8055  prefixLength=3 JWS(s,t)= Jaro(s,t) + ((prefixLength * PREFIXSCALE * (1.0-Jaro(s,t))) = 0.8055 + (3*0.1*(1-0.8055)) = 0.86385

14 Outline Definition of metric & similarity Various Distance Functions  Minkowski  Set based  Edit distance Basic concept of OLAP  Lattice  Distance in same level of hierarchy  Distance in different level of hierarchy

15 Βασικές Έννοιες OLAP Αφορά την ανάλυση κάποιων μετρήσιμων μεγεθών (μέτρων)  πωλήσεις, απόθεμα, κέρδος,... Διαστάσεις: παράμετροι που καθορίζουν το περιβάλλον (context) των μέτρων  ημερομηνία, προϊόν, τοποθεσία, πωλητής, … Κύβοι: συνδυασμοί διαστάσεων που καθορίζουν κάποια μέτρα  Ο κύβος καθορίζει ένα πολυδιάστατο χώρο διαστάσεων, με τα μέτρα να είναι σημεία του χώρου αυτού

16 Κύβοι για OLAP REGION N S W PRODUCT Juice Cola Soap MONTH Jan 10 13

17 Κύβοι για OLAP

18 Βασικές Έννοιες OLAP Τα δεδομένα θεωρούνται αποθηκευμένα σε ένα πολυδιάστατο πίνακα (multi-dimensional array), ο οποίος αποκαλείται και κύβος ή υπερκύβος (Cube και HyperCube αντίστοιχα). Ο κύβος είναι μια ομάδα από κελιά δεδομένων (data cells). Κάθε κελί χαρακτηρίζεται μονοσήμαντα από τις αντίστοιχες τιμές των διαστάσεων (dimensions) του κύβου. Τα περιεχόμενα του κελιού ονομάζονται μέτρα (measures) και αναπαριστούν τις αποτιμώμενες αξίες του πραγματικού κόσμου.

19 Ιεραρχίες επιπέδων για OLAP Μια διάσταση μοντελοποιεί όλους τους τρόπους με τους οποίους τα δεδομένα μπορούν να συναθροιστούν σε σχέση με μια συγκεκριμένη παράμετρο του περιεχομένου τους.  Ημερομηνία, Προϊόν, Τοποθεσία, Πωλητής, … Κάθε διάσταση έχει μια σχετική ιεραρχία επιπέδων συνάθροισης των δεδομένων (hierarchy of levels). Αυτό σημαίνει, ότι η διάσταση μπορεί να θεωρηθεί από πολλά επίπεδα αδρομέρειας.  Ημερομηνία: μέρα, εβδομάδα, μήνας, χρόνος, …

20 Ιεραρχίες Επιπέδων Ιεραρχίες Επιπέδων: κάθε διάσταση οργανώνεται σε διαφορετικά επίπεδα αδρομέρειας Ο χρήστης μπορεί να πλοηγηθεί από το ένα επίπεδο στο άλλο, δημιουργώντας νέους κύβους κάθε φορά Αδρομέρεια: το αντίθετο της λεπτομέρειας -- ο σωστός όρος είναι αδρομέρεια...

21 Κύβοι & ιεραρχίες διαστάσεων για OLAP Διαστάσεις: Product, Region, Date Ιεραρχίες διαστάσεων: Month Region Product Sales volume Industry Category Product Country Region City Store Year Quarter Month Week Day

22 Outline Definition of metric & similarity Various Distance Functions  Minkowski  Set based  Edit distance Basic concept of OLAP  Lattice  Distance in same level of hierarchy  Distance in different level of hierarchy

23 Lattice A lattice is a partially ordered set (poset) in which every pair of elements has a unique supremum and an inifimum The hierarchy of levels is formally defined as a lattice (L,<)  such that L= (L 1,..., L n, ALL) is a finite set of levels and  < is a partial order defined among the levels of L  such that L 1 <L i <ALL  1≤i≤n. the upper bound is always the level ALL,  so that we can group all values into the single value ‘all’. The lower bound of the lattice is the most detailed level of the dimension.

24 Outline Definition of metric & similarity Various Distance Functions  Minkowski  Set based  Edit distance Basic concept of OLAP  Lattice  Distance in same level of hierarchy  Distance in different level of hierarchy

25 Distances in the same level of Hierarchy Let a dimension D, its levels of hierarchies L 1 <L i <ALL and two specific values x and y s.t. x, y  L i All L2L2 L1L1

26 Distances in the same level of Hierarchy Explicit Minkowski Set Based Highway With respect to the detailed level Attribute Based

27 Distances in the same level of Hierarchy Explicit assignment  n 2 distances for the n values of the dom(L i ) Minkowski family  reduce to the Manhattan distance: |x-y| Set based family  reduced to {0, 1}, where

28 Distances in the same level of Hierarchy Highway distance  Let the values of level L i form a set of k clusters, where each cluster has a representative r k  dist(x, y)= dist(x, r x )+ dist(r x, r y )+ dist(y, r y )  Specify k 2 distances: dist (r x, r y ) and k distances: dist(x, r x )

29 Distances in the same level of Hierarchy With respect to the detailed level   f is a function that picks one of the descendants Attribute based   level L  attributes:   v [v 1 … v n ]  dom(L)  Distance can be defined with respect to the attributes

30 Outline Definition of metric & similarity Various Distance Functions  Minkowski  Set based  Edit distance Basic concept of OLAP  Lattice  Distance in same level of hierarchy  Distance in different level of hierarchy

31 Distances in different levels of Hierarchy Explicit dist 1 + dist 2 dist 3 +dist 4 With respect to the detailed level With respect to their least common ancestor Highway Attribute Based

32 Distances in different levels of Hierarchy Let a dimension D, its levels of hierarchies L 1 <L i <ALL two specific values x and y s. t. x  L x  y  L y L x <L y ancestor of x in level L y  a descendant of y in level L x  yxyx xyxy LyLy x y dist 1 dist 3 dist 2 dist 4 LxLx

33 Explicit assignment  define dist Lx,Ly (x, y)  x  L x, y  L y dist 1 +dist 2   Where is a distance of two values from the same level of hierarchy  special case: y is an ancestor of x then dist 2 =0 Distances in different levels of Hierarchy yxyx xyxy LyLy x y dist 1 dist 3 dist 2 dist 4 LxLx

34 Distances in different levels of Hierarchy dist 3 +dist 4   Where a distance of two values from the same level of hierarchy  special case: y is an ancestor of x then dist 4 =0 yxyx xyxy LyLy x y dist 1 dist 3 dist 2 dist 4 LxLx

35 Distances in different levels of Hierarchy With respect to the detailed level  Letand   Where dist(x 1, y 1 ) a distance of two values from the same level of hierarchy

36 Distances in different levels of Hierarchy With respect to their common ancestor  Let L z the level of hierarchy where x and y have their first common ancestor   number of “hops” needed to reach the first common ancestor  normalizing according to the height of the level

37 Distances in different levels of Hierarchy Highway distance  Let every L i is clustered into k i clusters and every cluster has its own representative r ki  Attribute Based   level L  attributes:   v [v 1 … v n ]  dom(L)  Distance can be defined with respect to the attributes

38 Types of Levels Nominal =   values hold the distinctness property  values can be explicitly distinguished Ordinal  values hold the distinctness property & the order property  values abide by an order Interval + -  values hold the distinctness, order & the addition property  a unit of measurement exists  there is meaning of the difference between two values


Κατέβασμα ppt "Distance Functions on Hierarchies Eftychia Baikousi."

Παρόμοιες παρουσιάσεις


Διαφημίσεις Google