Β4: ΣΕΜΙΝΑΡΙΟ ΤΜΗΜΑΤΟΣ ΚΑΙ CEID SOCIAL HOUR: 'k Nearest-Neighbor (k NN) and All k Nearest-Neighbor (AkNN) Searching in High
Dimensions: Foundations, Applications and New Challenges”, Καθηγητής Σιούτας Σπύρος, ΤΜΗΥΠ
Σεμινάριο Τμήματος & CEID social hour
Ημερομηνία-χώρος: Παρασκευή 01 Νοεμβρίου, 3-5μμ, Κτίριο Β (Αίθουσα Β4)
Ομιλητής: Καθηγητής Σιούτας Σπύρος, ΤΜΗΥΠ
Τίτλος: k Nearest-Neighbor (k NN) and All k Nearest-Neighbor (AkNN) Searching in High Dimensions: Foundations, Applications and New Challenges
Περίληψη: Numerous modern applications, from social networking to astronomy, need efficient answering of k Nearest-Neighbor (k NN) queries on spatial data.
One challenging and extremely demanding variation of k NN is the All k Nearest-Neighbor Query (AkNN), or k Nearest- Neighbor Join, that takes as input two datasets and, for each object of the first one, returns the k
nearest-neighbors from the second one. It is a combination of the k nearest-neighbor and join queries and is computationally demanding. Especially, when the datasets involved fall in the category of Big Data, a single machine cannot efficiently process it. Only in the last few years, papers proposing solutions for distributed computing environments have appeared in the literature. In this talk, we focus on both, the classic k Nearest-Neighbor (k NN) as well as the All k Nearest-Neighbor (AkNN) Queries. The later requires parallel and distributed algorithms and for this reason we use the Apache Hadoop framework. More specifically, we focus on an algorithm that was recently presented in the literature and propose improvements to tackle three major challenges that distributed processing faces: improvement of load balancing (we implement an adaptive partitioning scheme based on Quadtrees), acceleration of local processing (we prune points during calculations by utilizing plane-sweep processing), and reduction of network traffic (we restructure and reduce the output size of the most demanding phase of computation). Moreover, by using real 2D and 3D datasets, we experimentally study the effect of each improvement and their combinations on the performance of this literature algorithm. Experiments show that by carefully addressing the three aforementioned issues, one can achieve significantly better performance. Thereby, we conclude to a new scalable algorithm that adapts to the data distribution and significantly outperforms its predecessor. Moreover, we present an experimental comparison of our algorithm against other well-known MapReduce algorithms for the same query and show that these algorithms are also significantly outperformed.
Σχετικά με τον ομιλητή: https://www.ceid.upatras.gr/webpages/faculty/sioutas/
Μπορεί να δημοσιευθούν φωτογραφίες στη σελίδα του Τμήματος και στη σελίδα στο Facebook.
Ευχαριστούμε τη Citrix για την ευγενική υποστήριξη των εκδηλώσεων.