Posts

Showing posts from March, 2010

KNN Algorithm and KD-Trees

Image
I am taking a Pattern Recognition course this semester, it's not really one of my favourite topics but I am doing OK. We had an assignment last week to test and compare different classification algorithms. One of the algorithms we were asked to implement was KNN (K Nearest Neighbours) . We were asked to find a way that makes searching for the K Nearest Neighbours faster; that's what this post about. The problem briefly is that: Given two sets of K dimensional points find the N nearest points (using Euclidean distance or any other measurement) to a K dimensional point X. The Naive Solution: Go through the list, find the distance between X and all the points in the two sets, get the smallest N elements in the distances list... Simple enough. As part of our assignment we were given a dataset of 100,000 points which proved this algorithm to be very slow. My Naive WRONG Solution: I thought that it was easy to index the points in a way that makes it efficient to find the ne