Showing posts from March, 2010

KNN Algorithm and KD-Trees

I am taking a Pattern Recognition course this semester, it's not really one of my favourite topics but I am doing OK. We had an assignment last week to test and compare different classification algorithms. One of the algorithms we were asked to implement was KNN (K Nearest Neighbours). We were asked to find a way that makes searching for the K Nearest Neighbours faster; that's what this post about.

The problem briefly is that: Given two sets of K dimensional points find the N nearest points (using Euclidean distance or any other measurement) to a K dimensional point X.

The Naive Solution:
Go through the list, find the distance between X and all the points in the two sets, get the smallest N elements in the distances list... Simple enough.
As part of our assignment we were given a dataset of 100,000 points which proved this algorithm to be very slow.

My Naive WRONG Solution:
I thought that it was easy to index the points in a way that makes it efficient to find the nearest points…