K Nearest Neighbours is a basic classification algorithm. The idea comes probably from the extension of Rote classifier, which is as simple as point system in ‘Whose line is it anyway’. System memorizes whole training set and classifies only items that have exactly same values as in training set. Obvious disadvantage is there will be a lot of unclassified objects. The “next generation” of the concept says the classification occurs using the value of the nearest point in dataset. Comparing to previous way it is a huge difference, but still – system is vulnerable to noise and outliers.
KNN is (comparing to previous strategies) a bit more sophisticated. Algorithm finds a group of k-objects in training set under the condition of “distance” and according to the findings classifies the new object to the previously given class (cluster), respecting weights set to neighbours. Important issues are:
Parameters are very important to the results and I am going to write another post to discuss a little bit more about.
The procedure goes:
In spite of the fact, building the model using kNN is not very difficult task, costs of classification are relatively high. Comparing new object with whole training set (lazy learning) is responsible for that and it is especially visible in large datasets. There are some techniques that reduce the amout of computation – from simply editing training set (sometimes results are even better than classification with larger database) to proximity graphs.
Sources: [Top 10 algorithms in data mining, Springer 2008]
Article sponsored by Birmingham Limo Hire, best limo hire Birmingham can offer
admin November 18th, 2016