CRISP-DM stands for CRoss Industry Standard Process for Data Mining. It is a methodology used in processing data mining projects, as data exploration like the other business processing techniques demands a general guide to follow.
Basic methodology is split into four parts:
Data mining is in general a mechanism that let us make better decision in the future, by analysing (in very fancy way) past data. There are two moments in the data mining process which we have to be careful – when we discover a pattern, which can be false or when pattern is true, but useless. The 1st is a straight danger, because business decisions made on false basis simply cost money (sometimes awful lot of money). 2nd one has additional, hidden trap, because it becomes clear the rule is useless after implementing i – system doesn’t simply pass the reality check. Maintaining the methodology provides us with the mechanism to minimize probability of making such a mistake.
According to crisp-dm.org, the open methodology to keep data mining industrial process close to general business-and-research -problems solving strategy. System is divided into 6 steps:
It is very important to notice, each step is strictly connected with results of previous one and it is necessary to jump serveral times between levels (not only in the order presented above!). It is also natural that result of one step causes returning to the start point of the project and reevaluating some opinions or fore-designs.
[M. Berry, G. Linoff „Data Mining Techniques”, Wiley 2004.]
[Daniel Larose „Odkrywanie wiedzy z danych” 2006 PWN, 5]
admin November 21st, 2016