CRISP-DM stands for CRoss Industry Standard Process for Data Mining. It is a methodology used in processing data mining projects, as data exploration like the other business processing techniques demands a general guide to follow.

Basic methodology is split into four parts:

  1. problem identification
  2. data preprocessing (turn data into information, whatever it means)
  3. data exploration
  4. evaluation (result examination)

Data mining is in general a mechanism that let us make better decision in the future, by analysing (in very fancy way) past data. There are two moments in the data mining process which we have to be careful – when we discover a pattern, which can be false or when pattern is true, but useless. The 1st is a straight danger, because business decisions made on false basis simply cost money (sometimes awful lot of money). 2nd one has additional, hidden trap, because it becomes clear the rule is useless after implementing i – system doesn’t simply pass the reality check. Maintaining the methodology provides us with the mechanism to minimize probability of making such a mistake.

According to crisp-dm.org, the open methodology to keep data mining industrial process close to general business-and-research -problems solving strategy. System is divided into 6 steps:

  1. business problem and condition understanding
  2. data understanding
  3. data preparation
  4. modelling
  5. evaluation
  6. implementation

It is very important to notice, each step is strictly connected with results of previous one and it is necessary to jump serveral times between levels (not only in the order presented above!). It is also natural that result of one step causes returning to the start point of the project and reevaluating some opinions or fore-designs.

[M. Berry, G. Linoff „Data Mining Techniques”, Wiley 2004.]

[Daniel Larose „Odkrywanie wiedzy z danych” 2006 PWN, 5]

November 21st, 2016

Posted In: CRISP-DM, web content, web mining, YouTube

Leave a Reply

Your email address will not be published. Required fields are marked *