Web content mining is a part of data mining domain that is the closest one to the classic definition of DM. Web content mining aspects are related to the similar domains in classic data mining.

  • automatic content extraction from web pages
  • integration of the information
  • opinion and reviews extraction
  • knowledge synthesis
  • noise detection and segmentation

Briefly said, web content mining listed above are solutions for more or less complicated problems or issues, connected to automation of web usage, which lead to the improvement in several aspects of Internet daily life, considering both technical and non-technical matters.

Web mining is generally a data mining branch. Introducing Web mining I want to take one step back and present some thoughts about data mining.

Data mining or data exploration is set of techniques used to automatically discover non-trivial relations, patterns and schemes in large data collections. In other words, we are looking for deep-hidden knowledge in very large datasets (in web mining case – the Internet), and we only accept automatic solutions. Why? For better understanding. Having the mechanism, we can ask much more difficult questions (comparing to i.e. sql).

At this point, we can say that web mining is data mining with the Internet as the dataset.

Let’s take a short look at the appliance of web mining:

  • data classification (i.e. customers’ sentiment,  reviews…)
  • natural language processing (NLP, but don’t confuse with neuro-linguistic programming)
  • www personalization
  • knowledge management


