A priori is algorithm used in affinity analysis. A set of rules is generated, usually implications, that describe dataset. Finding frequent datasets from the transaction database is a popular task among data mining appliances, which isn’t as simple as it initially seems. The reason is computational complexity, which goes extremely extremely high when it comes to very large databases (I like the fancy way they described it in Top 10 Algorithm in DM: combinatorial explosion).

The idea of A priori is: find frequent itemsets (frequent means one with previously assigned level of support) and then generate rules that comply previously assigned level of confidence. There are candidate itemsets generated and they are the base to find n-element frequent itemsets (1st step in procedure is to find one-element frequent itemsets, and then repeat, eliminating itemsets which support is not sufficient). the procedure of generating candidate and frequent sets is repeated simply for the possible number. The main point exploits monotonicity: “if an itemset is not frequent, any of its superset is never frequent” (again Top 10 Algo in DM). Smart way to eliminate itemsets.

A priori is one of the most important algorithms in data mining. The other ideas to make it even more efficient are e.g. the new ways to create candidate itemsets – hashing techniques (smaller candidate itemesets), partitioning (divide the problem into smaller ones and explore them separately – if only real-life problems work this way!) or sampling. Important improvement of A priori algorithm is FP-growth algorithm, which supports compression (without losing important information) and then partitioning.

Despite of the fact A priori is rather simple, easy implementation and proper results make it serious solution in many problems.

admin November 25th, 2016

Posted In: web content, web mining, YouTube