Association rules are implications in the form of ). Among them, association rule XY has support and trust.
The association rule mining process mainly consists of two stages: in the first stage, all high-frequency item sets (Frequent Itemsets) must be found from the data collection. In the second stage, association rules (Association Rules) are generated from these high-frequency project groups. (Recommended learning: PHP video tutorial)
The first stage of association rule mining must find all high-frequency item sets (Large Itemsets) from the original data collection.
High frequency means that the frequency of a certain item group must reach a certain level relative to all records. The frequency of occurrence of an item group is called support. Taking a 2-itemset containing two items A and B as an example, we can obtain the support of the item group containing {A, B} through formula (1) , if the support is greater than or equal to the set minimum support threshold, {A, B} is called a high-frequency item group.
A k-itemset that satisfies the minimum support is called a high-frequency k-itemset (Frequent k-itemset), generally expressed as Large k or Frequent k. The algorithm then generates Large k 1 from the item group of Large k until no longer high-frequency item group can be found.
The second stage of association rule mining is to generate association rules (Association Rules). Generating association rules from high-frequency item groups is to use the high-frequency k-item group in the previous step to generate rules. Under the conditional threshold of minimum confidence (Minimum Confidence), if the confidence obtained by a rule meets the minimum confidence This rule is called an association rule.
For example: the reliability of the rule AB generated through the high-frequency k-item group {A, B} can be obtained through formula (2). If the reliability is greater than or equal to the minimum reliability, it is called AB for association rules.
Based on the categories of variables processed in the rules
The variables processed by association rules can be divided into Boolean and numerical types. The values processed by Boolean association rules are discrete and categorical, and they show the relationship between these variables; while numerical association rules can be combined with multi-dimensional association or multi-layer association rules to process numerical fields. Divide it dynamically, or process the original data directly. Of course, numerical association rules can also include category variables. For example: Gender = "Female" => Occupation = "Secretary", which is a Boolean association rule; Gender = "Female" => avg (income) = 2300, the income involved is a numeric type, so it is a numeric association rule.
Based on the abstraction level of data in rules
Based on the abstraction level of data in rules, it can be divided into single-layer association rules and multi-layer association rules. In single-level association rules, all variables do not take into account that the actual data has multiple different levels; in multi-level association rules, the multi-level nature of the data has been fully considered. For example: IBM desktop =>Sony printer, is a single-layer association rule on detailed data; desktop =>Sony printer, is a multi-layer association rule between a higher level and a detail level.
Based on the dimensionality of the data involved in the rules
The data in the association rules can be divided into single-dimensional and multi-dimensional. In single-dimensional association rules, we only involve one dimension of the data, such as the items purchased by the user; while in multi-dimensional association rules, the data to be processed will involve multiple dimensions. In other words, single-dimensional association rules deal with some relationships in a single attribute; multi-dimensional association rules deal with some relationships between various attributes. For example: Beer => Diapers, this rule only involves the items purchased by the user; Gender = "Female" => Occupation = "Secretary", this rule involves information in two fields, which are two dimensions. an association rule on.
For more PHP related technical articles, please visit the PHP Graphic Tutorial column to learn!
The above is the detailed content of Association rule mining. For more information, please follow other related articles on the PHP Chinese website!