Home >Database >Mysql Tutorial >How to use MySQL database for association rule mining?

How to use MySQL database for association rule mining?

PHPz
PHPzOriginal
2023-07-12 20:06:07833browse

How to use MySQL database for association rule mining?

Introduction:
Association rule mining is a data mining technology used to discover the correlation between items in the data set. MySQL is a widely used relational database management system with powerful data processing and query functions. This article will introduce how to use MySQL database for association rule mining, including data preparation, association rule mining algorithm, SQL statement implementation, and code examples.

1. Data preparation
Before mining association rules, you first need to prepare a suitable data set. The data set is the basis for association rule mining, which contains the transactions and item sets that need to be mined. In MySQL, data sets can be stored by creating data tables. For example, assuming we want to mine association rules in shopping basket data, we can create a data table named "transactions" to store each customer's shopping records, where each record contains multiple purchases by a customer.

CREATE TABLE transactions (
customer_id INT,
item_id INT
);

Then insert the shopping basket data into the data table:

INSERT INTO transactions (customer_id, item_id) VALUES
(1, 101),
(1, 102),
(1, 103),
(2, 101),
(2, 104),
(3, 102),
(3, 105),
(4, 101),
(4, 103),
(4, 104);

2. Association rule mining algorithm
Common association rule mining algorithms include Apriori algorithm and FP-Growth algorithm. The Apriori algorithm is an iterative algorithm based on candidate sets that discovers frequent item sets and association rules by gradually generating candidate sets and calculating support thresholds. FP-Growth algorithm is a prefix tree-based algorithm that can efficiently mine frequent item sets and association rules. In MySQL, we can use SQL statements to implement these two algorithms.

3. SQL statement implementation

  1. Apriori algorithm
    The Apriori algorithm includes two steps: frequent item set generation and association rule generation. First, generate frequent item sets through the following SQL statement:

SELECT item_id, COUNT(*) AS support
FROM transactions
GROUP BY item_id
HAVING support >= min_support;

Among them, "item_id" is the item in the item set, "support" is the support of the item set, and "min_support" is the set minimum support threshold. This SQL statement will return frequent itemsets that meet the minimum support requirements.

Then, generate association rules through the following SQL statement:

SELECT t1.item_id AS antecedent, t2.item_id AS consequent,

   COUNT(*) / (SELECT COUNT(*) FROM transactions) AS confidence

FROM transactions AS t1, transactions AS t2
WHERE t1.item_id != t2.item_id
GROUP BY t1.item_id, t2.item_id
HAVING confidence >= min_confidence;

Among them, "antecedent" is the preceding item of the rule , "consequent" is the consequent of the rule, "confidence" is the confidence of the rule, and "min_confidence" is the minimum confidence threshold set. This SQL statement will return association rules that meet the minimum confidence requirements.

  1. FP-Growth algorithm
    FP-Growth algorithm mines frequent item sets and association rules by building a prefix tree. In MySQL, the FP-Growth algorithm can be implemented using temporary tables and user-defined variables.

First, create a temporary table to store the frequent itemset of the item:

CREATE TEMPORARY TABLE frequent_items (
item_id INT,
support INT
);

Then, generate frequent itemsets through the following SQL statement:

INSERT INTO frequent_items
SELECT item_id, COUNT(*) AS support
FROM transactions
GROUP BY item_id
HAVING support >= min_support;

Next, create a user-defined variable to store the set of frequent items:

SET @frequent_items = '';

Then, Generate association rules through the following SQL statement:

SELECT t1.item_id AS antecedent, t2.item_id AS consequent,

   COUNT(*) / (SELECT COUNT(*) FROM transactions) AS confidence

FROM transactions AS t1, transactions AS t2
WHERE t1.item_id ! = t2.item_id
AND FIND_IN_SET(t1.item_id, @frequent_items) > 0
AND FIND_IN_SET(t2.item_id, @frequent_items) > 0
GROUP BY t1.item_id, t2.item_id
HAVING confidence >= min_confidence;

Finally, update the user-defined variables through the following SQL statement:

SET @frequent_items = (SELECT GROUP_CONCAT(item_id) FROM frequent_items);

4. Code Example
The following is a code example for association rule mining using MySQL database:

--Create data table
CREATE TABLE transactions (
customer_id INT,
item_id INT
);

--Insert shopping basket data
INSERT INTO transactions (customer_id, item_id) VALUES
(1, 101),
(1, 102),
(1, 103),
(2, 101),
(2, 104),
(3, 102),
(3, 105),
(4, 101) ,
(4, 103),
(4, 104);

-- Apriori algorithm
-- Generate frequent item sets
SELECT item_id, COUNT(*) AS support
FROM transactions
GROUP BY item_id
HAVING support >= 2;

-- Generate association rules
SELECT t1.item_id AS antecedent, t2.item_id AS consequent,

   COUNT(*) / (SELECT COUNT(*) FROM transactions) AS confidence

FROM transactions AS t1, transactions AS t2
WHERE t1.item_id != t2.item_id
GROUP BY t1.item_id, t2.item_id
HAVING confidence >= 0.5;

-- FP-Growth algorithm
-- Create temporary table
CREATE TEMPORARY TABLE frequent_items (
item_id INT,
support INT
);

-- Generate frequent itemsets
INSERT INTO frequent_items
SELECT item_id, COUNT(*) AS support
FROM transactions
GROUP BY item_id
HAVING support >= 2;

--Create user-defined variables
SET @frequent_items = '';

--Generate association rules
SELECT t1.item_id AS antecedent, t2.item_id AS consequent,

   COUNT(*) / (SELECT COUNT(*) FROM transactions) AS confidence

FROM transactions AS t1, transactions AS t2
WHERE t1.item_id != t2.item_id
AND FIND_IN_SET(t1.item_id, @frequent_items) > 0
AND FIND_IN_SET(t2.item_id, @frequent_items) > 0
GROUP BY t1.item_id, t2.item_id
HAVING confidence >= 0.5;

Conclusion:
Through the introduction of this article, we understand how to use MySQL Database for association rule mining. Both the Apriori algorithm and the FP-Growth algorithm can be implemented through SQL statements. I hope this article will be helpful to you when using MySQL for association rule mining.

The above is the detailed content of How to use MySQL database for association rule mining?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn