What are the six steps of data mining?
Data mining is a non-trivial process of obtaining effective, novel, potentially useful, and ultimately understandable patterns from large amounts of data. The steps are:
1. Define the problem;
2. Prepare data;
3. Browse data;
4. Generate the model;
5. Browse and verify the model;
6. Deploy and update the model.
Data mining usually requires data collection, data integration, data specification, data cleaning, data transformation, data mining implementation process, pattern evaluation and knowledge representation
1. Data Collection: Based on the obtained data, abstract the characteristic information of the data and store the collected information in the database. Choose a suitable data warehouse type for data storage and management
2. Data integration: classify data from different sources and formats
3. Data specification: when the amount of data and the size of the data When the value is relatively large, we can use reduction technology to obtain the reduction representation of the data set, such as (data value - data average) / data variance. This means that the data becomes much smaller but close to the integrity of the original data. After reduction The results of data mining are basically consistent with the results before specification.
4. Data cleaning: Some data are incomplete, such as: some have missing values (values do not exist), some contain noise (errors, isolated points), and some are inconsistent (such as different units, etc.), We can use tools to clean the data and get complete, correct, and consistent data.
5. Data transformation: Convert data into a data set suitable for data mining through smooth aggregation, data generalization, standardization, etc.
6. Feature extraction or feature selection: Feature extraction is mostly used in computer vision and image processing. Feature selection is to propose irrelevant and redundant features to prevent over-fitting and improve model accuracy. Common methods There is PCA and so on.
7. Data mining process: analyze the data information in the data warehouse, select appropriate data mining tools, apply statistical methods, and use corresponding data mining algorithms. .
8. From a business perspective, verify the correctness of the results of data analysis and data mining.
9. Knowledge representation, presenting the results of data mining to users in a visual way.
Recommended tutorial: "PHP"
The above is the detailed content of What are the six steps of data mining?. For more information, please follow other related articles on the PHP Chinese website!