Home >Common Problem >What are the basic steps of data mining

What are the basic steps of data mining

王林
王林Original
2021-05-10 15:36:2632919browse

The basic steps of data mining are: 1. Define the problem; 2. Establish a data mining library; 3. Analyze the data; 4. Prepare the data; 5. Build the model; 6. Evaluate the model; 7. Implement.

What are the basic steps of data mining

#The operating environment of this article: windows10 system, thinkpad t480 computer.

The specific steps are as follows:

1. Define the problem

The first and most important requirement before starting knowledge discovery is to understand the data and business problems. You must have a clear and clear definition of your goals, that is, decide what you want to do. For example, when you want to improve the utilization rate of your email, you may want to "increase user utilization rate" or you may want to "increase the value of one user use." The models established to solve these two problems are almost completely different. , a decision must be made.

2. Establishing a data mining library

Establishing a data mining library includes the following steps: data collection, data description, selection, data quality assessment and data cleaning, merging and integration, and building metadata , load the data mining library, and maintain the data mining library.

3. Analyze data

The purpose of analysis is to find the data fields that have the greatest impact on the prediction output and decide whether to define export fields. If the data set contains hundreds or thousands of fields, then browsing and analyzing the data will be a very time-consuming and tiring task. In this case, you need to choose a tool software with a good interface and powerful functions to assist you in completing these tasks. .

4. Prepare data

This is the last step of data preparation before building the model. This step can be divided into four parts: selecting variables, selecting records, creating new variables, and converting variables.

5. Building a model

Building a model is an iterative process. Different models need to be carefully examined to determine which model is most useful for the business problem faced. First use a portion of the data to build a model, and then use the remaining data to test and validate the resulting model. Sometimes there is a third data set, called the validation set, because the test set may be affected by the characteristics of the model, and an independent data set is needed to verify the accuracy of the model. Training and testing data mining models requires splitting the data into at least two parts, one for model training and the other for model testing.

6. Evaluation model

After the model is established, it is necessary to evaluate the results obtained and explain the value of the model. The accuracy obtained from the test set is only meaningful for the data used to build the model. In practical applications, it is necessary to further understand the types of errors and the related costs caused by them. Experience has proven that a valid model is not necessarily a correct model. The direct reason for this is the various assumptions implicit in model building, so it is important to test the model directly in the real world. Apply it to a small area first, obtain test data, and then promote it to a large area after you feel satisfied.

7. Implementation

After the model is established and verified, there are two main ways to use it. The first is to provide analysts with a reference; the other is to apply this model to different data sets.

Free learning video sharing: Introduction to programming

The above is the detailed content of What are the basic steps of data mining. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:What's on the taskbarNext article:What's on the taskbar