Home >Backend Development >Python Tutorial >Laptop Price Prediction with ML

Laptop Price Prediction with ML

Linda Hamilton
Linda HamiltonOriginal
2025-01-03 10:13:41357browse

In my previous post, I created a script to generate a CSV with laptop data, doing web scraping in PCComponentes.

This idea arose when trying to create a Machine Learning model that, depending on the components you provide, predicts the price of the device. However, when researching I found a public DataFrame that could be used to train the model, but it had a problem: the prices dated back to 2015, which made it of little use.

For this reason, I decided to build a DataFrame directly from the PCComponentes website, which would allow me to have updated and reliable data. Additionally, this process could be automated in the future (at least until PCComponentes changes the structure of its website).

Let's get into it!


DataFrame data processing

Before training the model, it is necessary to organize and clean the data to make it easier to read and process. For this, we will use the Numpy, Pandas and Matplotlib libraries, widely used in data analysis and processing.

The first thing is to import these libraries and open the generated CSV:

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  

Then, we delete the rows with empty or null values:

df = df.dropna()  

Data analysis and filtering

Let's start by analyzing the different types of CPUs available. To view them, we will use the Seaborn library:

import seaborn as sns  
sns.countplot(data=df, x='CPU')

Predicción de Precios de Portátiles con ML

Predicción de Precios de Portátiles con ML

Here we see that there are 207 different types of CPUs. Training a model with all these values ​​could be problematic, as much data would be irrelevant and generate noise that would affect performance.

Instead of removing the entire column, we will filter the most relevant values:

def cpu_type_define(text):
    text = text.split(' ')
    if text[0] == 'intel':
        if 'i' in text[-1]:
            if text[-1].split('-')[0] == 'i3':
                return 'low gamma intel processor'

            return text[0]+' '+text[1]+' '+text[-1].split('-')[0] 

        return 'low gamma intel processor'
    elif text[0] == 'amd':
        if text[1] == 'ryzen':
            if text[2] == '3':
                return 'low gamma amd processor'

            return text[0]+' '+text[1]+' '+text[2]

        return 'low gamma amd processor'
    elif 'm' in text[0]:
        return 'Mac Processor'
    else:
        return 'Other Processor'

data['Cpu'] = data['Cpu'].apply(cpu_type_define)
sns.histplot(data=data,x='Cpu')
data['Cpu'].value_counts()

Resulting in:

Predicción de Precios de Portátiles con ML


GPU filtering

We carry out a similar process with graphics cards (GPU), reducing the number of categories to avoid noise in the data:

def gpu_type_define(text):    

    if 'rtx' in text:

        num = int(''.join([char for char in text if char.isdigit()]))

        if num == 4080 or num == 4090 or num == 3080:
            return 'Nvidia High gamma'
        elif num == 4070 or num == 3070 or num == 4060 or num == 2080:
            return 'Nivida medium gamma'
        elif num == 3050 or num == 3060 or num == 4050 or num == 2070:
            return 'Nvidia low gamma'
        else:
            return 'Other nvidia grafic card'

    elif 'radeon' in text:

        if 'rx' in text:
            return 'Amd High gamma'
        else:
            return 'Amd low Gamma'

    elif 'gpu' in text:
        return 'Apple integrated graphics'

    return text



data['Gpu'] = data['Gpu'].apply(gpu_type_define)
sns.histplot(data=data,x='Gpu')
data['Gpu'].value_counts()  

Result:

Predicción de Precios de Portátiles con ML


Storage and RAM treatment

To simplify storage data, we combine the total space of all hard drives into a single value:

def fitler_ssd(text):
    two_discs = text.split('+')


    if len(two_discs) == 2:
        return int(''.join([char for char in two_discs[0] if char.isdigit()])) + int(''.join([char for char in two_discs[1] if char.isdigit()]))        
    else:
        return int(''.join([char for char in text if char.isdigit()]))

data['SSD'] = data['SSD'].str.replace('tb','000')
data['SSD'] = data['SSD'].str.replace('gb','')
data['SSD'] = data['SSD'].str.replace('emmc','')
data['SSD'] = data['SSD'].str.replace('ssd','')

Finally, we filter the RAM values ​​to keep only numbers:

import pandas as pd  
import numpy as np  
import matplotlib.pyplot as plt  


Non-numeric data encoding

Before training the model, it is necessary to transform the non-numeric columns into data that the algorithm can interpret. For this, we use the ColumnTransformer and OneHotEncoder from the sklearn library:

df = df.dropna()  

Model training

I tested several Machine Learning algorithms to determine which one was most efficient according to the coefficient of determination (R2 Score). Here are the results:

Modelo R2 Score
Logistic Regression -4086280.26
Random Forest 0.8025
ExtraTreeRegressor 0.7531
GradientBoostingRegressor 0.8025
XGBRegressor 0.7556

The best results were obtained with Random Forest and GradientBoostingRegressor, both with an R2 close to 1.

To improve further, I combined these algorithms using a Voting Regressor, achieving an R2 Score of 0.8085:

import seaborn as sns  
sns.countplot(data=df, x='CPU')

Conclusion

The model trained with the Voting Regressor was the most efficient. Now you are ready to integrate it into a web application, which I will explain in detail in the next post.

Link to the project

The above is the detailed content of Laptop Price Prediction with ML. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn