Home  >  Article  >  Technology peripherals  >  Implement a simple linear regression method in R and explain its concepts

Implement a simple linear regression method in R and explain its concepts

WBOY
WBOYforward
2024-01-22 23:09:111155browse

Simple linear regression is a statistical method used to study the relationship between two continuous variables. Among them, one variable is called the independent variable (x) and the other variable is called the dependent variable (y). We assume that there is a linear relationship between these two variables and try to find a linear function that accurately predicts the response value (y) of the dependent variable based on the characteristics of the independent variable. By fitting a straight line, we can get the predicted results. This predictive model can be used to understand and predict how the dependent variable changes as the independent variables change.

In order to understand this concept, we can use a salary data set, which contains the value of the dependent variable (salary) corresponding to each independent variable (years of experience).

Salary Data Set

Annual Salary and Experience

1.1 39343.00

1.3 46205.00

1.5 37731.00

2.0 43525.00

2.2 39891.00

2.9 56642.00

3.0 60150.00

3.2 54445.00

3.2 64445.00

3.7 57189.00

For general purposes, we define:

x as the feature vector, that is, x=[x_1,x_2,....,x_n],

y as the response vector, That is, y=[y_1,y_2,....,y_n]

for n observations (in the above example, n=10).

Scatterplot of the given data set

简单线性回归概念 R代码实现简单线性回归

Now, we have to find a line that fits the above scatterplot by It allows us to predict the response for any y value or any x value.

The line of best fit is called the regression line.

The following R code is used to implement simple linear regression

dataset=read.csv('salary.csv')
install.packages('caTools')
library(caTools)
split=sample.split(dataset$Salary,SplitRatio=0.7)
trainingset=subset(dataset,split==TRUE)
testset=subset(dataset,split==FALSE)
lm.r=lm(formula=Salary~YearsExperience,
data=trainingset)
coef(lm.r)
ypred=predict(lm.r,newdata=testset)
install.packages("ggplot2")
library(ggplot2)
ggplot()+geom_point(aes(x=trainingset$YearsExperience,
y=trainingset$Salary),colour='red')+
geom_line(aes(x=trainingset$YearsExperience,
y=predict(lm.r,newdata=trainingset)),colour='blue')+
ggtitle('Salary vs Experience(Training set)')+
xlab('Years of experience')+
ylab('Salary')
ggplot()+
geom_point(aes(x=testset$YearsExperience,y=testset$Salary),
colour='red')+
geom_line(aes(x=trainingset$YearsExperience,
y=predict(lm.r,newdata=trainingset)),
colour='blue')+
ggtitle('Salary vs Experience(Test set)')+
xlab('Years of experience')+
ylab('Salary')

Visualize the training set results

简单线性回归概念 R代码实现简单线性回归

The above is the detailed content of Implement a simple linear regression method in R and explain its concepts. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:163.com. If there is any infringement, please contact admin@php.cn delete