search
HomeBackend DevelopmentPHP TutorialSimple linear regression implemented in PHP_PHP tutorial
Simple linear regression implemented in PHP_PHP tutorialJul 21, 2016 pm 02:52 PM
phpreturnexistaccomplisharticleuseofSimpleseriesWireLinearcompositionthispart

In Part 1 of this two-part series ("Simple Linear Regression in PHP"), I explained why math libraries are useful for PHP. I also demonstrated how to develop and implement core parts of a simple linear regression algorithm using PHP as the implementation language.

The goal of this article is to show you how to use the SimpleLinearRegression class discussed in Part 1 to build an important data research tool.

Brief Review: Concepts

The basic goal behind simple linear regression modeling is to find the best-fitting straight line from a two-dimensional plane consisting of pairs of X and Y values ​​(i.e., X and Y measurements). Once the line is found using the minimum variance method, various statistical tests can be performed to determine how well the line fits the observed deviation from the Y value.

The linear equation (y = mx + b) has two parameters that must be estimated based on the X and Y data provided, they are the slope (m) and the y-intercept (b). Once these two parameters are estimated, you can enter the observed values ​​into the linear equation and observe the Y predictions generated by the equation.

To use the minimum variance method to estimate the m and b parameters, you need to find the estimated values ​​of m and b such that they minimize the observed and predicted values ​​of Y for all X values. The difference between the observed and predicted values ​​is called the error ( y i- (mx i+ b) ), and if you square each error value and then sum these residuals, the result is a prediction squared Bad number. Using the minimum variance method to determine the best fit involves finding estimates of m and b that minimize the prediction variance.

Two basic methods can be used to find the estimates m and b that satisfy the minimum variance method. In the first approach, one can use a numerical search process to set different values ​​of m and b and evaluate them, ultimately deciding on the estimate that yields the minimum variance. The second method is to use calculus to find equations for estimating m and b. I'm not going to get into the calculus involved in deriving these equations, but I did use these analytical equations in the SimpleLinearRegression class to find least square estimates of m and b (see getSlope() and getYIntercept in the SimpleLinearRegression class method).

Even if you have an equation that can be used to find the least squares estimate of m and b, it does not mean that if you plug these parameters into a linear equation, the result will be a straight line that fits the data well. The next step in this simple linear regression process is to determine whether the remaining prediction variance is acceptable.

You can use the statistical decision process to reject the alternative hypothesis that the straight line fits the data. This process is based on the calculation of the T statistic, using a probability function to find the probability of a randomly large observation. As mentioned in Part 1, the SimpleLinearRegression class generates a number of summary values, one of the important summary values ​​is the T statistic, which measures how well the linear equation fits the data. If the fit is good, the T statistic will tend to be a large value; if the T value is small, you should replace your linear equation with a default model that assumes that the mean of the Y values ​​is the best predictor (because The average of a set of values ​​can often be a useful predictor of the next observation).

To test whether the T statistic is large enough to not use the average Y value as the best predictor, you need to calculate the probability of obtaining the T statistic randomly. If the probability is low, then the null assumption that the mean is the best predictor can be dispensed with, and accordingly one can be confident that a simple linear model is a good fit to the data. (See Part 1 for more information on calculating the probability of a T-statistic.)

Back to discussing the statistical decision-making process. It tells you when not to adopt the null hypothesis, but it does not tell you whether to accept the alternative hypothesis. In a research setting, linear model alternative hypotheses need to be established through theoretical and statistical parameters.

The data research tool you will build implements a statistical decision-making process for linear models (T-tests) and provides summary data that can be used to construct the theoretical and statistical parameters needed to build linear models. Data research tools can be classified as decision support tools for knowledge workers to study patterns in small to medium-sized data sets.

From a learning perspective, simple linear regression modeling is worth studying as it is the only way to understand more advanced forms of statistical modeling. For example, many core concepts in simple linear regression establish a good foundation for understanding multiple regression (Multiple Regression), factor analysis (Factor Analysis), and time series (Time Series).

Simple linear regression is also a versatile modeling technique. It can be used to model curvilinear data by transforming the raw data (usually with a logarithmic or power transformation). These transformations linearize the data so that it can be modeled using simple linear regression. The resulting linear model will be represented as a linear formula related to the transformed values.

Probability function

In the previous article, I got around the problem of implementing probability functions in PHP by asking R to find the probability value. I wasn't completely satisfied with this solution, so I started researching the question: what is needed to develop a probability function based on PHP.

I started looking online for information and code. One source for both is Probability Functions in the book Numerical Recipes in C. I reimplemented some probability function code (gammln.c and betai.c functions) in PHP, but I'm still not satisfied with the results. It seems to have a bit more code than some other implementations. Additionally, I need the inverse probability function.

Luckily, I stumbled upon John Pezzullo’s Interactive Statistical Calculation. John's website on Probability Distribution Functions has all the functions I need, implemented in JavaScript to make learning easier.

I ported the Student T and Fisher F functions to PHP. I changed the API a bit to conform to Java naming style and embedded all functions into a class called Distribution. A great feature of this implementation is the doCommonMath method, which is reused by all functions in this library. Other tests that I didn't bother to implement (normality test and chi-square test) also use the doCommonMath method.

Another aspect of this transplant is also worth noting. By using JavaScript, users can assign dynamically determined values ​​to instance variables, such as:

var PiD2 = pi() / 2

You cannot do this in PHP. Only simple constant values ​​can be assigned to instance variables. Hopefully this flaw will be resolved in PHP5.

Note that the code in Listing 1 does not define instance variables — this is because in the JavaScript version, they are dynamically assigned values.

List 1. Implement probability function


doCommonMath($cth * $cth, 2, $df - 3, -1)) / (pi()/2); } else { return 1 - $sth * $this->doCommonMath($cth * $cth, 1, $df - 3, -1); } } function getInverseStudentT($p, $df) { $v = 0.5; $dv = 0.5; $t = 0; while($dv > 1e-6) { $t = (1 / $v) - 1; $dv = $dv / 2; if ( $this->getStudentT($t, $df) > $p) { $v = $v - $dv; } else { $v = $v + $dv; } } return $t; } function getFisherF($f, $n1, $n2) { // implemented but not shown } function getInverseFisherF($p, $n1, $n2) { // implemented but not shown } } ?>

Graphic output

The output methods you have implemented so far all display summary values ​​in HTML format. It is also suitable for displaying scatter plots or line plots of these data in GIF, JPEG or PNG format.

Rather than writing the code to generate line and distribution plots myself, I thought it would be better to use a PHP-based graphics library called JpGraph. JpGraph is being actively developed by Johan Persson, whose project website describes it this way:

Whether it’s a “quick and dirty” graph with minimal code, or a complex professional graph that requires very fine-grained control, JpGraph makes drawing them simple. JpGraph is equally suitable for scientific and business type graphs.

The JpGraph distribution includes a number of example scripts that can be customized to your specific needs. Using JpGraph as a data research tool is as simple as finding a sample script that does something similar to what I need and adapting it to fit my specific needs.

The script in Listing 3 is extracted from the sample data exploration tool (explore.php) and demonstrates how to call the library and populate the Line and Scatter classes with data from the SimpleLinearRegression analysis. The comments in this code were written by Johan Persson (who does a great job documenting the JPGraph codebase).

Listing 3. Details of functions from the sample data research tool explore.php


SetScale("linlin"); // Setup title $graph->title->Set("$title"); $graph->img->SetMargin(50,20,20,40); $graph->xaxis->SetTitle("$x_name","center"); $graph->yaxis->SetTitleMargin(30); $graph->yaxis->title->Set("$y_name"); $graph->title->SetFont(FF_FONT1,FS_BOLD); // make sure that the X-axis is always at the // bottom at the plot and not just at Y=0 which is // the default position $graph->xaxis->SetPos('min'); // Create the scatter plot with some nice colors $sp1 = new ScatterPlot($slr->Y, $slr->X); $sp1->mark->SetType(MARK_FILLEDCIRCLE); $sp1->mark->SetFillColor("red"); $sp1->SetColor("blue"); $sp1->SetWeight(3); $sp1->mark->SetWidth(4); // Create the regression line $lplot = new LinePlot($slr->PredictedY, $slr->X); $lplot->SetWeight(2); $lplot->SetColor('navy'); // Add the pltos to the line $graph->Add($sp1); $graph->Add($lplot); // ... and stroke $graph_name = "temp/test.png"; $graph->Stroke($graph_name); ?> Simple linear regression implemented in PHP_PHP tutorial ?>
 

数据研究脚本

该数据研究工具由单个脚本( explore.php)构成,该脚本调用 SimpleLinearRegressionHTML 类和 JpGraph 库的方法。

该脚本使用了简单的处理逻辑。该脚本的第一部分对所提交的表单数据执行基本验证。如果这些表单数据通过验证,则执行该脚本的第二部分。

该脚本的第二部分所包含的代码用于分析数据,并以 HTML 和图形格式显示汇总结果。 清单 4中显示了 explore.php脚本的基本结构:

清单 4. explore.php 的结构


$title"; $slr->showTableSummary($x_name, $y_name); echo "

"; $slr->showAnalysisOfVariance(); echo "

"; $slr->showParameterEstimates($x_name, $y_name); echo "
"; $slr->showFormula($x_name, $y_name); echo "

"; $slr->showRValues($x_name, $y_name); echo "
"; include ("jpgraph/jpgraph.php"); include ("jpgraph/jpgraph_scatter.php"); include ("jpgraph/jpgraph_line.php"); // The code for displaying the graphics is inline in the // explore.php script. The code for these two line plots // finishes off the script: // Omitted code for displaying scatter plus line plot // Omitted code for displaying residuals plot } ?>

www.bkjia.comtruehttp://www.bkjia.com/PHPjc/371643.htmlTechArticle在这个由两部分组成的系列文章的第 1 部分( 用 PHP 实现的简单线性回归)中,我说明了数学库对 PHP 有用的原因。我还演示了如何用 PHP...
Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
php怎么把负数转为正整数php怎么把负数转为正整数Apr 19, 2022 pm 08:59 PM

php把负数转为正整数的方法:1、使用abs()函数将负数转为正数,使用intval()函数对正数取整,转为正整数,语法“intval(abs($number))”;2、利用“~”位运算符将负数取反加一,语法“~$number + 1”。

php怎么实现几秒后执行一个函数php怎么实现几秒后执行一个函数Apr 24, 2022 pm 01:12 PM

实现方法:1、使用“sleep(延迟秒数)”语句,可延迟执行函数若干秒;2、使用“time_nanosleep(延迟秒数,延迟纳秒数)”语句,可延迟执行函数若干秒和纳秒;3、使用“time_sleep_until(time()+7)”语句。

php怎么除以100保留两位小数php怎么除以100保留两位小数Apr 22, 2022 pm 06:23 PM

php除以100保留两位小数的方法:1、利用“/”运算符进行除法运算,语法“数值 / 100”;2、使用“number_format(除法结果, 2)”或“sprintf("%.2f",除法结果)”语句进行四舍五入的处理值,并保留两位小数。

php怎么根据年月日判断是一年的第几天php怎么根据年月日判断是一年的第几天Apr 22, 2022 pm 05:02 PM

判断方法:1、使用“strtotime("年-月-日")”语句将给定的年月日转换为时间戳格式;2、用“date("z",时间戳)+1”语句计算指定时间戳是一年的第几天。date()返回的天数是从0开始计算的,因此真实天数需要在此基础上加1。

php怎么判断有没有小数点php怎么判断有没有小数点Apr 20, 2022 pm 08:12 PM

php判断有没有小数点的方法:1、使用“strpos(数字字符串,'.')”语法,如果返回小数点在字符串中第一次出现的位置,则有小数点;2、使用“strrpos(数字字符串,'.')”语句,如果返回小数点在字符串中最后一次出现的位置,则有。

php字符串有没有下标php字符串有没有下标Apr 24, 2022 am 11:49 AM

php字符串有下标。在PHP中,下标不仅可以应用于数组和对象,还可应用于字符串,利用字符串的下标和中括号“[]”可以访问指定索引位置的字符,并对该字符进行读写,语法“字符串名[下标值]”;字符串的下标值(索引值)只能是整数类型,起始值为0。

php怎么替换nbsp空格符php怎么替换nbsp空格符Apr 24, 2022 pm 02:55 PM

方法:1、用“str_replace(" ","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\&nbsp\;||\xc2\xa0)/","其他字符",$str)”语句。

php怎么读取字符串后几个字符php怎么读取字符串后几个字符Apr 22, 2022 pm 08:31 PM

在php中,可以使用substr()函数来读取字符串后几个字符,只需要将该函数的第二个参数设置为负值,第三个参数省略即可;语法为“substr(字符串,-n)”,表示读取从字符串结尾处向前数第n个字符开始,直到字符串结尾的全部字符。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools