Simple linear regression implemented in PHP_PHP tutorial
In Part 1 of this two-part series ("Simple Linear Regression in PHP"), I explained why math libraries are useful for PHP. I also demonstrated how to develop and implement core parts of a simple linear regression algorithm using PHP as the implementation language.
The goal of this article is to show you how to use the SimpleLinearRegression class discussed in Part 1 to build an important data research tool.
Brief Review: Concepts
The basic goal behind simple linear regression modeling is to find the best-fitting straight line from a two-dimensional plane consisting of pairs of X and Y values (i.e., X and Y measurements). Once the line is found using the minimum variance method, various statistical tests can be performed to determine how well the line fits the observed deviation from the Y value.
The linear equation (y = mx + b) has two parameters that must be estimated based on the X and Y data provided, they are the slope (m) and the y-intercept (b). Once these two parameters are estimated, you can enter the observed values into the linear equation and observe the Y predictions generated by the equation.
To use the minimum variance method to estimate the m and b parameters, you need to find the estimated values of m and b such that they minimize the observed and predicted values of Y for all X values. The difference between the observed and predicted values is called the error ( y i- (mx i+ b) ), and if you square each error value and then sum these residuals, the result is a prediction squared Bad number. Using the minimum variance method to determine the best fit involves finding estimates of m and b that minimize the prediction variance.
Two basic methods can be used to find the estimates m and b that satisfy the minimum variance method. In the first approach, one can use a numerical search process to set different values of m and b and evaluate them, ultimately deciding on the estimate that yields the minimum variance. The second method is to use calculus to find equations for estimating m and b. I'm not going to get into the calculus involved in deriving these equations, but I did use these analytical equations in the SimpleLinearRegression class to find least square estimates of m and b (see getSlope() and getYIntercept in the SimpleLinearRegression class method).
Even if you have an equation that can be used to find the least squares estimate of m and b, it does not mean that if you plug these parameters into a linear equation, the result will be a straight line that fits the data well. The next step in this simple linear regression process is to determine whether the remaining prediction variance is acceptable.
You can use the statistical decision process to reject the alternative hypothesis that the straight line fits the data. This process is based on the calculation of the T statistic, using a probability function to find the probability of a randomly large observation. As mentioned in Part 1, the SimpleLinearRegression class generates a number of summary values, one of the important summary values is the T statistic, which measures how well the linear equation fits the data. If the fit is good, the T statistic will tend to be a large value; if the T value is small, you should replace your linear equation with a default model that assumes that the mean of the Y values is the best predictor (because The average of a set of values can often be a useful predictor of the next observation).
To test whether the T statistic is large enough to not use the average Y value as the best predictor, you need to calculate the probability of obtaining the T statistic randomly. If the probability is low, then the null assumption that the mean is the best predictor can be dispensed with, and accordingly one can be confident that a simple linear model is a good fit to the data. (See Part 1 for more information on calculating the probability of a T-statistic.)
Back to discussing the statistical decision-making process. It tells you when not to adopt the null hypothesis, but it does not tell you whether to accept the alternative hypothesis. In a research setting, linear model alternative hypotheses need to be established through theoretical and statistical parameters.
The data research tool you will build implements a statistical decision-making process for linear models (T-tests) and provides summary data that can be used to construct the theoretical and statistical parameters needed to build linear models. Data research tools can be classified as decision support tools for knowledge workers to study patterns in small to medium-sized data sets.
From a learning perspective, simple linear regression modeling is worth studying as it is the only way to understand more advanced forms of statistical modeling. For example, many core concepts in simple linear regression establish a good foundation for understanding multiple regression (Multiple Regression), factor analysis (Factor Analysis), and time series (Time Series).
Simple linear regression is also a versatile modeling technique. It can be used to model curvilinear data by transforming the raw data (usually with a logarithmic or power transformation). These transformations linearize the data so that it can be modeled using simple linear regression. The resulting linear model will be represented as a linear formula related to the transformed values.
Back to top
Probability function
In the previous article, I got around the problem of implementing probability functions in PHP by asking R to find the probability value. I wasn't completely satisfied with this solution, so I started researching the question: what is needed to develop a probability function based on PHP.
I started looking online for information and code. One source for both is Probability Functions in the book Numerical Recipes in C. I reimplemented some probability function code (gammln.c and betai.c functions) in PHP, but I'm still not satisfied with the results. It seems to have a bit more code than some other implementations. Additionally, I need the inverse probability function.
Luckily, I stumbled upon John Pezzullo’s Interactive Statistical Calculation. John's website on Probability Distribution Functions has all the functions I need, implemented in JavaScript to make learning easier.
I ported the Student T and Fisher F functions to PHP. I changed the API a bit to conform to Java naming style and embedded all functions into a class called Distribution. A great feature of this implementation is the doCommonMath method, which is reused by all functions in this library. Other tests that I didn't bother to implement (normality test and chi-square test) also use the doCommonMath method.
Another aspect of this transplant is also worth noting. By using JavaScript, users can assign dynamically determined values to instance variables, such as:
var PiD2 = pi() / 2
You cannot do this in PHP. Only simple constant values can be assigned to instance variables. Hopefully this flaw will be resolved in PHP5.
Note that the code in Listing 1 does not define instance variables — this is because in the JavaScript version, they are dynamically assigned values.
List 1. Implement probability function
doCommonMath($cth * $cth, 2, $df - 3, -1)) / (pi()/2); } else { return 1 - $sth * $this->doCommonMath($cth * $cth, 1, $df - 3, -1); } } function getInverseStudentT($p, $df) { $v = 0.5; $dv = 0.5; $t = 0; while($dv > 1e-6) { $t = (1 / $v) - 1; $dv = $dv / 2; if ( $this->getStudentT($t, $df) > $p) { $v = $v - $dv; } else { $v = $v + $dv; } } return $t; } function getFisherF($f, $n1, $n2) { // implemented but not shown } function getInverseFisherF($p, $n1, $n2) { // implemented but not shown } } ?>
Back to top
Graphic output
The output methods you have implemented so far all display summary values in HTML format. It is also suitable for displaying scatter plots or line plots of these data in GIF, JPEG or PNG format.
Rather than writing the code to generate line and distribution plots myself, I thought it would be better to use a PHP-based graphics library called JpGraph. JpGraph is being actively developed by Johan Persson, whose project website describes it this way:
Whether it’s a “quick and dirty” graph with minimal code, or a complex professional graph that requires very fine-grained control, JpGraph makes drawing them simple. JpGraph is equally suitable for scientific and business type graphs.
The JpGraph distribution includes a number of example scripts that can be customized to your specific needs. Using JpGraph as a data research tool is as simple as finding a sample script that does something similar to what I need and adapting it to fit my specific needs.
The script in Listing 3 is extracted from the sample data exploration tool (explore.php) and demonstrates how to call the library and populate the Line and Scatter classes with data from the SimpleLinearRegression analysis. The comments in this code were written by Johan Persson (who does a great job documenting the JPGraph codebase).
Listing 3. Details of functions from the sample data research tool explore.php
SetScale("linlin"); // Setup title $graph->title->Set("$title"); $graph->img->SetMargin(50,20,20,40); $graph->xaxis->SetTitle("$x_name","center"); $graph->yaxis->SetTitleMargin(30); $graph->yaxis->title->Set("$y_name"); $graph->title->SetFont(FF_FONT1,FS_BOLD); // make sure that the X-axis is always at the // bottom at the plot and not just at Y=0 which is // the default position $graph->xaxis->SetPos('min'); // Create the scatter plot with some nice colors $sp1 = new ScatterPlot($slr->Y, $slr->X); $sp1->mark->SetType(MARK_FILLEDCIRCLE); $sp1->mark->SetFillColor("red"); $sp1->SetColor("blue"); $sp1->SetWeight(3); $sp1->mark->SetWidth(4); // Create the regression line $lplot = new LinePlot($slr->PredictedY, $slr->X); $lplot->SetWeight(2); $lplot->SetColor('navy'); // Add the pltos to the line $graph->Add($sp1); $graph->Add($lplot); // ... and stroke $graph_name = "temp/test.png"; $graph->Stroke($graph_name); ?> ?>
回页首
数据研究脚本
该数据研究工具由单个脚本( explore.php)构成,该脚本调用 SimpleLinearRegressionHTML 类和 JpGraph 库的方法。
该脚本使用了简单的处理逻辑。该脚本的第一部分对所提交的表单数据执行基本验证。如果这些表单数据通过验证,则执行该脚本的第二部分。
该脚本的第二部分所包含的代码用于分析数据,并以 HTML 和图形格式显示汇总结果。 清单 4中显示了 explore.php脚本的基本结构:
清单 4. explore.php 的结构
$title"; $slr->showTableSummary($x_name, $y_name); echo "
"; $slr->showAnalysisOfVariance(); echo "
"; $slr->showParameterEstimates($x_name, $y_name); echo "
"; $slr->showFormula($x_name, $y_name); echo "
"; $slr->showRValues($x_name, $y_name); echo "
"; include ("jpgraph/jpgraph.php"); include ("jpgraph/jpgraph_scatter.php"); include ("jpgraph/jpgraph_line.php"); // The code for displaying the graphics is inline in the // explore.php script. The code for these two line plots // finishes off the script: // Omitted code for displaying scatter plus line plot // Omitted code for displaying residuals plot } ?>

What’s still popular is the ease of use, flexibility and a strong ecosystem. 1) Ease of use and simple syntax make it the first choice for beginners. 2) Closely integrated with web development, excellent interaction with HTTP requests and database. 3) The huge ecosystem provides a wealth of tools and libraries. 4) Active community and open source nature adapts them to new needs and technology trends.

PHP and Python are both high-level programming languages that are widely used in web development, data processing and automation tasks. 1.PHP is often used to build dynamic websites and content management systems, while Python is often used to build web frameworks and data science. 2.PHP uses echo to output content, Python uses print. 3. Both support object-oriented programming, but the syntax and keywords are different. 4. PHP supports weak type conversion, while Python is more stringent. 5. PHP performance optimization includes using OPcache and asynchronous programming, while Python uses cProfile and asynchronous programming.

PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.

PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.

PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.

PHP remains important in the modernization process because it supports a large number of websites and applications and adapts to development needs through frameworks. 1.PHP7 improves performance and introduces new features. 2. Modern frameworks such as Laravel, Symfony and CodeIgniter simplify development and improve code quality. 3. Performance optimization and best practices further improve application efficiency.

PHPhassignificantlyimpactedwebdevelopmentandextendsbeyondit.1)ItpowersmajorplatformslikeWordPressandexcelsindatabaseinteractions.2)PHP'sadaptabilityallowsittoscaleforlargeapplicationsusingframeworkslikeLaravel.3)Beyondweb,PHPisusedincommand-linescrip

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

Dreamweaver Mac version
Visual web development tools

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

SublimeText3 Mac version
God-level code editing software (SublimeText3)