After thinking hard for a few days, I finally figured out the reason behind it. Write it down here and ask experts to correct me.
The idea of the collection program is very simple. It is nothing more than opening a page, usually a list page, getting the addresses of all the links in it, and then opening the links one by one to look for what we are interested in. If found, put it into the database or elsewhere. processing. Let's talk about it with a very simple example.
First determine a collection page, usually the list page. The target here is: http://www.jb51.net/article/11/index.htm. This is a list page, and our purpose is to collect all articles on this list page.
There is a list page. The first step is to open it and incorporate its content into our program. Generally, the two functions fopen or file_get_contents are used. We use fopen as an example here. How to open it? It's very simple: $source=fopen("http://www.jb51.net/article/11/index.htm",'r'); In fact, the content has been incorporated into our program. Note that the $source obtained is a resource, not a processable text, so the function fread is used to read the content into a variable. This time it is a real editable text. Example:
$c//www.jb51.net/article/7/all/545.1.htm)]. By looking at the source code, we can see that the link addresses of the articles inside all look like this
We can write regular expressions. $count=preg_match_all("/
The array $art_list[1][$s] contains the link address of an article. And $art_list[2][$s] contains the title of a certain article. At this point, it can be considered half the battle.
Then use a for loop to hit each link in turn, and then get the content in the same way as the title. The above are similar to the tutorials I found online, but when it comes to this for loop, the online tutorials are terrible. I haven't found an article that can explain this clearly. At the beginning, I used js to help the loop, or used Let me give you an example. At the beginning, I did this:
for($i=0;$iThe middle is the part of collecting content, which is omitted.
After one page has been collected, I must collect another page. Ah
But it doesn't work when I use fopen to open the link. The request fails or something, and it doesn't work with js. Finally I know that we need to use this echo "}
My mind is a little uncomfortable, and the writing is a bit messy, so just make do with it. In the eyes of experts, this may not be a big deal, but for novices like me. , it’s really helpful.
The above is an introduction to the program PHP collection program principle analysis, including program content. I hope it will be helpful to friends who are interested in PHP tutorials.

APHPDependencyInjectionContainerisatoolthatmanagesclassdependencies,enhancingcodemodularity,testability,andmaintainability.Itactsasacentralhubforcreatingandinjectingdependencies,thusreducingtightcouplingandeasingunittesting.

Select DependencyInjection (DI) for large applications, ServiceLocator is suitable for small projects or prototypes. 1) DI improves the testability and modularity of the code through constructor injection. 2) ServiceLocator obtains services through center registration, which is convenient but may lead to an increase in code coupling.

PHPapplicationscanbeoptimizedforspeedandefficiencyby:1)enablingopcacheinphp.ini,2)usingpreparedstatementswithPDOfordatabasequeries,3)replacingloopswitharray_filterandarray_mapfordataprocessing,4)configuringNginxasareverseproxy,5)implementingcachingwi

PHPemailvalidationinvolvesthreesteps:1)Formatvalidationusingregularexpressionstochecktheemailformat;2)DNSvalidationtoensurethedomainhasavalidMXrecord;3)SMTPvalidation,themostthoroughmethod,whichchecksifthemailboxexistsbyconnectingtotheSMTPserver.Impl

TomakePHPapplicationsfaster,followthesesteps:1)UseOpcodeCachinglikeOPcachetostoreprecompiledscriptbytecode.2)MinimizeDatabaseQueriesbyusingquerycachingandefficientindexing.3)LeveragePHP7 Featuresforbettercodeefficiency.4)ImplementCachingStrategiessuc

ToimprovePHPapplicationspeed,followthesesteps:1)EnableopcodecachingwithAPCutoreducescriptexecutiontime.2)ImplementdatabasequerycachingusingPDOtominimizedatabasehits.3)UseHTTP/2tomultiplexrequestsandreduceconnectionoverhead.4)Limitsessionusagebyclosin

Dependency injection (DI) significantly improves the testability of PHP code by explicitly transitive dependencies. 1) DI decoupling classes and specific implementations make testing and maintenance more flexible. 2) Among the three types, the constructor injects explicit expression dependencies to keep the state consistent. 3) Use DI containers to manage complex dependencies to improve code quality and development efficiency.

DatabasequeryoptimizationinPHPinvolvesseveralstrategiestoenhanceperformance.1)Selectonlynecessarycolumnstoreducedatatransfer.2)Useindexingtospeedupdataretrieval.3)Implementquerycachingtostoreresultsoffrequentqueries.4)Utilizepreparedstatementsforeffi


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.
