


Teach you step by step how to do a keyword matching project (search engine) ---- Day 20, teach you how to do it on the 20th day_PHP Tutorial
Teach you step by step how to do a keyword matching project (search engine) ---- On the 20th day, teach you how to do it on the 20th day
Guest appearance: Diaosi’s cheating form Things like artifacts and databases
Object-oriented sublimation: object-oriented understanding - first acquaintance with new students, object-oriented extras - sleepwalking of thoughts (1), object-oriented understanding - how to find classes
Load Balancing: Load Balancing - Concept Understanding, Load Balancing - Implementation Configuration (Nginx)
Complaints: Some people reported such information, saying that the article became harder to understand towards the end and could not keep up with the rhythm. Some people also asked why Xiao Shuai Shuai’s ability increased so fast, and whether I was stupid. Some people just read the text without looking at the code, because the code is too difficult to understand.
Actually, I have been thinking about this issue these days, so I had no choice but to launch some object-oriented courses. I hope it will be helpful to those who can't keep up. In fact, to be honest, if readers don't give feedback, I will have to carry out the course according to what I think Xiaoshuai Shuai is.
Day 20
Starting point: Teach you step by step how to do keyword matching project (search engine) ---- Day 1
Review: Teach you step by step how to do keyword matching project (search engine) ---- Day 19
It is said that Xiao Shuai Shuai wrote the first version in order to solve the word segmentation algorithm. When he showed it to Boss Yu, he was asked to rewrite it.
The reasons are as follows:
1. How to test and test data?
2. Has Splitter done too much?
3. What should I do if there are repeated phrases in dresses like xxl dresses?
Xiao Shuai Shuai took these questions and began to reconstruct.
First he discovered this, the judgment of Chinese, English and Chinese-English, and the calculation of length. He wrote this as a class:
<?<span>php </span><span>class</span><span> UTF8 { </span><span>/*</span><span>* * 检测是否utf8 * @param $char * @return bool </span><span>*/</span> <span>public</span> <span>static</span> <span>function</span> is(<span>$char</span><span>){ </span><span>return</span> (<span>preg_match</span>("/^([".<span>chr</span>(228)."-".<span>chr</span>(233)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}){1}/",<span>$char</span>) || <span>preg_match</span>("/([".<span>chr</span>(228)."-".<span>chr</span>(233)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}){1}$/",<span>$char</span>) || <span>preg_match</span>("/([".<span>chr</span>(228)."-".<span>chr</span>(233)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}){2,}/",<span>$char</span><span>)); } </span><span>/*</span><span>* * 计算utf8字的个数 * @param $char * @return float|int </span><span>*/</span> <span>public</span> <span>static</span> <span>function</span> length(<span>$char</span><span>) { </span><span>if</span>(self::is(<span>$char</span><span>)) </span><span>return</span> <span>ceil</span>(<span>strlen</span>(<span>$char</span>)/3<span>); </span><span>return</span> <span>strlen</span>(<span>$char</span><span>); } </span><span>/*</span><span>* * 检测是否为词组 * @param $word * @return bool </span><span>*/</span> <span>public</span> <span>static</span> <span>function</span> isPhrase(<span>$word</span><span>){ </span><span>if</span>(self::length(<span>$word</span>)<=1<span>) </span><span>return</span> <span>false</span><span>; </span><span>return</span> <span>true</span><span>; } }</span>
Xiao Shuai Shuai also considered that the source of the dictionary may come from multiple places, such as the test data I gave. This can solve the problem that Boss Yu mentioned that cannot be tested. Xiao Shuai Shuai took a cut of the source of the dictionary. Created a class, the class is as follows:
<?<span>php </span><span>class</span><span> DBSegmentation { </span><span>public</span> <span>$cid</span><span>; </span><span>/*</span><span>* * 获取类目下分词的词组数据 * @return array </span><span>*/</span> <span>public</span> <span>function</span><span> transferDictionary(){ </span><span>$ret</span> = <span>array</span><span>(); </span><span>$sql</span> = "select word from category_linklist where cid='<span>$this</span>->cid'"<span>; </span><span>$words</span> = DB::makeArray(<span>$sql</span><span>); </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$strWords</span><span>){ </span><span>$words</span> = <span>explode</span>(",",<span>$strWords</span><span>); </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$word</span><span>){ </span><span>if</span>(UTF8::isPhrase(<span>$word</span><span>)){ </span><span>$ret</span>[] = <span>$word</span><span>; } } } </span><span>return</span> <span>$ret</span><span>; } } </span><span>class</span><span> TestSegmentation { </span><span>public</span> <span>function</span><span> transferDictionary(){ </span><span>$words</span> = <span>array</span><span>( </span>"连衣裙,连衣", "XXL,xxl,加大,加大码", "X码,中码", "外套,衣,衣服,外衣,上衣", "女款,女士,女生,女性"<span> ); </span><span>$ret</span> = <span>array</span><span>(); </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$strWords</span><span>){ </span><span>$words</span> = <span>explode</span>(",",<span>$strWords</span><span>); </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$word</span><span>){ </span><span>if</span>(UTF8::isPhrase(<span>$word</span><span>)){ </span><span>$ret</span>[] = <span>$word</span><span>; } } } </span><span>return</span> <span>$ret</span><span>; } }</span>
Then Splitter will focus on word segmentation. The code is as follows:
<span>class</span><span> Splitter { </span><span>public</span> <span>$keyword</span><span>; </span><span>private</span> <span>$dictionary</span> = <span>array</span><span>(); </span><span>public</span> <span>function</span> setDictionary(<span>$dictionary</span> = <span>array</span><span>()){ </span><span>usort</span>(<span>$dictionary</span>,<span>function</span>(<span>$a</span>,<span>$b</span><span>){ </span><span>return</span> (UTF8::length(<span>$a</span>)>UTF8::length(<span>$b</span>))?1:-1<span>; }); </span><span>$this</span>->dictionary = <span>$dictionary</span><span>; } </span><span>public</span> <span>function</span><span> getDictionary(){ </span><span>return</span> <span>$this</span>-><span>dictionary; } </span><span>/*</span><span>* * 把关键词拆分成词组或者单词 * @return KeywordEntity $keywordEntity </span><span>*/</span> <span>public</span> <span>function</span> <span>split</span><span>(){ </span><span>$remainKeyword</span> = <span>$this</span>-><span>keyword; </span><span>$keywordEntity</span> = <span>new</span> KeywordEntity(<span>$this</span>-><span>keyword); </span><span>foreach</span>(<span>$this</span>->dictionary <span>as</span> <span>$phrase</span><span>){ </span><span>$matchTimes</span> = <span>preg_match_all</span>("/<span>$phrase</span>/",<span>$remainKeyword</span>,<span>$matches</span><span>); </span><span>if</span>(<span>$matchTimes</span>>0<span>){ </span><span>$keywordEntity</span>->addElement(<span>$phrase</span>,<span>$matchTimes</span><span>); </span><span>$remainKeyword</span> = <span>str_replace</span>(<span>$phrase</span>,"::",<span>$remainKeyword</span><span>); } } </span><span>$remainKeywords</span> = <span>explode</span>("::",<span>$remainKeyword</span><span>); </span><span>foreach</span>(<span>$remainKeywords</span> <span>as</span> <span>$splitWord</span><span>){ </span><span>if</span>(!<span>empty</span>(<span>$splitWord</span><span>)){ </span><span>$keywordEntity</span>->addElement(<span>$splitWord</span><span>); } } </span><span>return</span> <span>$keywordEntity</span><span>; } } </span><span>class</span><span> KeywordEntity { </span><span>public</span> <span>$keyword</span><span>; </span><span>public</span> <span>$elements</span> = <span>array</span><span>(); </span><span>public</span> <span>function</span> __construct(<span>$keyword</span><span>){ </span><span>$this</span>->keyword = <span>$keyword</span><span>; } </span><span>public</span> <span>function</span> addElement(<span>$word</span>,<span>$times</span>=1<span>){ </span><span>if</span>(<span>isset</span>(<span>$this</span>->elements[<span>$word</span><span>])){ </span><span>$this</span>->elements[<span>$word</span>]->times += <span>$times</span><span>; }</span><span>else</span> <span>$this</span>->elements[] = <span>new</span> KeywordElement(<span>$word</span>,<span>$times</span><span>); } </span><span>/*</span><span>* * @desc 计算UTF8字符串权重 * @param string $word * @return float </span><span>*/</span> <span>public</span> <span>function</span> calculateWeight(<span>$word</span><span>) { </span><span>$element</span> = <span>$this</span>->elements[<span>$word</span><span>]; </span><span>return</span> <span>ROUND</span>(<span>strlen</span>(<span>$element</span>->word)*<span>$element</span>->times / <span>strlen</span>(<span>$this</span>->keyword), 3<span>); } } </span><span>class</span><span> KeywordElement { </span><span>public</span> <span>$word</span><span>; </span><span>public</span> <span>$times</span><span>; </span><span>public</span> <span>function</span> __construct(<span>$word</span>,<span>$times</span><span>){ </span><span>$this</span>->word = <span>$word</span><span>; </span><span>$this</span>->times = <span>$times</span><span>; } }</span>
He also left the weight calculation to a class to handle specifically.
After Xiao Shuai Shuai finished writing, he also wrote a test example:
<?<span>php </span><span>$segmentation</span> = <span>new</span><span> TestSegmentation(); </span><span>$splitter</span> = <span>new</span><span> Splitter(); </span><span>$splitter</span>->setDictionary(<span>$segmentation</span>-><span>transferDictionary()); </span><span>$splitter</span>->keyword = "连衣裙xxl裙连衣裙"<span>; </span><span>$keywordEntity</span> = <span>$splitter</span>-><span>split</span><span>(); </span><span>var_dump</span>(<span>$keywordEntity</span>);
This way, even if your algorithm changes, it will be able to face it calmly.
Xiao Shuaishuai understands this. When you feel that a class does too many things, you can consider the single responsibility principle.
Single Responsibility Principle: A class has only one reason for its change. There should be only one responsibility. Each responsibility is an axis of change. If a class has more than one responsibility, these responsibilities are coupled together. This can lead to fragile designs. When one responsibility changes, it may affect other responsibilities. In addition, multiple responsibilities are coupled together, which affects reusability. For example: To achieve the separation of logic and interface. 【From Baidu Encyclopedia】
When Mr. Yu mentioned whether there are other word segmentation algorithms and whether we can use them, Xiao Shuaishuai was very happy because the code is so beautiful now.
How Xiao Shuai Shuai plays with third-party word segmentation extensions, please stay tuned for the next chapter’s breakdown: I’ll teach you step by step how to do keyword matching projects (search engines) ---- Day 21

DependencyInjection(DI)inPHPenhancescodeflexibilityandtestabilitybydecouplingdependencycreationfromusage.ToimplementDIeffectively:1)UseDIcontainersjudiciouslytoavoidover-engineering.2)Avoidconstructoroverloadbylimitingdependenciestothreeorfour.3)Adhe

ToimproveyourPHPwebsite'sperformance,usethesestrategies:1)ImplementopcodecachingwithOPcachetospeedupscriptinterpretation.2)Optimizedatabasequeriesbyselectingonlynecessaryfields.3)UsecachingsystemslikeRedisorMemcachedtoreducedatabaseload.4)Applyasynch

Yes,itispossibletosendmassemailswithPHP.1)UselibrarieslikePHPMailerorSwiftMailerforefficientemailsending.2)Implementdelaysbetweenemailstoavoidspamflags.3)Personalizeemailsusingdynamiccontenttoimproveengagement.4)UsequeuesystemslikeRabbitMQorRedisforb

DependencyInjection(DI)inPHPisadesignpatternthatachievesInversionofControl(IoC)byallowingdependenciestobeinjectedintoclasses,enhancingmodularity,testability,andflexibility.DIdecouplesclassesfromspecificimplementations,makingcodemoremanageableandadapt

The best ways to send emails using PHP include: 1. Use PHP's mail() function to basic sending; 2. Use PHPMailer library to send more complex HTML mail; 3. Use transactional mail services such as SendGrid to improve reliability and analysis capabilities. With these methods, you can ensure that emails not only reach the inbox, but also attract recipients.

Calculating the total number of elements in a PHP multidimensional array can be done using recursive or iterative methods. 1. The recursive method counts by traversing the array and recursively processing nested arrays. 2. The iterative method uses the stack to simulate recursion to avoid depth problems. 3. The array_walk_recursive function can also be implemented, but it requires manual counting.

In PHP, the characteristic of a do-while loop is to ensure that the loop body is executed at least once, and then decide whether to continue the loop based on the conditions. 1) It executes the loop body before conditional checking, suitable for scenarios where operations need to be performed at least once, such as user input verification and menu systems. 2) However, the syntax of the do-while loop can cause confusion among newbies and may add unnecessary performance overhead.

Efficient hashing strings in PHP can use the following methods: 1. Use the md5 function for fast hashing, but is not suitable for password storage. 2. Use the sha256 function to improve security. 3. Use the password_hash function to process passwords to provide the highest security and convenience.


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Atom editor mac version download
The most popular open source editor

Dreamweaver Mac version
Visual web development tools

SublimeText3 Chinese version
Chinese version, very easy to use

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 English version
Recommended: Win version, supports code prompts!
