찾다
php教程php手册PHP similar_text()、levenshtein()、lcs()支持中文汉字版,

PHP similar_text()、levenshtein()、lcs()支持中文汉字版,

PHP 原生的similar_text()函数、levenshtein()函数对中文汉字支持不好,我自己写了一个


similar_text()中文汉字版

<span> 1</span>     <?<span>php  
</span><span> 2</span>     <span>//</span><span>拆分字符串  </span>
<span> 3</span>     <span>function</span> split_str(<span>$str</span><span>) {  
</span><span> 4</span>       <span>preg_match_all</span>("/./u", <span>$str</span>, <span>$arr</span><span>);  
</span><span> 5</span>       <span>return</span> <span>$arr</span>[0<span>];  
</span><span> 6</span> <span>    }  
</span><span> 7</span>       
<span> 8</span>     <span>//</span><span>相似度检测  </span>
<span> 9</span>     <span>function</span> similar_text_cn(<span>$str1</span>, <span>$str2</span><span>) {  
</span><span>10</span>       <span>$arr_1</span> = <span>array_unique</span>(split_str(<span>$str1</span><span>));  
</span><span>11</span>       <span>$arr_2</span> = <span>array_unique</span>(split_str(<span>$str2</span><span>));  
</span><span>12</span>       <span>$similarity</span> = <span>count</span>(<span>$arr_2</span>) - <span>count</span>(<span>array_diff</span>(<span>$arr_2</span>, <span>$arr_1</span><span>));  
</span><span>13</span>         
<span>14</span>       <span>return</span> <span>$similarity</span><span>;  
</span><span>15</span>     }  

levenshtein()中文汉字版

<span> 1</span>     <?<span>php  
</span><span> 2</span>     <span>//</span><span>拆分字符串  </span>
<span> 3</span>     <span>function</span> mbStringToArray(<span>$string</span>, <span>$encoding</span> = 'UTF-8'<span>) {  
</span><span> 4</span>         <span>$arrayResult</span> = <span>array</span><span>();  
</span><span> 5</span>       
<span> 6</span>         <span>while</span> (<span>$iLen</span> = mb_strlen(<span>$string</span>, <span>$encoding</span><span>)) {  
</span><span> 7</span>             <span>array_push</span>(<span>$arrayResult</span>, mb_substr(<span>$string</span>, 0, 1, <span>$encoding</span><span>));  
</span><span> 8</span>             <span>$string</span> = mb_substr(<span>$string</span>, 1, <span>$iLen</span>, <span>$encoding</span><span>);  
</span><span> 9</span> <span>        }  
</span><span>10</span>       
<span>11</span>         <span>return</span> <span>$arrayResult</span><span>;  
</span><span>12</span> <span>    }  
</span><span>13</span>       
<span>14</span>     <span>//</span><span>编辑距离  </span>
<span>15</span>     <span>function</span> levenshtein_cn(<span>$str1</span>, <span>$str2</span>, <span>$costReplace</span> = 1, <span>$encoding</span> = 'UTF-8'<span>) {  
</span><span>16</span>         <span>$count_same_letter</span> = 0<span>;  
</span><span>17</span>         <span>$d</span> = <span>array</span><span>();  
</span><span>18</span>       
<span>19</span>         <span>$mb_len1</span> = mb_strlen(<span>$str1</span>, <span>$encoding</span><span>);  
</span><span>20</span>         <span>$mb_len2</span> = mb_strlen(<span>$str2</span>, <span>$encoding</span><span>);  
</span><span>21</span>       
<span>22</span>         <span>$mb_str1</span> = mbStringToArray(<span>$str1</span>, <span>$encoding</span><span>);  
</span><span>23</span>         <span>$mb_str2</span> = mbStringToArray(<span>$str2</span>, <span>$encoding</span><span>);  
</span><span>24</span>       
<span>25</span>         <span>for</span> (<span>$i1</span> = 0; <span>$i1</span> <= <span>$mb_len1</span>; <span>$i1</span>++<span>) {  
</span><span>26</span>             <span>$d</span>[<span>$i1</span>] = <span>array</span><span>();  
</span><span>27</span>             <span>$d</span>[<span>$i1</span>][0] = <span>$i1</span><span>;  
</span><span>28</span> <span>        }  
</span><span>29</span>       
<span>30</span>         <span>for</span> (<span>$i2</span> = 0; <span>$i2</span> <= <span>$mb_len2</span>; <span>$i2</span>++<span>) {  
</span><span>31</span>             <span>$d</span>[0][<span>$i2</span>] = <span>$i2</span><span>;  
</span><span>32</span> <span>        }  
</span><span>33</span>       
<span>34</span>         <span>for</span> (<span>$i1</span> = 1; <span>$i1</span> <= <span>$mb_len1</span>; <span>$i1</span>++<span>) {  
</span><span>35</span>             <span>for</span> (<span>$i2</span> = 1; <span>$i2</span> <= <span>$mb_len2</span>; <span>$i2</span>++<span>) {  
</span><span>36</span>                 <span>//</span><span> $cost = ($str1[$i1 - 1] == $str2[$i2 - 1]) ? 0 : 1;  </span>
<span>37</span>                 <span>if</span> (<span>$mb_str1</span>[<span>$i1</span> - 1] === <span>$mb_str2</span>[<span>$i2</span> - 1<span>]) {  
</span><span>38</span>                     <span>$cost</span> = 0<span>;  
</span><span>39</span>                     <span>$count_same_letter</span>++<span>;  
</span><span>40</span>                 } <span>else</span><span> {  
</span><span>41</span>                     <span>$cost</span> = <span>$costReplace</span>; <span>//</span><span>替换  </span>
<span>42</span> <span>                }  
</span><span>43</span>       
<span>44</span>                 <span>$d</span>[<span>$i1</span>][<span>$i2</span>] = <span>min</span>(<span>$d</span>[<span>$i1</span> - 1][<span>$i2</span>] + 1, <span>//</span><span>插入  </span>
<span>45</span>                 <span>$d</span>[<span>$i1</span>][<span>$i2</span> - 1] + 1, <span>//</span><span>删除  </span>
<span>46</span>                 <span>$d</span>[<span>$i1</span> - 1][<span>$i2</span> - 1] + <span>$cost</span><span>);  
</span><span>47</span> <span>            }  
</span><span>48</span> <span>        }  
</span><span>49</span>       
<span>50</span>         <span>return</span> <span>$d</span>[<span>$mb_len1</span>][<span>$mb_len2</span><span>];  
</span><span>51</span>         <span>//</span><span>return array('distance' => $d[$mb_len1][$mb_len2], 'count_same_letter' => $count_same_letter);  </span>
<span>52</span>     }  
 

最长公共子序列LCS()

<span> 1</span>         <?<span>php  
</span><span> 2</span>         <span>//</span><span>最长公共子序列英文版  </span>
<span> 3</span>         <span>function</span> LCS_en(<span>$str_1</span>, <span>$str_2</span><span>) {  
</span><span> 4</span>           <span>$len_1</span> = <span>strlen</span>(<span>$str_1</span><span>);  
</span><span> 5</span>           <span>$len_2</span> = <span>strlen</span>(<span>$str_2</span><span>);  
</span><span> 6</span>           <span>$len</span> = <span>$len_1</span> > <span>$len_2</span> ? <span>$len_1</span> : <span>$len_2</span><span>;  
</span><span> 7</span>           
<span> 8</span>           <span>$dp</span> = <span>array</span><span>();  
</span><span> 9</span>           <span>for</span> (<span>$i</span> = 0; <span>$i</span> <= <span>$len</span>; <span>$i</span>++<span>) {  
</span><span>10</span>             <span>$dp</span>[<span>$i</span>] = <span>array</span><span>();  
</span><span>11</span>             <span>$dp</span>[<span>$i</span>][0] = 0<span>;  
</span><span>12</span>             <span>$dp</span>[0][<span>$i</span>] = 0<span>;  
</span><span>13</span> <span>          }  
</span><span>14</span>           
<span>15</span>           <span>for</span> (<span>$i</span> = 1; <span>$i</span> <= <span>$len_1</span>; <span>$i</span>++<span>) {  
</span><span>16</span>             <span>for</span> (<span>$j</span> = 1; <span>$j</span> <= <span>$len_2</span>; <span>$j</span>++<span>) {  
</span><span>17</span>               <span>if</span> (<span>$str_1</span>[<span>$i</span> - 1] == <span>$str_2</span>[<span>$j</span> - 1<span>]) {  
</span><span>18</span>                 <span>$dp</span>[<span>$i</span>][<span>$j</span>] = <span>$dp</span>[<span>$i</span> - 1][<span>$j</span> - 1] + 1<span>;  
</span><span>19</span>               } <span>else</span><span> {  
</span><span>20</span>                 <span>$dp</span>[<span>$i</span>][<span>$j</span>] = <span>$dp</span>[<span>$i</span> - 1][<span>$j</span>] > <span>$dp</span>[<span>$i</span>][<span>$j</span> - 1] ? <span>$dp</span>[<span>$i</span> - 1][<span>$j</span>] : <span>$dp</span>[<span>$i</span>][<span>$j</span> - 1<span>];  
</span><span>21</span> <span>              }  
</span><span>22</span> <span>            }  
</span><span>23</span> <span>          }  
</span><span>24</span>           
<span>25</span>           <span>return</span> <span>$dp</span>[<span>$len_1</span>][<span>$len_2</span><span>];  
</span><span>26</span> <span>        }  
</span><span>27</span>           
<span>28</span>         <span>//</span><span>拆分字符串  </span>
<span>29</span>         <span>function</span> mbStringToArray(<span>$string</span>, <span>$encoding</span> = 'UTF-8'<span>) {  
</span><span>30</span>           <span>$arrayResult</span> = <span>array</span><span>();  
</span><span>31</span>           
<span>32</span>           <span>while</span> (<span>$iLen</span> = mb_strlen(<span>$string</span>, <span>$encoding</span><span>)) {  
</span><span>33</span>             <span>array_push</span>(<span>$arrayResult</span>, mb_substr(<span>$string</span>, 0, 1, <span>$encoding</span><span>));  
</span><span>34</span>             <span>$string</span> = mb_substr(<span>$string</span>, 1, <span>$iLen</span>, <span>$encoding</span><span>);  
</span><span>35</span> <span>          }  
</span><span>36</span>           
<span>37</span>           <span>return</span> <span>$arrayResult</span><span>;  
</span><span>38</span> <span>        }  
</span><span>39</span>           
<span>40</span>         <span>//</span><span>最长公共子序列中文版  </span>
<span>41</span>         <span>function</span> LCS_cn(<span>$str1</span>, <span>$str2</span>, <span>$encoding</span> = 'UTF-8'<span>) {  
</span><span>42</span>           <span>$mb_len1</span> = mb_strlen(<span>$str1</span>, <span>$encoding</span><span>);  
</span><span>43</span>           <span>$mb_len2</span> = mb_strlen(<span>$str2</span>, <span>$encoding</span><span>);  
</span><span>44</span>           
<span>45</span>           <span>$mb_str1</span> = mbStringToArray(<span>$str1</span>, <span>$encoding</span><span>);  
</span><span>46</span>           <span>$mb_str2</span> = mbStringToArray(<span>$str2</span>, <span>$encoding</span><span>);  
</span><span>47</span>           
<span>48</span>           <span>$len</span> = <span>$mb_len1</span> > <span>$mb_len2</span> ? <span>$mb_len1</span> : <span>$mb_len2</span><span>;  
</span><span>49</span>           
<span>50</span>           <span>$dp</span> = <span>array</span><span>();  
</span><span>51</span>           <span>for</span> (<span>$i</span> = 0; <span>$i</span> <= <span>$len</span>; <span>$i</span>++<span>) {  
</span><span>52</span>             <span>$dp</span>[<span>$i</span>] = <span>array</span><span>();  
</span><span>53</span>             <span>$dp</span>[<span>$i</span>][0] = 0<span>;  
</span><span>54</span>             <span>$dp</span>[0][<span>$i</span>] = 0<span>;  
</span><span>55</span> <span>          }  
</span><span>56</span>           
<span>57</span>           <span>for</span> (<span>$i</span> = 1; <span>$i</span> <= <span>$mb_len1</span>; <span>$i</span>++<span>) {  
</span><span>58</span>             <span>for</span> (<span>$j</span> = 1; <span>$j</span> <= <span>$mb_len2</span>; <span>$j</span>++<span>) {  
</span><span>59</span>               <span>if</span> (<span>$mb_str1</span>[<span>$i</span> - 1] == <span>$mb_str2</span>[<span>$j</span> - 1<span>]) {  
</span><span>60</span>                 <span>$dp</span>[<span>$i</span>][<span>$j</span>] = <span>$dp</span>[<span>$i</span> - 1][<span>$j</span> - 1] + 1<span>;  
</span><span>61</span>               } <span>else</span><span> {  
</span><span>62</span>                 <span>$dp</span>[<span>$i</span>][<span>$j</span>] = <span>$dp</span>[<span>$i</span> - 1][<span>$j</span>] > <span>$dp</span>[<span>$i</span>][<span>$j</span> - 1] ? <span>$dp</span>[<span>$i</span> - 1][<span>$j</span>] : <span>$dp</span>[<span>$i</span>][<span>$j</span> - 1<span>];  
</span><span>63</span> <span>              }  
</span><span>64</span> <span>            }  
</span><span>65</span> <span>          }  
</span><span>66</span>           
<span>67</span>           <span>return</span> <span>$dp</span>[<span>$mb_len1</span>][<span>$mb_len2</span><span>];  
</span><span>68</span>         } 

 

 

성명
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.
php怎么把负数转为正整数php怎么把负数转为正整数Apr 19, 2022 pm 08:59 PM

php把负数转为正整数的方法:1、使用abs()函数将负数转为正数,使用intval()函数对正数取整,转为正整数,语法“intval(abs($number))”;2、利用“~”位运算符将负数取反加一,语法“~$number + 1”。

php怎么实现几秒后执行一个函数php怎么实现几秒后执行一个函数Apr 24, 2022 pm 01:12 PM

实现方法:1、使用“sleep(延迟秒数)”语句,可延迟执行函数若干秒;2、使用“time_nanosleep(延迟秒数,延迟纳秒数)”语句,可延迟执行函数若干秒和纳秒;3、使用“time_sleep_until(time()+7)”语句。

php字符串有没有下标php字符串有没有下标Apr 24, 2022 am 11:49 AM

php字符串有下标。在PHP中,下标不仅可以应用于数组和对象,还可应用于字符串,利用字符串的下标和中括号“[]”可以访问指定索引位置的字符,并对该字符进行读写,语法“字符串名[下标值]”;字符串的下标值(索引值)只能是整数类型,起始值为0。

php怎么除以100保留两位小数php怎么除以100保留两位小数Apr 22, 2022 pm 06:23 PM

php除以100保留两位小数的方法:1、利用“/”运算符进行除法运算,语法“数值 / 100”;2、使用“number_format(除法结果, 2)”或“sprintf("%.2f",除法结果)”语句进行四舍五入的处理值,并保留两位小数。

php怎么读取字符串后几个字符php怎么读取字符串后几个字符Apr 22, 2022 pm 08:31 PM

在php中,可以使用substr()函数来读取字符串后几个字符,只需要将该函数的第二个参数设置为负值,第三个参数省略即可;语法为“substr(字符串,-n)”,表示读取从字符串结尾处向前数第n个字符开始,直到字符串结尾的全部字符。

php怎么根据年月日判断是一年的第几天php怎么根据年月日判断是一年的第几天Apr 22, 2022 pm 05:02 PM

判断方法:1、使用“strtotime("年-月-日")”语句将给定的年月日转换为时间戳格式;2、用“date("z",时间戳)+1”语句计算指定时间戳是一年的第几天。date()返回的天数是从0开始计算的,因此真实天数需要在此基础上加1。

php怎么替换nbsp空格符php怎么替换nbsp空格符Apr 24, 2022 pm 02:55 PM

方法:1、用“str_replace("&nbsp;","其他字符",$str)”语句,可将nbsp符替换为其他字符;2、用“preg_replace("/(\s|\&nbsp\;||\xc2\xa0)/","其他字符",$str)”语句。

php怎么查找字符串是第几位php怎么查找字符串是第几位Apr 22, 2022 pm 06:48 PM

查找方法:1、用strpos(),语法“strpos("字符串值","查找子串")+1”;2、用stripos(),语法“strpos("字符串值","查找子串")+1”。因为字符串是从0开始计数的,因此两个函数获取的位置需要进行加1处理。

See all articles

핫 AI 도구

Undresser.AI Undress

Undresser.AI Undress

사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover

AI Clothes Remover

사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool

Undress AI Tool

무료로 이미지를 벗다

Clothoff.io

Clothoff.io

AI 옷 제거제

AI Hentai Generator

AI Hentai Generator

AI Hentai를 무료로 생성하십시오.

뜨거운 도구

메모장++7.3.1

메모장++7.3.1

사용하기 쉬운 무료 코드 편집기

PhpStorm 맥 버전

PhpStorm 맥 버전

최신(2018.2.1) 전문 PHP 통합 개발 도구

SublimeText3 Mac 버전

SublimeText3 Mac 버전

신 수준의 코드 편집 소프트웨어(SublimeText3)

에디트플러스 중국어 크랙 버전

에디트플러스 중국어 크랙 버전

작은 크기, 구문 강조, 코드 프롬프트 기능을 지원하지 않음

mPDF

mPDF

mPDF는 UTF-8로 인코딩된 HTML에서 PDF 파일을 생성할 수 있는 PHP 라이브러리입니다. 원저자인 Ian Back은 자신의 웹 사이트에서 "즉시" PDF 파일을 출력하고 다양한 언어를 처리하기 위해 mPDF를 작성했습니다. HTML2FPDF와 같은 원본 스크립트보다 유니코드 글꼴을 사용할 때 속도가 느리고 더 큰 파일을 생성하지만 CSS 스타일 등을 지원하고 많은 개선 사항이 있습니다. RTL(아랍어, 히브리어), CJK(중국어, 일본어, 한국어)를 포함한 거의 모든 언어를 지원합니다. 중첩된 블록 수준 요소(예: P, DIV)를 지원합니다.