【对比】PHP检测提交的段落是否有重复行,哪一种更好?
写的两个函数,对比提交的文本段落里面重复的有无,发现一些问题:
(1)in_array()检测中文有的时候会有问题,明明存在却提示不存在,长文本的时候概率更高
(2)有时候短段落重复3~4次是允许的,但是如果用similar_text作对比就导致只要有一次重复就拒绝用户提交了。如何改进才更好
(3)还有没有更好的方法,求~
<br> <br> <br> function hasSimilarText($string)<br> {<br> $lineArr = explode("\n",$string);<br> $arrStr = $arrLen = array();<br> foreach($lineArr as $k => $v)<br> {<br> $arrLen[] = strlen($v);<br> $arrStr[] = $v;<br> }<br> <br> foreach($arrStr as $k1 => $v1)<br> {<br> foreach($arrStr as $k2 => $v2)<br> {<br> if($k1 == $k2) continue;<br> if($arrLen[$k2] 100) continue;<br> similar_text($v1, $v2, $pct);<br> if($pct > 90) return true;<br> }<br> }<br> return false;<br> }<br> <br> <br> /* 重复段落检测 */<br> function hasRepeatLine($string)<br> {<br> $string = str_replace(array("\t"," ","@","#","。",",",".",","),'',$string);<br> //$string = str_replace("\r","\n",$string);<br> $lineArr = explode("\n",$string);<br> $countShort = $countMiddle = $countLong = 0;<br> $arr = array();<br> <br> foreach($lineArr as $lineString)<br> {<br> $length = strlen( $lineString );<br> if($length if(in_array($lineString,$arr))<br> {<br> if($length {<br> $countShort++;<br> if($countShort > 4) return true;//5次<br> } elseif($length>12 && $length $countMiddle++;<br> if($countMiddle > 3) return true; //4次<br> } elseif($length>50 && $length $countLong++;<br> if($countLong > 2) return true; //3次 <div class="clear"> </div>