Home  >  Article  >  Backend Development  >  Function code for calculating string similarity in PHP_PHP tutorial

Function code for calculating string similarity in PHP_PHP tutorial

WBOY
WBOYOriginal
2016-07-21 15:13:51831browse

similar_text — Calculate the similarity of two strings
int similar_text ( string $first , string $second [, float &$percent ] )
$first required. Specifies the first string to compare.
$second required. Specifies the second string to be compared.
$percent optional. Specifies the variable name used to store percent similarity.

The similarity between two strings is calculated as described by Oliver [1993]. Note that this implementation does not use the stack in Oliver's virtual code, but does make recursive calls, which may cause the entire process to be slower or faster. Also note that the complexity of this algorithm is O(N**3), where N is the length of the longest string.

For example, we want to find the similarity between the string abcdefg and the string aeg:

Copy the code The code is as follows:

$first = "abcdefg";
$second = "aeg";
echo similar_text($first, $second); Result output 3. If you want to display in percentage, you can use this The third parameter is as follows:
$first = "abcdefg";
$second = "aeg";
similar_text($first, $second, $percent);
echo $percent;


Usage and implementation process of similar_text function. The similar_text() function is mainly used to calculate the number of matching characters in two strings, and can also calculate the similarity (in percentage) of two strings. The levenshtein() function we are going to introduce today is faster compared to the similar_text() function. However, the similar_text() function provides more accurate results with fewer modifications required. You can consider using the levenshtein() function when you are pursuing speed but less accuracy, and the string length is limited.

Instructions for use

First read the description of the levenshtein() function in the manual:

levenshtein() function returns the Levenshtein between two strings distance.

Levenshtein distance, also known as edit distance, refers to the minimum number of edit operations required between two strings to convert one into the other. Permitted editing operations include replacing one character with another, inserting a character, and deleting a character.

For example, convert kitten to sitting:

sitten (k→s)
sittin (e→i)
sitting (→g) levenshtein() function for each operation (replacements, insertions and deletions) with equal weight. However, you can define the cost of each operation by setting the optional insert, replace, and delete parameters.

Syntax:

levenshtein(string1,string2,insert,replace,delete)

Parameter Description

•string1 Required. The first string to compare.
•string2 required. The second string to compare.
•insert optional. The cost of inserting a character. The default is 1.
•replace optional. The cost of replacing a character. The default is 1.
•delete Optional. The cost of deleting a character. The default is 1.
Tips and Notes

•The levenshtein() function returns -1 if one of the strings exceeds 255 characters.
•levenshtein() function is not case sensitive.
•levenshtein() function is faster than similar_text() function. However, the similar_text() function provides more accurate results that require fewer modifications.
Example

Copy code The code is as follows:

echo levenshtein(" Hello World","ello World");
echo "
";
echo levenshtein("Hello World","ello World",10,20,30);
? >

Output: 1 30

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/326419.htmlTechArticlesimilar_text — Calculate the similarity of two strings int similar_text ( string $first , string $second [, float $ second = "aeg"; echo similar_text($first, $second);The result output is 3. If you want to...
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn