PHP levenshtein()

王林
王林Original
2024-08-29 12:54:331011browse

The levenshtein() is an inbuilt function in PHP which is used to determine a unit of distance called Levenshtein distance in comparison with two strings. The definition of Levenshtein distance stands for the total number of characters which are to be modified like replacing, inserting or deleting the input string to transform it into another string.

ADVERTISEMENT Popular Course in this category PHP DEVELOPER - Specialization | 8 Course Series | 3 Mock Tests

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

There is equal weightage given to all the above 3 modifications (replace, delete, insert) by default in PHP. But there is an option for us to input the cost or the weightage of each of these operations by giving the optional parameters for the above. The algorithm used for this function has a complexity of O(a*b) where a and b are the length of strings str1 and str2 respectively.

There are a few things to note of this function though:

  • This levenshtein() function is case insensitive one.
  • There is a similar function to levenshtein which is called the similar_text() function. As compared to that, levenshtein() function is a to faster but simiar_text() function returns a more accurate results with a limited number of changes needed. Also, levenshtein() is more expensive.

Syntax and Parameters

Here we discuss the syntax and parameters:

Syntax:

levenshtein(str1,str2,insert,replace,delete)

Parameters:

  • str1: Mandatory input parameter required and is the first string to be compared with.
  • str2: This is the second string to be compared with and is also a mandatory parameter.
  • insert: An optional parameter and represents the cost at which a character will be inserted.
  • replace: Also an optional one which represents the cost at which a character will be replaced.
  • delete: Another optional parameter representing the cost at which a character will be deleted.

The default value for all the last 3 parameters is 1.

Return Value: This function outputs the Levenshtein distance between the two input strings. It returns the value -1 if even any one of the total string characters crosses 255.

Examples of PHP levenshtein()

Let us take a few examples to understand the working of levenshtein function.

Example #1

Code:

<?php
// PHP code to determine levenshtein distance
// between 2 strings $s1 and $s2
$s1 = 'rdo';
$s2 = 'rst';
print_r(levenshtein($s1, $s2));
?>

Output:

PHP levenshtein()

This is a basic example where the 2 input strings s1 and s2 have one word each consisting of 3 different letters. Now the levenshtein function compares these 2 strings character by character and finds out the difference in the number of characters. Here there are 2 letters which are not in common out of the 3. So to make the first string the same as the second string we need to add the 2 letters “s,t” to it hence the output 2.

Example #2

Code:

<?php
// PHP code to determine levenshtein distance
// between 2 strings $s1 and $s2
$s1 = 'first string';
$s2 = 'second string';
print_r(levenshtein($s1, $s2));
?>

Output:

PHP levenshtein()

In this basic example, we can find out the levenshtein distance between the 2 input strings which are represented by s1 and s2 here. If we compare the characters of the two strings, we can see that they have one word in common I.e. “string”. And in the remaining words, it compares between “first” and “second” words and also with the common word “string”. Here the only letters not in common are “f,e,c,o,d” and the extra “s”. So levenshtein function returns the output as 6 meaning these 6 letters are the difference between these 2 input strings and using which these 2 strings can be made equal in terms of characters.

Example #3

Code:

<?php
// PHP code to determine levenshtein distance
// between $s1 and $s2
$s1 = 'Common Three Words';
$s2 = 'Common Words';
echo("The Levenshtein distance is: ");
print_r(levenshtein($s1, $s2));
?>

Output:

PHP levenshtein()

Here in this example, we can see that the first string has 3 words whereas the second string has only 2 words. And we can notice that both of these 2 words in the second string are already present in the first string. Hence the only difference in characters here will be the word “Three” which 5 characters. An interesting thing to notice here that the output gives 6 which means that even the extra space is also considered as a character.

Example #4

<?php
// Giving a misspelled word as input
$ip = 'giraffee';
// sample set array to compare with
$word_list = array('cat','dog','cow','elephant',
'giraffe','eagle','pigeon','parrot','rabbit');
// Since shortest distance is not found yet
$short = -1;
// Looping through array to find the closest word
foreach ($word_list as $word_list) {
// Calculating the levenshtein distance between
// input word and the current word
$levn = levenshtein($ip, $word_list);
// To check for the matching word
if ($levn == 0) {
// This is the closest one which is an perfect match
$closest = $word_list;
$short = 0;
// Here we break from foreach loop
// when the exact match is found
break;
}
// When the distance shown here is less than shortest distance
// found in next iteration or if the next shortest word is
// yet to be found
if ($levn <= $short || $short < 0) {
// Setting the shortest distance and one having
// closest match to the input word
$close = $word_list;
$short = $levn;
}
}
echo "Input word: $ip\n";
if ($short == 0) {
echo "The closest/exact match found to the input word is: $close\n";
} else {
echo "Did you mean to spell: $close?\n";
}
?>

Output:

PHP levenshtein()

The above example shows us one of the different cases where this levenshtein function can be implemented. Here we are helping the user to correct a misspelled word by comparing it with a pre-defined set of an array which has the list of correct words.

So at first, we are accepting an input word from the user which is typically misspelt (giraffee). We are defining an array set of correct animal names as shown which also has the correct spelling for input word (giraffe). A foreach loop is used to iterate through the array list and find the closest word which is matching with the input and this is done with the help of levenshtein function. The loop breaks when an exact match or the closest one is found. At the end, we compare the distance with the short parameter and if the distance is 0 it means that an exact match is found for the input word which is then printed in the output.

Conclusion

So basically levenshtein function returns the distance in integer values returned by comparing the character by character of the 2 input strings given to it. The first two parameters are the input strings which are mandatory and the last 3 parameters are optional which represent the cost of delete, insert or replace operations.

The above is the detailed content of PHP levenshtein(). For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Previous article:PHP parse_str()Next article:PHP parse_str()