search
HomeBackend DevelopmentPHP TutorialWord segmentation algorithm php unary word segmentation algorithm

Copy code The code is as follows:


/**
* Uniary word segmentation algorithm
* UTF8 encodes the next character. If the ASCII code of the first character is not greater than 192, it will only occupy 1 byte.
* If the ASCII code of the first character is greater than 192 and less than 224, it will occupy 2 bytes. Otherwise, it will occupy 3 characters. Section
* Uniary word segmentation needs to add ft_min_word_len=1 in the my.ini file of mysql
* You can use the mysql query statement show variables like '%ft%' to view the mysql full-text search related settings
*
* @access global
* @param string $str
* @param boolean $unique Whether to remove duplicate values ​​
* @param boolean $merge Whether to merge additional values ​​
* @return array
*/
function seg_word($str,$unique=false,$merge=true)
{
$str = trim(strip_tags($str ));
$strlen = strlen($str);
if($strlen == 0) return array();
$spc = ' ';
//Add characters to be filtered as needed
$search = array( ',', '/', '\', '.', ';', ':', ''', '!', '~','"', '`', '^', '( ', ')', '?', '-', "t", "n", ''', '', "r", "rn", '$', '& ', '%', '#', '@', '+', '=', '{', '}', '[', ']', ')', '(', '.', '. ', ', '! ', ';', '"', '"', ''', ''', ', ', '—', ' , '《', '》', '-', '…', '【', '】',':');
$numpairs = array('1'=>'一','2'= >'two','3'=>'three','4'=>'four','5'=>'five','6'=>'six','7'= >'seven','8'=>'eight','9'=>'nine','0'=>'zero');
$str = alab_num($str);
$str = str_replace($search,' ',$str);
$ord = $i = $k = 0;
$prechar = 0;// 0-blank 1-English and symbol 2-Chinese
$result = array( );
$annex = array();
while($ord = ord($str[$i]))
{
//1 byte character
if ($ord {
// Remove empty strings
if($ord $prechar=0;
$i++;
$k++;
continue;
}
//Additional Chinese uppercase number conversion
if(isset($numpairs[$ str[$i]])) {
$annex[]=$numpairs[$str[$i]];
}
//If the preceding character is Chinese
if( $prechar == 2 ){
$result[+ +$k] = $str[$i];
}
else {
$result[$k] .= $str[$i];
}
$prechar = 1;
$i++;
}
else / /2-3 byte characters (Chinese)
{
if($ord $step = 2;
else
$step = 3;
$c = substr($str,$i,$step) ;
if(false !== $key = array_search($c,$numpairs)){
$annex[] = $key;
}
if ($prechar != 0) {
$result[++$k ] = $c;
}
else {
$result[$k] .= $c;
}
$prechar = 2;
$i+=$step;
}
}
$result = $merge ? array_merge( $result,$annex) : $result ;
return $unique ? array_unique($result) : $result ;
}

The above introduces the word segmentation algorithm PHP unary word segmentation algorithm, including the content of word segmentation algorithm. I hope it will be helpful to friends who are interested in PHP tutorials.

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
What data can be stored in a PHP session?What data can be stored in a PHP session?May 02, 2025 am 12:17 AM

PHPsessionscanstorestrings,numbers,arrays,andobjects.1.Strings:textdatalikeusernames.2.Numbers:integersorfloatsforcounters.3.Arrays:listslikeshoppingcarts.4.Objects:complexstructuresthatareserialized.

How do you start a PHP session?How do you start a PHP session?May 02, 2025 am 12:16 AM

TostartaPHPsession,usesession_start()atthescript'sbeginning.1)Placeitbeforeanyoutputtosetthesessioncookie.2)Usesessionsforuserdatalikeloginstatusorshoppingcarts.3)RegeneratesessionIDstopreventfixationattacks.4)Considerusingadatabaseforsessionstoragei

What is session regeneration, and how does it improve security?What is session regeneration, and how does it improve security?May 02, 2025 am 12:15 AM

Session regeneration refers to generating a new session ID and invalidating the old ID when the user performs sensitive operations in case of session fixed attacks. The implementation steps include: 1. Detect sensitive operations, 2. Generate new session ID, 3. Destroy old session ID, 4. Update user-side session information.

What are some performance considerations when using PHP sessions?What are some performance considerations when using PHP sessions?May 02, 2025 am 12:11 AM

PHP sessions have a significant impact on application performance. Optimization methods include: 1. Use a database to store session data to improve response speed; 2. Reduce the use of session data and only store necessary information; 3. Use a non-blocking session processor to improve concurrency capabilities; 4. Adjust the session expiration time to balance user experience and server burden; 5. Use persistent sessions to reduce the number of data read and write times.

How do PHP sessions differ from cookies?How do PHP sessions differ from cookies?May 02, 2025 am 12:03 AM

PHPsessionsareserver-side,whilecookiesareclient-side.1)Sessionsstoredataontheserver,aremoresecure,andhandlelargerdata.2)Cookiesstoredataontheclient,arelesssecure,andlimitedinsize.Usesessionsforsensitivedataandcookiesfornon-sensitive,client-sidedata.

How does PHP identify a user's session?How does PHP identify a user's session?May 01, 2025 am 12:23 AM

PHPidentifiesauser'ssessionusingsessioncookiesandsessionIDs.1)Whensession_start()iscalled,PHPgeneratesauniquesessionIDstoredinacookienamedPHPSESSIDontheuser'sbrowser.2)ThisIDallowsPHPtoretrievesessiondatafromtheserver.

What are some best practices for securing PHP sessions?What are some best practices for securing PHP sessions?May 01, 2025 am 12:22 AM

The security of PHP sessions can be achieved through the following measures: 1. Use session_regenerate_id() to regenerate the session ID when the user logs in or is an important operation. 2. Encrypt the transmission session ID through the HTTPS protocol. 3. Use session_save_path() to specify the secure directory to store session data and set permissions correctly.

Where are PHP session files stored by default?Where are PHP session files stored by default?May 01, 2025 am 12:15 AM

PHPsessionfilesarestoredinthedirectoryspecifiedbysession.save_path,typically/tmponUnix-likesystemsorC:\Windows\TemponWindows.Tocustomizethis:1)Usesession_save_path()tosetacustomdirectory,ensuringit'swritable;2)Verifythecustomdirectoryexistsandiswrita

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.