Home >Backend Development >PHP Problem >What should I do if php cannot load scws?

What should I do if php cannot load scws?

藏色散人
藏色散人Original
2021-09-06 09:06:231988browse

php cannot load scws because it was not installed successfully. The solution is: 1. Find "scws-1.2.1.tar.bz2"; 2. Install through "make install"; 3. Install scws PHP extension; 4. Install the vocabulary library.

What should I do if php cannot load scws?

The operating environment of this article: Windows 7 system, PHP version 5.4, Dell G3 computer.

What should I do if scws cannot be loaded in php? Installation and usage examples of the open source PHP Chinese word segmentation system SCWS

1. Introduction to SCWS

SCWS is Simple Chinese Word The acronym for Segmentation (ie: Simple Chinese word segmentation system).

This is a mechanical Chinese word segmentation engine based on word frequency dictionary, which can basically correctly divide a whole paragraph of Chinese text into words. Word is the smallest morpheme unit in Chinese, but when written, words are not separated by spaces like English. Therefore, how to segment words accurately and quickly has always been a difficult problem in Chinese word segmentation.

SCWS is developed in pure C language and does not rely on any external library functions. It can directly use dynamic link libraries to embed applications. Supported Chinese encodings include GBK, UTF-8, etc. In addition, a PHP extension module is provided to quickly and easily use the word segmentation function in PHP.

There are not many innovative elements in the word segmentation algorithm. It uses the word frequency dictionary collected by itself, supplemented by certain proper names, names of people, place names, digital ages and other rule recognition to achieve basic word segmentation. The range test accuracy is between 90% and 95%, which can basically meet the needs of some small search engines, keyword extraction and other occasions. The first prototype version was released in late 2005.

SCWS was developed by hightman and released as open source under the BSD license. The source code is hosted on github.

2. scws installation

The code is as follows:

# wget -c http://www.xunsearch.com/scws/down/scws-1.2.1.tar.bz2
# tar jxvf scws-1.2.1.tar.bz2
# cd scws-1.2.1
# ./configure --prefix=/usr/local/scws
# make && make install

3. scws PHP extension installation

The code is as follows:

# cd ./phpext
# phpize
# ./configure --with-php-config=/usr/local/php5410/bin/php-config
# make && make install
# echo "[scws]" >> /usr/local/php5410/etc/php.ini
# echo "extension = scws.so" >> /usr/local/php5410/etc/php.ini
# echo "scws.default.charset = utf-8" >> /usr/local/php5410/etc/php.ini
# echo "scws.default.fpath = /usr/local/scws/etc/" >> /usr/local/php5410/etc/php.ini

4 , Thesaurus installation

code is as follows:

# wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
# tar jxvf scws-dict-chs-utf8.tar.bz2 -C /usr/local/scws/etc/
# chown www:www /usr/local/scws/etc/dict.utf8.xdb

5. PHP example code. You can read the SCWS official API description in detail

The code is as follows:

//实例化分词插件核心类
 $so = scws_new();
 //设置分词时所用编码
 $so->set_charset('utf-8');
 //设置分词所用词典(此处使用utf8的词典)
 $so->set_dict('/usr/local/scws/etc/dict.utf8.xdb');
 //设置分词所用规则
 $so->set_rule('/usr/local/scws/etc/rules.utf8.ini ');
 //分词前去掉标点符号
 $so->set_ignore(true);
 //是否复式分割,如“中国人”返回“中国+人+中国人”三个词。
 $so->set_multi(true);
 //设定将文字自动以二字分词法聚合
 $so->set_duality(true);
 //要进行分词的语句
 $so->send_text(“欢迎来到火星时代IT开发”);
 //获取分词结果,如果提取高频词用get_tops方法
 while ($tmp = $so->get_result())
 {
     print_r($tmp);
 }
 $so->close();

Return array result description:

The code is as follows:

word   _string_ 词本身 
idf        _float_ 逆文本词频 
off         _int_ 该词在原文本路的位置 
attr       _string_ 词性

Recommended learning:《PHP video tutorial

The above is the detailed content of What should I do if php cannot load scws?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn