search
HomeBackend DevelopmentPHP TutorialThinkphp32 uses scws Chinese word segmentation to extract keywords

SCWS is the acronym for Simple Chinese Word Segmentation (ie: Simple Chinese Word Segmentation System).
1. Download the classes officially provided by scws (the fourth version of pscws is used here)
http://www.xunsearch.com/scws/down/pscws4-20081221.tar.bz2
Download the XDB dictionary file (used here is the utf8 simplified Chinese dictionary package)
http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
2. Unzip the scws class Pscws.class.php (here I put pscws4 The .class.php file name has been changed to pscws.class.php) and XDB_R.class.php (here I have changed the xdb_r.class.php file name to uppercase XDB_R.class.php) and placed them under the ThinkPHPLibraryOrgUtil directory.
3. Then modify Pscws.class.php
Add the namespace

1 namespace Org\Util;

Change the name of the class to Pscws

把require_once (dirname(__FILE__) . '/XBD_R.class.php');这段代码删除掉。

Modify XDB_R.class.php
Add the namespace

namespace Org\Util;

4. Unzip the XDB dictionary file
Create a new dict folder in the Publicadmin directory, then unzip the dict.utf8.xdb of the XDB dictionary file to the word directory, and then unzip rules.utf8.ini under etc in the scws class Put it under this directory.
5. Add a line of constant definition code to the entry file (actually the path to define the dictionary file and configuration file)

define("CONF_PATH", dirname(__FILE__)."/Public/admin/dict/");

6. Create a private method in the IndexController.class.php controller for other methods to call

/**
     * 中文分词  
         * @params string $title 需要分词的语句 
         * @params int $num  分词个数,默认不用填写
     **/
    private function get_tags($title,$num=null){        
        $pscws = new \Org\Util\Pscws('utf8');
        $pscws->set_dict(CONF_PATH . 'dict.utf8.xdb');
        $pscws->set_rule(CONF_PATH . 'rules.utf8.ini');
        $pscws->set_ignore(true);
        $pscws->send_text($title);
        $words = $pscws->get_tops($num);
        $pscws->close();
        $tags = array();
        foreach ($words as $val) {
            $tags[] = $val['word'];
        }
        return implode(',', $tags);
    }
      /**
     * 商品搜索结果页
     **/
    public function search(){
        $rzt=$this->get_tags("新款 牛漆皮小尖头直跟高跟单鞋910033 灰羊猄(7.31发货) 39");
        print_r($rzt);
    }

The displayed result is:

漆皮,单鞋,尖头,高跟,新款,发货,910033,7.31,39

The above introduces Thinkphp32 to use scws Chinese word segmentation to extract keywords, including the require content. I hope it will be helpful to friends who are interested in PHP tutorials.


Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
php怎么把负数转为正整数php怎么把负数转为正整数Apr 19, 2022 pm 08:59 PM

php把负数转为正整数的方法:1、使用abs()函数将负数转为正数,使用intval()函数对正数取整,转为正整数,语法“intval(abs($number))”;2、利用“~”位运算符将负数取反加一,语法“~$number + 1”。

Python中的class类和method方法的使用方法Python中的class类和method方法的使用方法Apr 21, 2023 pm 02:28 PM

类和方法的概念和实例类(Class):用来描述具有相同的属性和方法的对象的集合。它定义了该集合中每个对象所共有的属性和方法。对象是类的实例。方法:类中定义的函数。类的构造方法__init__():类有一个名为init()的特殊方法(构造方法),该方法在类实例化时会自动调用。实例变量:在类的声明中,属性是用变量来表示的,这种变量就称为实例变量,实例变量就是一个用self修饰的变量。实例化:创建一个类的实例,类的具体对象。继承:即一个派生类(derivedclass)继承基类(baseclass)的

php怎么判断有没有小数点php怎么判断有没有小数点Apr 20, 2022 pm 08:12 PM

php判断有没有小数点的方法:1、使用“strpos(数字字符串,'.')”语法,如果返回小数点在字符串中第一次出现的位置,则有小数点;2、使用“strrpos(数字字符串,'.')”语句,如果返回小数点在字符串中最后一次出现的位置,则有。

php怎么设置implode没有分隔符php怎么设置implode没有分隔符Apr 18, 2022 pm 05:39 PM

在PHP中,可以利用implode()函数的第一个参数来设置没有分隔符,该函数的第一个参数用于规定数组元素之间放置的内容,默认是空字符串,也可将第一个参数设置为空,语法为“implode(数组)”或者“implode("",数组)”。

使用jQuery替换元素的class名称使用jQuery替换元素的class名称Feb 24, 2024 pm 11:03 PM

jQuery是一种经典的JavaScript库,被广泛应用于网页开发中,它简化了在网页上处理事件、操作DOM元素和执行动画等操作。在使用jQuery时,经常会遇到需要替换元素的class名的情况,本文将介绍一些实用的方法,以及具体的代码示例。1.使用removeClass()和addClass()方法jQuery提供了removeClass()方法用于删除

python中class是什么意思python中class是什么意思May 21, 2019 pm 05:10 PM

class是python中的一个关键字,用来定义一个类,定义类的方法:class后面加一个空格然后加类名;类名规则:首字母大写,如果多个单词用驼峰命名法,如【class Dog()】。

SpringBoot怎么通过自定义classloader加密保护class文件SpringBoot怎么通过自定义classloader加密保护class文件May 11, 2023 pm 09:07 PM

背景最近针对公司框架进行关键业务代码进行加密处理,防止通过jd-gui等反编译工具能够轻松还原工程代码,相关混淆方案配置使用比较复杂且针对springboot项目问题较多,所以针对class文件加密再通过自定义的classloder进行解密加载,此方案并不是绝对安全,只是加大反编译的困难程度,防君子不防小人,整体加密保护流程图如下图所示maven插件加密使用自定义maven插件对编译后指定的class文件进行加密,加密后的class文件拷贝到指定路径,这里是保存到resource/corecla

php怎么去除首位数字php怎么去除首位数字Apr 20, 2022 pm 03:23 PM

去除方法:1、使用substr_replace()函数将首位数字替换为空字符串即可,语法“substr_replace($num,"",0,1)”;2、用substr截取从第二位数字开始的全部字符即可,语法“substr($num,1)”。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.