Home  >  Article  >  Backend Development  >  How to use IKAnalyzer tokenizer to customize extended dictionary

How to use IKAnalyzer tokenizer to customize extended dictionary

坏嘻嘻
坏嘻嘻Original
2018-09-14 16:54:574781browse

This article mainly introduces how to use the IKAnalyzer word segmenter to customize the expanded dictionary. I hope you can learn patiently.

After downloading the IKAnalyzer complete distribution package, the IK Analyzer installation package contains:
1. "IKAnalyzer Chinese Word Segmenter V2012 User Manual"
2. IKAnalyzer2012.jar (main jar package)
3 . IKAnalyzer.cfg.xml (word segmenter extension configuration file)
4. stopword.dic (stop dictionary)
5. LICENSE.TXT; NOTICE.TXT (apache copyright statement)
Its installation and deployment are very simple Simple, deploy IKAnalyzer2012.jar in the lib directory of the project;
IKAnalyzer.cfg.xml and stopword.dic files are placed in the class root directory (for web projects, usually the
WEB-INF/classes directory, the same The hibernate, log4j and other configuration files are the same).

To extend the custom dictionary, open IKAnalyzer.cfg.xml

How to use IKAnalyzer tokenizer to customize extended dictionary

and remove the comments of the extended dictionary.

How to use IKAnalyzer tokenizer to customize extended dictionary

At the same time, create a new ext.dic file in the class root directory, that is, the src folder. After it is built, use Notepad software to open ext.dic.

How to use IKAnalyzer tokenizer to customize extended dictionary

Related recommendations:

Detailed explanation of how to use Java open source 11 Chinese word segmenters and comparison of word segmentation effects

Using Discuz keyword server to implement PHP Chinese word segmentation_PHP tutorial

The above is the detailed content of How to use IKAnalyzer tokenizer to customize extended dictionary. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn