Home > Article > Backend Development > How to use IKAnalyzer tokenizer to customize extended dictionary
This article mainly introduces how to use the IKAnalyzer word segmenter to customize the expanded dictionary. I hope you can learn patiently.
After downloading the IKAnalyzer complete distribution package, the IK Analyzer installation package contains:
1. "IKAnalyzer Chinese Word Segmenter V2012 User Manual"
2. IKAnalyzer2012.jar (main jar package)
3 . IKAnalyzer.cfg.xml (word segmenter extension configuration file)
4. stopword.dic (stop dictionary)
5. LICENSE.TXT; NOTICE.TXT (apache copyright statement)
Its installation and deployment are very simple Simple, deploy IKAnalyzer2012.jar in the lib directory of the project;
IKAnalyzer.cfg.xml and stopword.dic files are placed in the class root directory (for web projects, usually the
WEB-INF/classes directory, the same The hibernate, log4j and other configuration files are the same).
To extend the custom dictionary, open IKAnalyzer.cfg.xml
and remove the comments of the extended dictionary.
At the same time, create a new ext.dic file in the class root directory, that is, the src folder. After it is built, use Notepad software to open ext.dic.
Related recommendations:
Using Discuz keyword server to implement PHP Chinese word segmentation_PHP tutorial
The above is the detailed content of How to use IKAnalyzer tokenizer to customize extended dictionary. For more information, please follow other related articles on the PHP Chinese website!