Home >Java >javaTutorial >10 recommended articles about Chinese word segmenters

10 recommended articles about Chinese word segmenters

黄舟
黄舟Original
2017-06-12 11:38:072274browse

The goals of this article are twofold: 1. Learn to use the 11 major Java open source Chinese word segmenters 2. Comparatively analyze the word segmentation effects of the 11 major Java open source Chinese word segmenters. This article gives the usage methods and word segmentation of the 11 major Java open source Chinese word segmenters. The results are compared with the codes. As for which one has better results, the user should judge it by themselves based on their own application scenarios. 11 major Java open source Chinese word segmenters. Different word segmenters have different usages and different defined interfaces. Let’s first define a unified interface: /*** Obtain all word segmentation results of the text and compare the results of different word segmenters * @author Yang Shangchuan ​*/ public interface WordSegmenter { /** * Get all word segmentation results of the text  

1. Detailed explanation of how to use Java open source 11 Chinese word segmenters and comparison of word segmentation effects

10 recommended articles about Chinese word segmenters

Introduction: The goals of this article are twofold: 1. Learn to use the 11 major Java open source Chinese word segmenters 2. Comparatively analyze the word segmentation effects of the 11 major Java open source Chinese word segmenters This article gives the 11 major Java open source Chinese word segmenters How to use Java open source Chinese word segmentation and the word segmentation result comparison code. As for which one is better, the user must judge it based on their own application scenarios. 11 major Java open source Chinese word segmenters. Different word segmenters have different usages and different defined interfaces. Let’s first define a unified interface: /** * Get all the word segmentation results of the text and compare the results of different word segmenters * @ author Yang Shangchuan..

2. Write a simple Chinese word segmenter in Python

10 recommended articles about Chinese word segmenters

Introduction: After unzipping, take out the following files: Training data: icwb2-data/training/pku_ training.utf8 Test data: icwb2-data/testing/pku_ test.utf8 Correct word segmentation result: icw. ..

##3. solr4.4.0 integrates carrot2 to support Chinese and how to add your own Chinese word segmenter

Introduction: By default, carrot2 supports Chinese, but a parameter is required to specify carrot.lang= CHINESE_SIMPLIFIED. For the languages ​​supported by carrot2, please refer to http://doc.carrot2.org/#div.attribute.lingo.MultilingualClustering.defaultLanguage. But by default, The word segmentation class used by carrot2 is org.apache.luc

4. Robbe-1.6.0 Release

Introduction : Robbe is a high-performance PHP Chinese word segmentation extension built on the Friso Chinese word segmenter. It also supports segmentation of UTF-8/GBK encoding. Robbe-1.6.0: 1. Change the interface to apply to Friso-1.6.0. 2. Modified the UTF-8 test program, added multiple configuration test options, and added a GBK test program. 3. Changed rb_split, you can customize the return


The above is the detailed content of 10 recommended articles about Chinese word segmenters. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn