search
HomeBackend DevelopmentPython TutorialPython + wordcloud + jieba learn to generate Chinese word cloud in ten minutes

Foregoing

Two Pythonclass libraries needed for this article

jieba: Chinese word segmentation tool

wordcloud: Word cloud generation tool under Python

In the last lesson, we learnedhow to make an English word cloud. In this article, we will explain how to make a Chinese word cloud. After reading this article, you will learn How to generate a word cloud from any Chinese text

Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes

Introduction to code composition

The code part comes from other people’s blogs, but due to bugs Or for reasons of operating efficiency, I have made major changes to the code

The first part of the code sets most of the parameters needed to run the code. You can easily use the code directly without making too many modifications.

The second part is some settings of jieba. Of course, you can also use the isCN parameter to cancel Chinese word segmentation

The third part is the settings of wordcloud, including image display and saving

##Use the code by comment ##
关于该程序的使用,你可以直接读注释在数分钟内学会如何使用它
# - * - coding: utf - 8 -*-
from os import path
from scipy.misc import imread
import matplotlib.pyplot as plt
import jieba
# jieba.load_userdict("txt\userdict.txt")
# 添加用户词库为主词典,原词典变为非主词典
from wordcloud import WordCloud, ImageColorGenerator
# 获取当前文件路径
# __file__ 为当前文件, 在ide中运行此行会报错,可改为
# d = path.dirname('.')
d = path.dirname(__file__)
stopwords = {}
isCN = 1 #默认启用中文分词
back_coloring_path = "img/lz1.jpg" # 设置背景图片路径
text_path = 'txt/lz.txt' #设置要分析的文本路径
font_path = 'D:\Fonts\simkai.ttf' # 为matplotlib设置中文字体路径没
stopwords_path = 'stopwords\stopwords1893.txt' # 停用词词表
imgname1 = "WordCloudDefautColors.png" # 保存的图片名字1(只按照背景图片形状)
imgname2 = "WordCloudColorsByImg.png"# 保存的图片名字2(颜色按照背景图片颜色布局生成)
my_words_list = ['路明非'] # 在结巴的词库中添加新词
back_coloring = imread(path.join(d, back_coloring_path))# 设置背景图片
# 设置词云属性
wc = WordCloud(font_path=font_path,  # 设置字体
               background_color="white",  # 背景颜色
               max_words=2000,  # 词云显示的最大词数
               mask=back_coloring,  # 设置背景图片
               max_font_size=100,  # 字体最大值
               random_state=42,
               width=1000, height=860, margin=2,# 设置图片默认的大小,但是如果使用背景图片的话,那么保存的图片大小将会按照其大小保存,margin为词语边缘距离
               )
# 添加自己的词库分词
def add_word(list):
    for items in list:
        jieba.add_word(items)
add_word(my_words_list)
text = open(path.join(d, text_path)).read()
def jiebaclearText(text):
    mywordlist = []
    seg_list = jieba.cut(text, cut_all=False)
    liststr="/ ".join(seg_list)
    f_stop = open(stopwords_path)
    try:
        f_stop_text = f_stop.read( )
        f_stop_text=unicode(f_stop_text,'utf-8')
    finally:
        f_stop.close( )
    f_stop_seg_list=f_stop_text.split('\n')
    for myword in liststr.split('/'):
        if not(myword.strip() in f_stop_seg_list) and len(myword.strip())>1:
            mywordlist.append(myword)
    return ''.join(mywordlist)
if isCN:
    text = jiebaclearText(text)
# 生成词云, 可以用generate输入全部文本(wordcloud对中文分词支持不好,建议启用中文分词),也可以我们计算好词频后使用generate_from_frequencies函数
wc.generate(text)
# wc.generate_from_frequencies(txt_freq)
# txt_freq例子为[('词a', 100),('词b', 90),('词c', 80)]
# 从背景图片生成颜色值
image_colors = ImageColorGenerator(back_coloring)
plt.figure()
# 以下代码显示图片
plt.imshow(wc)
plt.axis("off")
plt.show()
# 绘制词云
# 保存图片
wc.to_file(path.join(d, imgname1))
image_colors = ImageColorGenerator(back_coloring)
plt.imshow(wc.recolor(color_func=image_colors))
plt.axis("off")
# 绘制背景图片为颜色的图片
plt.figure()
plt.imshow(back_coloring, cmap=plt.cm.gray)
plt.axis("off")
plt.show()
# 保存图片
wc.to_file(path.join(d, imgname2))

Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes

Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes

Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes

##Summary

If you want to use this code to generate English words Cloud, then you need to set the isCN parameter to 0 and provide an English stop word list.

The above is the detailed content of Python + wordcloud + jieba learn to generate Chinese word cloud in ten minutes. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:CSDN. If there is any infringement, please contact admin@php.cn delete
How do you append elements to a Python array?How do you append elements to a Python array?Apr 30, 2025 am 12:19 AM

InPython,youappendelementstoalistusingtheappend()method.1)Useappend()forsingleelements:my_list.append(4).2)Useextend()or =formultipleelements:my_list.extend(another_list)ormy_list =[4,5,6].3)Useinsert()forspecificpositions:my_list.insert(1,5).Beaware

How do you debug shebang-related issues?How do you debug shebang-related issues?Apr 30, 2025 am 12:17 AM

The methods to debug the shebang problem include: 1. Check the shebang line to make sure it is the first line of the script and there are no prefixed spaces; 2. Verify whether the interpreter path is correct; 3. Call the interpreter directly to run the script to isolate the shebang problem; 4. Use strace or trusts to track the system calls; 5. Check the impact of environment variables on shebang.

How do you remove elements from a Python array?How do you remove elements from a Python array?Apr 30, 2025 am 12:16 AM

Pythonlistscanbemanipulatedusingseveralmethodstoremoveelements:1)Theremove()methodremovesthefirstoccurrenceofaspecifiedvalue.2)Thepop()methodremovesandreturnsanelementatagivenindex.3)Thedelstatementcanremoveanitemorslicebyindex.4)Listcomprehensionscr

What data types can be stored in a Python list?What data types can be stored in a Python list?Apr 30, 2025 am 12:07 AM

Pythonlistscanstoreanydatatype,includingintegers,strings,floats,booleans,otherlists,anddictionaries.Thisversatilityallowsformixed-typelists,whichcanbemanagedeffectivelyusingtypechecks,typehints,andspecializedlibrarieslikenumpyforperformance.Documenti

What are some common operations that can be performed on Python lists?What are some common operations that can be performed on Python lists?Apr 30, 2025 am 12:01 AM

Pythonlistssupportnumerousoperations:1)Addingelementswithappend(),extend(),andinsert().2)Removingitemsusingremove(),pop(),andclear().3)Accessingandmodifyingwithindexingandslicing.4)Searchingandsortingwithindex(),sort(),andreverse().5)Advancedoperatio

How do you create multi-dimensional arrays using NumPy?How do you create multi-dimensional arrays using NumPy?Apr 29, 2025 am 12:27 AM

Create multi-dimensional arrays with NumPy can be achieved through the following steps: 1) Use the numpy.array() function to create an array, such as np.array([[1,2,3],[4,5,6]]) to create a 2D array; 2) Use np.zeros(), np.ones(), np.random.random() and other functions to create an array filled with specific values; 3) Understand the shape and size properties of the array to ensure that the length of the sub-array is consistent and avoid errors; 4) Use the np.reshape() function to change the shape of the array; 5) Pay attention to memory usage to ensure that the code is clear and efficient.

Explain the concept of 'broadcasting' in NumPy arrays.Explain the concept of 'broadcasting' in NumPy arrays.Apr 29, 2025 am 12:23 AM

BroadcastinginNumPyisamethodtoperformoperationsonarraysofdifferentshapesbyautomaticallyaligningthem.Itsimplifiescode,enhancesreadability,andboostsperformance.Here'showitworks:1)Smallerarraysarepaddedwithonestomatchdimensions.2)Compatibledimensionsare

Explain how to choose between lists, array.array, and NumPy arrays for data storage.Explain how to choose between lists, array.array, and NumPy arrays for data storage.Apr 29, 2025 am 12:20 AM

ForPythondatastorage,chooselistsforflexibilitywithmixeddatatypes,array.arrayformemory-efficienthomogeneousnumericaldata,andNumPyarraysforadvancednumericalcomputing.Listsareversatilebutlessefficientforlargenumericaldatasets;array.arrayoffersamiddlegro

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function