Home  >  Article  >  Backend Development  >  Tutorial on how to generate a word cloud using python

Tutorial on how to generate a word cloud using python

巴扎黑
巴扎黑Original
2017-06-23 15:33:123002browse

I have been busy with the final review and have spent some time writing scrapy frameworks. Today I will introduce how to use python to generate word clouds. Although there are many word cloud generation tools on the Internet, it would be more fulfilling to write it yourself in python.

What we are going to generate today is a word cloud of inspirational songs. We have found about 20 songs in Baidu Library, such as "Stubborn", "The Sea and the Sky", etc. that everyone is familiar with.

The python libraries to be used include jieba (a Chinese word segmentation library), wordcould, matplotlib, PIL, and numpy.

The first thing we need to do is read the lyrics. I saved the lyrics in the inspirational song text in the file directory.

Now let’s read it

#encoding=gbklyric= ''f=open('./励志歌曲歌词.txt','r')for i in f:
    lyric+=f.read()

#encoding=gbk is added to prevent subsequent operations from reporting SyntaxError: Non-UTF-8 code starting with '\xc0'
Then we use jieba word segmentation to segment the songs and extract words with high frequency

import jieba.analyse
result=jieba.analyse.textrank(lyric,topK=50,withWeight=True)
keywords = dict()for i in result:
    keywords[i[0]]=i[1]print(keywords)

Get the result:

Then We can generate word clouds through libraries such as wrodcloud

First find a picture to use as the shape of the word cloud

from PIL import Image,ImageSequenceimport numpy as npimport matplotlib.pyplot as pltfrom wordcloud import WordCloud,ImageColorGenerator
image= Image.open('./tim.jpg')
graph = np.array(image)
wc = WordCloud(font_path='./fonts/simhei.ttf',background_color='White',max_words=50,mask=graph)
wc.generate_from_frequencies(keywords)
image_color = ImageColorGenerator(graph)
plt.imshow(wc)
plt.imshow(wc.recolor(color_func=image_color))
plt.axis("off")
plt.show()

Save the generated image

wc.to_file('dream.png')


Full code:

#encoding=gbkimport jieba.analysefrom PIL import Image,ImageSequenceimport numpy as npimport matplotlib.pyplot as pltfrom wordcloud import WordCloud,ImageColorGenerator
lyric= ''f=open('./励志歌曲歌词.txt','r')for i in f:
    lyric+=f.read()


result=jieba.analyse.textrank(lyric,topK=50,withWeight=True)
keywords = dict()for i in result:
    keywords[i[0]]=i[1]print(keywords)


image= Image.open('./tim.jpg')
graph = np.array(image)
wc = WordCloud(font_path='./fonts/simhei.ttf',background_color='White',max_words=50,mask=graph)
wc.generate_from_frequencies(keywords)
image_color = ImageColorGenerator(graph)
plt.imshow(wc)
plt.imshow(wc.recolor(color_func=image_color))
plt.axis("off")
plt.show()
wc.to_file('dream.png')

The above is the detailed content of Tutorial on how to generate a word cloud using python. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn