Home > Article > Backend Development > Tutorial on how to generate a word cloud using python
I have been busy with the final review and have spent some time writing scrapy frameworks. Today I will introduce how to use python to generate word clouds. Although there are many word cloud generation tools on the Internet, it would be more fulfilling to write it yourself in python.
What we are going to generate today is a word cloud of inspirational songs. We have found about 20 songs in Baidu Library, such as "Stubborn", "The Sea and the Sky", etc. that everyone is familiar with.
The python libraries to be used include jieba (a Chinese word segmentation library), wordcould, matplotlib, PIL, and numpy.
The first thing we need to do is read the lyrics. I saved the lyrics in the inspirational song text in the file directory.
Now let’s read it
#encoding=gbklyric= ''f=open('./励志歌曲歌词.txt','r')for i in f: lyric+=f.read()
#encoding=gbk is added to prevent subsequent operations from reporting SyntaxError: Non-UTF-8 code starting with '\xc0'
Then we use jieba word segmentation to segment the songs and extract words with high frequency
import jieba.analyse result=jieba.analyse.textrank(lyric,topK=50,withWeight=True) keywords = dict()for i in result: keywords[i[0]]=i[1]print(keywords)
Get the result:
Then We can generate word clouds through libraries such as wrodcloud
First find a picture to use as the shape of the word cloud
from PIL import Image,ImageSequenceimport numpy as npimport matplotlib.pyplot as pltfrom wordcloud import WordCloud,ImageColorGenerator image= Image.open('./tim.jpg') graph = np.array(image) wc = WordCloud(font_path='./fonts/simhei.ttf',background_color='White',max_words=50,mask=graph) wc.generate_from_frequencies(keywords) image_color = ImageColorGenerator(graph) plt.imshow(wc) plt.imshow(wc.recolor(color_func=image_color)) plt.axis("off") plt.show()
Save the generated image
wc.to_file('dream.png')
Full code:
#encoding=gbkimport jieba.analysefrom PIL import Image,ImageSequenceimport numpy as npimport matplotlib.pyplot as pltfrom wordcloud import WordCloud,ImageColorGenerator lyric= ''f=open('./励志歌曲歌词.txt','r')for i in f: lyric+=f.read() result=jieba.analyse.textrank(lyric,topK=50,withWeight=True) keywords = dict()for i in result: keywords[i[0]]=i[1]print(keywords) image= Image.open('./tim.jpg') graph = np.array(image) wc = WordCloud(font_path='./fonts/simhei.ttf',background_color='White',max_words=50,mask=graph) wc.generate_from_frequencies(keywords) image_color = ImageColorGenerator(graph) plt.imshow(wc) plt.imshow(wc.recolor(color_func=image_color)) plt.axis("off") plt.show() wc.to_file('dream.png')
The above is the detailed content of Tutorial on how to generate a word cloud using python. For more information, please follow other related articles on the PHP Chinese website!