I wrote a small python program of about 70 lines to calculate the similarity of documents.
The material is 88 paper documents, using the gensim package.
The process of the program is to preprocess the document (delete unnecessary symbols, word segmentation, etc.), calculate the tfidf value of the document, and establish the tfidf model and model index of 88 papers. Up to this point, the program is running normally, but when using the index, an error is reported:
What is the cause of this? Thank you~
The following is part of the source code that runs without problems:
#分词:
texts = [[word for word in document.split()]for document in documents]
#利用所有文档,创建词典
dictionary = corpora.Dictionary(texts)
#创建语料
corpus = [dictionary.doc2bow(text) for text in texts]
#利用这些语料,创建tfidf模型
tfidf_model = models.TfidfModel(corpus)
#计算每个文档的tfidf
tfidfs = tfidf_model[corpus]
#创建tfidf的索引
index = similarities.SparseMatrixSimilarity(tfidfs,num_features=88075)
A problem occurred while running this code:
#创建目标文档的语料
content = 'A student of music needs as long and as arduous a training to become a performer as a medical student needs to become a doctor'
content = content.lower().split()
test = dictionary.doc2bow(content)
#计算目标文档的tfidf
test_tfidf = tfidf_model[test]
sims = index[test_tfidf]#**就是这一句出现了问题!**
ringa_lee2017-05-18 10:49:38
What is your python version? Currently gensim
的版本?是否和官网测试过的稳定版一致?还有,建议使用类Unix系统,gensim基于 NumPy
和 Scipy
, it is difficult to install both of these on win. Even if installed, there may not be any problems
曾经蜡笔没有小新2017-05-18 10:49:38
This error may also be caused by the Windows operating system. If you copy the code to Google, you will find many solutions, such as this one:
某草草2017-05-18 10:49:38
http://www.wiki-errors.com/do... Just download and install it. Return to Baidu to ensure your safety.