Use of gensim library word2vec in Python-Python Tutorial-php.cn

Home

Backend Development

Python Tutorial

Use of gensim library word2vec in Python

不言

May 08, 2018 pm 02:24 PM

python

This article mainly introduces the use of the gensim library word2vec in Python. It has a certain reference value. Now I share it with you. Friends in need can refer to it.

pip install gensim After installing the library , you can import and use:

1. Training model definition

from gensim.models import Word2Vec 
model = Word2Vec(sentences, sg=1, size=100, window=5, min_count=5, negative=3, sample=0.001, hs=1, workers=4)

Parameter explanation:

1.sg=1 is the skip-gram algorithm, which is sensitive to low-frequency words; the default sg=0 is the CBOW algorithm.

2.size is the dimension of the output word vector. If the value is too small, the word mapping will affect the results due to conflicts. If the value is too large, it will consume memory and slow down the algorithm calculation. Generally, the value is 100 to between 200.

3.window is the maximum distance between the current word and the target word in the sentence. 3 means looking at 3-b words before the target word and b words after it (b is random between 0-3 ).

4.min_count is used to filter words. Words with a frequency less than min-count will be ignored. The default value is 5.

5. Negative and sample can be fine-tuned based on the training results. Sample indicates that higher frequency words are randomly downsampled to the set threshold. The default value is 1e-3.

6.hs=1 means hierarchical softmax will be used. By default hs=0 and negative is not 0, negative sampling will be selected.

7. Workers control the parallelism of training. This parameter is only valid after Cpython is installed, otherwise only a single core can be used.

For detailed parameter description, please view the word2vec source code.

2. Saving and loading the model after training

model.save(fname) 
model = Word2Vec.load(fname)

3. Model use (word similarity calculation, etc.)

model.most_similar(positive=[&#39;woman&#39;, &#39;king&#39;], negative=[&#39;man&#39;]) 
#输出[(&#39;queen&#39;, 0.50882536), ...] 
 
model.doesnt_match("breakfast cereal dinner lunch".split()) 
#输出&#39;cereal&#39; 
 
model.similarity(&#39;woman&#39;, &#39;man&#39;) 
#输出0.73723527 
 
model[&#39;computer&#39;] # raw numpy vector of a word 
#输出array([-0.00449447, -0.00310097, 0.02421786, ...], dtype=float32)

Other contents will not be described in detail, please refer to the details The official description of gensim's word2vec is very detailed.

The above is the detailed content of Use of gensim library word2vec in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Explain the performance differences in element-wise operations between lists and arrays.May 06, 2025 am 12:15 AM

Arraysarebetterforelement-wiseoperationsduetofasteraccessandoptimizedimplementations.1)Arrayshavecontiguousmemoryfordirectaccess,enhancingperformance.2)Listsareflexiblebutslowerduetopotentialdynamicresizing.3)Forlargedatasets,arrays,especiallywithlib

How can you perform mathematical operations on entire NumPy arrays efficiently?May 06, 2025 am 12:15 AM

Mathematical operations of the entire array in NumPy can be efficiently implemented through vectorized operations. 1) Use simple operators such as addition (arr 2) to perform operations on arrays. 2) NumPy uses the underlying C language library, which improves the computing speed. 3) You can perform complex operations such as multiplication, division, and exponents. 4) Pay attention to broadcast operations to ensure that the array shape is compatible. 5) Using NumPy functions such as np.sum() can significantly improve performance.

How do you insert elements into a Python array?May 06, 2025 am 12:14 AM

In Python, there are two main methods for inserting elements into a list: 1) Using the insert(index, value) method, you can insert elements at the specified index, but inserting at the beginning of a large list is inefficient; 2) Using the append(value) method, add elements at the end of the list, which is highly efficient. For large lists, it is recommended to use append() or consider using deque or NumPy arrays to optimize performance.

How can you make a Python script executable on both Unix and Windows?May 06, 2025 am 12:13 AM

TomakeaPythonscriptexecutableonbothUnixandWindows:1)Addashebangline(#!/usr/bin/envpython3)andusechmod xtomakeitexecutableonUnix.2)OnWindows,ensurePythonisinstalledandassociatedwith.pyfiles,oruseabatchfile(run.bat)torunthescript.

What should you check if you get a 'command not found' error when trying to run a script?May 06, 2025 am 12:03 AM

When encountering a "commandnotfound" error, the following points should be checked: 1. Confirm that the script exists and the path is correct; 2. Check file permissions and use chmod to add execution permissions if necessary; 3. Make sure the script interpreter is installed and in PATH; 4. Verify that the shebang line at the beginning of the script is correct. Doing so can effectively solve the script operation problem and ensure the coding process is smooth.

Why are arrays generally more memory-efficient than lists for storing numerical data?May 05, 2025 am 12:15 AM

Arraysaregenerallymorememory-efficientthanlistsforstoringnumericaldataduetotheirfixed-sizenatureanddirectmemoryaccess.1)Arraysstoreelementsinacontiguousblock,reducingoverheadfrompointersormetadata.2)Lists,oftenimplementedasdynamicarraysorlinkedstruct

How can you convert a Python list to a Python array?May 05, 2025 am 12:10 AM

ToconvertaPythonlisttoanarray,usethearraymodule:1)Importthearraymodule,2)Createalist,3)Usearray(typecode,list)toconvertit,specifyingthetypecodelike'i'forintegers.Thisconversionoptimizesmemoryusageforhomogeneousdata,enhancingperformanceinnumericalcomp

Can you store different data types in the same Python list? Give an example.May 05, 2025 am 12:10 AM

Python lists can store different types of data. The example list contains integers, strings, floating point numbers, booleans, nested lists, and dictionaries. List flexibility is valuable in data processing and prototyping, but it needs to be used with caution to ensure the readability and maintainability of the code.

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Roblox: Dead Rails - How To Tame Wolves

4 weeks agoByDDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

2 weeks agoByDDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

WebStorm Mac version

Useful JavaScript development tools

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

1660

1416

1310

1260

1233