Home > Article > Backend Development > How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?
Calculating Cosine Similarity Between Sentence Strings
Given two strings representing sentences, there is a need to calculate their cosine similarity without using external libraries. Let us explore a Python implementation to achieve this.
The cosine similarity measures the angle between two vectors, typically representing documents or sentences in a vector space. A high cosine similarity value indicates that the sentences are similar, while a low value suggests they differ.
Step 1: Tokenization and Vectorization
To calculate cosine similarity, we must convert the sentences into vectors. We use a simple word-based tokenizer that splits the sentences into words and counts their occurrences:
<code class="python">import re from collections import Counter WORD = re.compile(r"\w+") def text_to_vector(text): words = WORD.findall(text) return Counter(words)</code>
Step 2: Calculating Cosine Similarity
The cosine similarity formula is:
cosine = (Numerator) / (Denominator)
where:
<code class="python">import math def get_cosine(vec1, vec2): intersection = set(vec1.keys()) & set(vec2.keys()) numerator = sum([vec1[x] * vec2[x] for x in intersection]) sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())]) sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())]) denominator = math.sqrt(sum1) * math.sqrt(sum2) if not denominator: return 0.0 else: return float(numerator) / denominator</code>
Step 3: Example Usage
Using the above functions, we can calculate the cosine similarity between two sentences:
<code class="python">text1 = "This is a foo bar sentence." text2 = "This sentence is similar to a foo bar sentence." vector1 = text_to_vector(text1) vector2 = text_to_vector(text2) cosine = get_cosine(vector1, vector2) print("Cosine:", cosine)</code>
The output will show a high cosine similarity value, indicating that the sentences are similar.
The above is the detailed content of How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?. For more information, please follow other related articles on the PHP Chinese website!