Home  >  Article  >  Backend Development  >  How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?

How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?

DDD
DDDOriginal
2024-10-30 07:48:28512browse

How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?

Calculating Cosine Similarity Between Sentence Strings

Given two strings representing sentences, there is a need to calculate their cosine similarity without using external libraries. Let us explore a Python implementation to achieve this.

The cosine similarity measures the angle between two vectors, typically representing documents or sentences in a vector space. A high cosine similarity value indicates that the sentences are similar, while a low value suggests they differ.

Step 1: Tokenization and Vectorization

To calculate cosine similarity, we must convert the sentences into vectors. We use a simple word-based tokenizer that splits the sentences into words and counts their occurrences:

<code class="python">import re
from collections import Counter

WORD = re.compile(r"\w+")

def text_to_vector(text):
    words = WORD.findall(text)
    return Counter(words)</code>

Step 2: Calculating Cosine Similarity

The cosine similarity formula is:

cosine = (Numerator) / (Denominator)

where:

  • Numerator is the dot product of the two vectors.
  • Denominator is the product of the magnitudes of the two vectors.
<code class="python">import math

def get_cosine(vec1, vec2):
    intersection = set(vec1.keys()) & set(vec2.keys())
    numerator = sum([vec1[x] * vec2[x] for x in intersection])

    sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])
    sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])
    denominator = math.sqrt(sum1) * math.sqrt(sum2)

    if not denominator:
        return 0.0
    else:
        return float(numerator) / denominator</code>

Step 3: Example Usage

Using the above functions, we can calculate the cosine similarity between two sentences:

<code class="python">text1 = "This is a foo bar sentence."
text2 = "This sentence is similar to a foo bar sentence."

vector1 = text_to_vector(text1)
vector2 = text_to_vector(text2)

cosine = get_cosine(vector1, vector2)

print("Cosine:", cosine)</code>

The output will show a high cosine similarity value, indicating that the sentences are similar.

The above is the detailed content of How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn