Home  >  Article  >  Backend Development  >  How to Calculate Cosine Similarity Between Sentence Strings in Python Without External Libraries?

How to Calculate Cosine Similarity Between Sentence Strings in Python Without External Libraries?

Linda Hamilton
Linda HamiltonOriginal
2024-10-31 14:30:021025browse

How to Calculate Cosine Similarity Between Sentence Strings in Python Without External Libraries?

Calculating Cosine Similarity of Sentence Strings without External Libraries

To calculate the cosine similarity between two text strings without external modules, a simple Python implementation can be employed. The fundamental cosine similarity formula is utilized in this process:

cos(θ) = (A · B) / (||A|| · ||B||)

Where:

  • A and B are two vectors representing the sentences.
  • A · B is the dot product of vectors A and B.
  • ||A|| and ||B|| are the respective magnitudes of vectors A and B.

Implementation

The following Python code provides a practical implementation of this formula:

<code class="python">import math
import re
from collections import Counter

WORD = re.compile(r"\w+")

def get_cosine(vec1, vec2):
    intersection = set(vec1.keys()) & set(vec2.keys())
    numerator = sum([vec1[x] * vec2[x] for x in intersection])

    sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])
    sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])
    denominator = math.sqrt(sum1) * math.sqrt(sum2)

    if not denominator:
        return 0.0
    else:
        return float(numerator) / denominator


def text_to_vector(text):
    words = WORD.findall(text)
    return Counter(words)</code>

To use this code, convert the sentence strings into vectors using the text_to_vector function and then calculate the cosine similarity using the get_cosine function:

<code class="python">text1 = "This is a foo bar sentence ."
text2 = "This sentence is similar to a foo bar sentence ."

vector1 = text_to_vector(text1)
vector2 = text_to_vector(text2)

cosine = get_cosine(vector1, vector2)

print("Cosine:", cosine)</code>

This will output the cosine similarity between the two sentence strings. Note that tf-idf weighting is not included in this implementation, but can be added if a suitable corpus is available.

The above is the detailed content of How to Calculate Cosine Similarity Between Sentence Strings in Python Without External Libraries?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn