Home > Article > Backend Development > How can I calculate cosine similarity between two sentences without using external libraries?
Calculating Cosine Similarity for Sentence Strings
Cosine similarity is a measure of the correlation between two vectors. In the context of text processing, it can be used to determine the similarity between two sentences. To calculate cosine similarity for two strings without external libraries, follow these steps:
A simple Python implementation:
<code class="python">import math import re from collections import Counter WORD = re.compile(r"\w+") def get_cosine(vec1, vec2): intersection = set(vec1.keys()) & set(vec2.keys()) numerator = sum([vec1[x] * vec2[x] for x in intersection]) sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())]) sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())]) denominator = math.sqrt(sum1) * math.sqrt(sum2) if not denominator: return 0.0 else: return float(numerator) / denominator def text_to_vector(text): words = WORD.findall(text) return Counter(words)</code>
Example usage:
<code class="python">text1 = "This is a foo bar sentence ." text2 = "This sentence is similar to a foo bar sentence ." vector1 = text_to_vector(text1) vector2 = text_to_vector(text2) cosine = get_cosine(vector1, vector2) print("Cosine:", cosine)</code>
Output:
Cosine: 0.861640436855
Note that this implementation does not include TF-IDF weighting, which can improve the accuracy of cosine similarity for larger datasets.
The above is the detailed content of How can I calculate cosine similarity between two sentences without using external libraries?. For more information, please follow other related articles on the PHP Chinese website!