Home  >  Article  >  Database  >  How can Elasticsearch be used to achieve fuzzy matching for email and telephone numbers?

How can Elasticsearch be used to achieve fuzzy matching for email and telephone numbers?

Susan Sarandon
Susan SarandonOriginal
2024-10-28 06:08:30193browse

 How can Elasticsearch be used to achieve fuzzy matching for email and telephone numbers?

Fuzzy Matching Email and Telephone in Elasticsearch

Matching email addresses ending with a specific domain or telephone numbers starting with a specific prefix can be achieved using Elasticsearch's custom analyzers.

An effective solution involves tailoring analyzers for email and telephone fields. For email, an index analyzer that tokenizes using n-grams is employed, enabling matching on various email sections. For telephones, an edge-ngram analyzer indexes prefixes of varying lengths, facilitating efficient prefix matching.

Implementation details:

Analyzer Definitions for Emails:

  • index_email_analyzer: Tokenizes email values, generating n-grams (subsequences) of varying lengths (1-20 characters), ensuring a wide range of matching possibilities (e.g., "@gmail.com" tokens include "@g", "@@", "@gm", "@gma", etc.).
  • search_email_analyzer: Used during search, it merely tokenizes the input string, allowing for direct comparison against indexed tokens (e.g., a search for "@gmail.com" will match emails indexed using index_email_analyzer).

Analyzer Definitions for Telephones:

  • index_phone_analyzer: Tokenizes telephone numbers, extracting all possible prefixes, ensuring matches for partial input (e.g., searching for "136" will match "1362435647").
  • search_phone_analyzer: Processes search input, converting it into a tokenized form that can be matched against indexed telephone numbers (e.g., a search for "136" will be tokenized and compared against indexed tokens such as "136", "13", "1").

Example Index and Query:

PUT myindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        ...
        "index_email_analyzer": { ... },
        "search_email_analyzer": { ... },
        "index_phone_analyzer": { ... },
        "search_phone_analyzer": { ... }
        ...
      }
    }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "email": {
          "type": "string",
          "analyzer": "index_email_analyzer",
          "search_analyzer": "search_email_analyzer"
        },
        "phone": {
          "type": "string",
          "analyzer": "index_phone_analyzer",
          "search_analyzer": "search_phone_analyzer"
        }
      }
    }
  }
}

POST myindex
{ 
    "query": {
        "term": 
            { "email": "@gmail.com" }
    }
}

This approach provides efficient and customizable fuzzy matching for email and telephone fields in Elasticsearch, enabling flexible search capabilities.

The above is the detailed content of How can Elasticsearch be used to achieve fuzzy matching for email and telephone numbers?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn