Home >Database >Mysql Tutorial >How can I achieve efficient fuzzy matching for email addresses and phone numbers within Elasticsearch?

How can I achieve efficient fuzzy matching for email addresses and phone numbers within Elasticsearch?

Susan Sarandon
Susan SarandonOriginal
2024-10-31 09:19:01850browse

How can I achieve efficient fuzzy matching for email addresses and phone numbers within Elasticsearch?

Elasticsearch Fuzzy Email or Telephone Matching

Question:

How can fuzzy matching be implemented for email addresses or telephone numbers using Elasticsearch? Specifically, how can one match all emails ending with "@gmail.com" or all telephone numbers starting with "136"?

Answer:

Utilizing custom analyzers for indexing and searching can facilitate fuzzy matching for email and telephone data.

Email Fuzzy Matching:

Configure an analyzer with the following settings:

  • Index analyzer: index_email_analyzer

    • Standard tokenizer
    • Lowercase and name-ngram filters
    • Max gram: 20
  • Search analyzer: search_email_analyzer

    • Standard tokenizer
    • Lowercase filter

Telephone Number Fuzzy Matching:

Configure an analyzer with the following settings:

  • Index analyzer: index_phone_analyzer

    • Digit-only filter
    • Edge-ngram tokenizer (3-15 grams)
    • Min gram: 1
    • Max gram: 15
  • Search analyzer: search_phone_analyzer

    • Digit-only filter
    • Keyword tokenizer

Index Example:

PUT myindex
{
  "settings": {
    "analysis": {
      "analyzer": {
        "email_url_analyzer": {
          "type": "custom",
          "tokenizer": "uax_url_email",
          "filter": [ "trim" ]
        },
        "index_phone_analyzer": {
          "type": "custom",
          "char_filter": [ "digit_only" ],
          "tokenizer": "digit_edge_ngram_tokenizer",
          "filter": [ "trim" ]
        },
        "search_phone_analyzer": {
          "type": "custom",
          "char_filter": [ "digit_only" ],
          "tokenizer": "keyword",
          "filter": [ "trim" ]
        },
        "index_email_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [ "lowercase", "name_ngram_filter", "trim" ]
        },
        "search_email_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [ "lowercase", "trim" ]
        }
      },
      "char_filter": {
        "digit_only": {
          "type": "pattern_replace",
          "pattern": "\D+",
          "replacement": ""
        }
      },
      "tokenizer": {
        "digit_edge_ngram_tokenizer": {
          "type": "edgeNGram",
          "min_gram": "1",
          "max_gram": "15",
          "token_chars": [ "digit" ]
        }
      },
      "filter": {
        "name_ngram_filter": {
          "type": "ngram",
          "min_gram": "1",
          "max_gram": "20"
        }
      }
    }
  },
  "mappings": {
    "your_type": {
      "properties": {
        "email": {
          "type": "string",
          "analyzer": "index_email_analyzer",
          "search_analyzer": "search_email_analyzer"
        },
        "phone": {
          "type": "string",
          "analyzer": "index_phone_analyzer",
          "search_analyzer": "search_phone_analyzer"
        }
      }
    }
  }
}

Search Queries:

  • Match all emails ending with "@gmail.com":
POST myindex
{ 
    "query": {
        "term": 
            { "email": "@gmail.com" }
    }
}
  • Match all telephone numbers starting with "136":
POST myindex
{ 
    "query": {
        "term": 
            { "phone": "136" }
    }
}

By utilizing these custom analyzers, Elasticsearch can perform fuzzy matching for email addresses and telephone numbers efficiently.

The above is the detailed content of How can I achieve efficient fuzzy matching for email addresses and phone numbers within Elasticsearch?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn