Home  >  Article  >  Database  >  How to Perform Fuzzy Matching of Email Addresses and Telephone Numbers Using Elasticsearch?

How to Perform Fuzzy Matching of Email Addresses and Telephone Numbers Using Elasticsearch?

Linda Hamilton
Linda HamiltonOriginal
2024-11-01 05:33:27736browse

How to Perform Fuzzy Matching of Email Addresses and Telephone Numbers Using Elasticsearch?

Fuzzy Matching Email or Telephone Using Elasticsearch

Elasticsearch offers built-in capabilities for fuzzy matching of email addresses and telephone numbers.

Email Matching

To match email addresses ending with a specific domain (e.g., @gmail.com):

<code class="json">{
    "query": {
        "term": {
            "email": ".*@gmail.com"
        }
    }
}</code>

Or, to match emails containing a specific string:

<code class="json">{
    "query": {
        "match": {
            "email": {
                "query": "sales@*",
                "operator": "and"
            }
        }
    }
}</code>

Telephone Matching

For fuzzy matching of telephone numbers, you can use the following pattern:

<code class="json">{
    "query": {
        "prefix": {
            "tel": "136*"
        }
    }
}</code>

This will match all phone numbers starting with "136".

Performance Optimization

To improve performance for fuzzy matching, consider using custom analyzers that leverage n-gram or edge n-gram token filters. These filters break down the text into smaller tokens, making it easier for Elasticsearch to perform fuzzy matching.

Email Analyzer Configuration:

<code class="json">{
  "settings": {
    "analysis": {
      "analyzer": {
        "email_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "name_ngram_filter",
            "trim"
          ]
        }
      },
      "filter": {
        "name_ngram_filter": {
          "type": "ngram",
          "min_gram": "3",
          "max_gram": "20"
        }
      }
    }
  }
}</code>

Telephone Analyzer Configuration:

<code class="json">{
  "settings": {
    "analysis": {
      "analyzer": {
        "phone_analyzer": {
          "type": "custom",
          "char_filter": [
            "digit_only"
          ],
          "tokenizer": "digit_edge_ngram_tokenizer",
          "filter": [
            "trim"
          ]
        }
      },
      "char_filter": {
        "digit_only": {
          "type": "pattern_replace",
          "pattern": "\D+",
          "replacement": ""
        }
      },
      "tokenizer": {
        "digit_edge_ngram_tokenizer": {
          "type": "edgeNGram",
          "min_gram": "3",
          "max_gram": "15",
          "token_chars": [
            "digit"
          ]
        }
      }
    }
  }
}</code>

The above is the detailed content of How to Perform Fuzzy Matching of Email Addresses and Telephone Numbers Using Elasticsearch?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn