Elasticsearch offers built-in capabilities for fuzzy matching of email addresses and telephone numbers.
To match email addresses ending with a specific domain (e.g., @gmail.com):
<code class="json">{ "query": { "term": { "email": ".*@gmail.com" } } }</code>
Or, to match emails containing a specific string:
<code class="json">{ "query": { "match": { "email": { "query": "sales@*", "operator": "and" } } } }</code>
For fuzzy matching of telephone numbers, you can use the following pattern:
<code class="json">{ "query": { "prefix": { "tel": "136*" } } }</code>
This will match all phone numbers starting with "136".
To improve performance for fuzzy matching, consider using custom analyzers that leverage n-gram or edge n-gram token filters. These filters break down the text into smaller tokens, making it easier for Elasticsearch to perform fuzzy matching.
Email Analyzer Configuration:
<code class="json">{ "settings": { "analysis": { "analyzer": { "email_analyzer": { "type": "custom", "tokenizer": "standard", "filter": [ "lowercase", "name_ngram_filter", "trim" ] } }, "filter": { "name_ngram_filter": { "type": "ngram", "min_gram": "3", "max_gram": "20" } } } } }</code>
Telephone Analyzer Configuration:
<code class="json">{ "settings": { "analysis": { "analyzer": { "phone_analyzer": { "type": "custom", "char_filter": [ "digit_only" ], "tokenizer": "digit_edge_ngram_tokenizer", "filter": [ "trim" ] } }, "char_filter": { "digit_only": { "type": "pattern_replace", "pattern": "\D+", "replacement": "" } }, "tokenizer": { "digit_edge_ngram_tokenizer": { "type": "edgeNGram", "min_gram": "3", "max_gram": "15", "token_chars": [ "digit" ] } } } } }</code>
The above is the detailed content of How to Perform Fuzzy Matching of Email Addresses and Telephone Numbers Using Elasticsearch?. For more information, please follow other related articles on the PHP Chinese website!