Home >Backend Development >PHP Tutorial >Some options when elasticsearch creates an index

Some options when elasticsearch creates an index

WBOY
WBOYOriginal
2016-07-06 13:53:081260browse

我想用elasticsearch为博客的文章做站内搜索,后台用的php。

文章表articles的全部字段如下:

<code>id     title     content     user_id    created_at     updated_at</code>

现在我想为文章表的title字段、content字段、updated_at字段,共三个字段创建索引。

下面是我参照elasticsearch-php客户端的官方文档写的创建索引blog和创建类型article的demo,分词用到了ik分词。

其中有些选项不太清楚什么意思,具体问题在下面代码中(有4个),请大神帮解答一下,谢谢。

官方文档链接:https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/_index_management_operations.html#_create_an_index_advanced_example

<code>        $params = [
            'index' => 'blog',
            'body' => [
                'settings' => [
                    'number_of_shards' => 1,
                    'number_of_replicas' => 0,
                    'analysis' => [
                        'filter' => [
                            //1、这里的两个shingle应该改成article吗?
                            'shingle' => [
                                'type' => 'shingle'
                            ]
                        ],

                        //2、char_filter里面内容表示什么意思?包括pre_negs和post_negs。
                        'char_filter' => [

                            'pre_negs' => [
                                'type' => 'pattern_replace',
                                'pattern' => '(\\w+)\\s+((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\b',
                                'replacement' => '~$1 $2'
                            ],
                            'post_negs' => [
                                'type' => 'pattern_replace',
                                'pattern' => '\\b((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\s+(\\w+)',
                                'replacement' => '$1 ~$2'
                            ]
                        ],

                        //3、analyzer的内容需要怎么修改吗?
                        'analyzer' => [
                            'blog' => [
                                'type' => 'custom',
                                'tokenizer' => 'standard',
                                'filter' => ['lowercase', 'stop', 'kstem']
                            ]
                        ]
                    ]
                ],
                'mappings' => [
                    'article' => [
                        "_all" => [
                            "analyzer" => "ik_max_word",
                            "search_analyzer" => "ik_max_word",
                            "term_vector" => "no",
                            "store" => "false"
                        ],
                        'properties' => [
                            'title' => [
                                'type' => 'string',
                                'store' => 'no',
                                'term_vector' => 'with_positions_offsets',
                                'analyzer' => 'ik_max_word',
                                'search_analyzer' => 'ik_max_word',
                                'include_in_all' => 'true',
                                'boost' => 9
                            ],
                            'content' => [
                                'type' => 'string',
                                'store' => 'no',
                                'term_vector' => 'with_positions_offsets',
                                'analyzer' => 'ik_max_word',
                                'search_analyzer' => 'ik_max_word',
                                'include_in_all' => 'true',
                                'boost' => 8
                            ],
                            //4、时间只是用来在搜索的时候排序使用,下面的选项该怎么填写?
                            'updated_at' => [
                                'type' => '',
                                'store' => '',
                                'term_vector' => '',
                                'analyzer' => '',
                                'search_analyzer' => '',
                                'include_in_all' => '',
                                'boost' => 
                            ]
                        ]
                    ]


                ]
            ]
        ];
        $client->indices()->create($params);</code>

回复内容:

我想用elasticsearch为博客的文章做站内搜索,后台用的php。

文章表articles的全部字段如下:

<code>id     title     content     user_id    created_at     updated_at</code>

现在我想为文章表的title字段、content字段、updated_at字段,共三个字段创建索引。

下面是我参照elasticsearch-php客户端的官方文档写的创建索引blog和创建类型article的demo,分词用到了ik分词。

其中有些选项不太清楚什么意思,具体问题在下面代码中(有4个),请大神帮解答一下,谢谢。

官方文档链接:https://www.elastic.co/guide/en/elasticsearch/client/php-api/current/_index_management_operations.html#_create_an_index_advanced_example

<code>        $params = [
            'index' => 'blog',
            'body' => [
                'settings' => [
                    'number_of_shards' => 1,
                    'number_of_replicas' => 0,
                    'analysis' => [
                        'filter' => [
                            //1、这里的两个shingle应该改成article吗?
                            'shingle' => [
                                'type' => 'shingle'
                            ]
                        ],

                        //2、char_filter里面内容表示什么意思?包括pre_negs和post_negs。
                        'char_filter' => [

                            'pre_negs' => [
                                'type' => 'pattern_replace',
                                'pattern' => '(\\w+)\\s+((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\b',
                                'replacement' => '~$1 $2'
                            ],
                            'post_negs' => [
                                'type' => 'pattern_replace',
                                'pattern' => '\\b((?i:never|no|nothing|nowhere|noone|none|not|havent|hasnt|hadnt|cant|couldnt|shouldnt|wont|wouldnt|dont|doesnt|didnt|isnt|arent|aint))\\s+(\\w+)',
                                'replacement' => '$1 ~$2'
                            ]
                        ],

                        //3、analyzer的内容需要怎么修改吗?
                        'analyzer' => [
                            'blog' => [
                                'type' => 'custom',
                                'tokenizer' => 'standard',
                                'filter' => ['lowercase', 'stop', 'kstem']
                            ]
                        ]
                    ]
                ],
                'mappings' => [
                    'article' => [
                        "_all" => [
                            "analyzer" => "ik_max_word",
                            "search_analyzer" => "ik_max_word",
                            "term_vector" => "no",
                            "store" => "false"
                        ],
                        'properties' => [
                            'title' => [
                                'type' => 'string',
                                'store' => 'no',
                                'term_vector' => 'with_positions_offsets',
                                'analyzer' => 'ik_max_word',
                                'search_analyzer' => 'ik_max_word',
                                'include_in_all' => 'true',
                                'boost' => 9
                            ],
                            'content' => [
                                'type' => 'string',
                                'store' => 'no',
                                'term_vector' => 'with_positions_offsets',
                                'analyzer' => 'ik_max_word',
                                'search_analyzer' => 'ik_max_word',
                                'include_in_all' => 'true',
                                'boost' => 8
                            ],
                            //4、时间只是用来在搜索的时候排序使用,下面的选项该怎么填写?
                            'updated_at' => [
                                'type' => '',
                                'store' => '',
                                'term_vector' => '',
                                'analyzer' => '',
                                'search_analyzer' => '',
                                'include_in_all' => '',
                                'boost' => 
                            ]
                        ]
                    ]


                ]
            ]
        ];
        $client->indices()->create($params);</code>
Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn