Elasticsearch Version: 5.4
Elasticsearch Quick Start Part 1: Getting Started with Elasticsearch
Elasticsearch Quick Start Part 1 2 articles: Elasticsearch and Kibana installation
Elasticsearch quick start article 3: Elasticsearch index and document operations
Elasticsearch quick start article 4: Elasticsearch document query
Elasticsearch is a highly scalable open source full-text search and analysis engine. It can store, search and analyze large-scale data quickly and in near real-time. It is generally used as the underlying engine/technology to provide strong support for applications with complex search functions and requirements.
Elasticsearch can be used in these places:
Suppose there is an online store website, in order to allow customers to search for products on sale. In this case, you can use Elasticsearch to store your entire product catalog and inventory and provide searches and automatically give them some suggestions.
Suppose you want to collect logs or transaction data and find trends, statistics, summaries or anomalies through analysis and mining. In this case, you can use LogStash (part of the Elasticsearch/Logstash/Kibana stack) to collect, aggregate and parse your data, and then use LogStash Submit this data to Elasticsearch . Once Elasticsearch has obtained the data, you can search and aggregate the information that interests you.
Suppose you run a price alert platform and let price-savvy customers specify a rule such as “I am interested in purchasing a specific electronic gadget if, within the next month, there is a seller Price is less than $x, I want to be notified". In this case, you can submit the seller's price to Elasticsearch , use a reverse search (filter), match the price changes to the customer query, and notify the customer once a match is found.
Suppose you have an analytical (business intelligence) need and want to quickly investigate, analyze, visualize and find an ad-hoc problem in large amounts of data (think millions or billions of records) . In this case, you can use Elasticsearch to store the data, and then use Kibana (part of the Elasticsearch stack) to build custom dashboards that can be visualized for you important data. In addition, you can use the Elasticsearch aggregation function to perform complex business intelligence queries based on data.
For the rest of this tutorial, I will guide you through the startup and running process of Elasticsearch , and show you some basic operations, such as: indexing, Search and modify data. By the end of this tutorial, you will have a deeper understanding of what Elasticsearch is and how it works. Hopefully you'll be inspired to use it to both build sophisticated search applications and discover useful things from your data.
Basic Concepts (Basic Concepts)
There are some concepts that are the core of Elasticsearch . Understanding these concepts from the beginning will greatly aid later learning.
Near Real Time (NRT)
Elasticsearch is a near real-time search platform. This means there is only a slight delay (usually 1 second) from the time a document is indexed to the time it becomes searchable.
Cluster (Cluster)
A cluster is a collection of one or more nodes (servers) that unite to save all data, and Indexing and search operations can be performed on all nodes. Clusters are identified by a unique name, which defaults to "elasticsearch". Since a node can only belong to one cluster and join the cluster according to the cluster name. So the name is important.
Do not use the same cluster name in different environments, otherwise the wrong cluster may be added. For example, you can use cluster names, logging-dev , logging-stage and logging-prod in development, staging, and production environments respectively.
Note that a cluster with only one node is valid and perfect. It is also possible to have multiple independent clusters, each with its own unique cluster name.
Node (Node)
A node is a single server that is part of the cluster, stores data, and participates in the indexing and search of the cluster. Like the cluster, nodes are also distinguished by unique names. The default name is a random UUID (Universally Unique IDentifier), which will be set to the node when the server starts. You can also customize the node name if you don't want to use the default value. Names are very important to administrators, as they help you identify which nodes correspond to each server in the cluster.
Nodes can join the specified cluster by configuring the cluster name. By default, nodes join a cluster called elasticsearch , which means that if you start a large number of nodes in the network and if they can all communicate with each other, they will automatically be added to a cluster. The cluster named elasticsearch .
Index
Index is a collection of documents with certain similar characteristics. For example, customer data index, product catalog index, and order data index. An index is identified by a name (which must be all lowercase) that is used when indexing, searching, updating, and deleting documents. Within a single cluster, you can define as many indexes as needed.
Type (Type)
An index can define one or more types. A type is a logical category/partition of an index, whatever you want to understand it to be. Typically, a type is defined for documents that have a common set of fields. For example, a blogging platform might store all data in a single index. In this index, you can define user data types, blog data types, and comment data types.
Document (document)
Document is the basic unit that can be indexed. For example, use a document to save data about a customer, or save data about a single product, or save data about a single order. Documents are represented using JSON. A large number of documents can be stored in an index/type. It is worth noting that although the document is essentially stored in the index, it is actually indexed/assigned to a type in the index.
Shards & replicas
An index may store massive amounts of data, which may exceed the hard disk capacity of a single node. For example, an index stores 1 billion documents and occupies 1 TB of hard disk space. The hard disk of a single node may not be enough to store such a large amount of data. Even if it can be stored, it may slow down the server's processing speed of search requests.
In order to solve this problem, elasticsearch provides the sharding function, which is to subdivide the index. When creating an index, you can simply define the number of shards required. Each shard itself has all the functions of an index and can be stored on any node in the cluster.
Sharding is important for two main reasons:
It allows you to split/scale your content volume horizontally
It allows you to distribute operations to shards on multiple nodes in parallel, thereby improving performance or throughput.
# The mechanism of shard distribution, and how its documents are aggregated back into search requests, is completely managed by Elasticsearch and is transparent to the user.
In a network/cloud environment where failure can occur at any time, sharding can be very useful and a failover mechanism is highly recommended to prevent the shard/node from going offline or disappearing. To do this, elasticsearch allows you to make one or more copies of the index's shards, which are so-called replicated shards, or simply replicas.
Replicas are important for two main reasons:
#To provide high availability if a shard/node fails. Therefore, it is important to note that a replica cannot be allocated on the same node as the original/primary shard it is copied from.
It allows you to scale search volume/throughput since searches can be performed in parallel on all replicas.
In summary, each index can be divided into multiple shards. Each index can also be replicated zero times (meaning no copies) or multiple times. Once replicated, each index will have a primary shard (the original shard that was replicated) and a secondary shard (a copy of the primary shard). The number of shards and replicas can be defined per index when creating the index. After creating an index, you can dynamically change the number of replicas at any time, but you cannot change the number of shards afterwards.
By default, each index will be assigned 5 primary shards and 1 replica shard, which means that if you have two nodes in the cluster, your index will have 5 primary shards. shards and 5 replicated shards, for a total of 10 shards.
Each elasticsearch shard is a Lucene index. There can be many documents in a Lucene index. As of LUCENE-5843, up to 2,147,483,519 (= Integer.MAX_VALUE - 128) documents. You can use _cat/shards API monitors shard size.
Summary
1. Why not use a relational database for searching? Because the database is used to implement the search, the performance will be very poor and word segmentation search cannot be performed.
2. What are full-text search, inverted index and Lucene? Previous people have already summarized it, please refer to [Teaching you step-by-step full-text retrieval] A preliminary exploration of Apache Lucene
3. Characteristics of Elasticsearch
It can be distributed in clusters and handle massive data Perform near real-time processing;
is very simple for users to use out of the box. If the amount of data is not large, the operation will not be too complicated;
has functions that relational databases do not have, such as full-text search, synonym processing, relevance ranking, complex data analysis, and massive data processing Near real-time processing;
Based on Lucene, it hides complexity and provides simple and easy-to-use restful api interface and java api interface
4, The core concept of elasticsearch
Cluster: The cluster contains multiple nodes, and which cluster each node belongs to is determined by configuration (the default is elasticsearch)
-
Node: A node in the cluster. The node will automatically join the cluster named "elasticsearch" by default. An elasticsearch service is a node. For example, if a machine starts two es services, there will be two nodes.
Index: Index, equivalent to the mysql database, contains a bunch of document data with a similar structure.
Type: Type, equivalent to a mysql table, a logical data classification in the index.
Document: Document, equivalent to a row of records in the mysql table, is the smallest data unit in es.
shard: Sharding. A single machine cannot store a large amount of data. es can split the data in an index into multiple shards and distribute them for storage on multiple servers.
replica: Replica: In order to prevent downtime and shard loss, the minimum high availability configuration is 2 servers.
The above is the detailed content of What is Elasticsearch? Where can Elasticsearch be used?. For more information, please follow other related articles on the PHP Chinese website!

Redis与Elasticsearch的区别与使用场景随着互联网信息的快速发展和海量化,数据的高效存储和检索变得越来越重要。为此,NoSQL(NotOnlySQL)类型的数据库出现了,其中又以Redis和Elasticsearch较为流行。本文将对Redis和Elasticsearch进行比较,并探讨它们的使用场景。Redis与Elasticsearch

随着大数据和云计算技术的发展,搜索引擎也在不断创新。Elasticsearch,作为一个基于Lucene的全文搜索引擎,已经成为了一种流行的选择。这里将会介绍如何在PHP编程中使用Elasticsearch。安装Elasticsearch首先,我们需要安装和设置Elasticsearch。可以在官方网站下载和安装Elasticsearch,具体安装方法可以参

商品检索大家应该都在各种电商网站检索过商品,检索商品一般都是通过什么实现呢?搜索引擎Elasticsearch。那么问题来了,商品上架,数据一般写入到MySQL的数据库中,那么用于检索的数据又是怎么同步到Elasticsearch的呢?MySQL同步ES1.同步双写这是能想到的最直接的方式,在写入MySQL,直接也同步往ES里写一份数据。同步双写对于这种方式:优点:实现简单缺点:业务耦合,商品的管理中耦合大量数据同步代码影响性能,写入两个存储,响应时间变长不便扩展:搜索可能有一些个性化需求,需要

随着互联网的发展,企业面对的文本数据越来越庞大。如何快速、准确地检索出相关内容,成为企业在信息化领域的重要课题之一。Elasticsearch作为一个基于Lucene的开源搜索引擎,具有高可用性、高可扩展性和快速检索的特点,成为企业全文检索的首选方案之一。而PHP作为一门流行的服务器端编程语言,也能够快速进行Web开发和API开发,成为与Elasticsea

如何使用Elasticsearch和PHP构建智能问答系统引言:随着人工智能技术的快速发展,智能问答系统正逐渐成为人们获取信息的重要方式。Elasticsearch作为一个强大的搜索引擎,拥有快速、高效的全文搜索和分析能力,可以为智能问答系统提供强大的支持。本文将介绍如何使用Elasticsearch和PHP构建一个简单的智能问答系统,并提供相应的代码示例。

PHP和Elasticsearch实现的高性能文本分类技术引言:在当前的信息时代,文本分类技术被广泛应用于搜索引擎、推荐系统、情感分析等领域。而PHP是一种广泛使用的服务器端脚本语言,具有简单易学、效率高等特点。在本文中,我们将介绍如何利用PHP和Elasticsearch实现高性能的文本分类技术。一、Elasticsearch简介Elasticsearch

1.业务层同步由于对MySQL数据的操作也是在业务层完成的,所以在业务层同步操作另外的数据源也是很自然的,比较常见的做法就是在ORM的hooks钩子里编写相关同步代码。这种方式的缺点是,当服务越来越多时,同步的部分可能会过于分散从而导致难以更新迭代,例如对ES索引进行不兼容迁移时就可能会牵一发而动全身。2.中间件同步当应用架构演变为微服务时,各个服务里可能不再直接调用MySQL,而是通过一层middleware中间件,这时候就可以在中间件操作MySQL的同时同步其它数据源。这种方式需要中间件去适

1,引入依赖org.springframework.bootspring-boot-starter-data-elasticsearch2,编写实体映射类@Data@Document(indexName="index",createIndex=true)publicclassIndex{@IdprivateStringid;@Field(type=FieldType.Text,analyzer="ik_max_word",searchAnalyzer=&q


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Atom editor mac version download
The most popular open source editor

Dreamweaver Mac version
Visual web development tools

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.
