집 > 기사 > 기술 주변기기 > 메모리 스토리지 기반 Elasticsearch 배포 - 1억 개 이상의 데이터, 전체 텍스트 검색 100ms 응답

메모리 스토리지 기반 Elasticsearch 배포 - 1억 개 이상의 데이터, 전체 텍스트 검색 100ms 응답

WBOY원래의: 2024-06-07 11:11:48477검색

1. 호스트에 메모리 저장 디렉터리를 마운트합니다.

마운트할 디렉터리를 생성합니다.

mkdir /mnt/memory_storage

tmpfs 파일 시스템을 마운트합니다.

mount -t tmpfs -o size=800G tmpfs /mnt/memory_storage

저장 공간은 요청 시 사용됩니다. 100G가 사용됩니다. 저장할 때 메모리는 100G만 차지합니다. 호스트 노드에는 2T 메모리가 있고 여기에는 Elasticsearch 데이터를 저장하기 위해 800G 메모리가 할당됩니다.

미리 디렉터리를 생성하세요

mkdir /mnt/memory_storage/elasticsearch-data-es-jfs-prod-es-default-0mkdir /mnt/memory_storage/elasticsearch-data-es-jfs-prod-es-default-1mkdir /mnt/memory_storage/elasticsearch-data-es-jfs-prod-es-default-2

디렉토리를 미리 생성하지 않고 읽기 및 쓰기 권한을 부여하지 않으면 Elasticsearch 구성 요소를 시작할 수 없으며 여러 노드가 동일한 데이터 디렉터리를 사용하게 됩니다.

디렉터리 권한 구성

chmod -R 777 /mnt/memory_storage

DD IO 대역폭 테스트

dd if=/dev/zero of=/mnt/memory_storage/dd.txt bs=4M count=25002500+0 records in2500+0 records out10485760000 bytes (10 GB, 9.8 GiB) copied, 3.53769 s, 3.0 GB/s

파일 정리

rm -rf /mnt/memory_storage/dd.txt

FIO IO 대역폭 테스트

fio --name=test --filename=/mnt/memory_storage/fio_test_file --size=10G --rw=write --bs=4M --numjobs=1 --runtime=60 --time_basedRun status group 0 (all jobs):WRITE: bw=2942MiB/s (3085MB/s), 2942MiB/s-2942MiB/s (3085MB/s-3085MB/s), io=172GiB (185GB), run=60001-60001msec

파일 정리

rm -rf /mnt/memory_storage/fio_test_file

메모리 IO 대역폭 테스트

mbw 10000Long uses 8 bytes. Allocating 2*1310720000 elements = 20971520000 bytes of memory.Using 262144 bytes as blocks for memcpy block copy test.Getting down to business... Doing 10 runs per test.0 Method: MEMCPY Elapsed: 1.62143 MiB: 10000.00000 Copy: 6167.380 MiB/s1 Method: MEMCPY Elapsed: 1.63542 MiB: 10000.00000 Copy: 6114.656 MiB/s2 Method: MEMCPY Elapsed: 1.63345 MiB: 10000.00000 Copy: 6121.997 MiB/s3 Method: MEMCPY Elapsed: 1.63715 MiB: 10000.00000 Copy: 6108.161 MiB/s4 Method: MEMCPY Elapsed: 1.64429 MiB: 10000.00000 Copy: 6081.667 MiB/s5 Method: MEMCPY Elapsed: 1.62772 MiB: 10000.00000 Copy: 6143.574 MiB/s6 Method: MEMCPY Elapsed: 1.60684 MiB: 10000.00000 Copy: 6223.379 MiB/s7 Method: MEMCPY Elapsed: 1.62499 MiB: 10000.00000 Copy: 6153.876 MiB/s8 Method: MEMCPY Elapsed: 1.63967 MiB: 10000.00000 Copy: 6098.770 MiB/s9 Method: MEMCPY Elapsed: 2.97213 MiB: 10000.00000 Copy: 3364.588 MiB/sAVG Method: MEMCPY Elapsed: 1.76431 MiB: 10000.00000 Copy: 5667.937 MiB/s0 Method: DUMB Elapsed: 1.01521 MiB: 10000.00000 Copy: 9850.140 MiB/s1 Method: DUMB Elapsed: 0.85378 MiB: 10000.00000 Copy: 11712.605 MiB/s2 Method: DUMB Elapsed: 0.82487 MiB: 10000.00000 Copy: 12123.167 MiB/s3 Method: DUMB Elapsed: 0.84520 MiB: 10000.00000 Copy: 11831.463 MiB/s4 Method: DUMB Elapsed: 0.83050 MiB: 10000.00000 Copy: 12040.968 MiB/s5 Method: DUMB Elapsed: 0.84932 MiB: 10000.00000 Copy: 11774.194 MiB/s6 Method: DUMB Elapsed: 0.82491 MiB: 10000.00000 Copy: 12122.505 MiB/s7 Method: DUMB Elapsed: 1.44235 MiB: 10000.00000 Copy: 6933.144 MiB/s8 Method: DUMB Elapsed: 2.68656 MiB: 10000.00000 Copy: 3722.225 MiB/s9 Method: DUMB Elapsed: 8.44667 MiB: 10000.00000 Copy: 1183.898 MiB/sAVG Method: DUMB Elapsed: 1.86194 MiB: 10000.00000 Copy: 5370.750 MiB/s0 Method: MCBLOCK Elapsed: 4.52486 MiB: 10000.00000 Copy: 2210.013 MiB/s1 Method: MCBLOCK Elapsed: 4.82467 MiB: 10000.00000 Copy: 2072.683 MiB/s2 Method: MCBLOCK Elapsed: 0.84797 MiB: 10000.00000 Copy: 11792.870 MiB/s3 Method: MCBLOCK Elapsed: 0.84980 MiB: 10000.00000 Copy: 11767.516 MiB/s4 Method: MCBLOCK Elapsed: 0.87665 MiB: 10000.00000 Copy: 11407.113 MiB/s5 Method: MCBLOCK Elapsed: 0.85952 MiB: 10000.00000 Copy: 11634.468 MiB/s6 Method: MCBLOCK Elapsed: 0.84132 MiB: 10000.00000 Copy: 11886.154 MiB/s7 Method: MCBLOCK Elapsed: 0.84970 MiB: 10000.00000 Copy: 11768.915 MiB/s8 Method: MCBLOCK Elapsed: 0.86918 MiB: 10000.00000 Copy: 11505.150 MiB/s9 Method: MCBLOCK Elapsed: 0.85996 MiB: 10000.00000 Copy: 11628.434 MiB/sAVG Method: MCBLOCK Elapsed: 1.62036 MiB: 10000.00000 Copy: 6171.467 MiB/s

메모리를 파일 시스템으로 탑재할 때의 IO 대역폭은 메모리의 IO 대역폭의 절반밖에 도달할 수 없는 것 같습니다.

2. Kubernetes 클러스터에서 PVC 생성

환경 변수 구성

export NAMESPACE=data-centerexport PVC_NAME=elasticsearch-data-es-jfs-prod-es-default-0

PV 및 PVC 생성

kubectl create -f - <<EOFapiVersion: v1kind: PersistentVolumemetadata:name: ${PVC_NAME}namespace: ${NAMESPACE}spec:accessModes:- ReadWriteManycapacity:storage: 800GihostPath:path: /mnt/memory_storage/${PVC_NAME}---apiVersion: v1kind: PersistentVolumeClaimmetadata:name: ${PVC_NAME}namespace: ${NAMESPACE}spec:accessModes:- ReadWriteManyresources:requests:storage: 800GiEOF

PVC_NAME 변수를 수정하여 3개 이상의 PVC 애플리케이션을 생성하고 최종적으로 20개의 PVC를 생성했습니다. , 총 15TB 이상의 스토리지를 제공합니다.

3. Elasticsearch 관련 구성요소 배포

여기에서는 일부 내용이 생략되었습니다. 자세한 내용은 JuiceFS를 사용하여 Elasticsearch 데이터 저장[1]을 참조하세요.

Elasticsearch 배포

cat <<EOF | kubectl apply -f -apiVersion: elasticsearch.k8s.elastic.co/v1kind: Elasticsearchmetadata:namespace: $NAMESPACEname: es-jfs-prodspec:version: 8.3.2image: hubimage/elasticsearch:8.3.2http:tls:selfSignedCertificate:disabled: truenodeSets:- name: defaultcount: 3config:node.store.allow_mmap: falseindex.store.type: niofspodTemplate:spec:nodeSelector:servertype: Ascend910B-24initContainers:- name: sysctlsecurityContext:privileged: truerunAsUser: 0command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']- name: install-pluginscommand:- sh- -c- |bin/elasticsearch-plugin install --batch https://get.infini.cloud/elasticsearch/analysis-ik/8.3.2securityContext:runAsUser: 0runAsGroup: 0containers:- name: elasticsearchreadinessProbe:exec:command:- bash- -c- /mnt/elastic-internal/scripts/readiness-probe-script.shfailureThreshold: 10initialDelaySeconds: 30periodSeconds: 30successThreshold: 1timeoutSeconds: 30env:- name: "ES_JAVA_OPTS"value: "-Xms31g -Xmx31g"- name: "NSS_SDB_USE_CACHE"value: "no"resources:requests:cpu: 8memory: 64GiEOF

Elasticsearch 비밀번호 보기

kubectl -n $NAMESPACE get secret es-jfs-prod-es-elastic-user -o go-template='{{.data.elastic | base64decode}}'xxx

기본 사용자 이름은 elastic입니다.

Metricbeat 배포

kubectl apply -f - <<EOFapiVersion: beat.k8s.elastic.co/v1beta1kind: Beatmetadata:name: es-jfs-prodnamespace: $NAMESPACEspec:type: metricbeatversion: 8.3.2elasticsearchRef:name: es-jfs-prodconfig:metricbeat:autodiscover:providers:- type: kubernetesscope: clusterhints.enabled: truetemplates:- config:- module: kubernetesmetricsets:- eventperiod: 10sprocessors:- add_cloud_metadata: {}logging.json: truedeployment:podTemplate:spec:serviceAccountName: metricbeatautomountServiceAccountToken: true# required to read /etc/beat.ymlsecurityContext:runAsUser: 0EOF

Kibana

cat <<EOF | kubectl apply -f -apiVersion: kibana.k8s.elastic.co/v1kind: Kibanametadata:namespace: $NAMESPACEname: es-jfs-prodspec:version: 8.3.2count: 1image: hubimage/kibana:8.3.2elasticsearchRef:name: es-jfs-prodhttp:tls:selfSignedCertificate:disabled: trueEOF

Elasticsearch 클러스터 정보 보기

部署基于内存存储的 Elasticsearch - 一亿+条数据，全文检索 100ms 响应 이미지

4. 데이터 가져오기

인덱스 만들기

Elasticsearch 관리의 개발자 도구 페이지에서 수행:

Rreee

두 가지 주의 사항이 있습니다.

각 지점 유지 각 포인트는 슬라이스 크기는 10-50G 사이입니다. 여기서는 가져와야 하는 데이터가 수백 GB에 달하므로 number_of_shards는 30으로 설정됩니다.
롤링 업데이트 중에 포드의 데이터가 손실되지 않도록 복제본 수는 1개 이상입니다. Pod의 IP가 변경되면 Elasticsearch는 이를 새 노드로 간주하여 이전 데이터를 재사용할 수 없습니다. 이때 샤드를 재구축할 복사본이 없으면 데이터 손실이 발생합니다.

가져오기 도구 설치

Elasticdump 컨테이너를 사용하여 가져올 수도 있습니다. 아래 예가 있습니다. npm을 사용하여 여기에 설치했습니다.

PUT /bayou_tt_articles{"settings": {"index": {"number_of_shards": 30,"number_of_replicas": 1,"refresh_interval": "120s","translog.durability": "async","translog.sync_interval": "120s","translog.flush_threshold_size": "2048M"}},"mappings": {"properties": {"text": {"type": "text","analyzer": "ik_smart"}}}}

apt-get install npm -y

데이터 가져오기

npm install elasticdump -g

limit 表示每次导入的数据条数，默认值是 100 太小，建议在保障导入成功的前提下尽可能大一点。

查看索引速率

部署基于内存存储的 Elasticsearch - 一亿+条数据，全文检索 100ms 响应图片

索引速率达到 1w+/s，但上限远不止于此。因为，根据社区文档的压力测试结果显示，单个节点至少能提供 2W/s 的索引速率。

5. 测试与验证

全文检索性能显著提升

部署基于内存存储的 Elasticsearch - 一亿+条数据，全文检索 100ms 响应图片

上图是使用 JuiceFS 存储的全文检索速度为 18s，使用 SSD 节点的 Elasticsearch 的全文检索速度为 5s。下图是使用内存存储的 Elasticsearch 的全文检索速度为 100ms 左右。

部署基于内存存储的 Elasticsearch - 一亿+条数据，全文检索 100ms 响应图片

更新 Elasticsearch 不会丢数据

之前给 Elasticsearch Pod 分配的 CPU 和 Memory 太多，调整为 CPU 32C，Memory 64 GB。在滚动更新过程中，Elasticsearch 始终可用，并且数据没有丢失。

但务必注意设置 replicas > 1，尽量不要自行重启 Pod，虽然 Pod 是原节点更新。

能平稳实现节点的扩容

部署基于内存存储的 Elasticsearch - 一亿+条数据，全文检索 100ms 响应图片

由于业务总的 Elasticsearch 存储需求是 10T 左右，我继续增加节点到 10 个，Elasticsearch 的索引分片会自动迁移，均匀分布在这些节点上。

导出索引速度达 1w 条每秒

docker run --rm -ti elasticdump/elasticsearch-dump --limit 10000 --input=http://elastic:xxx@x.x.x.x:31391/bayou_tt_articles --output=/data/es-bayou_tt_articles-output.json --type=data

Wed, 29 May 2024 01:41:23 GMT | got 10000 objects from source elasticsearch (offset: 0)Wed, 29 May 2024 01:41:23 GMT | sent 10000 objects to destination file, wrote 10000Wed, 29 May 2024 01:41:24 GMT | got 10000 objects from source elasticsearch (offset: 10000)Wed, 29 May 2024 01:41:24 GMT | sent 10000 objects to destination file, wrote 10000Wed, 29 May 2024 01:41:25 GMT | got 10000 objects from source elasticsearch (offset: 20000)Wed, 29 May 2024 01:41:25 GMT | sent 10000 objects to destination file, wrote 10000Wed, 29 May 2024 01:41:25 GMT | got 10000 objects from source elasticsearch (offset: 30000)

导出速度能达到 1w 条每秒，一亿条数据大约需要 3h，基本也能满足索引的备份、迁移需求。

Elasticsearch 节点 Pod 更新时，不会发生漂移

更新之前的 Pod 分布节点如下：

NAME READY STATUSRESTARTSAGE IP NODE NOMINATED NODE READINESS GATESes-jfs-prod-beat-metricbeat-7fbdd657c4-djgg6 1/1 Running 6 (32m ago) 18h 10.244.54.5ascend-01 <none> <none>es-jfs-prod-es-default-0 1/1 Running 0 28m 10.244.46.82 ascend-07 <none> <none>es-jfs-prod-es-default-1 1/1 Running 0 29m 10.244.23.77 ascend-53 <none> <none>es-jfs-prod-es-default-2 1/1 Running 0 31m 10.244.49.65 ascend-20 <none> <none>es-jfs-prod-es-default-3 1/1 Running 0 32m 10.244.54.14 ascend-01 <none> <none>es-jfs-prod-es-default-4 1/1 Running 0 34m 10.244.100.239 ascend-40 <none> <none>es-jfs-prod-es-default-5 1/1 Running 0 35m 10.244.97.201ascend-39 <none> <none>es-jfs-prod-es-default-6 1/1 Running 0 37m 10.244.101.156 ascend-38 <none> <none>es-jfs-prod-es-default-7 1/1 Running 0 39m 10.244.19.101ascend-49 <none> <none>es-jfs-prod-es-default-8 1/1 Running 0 40m 10.244.16.109ascend-46 <none> <none>es-jfs-prod-es-default-9 1/1 Running 0 41m 10.244.39.119ascend-15 <none> <none>es-jfs-prod-kb-75f7bbd96-6tcrn 1/1 Running 0 18h 10.244.1.164 ascend-22 <none> <none>

更新之后的 Pod 分布节点如下：

NAME READY STATUSRESTARTSAGE IP NODE NOMINATED NODE READINESS GATESes-jfs-prod-beat-metricbeat-7fbdd657c4-djgg6 1/1 Running 6 (50m ago) 18h 10.244.54.5ascend-01 <none> <none>es-jfs-prod-es-default-0 1/1 Running 0 72s 10.244.46.83 ascend-07 <none> <none>es-jfs-prod-es-default-1 1/1 Running 0 2m35s 10.244.23.78 ascend-53 <none> <none>es-jfs-prod-es-default-2 1/1 Running 0 3m59s 10.244.49.66 ascend-20 <none> <none>es-jfs-prod-es-default-3 1/1 Running 0 5m34s 10.244.54.15 ascend-01 <none> <none>es-jfs-prod-es-default-4 1/1 Running 0 7m21s 10.244.100.240 ascend-40 <none> <none>es-jfs-prod-es-default-5 1/1 Running 0 8m44s 10.244.97.202ascend-39 <none> <none>es-jfs-prod-es-default-6 1/1 Running 0 10m 10.244.101.157 ascend-38 <none> <none>es-jfs-prod-es-default-7 1/1 Running 0 11m 10.244.19.102ascend-49 <none> <none>es-jfs-prod-es-default-8 1/1 Running 0 13m 10.244.16.110ascend-46 <none> <none>es-jfs-prod-es-default-9 1/1 Running 0 14m 10.244.39.120ascend-15 <none> <none>es-jfs-prod-kb-75f7bbd96-6tcrn 1/1 Running 0 18h 10.244.1.164 ascend-22 <none> <none>

这点打消了我的一个顾虑， Elasticsearch 的 Pod 重启时，发生了漂移，那么节点上是否会残留分片的数据，导致内存使用不断膨胀？答案是，不会。ECK Operator 似乎能让 Pod 在原节点进行重启，挂载的 Hostpath 数据依然对新的 Pod 有效，仅当主机节点发生重启时，才会丢失数据。

6. 总结

AI 的算力节点有大量空闲的 CPU 和 Memory 资源，使用这些大内存的主机节点，部署一些短生命周期的基于内存存储的高性能应用，有利于提高资源的使用效率。

本篇主要介绍了借助于 Hostpath 的内存存储部署 Elasticsearch 提供高性能查询能力的方案，具体内容如下：

将内存 mount 目录到主机上
创建基于 Hostpath 的 PVC，将数据挂载到上述目录
使用 ECK Operator 部署 Elasticsearch
Elasticsearch 更新时，数据并不会丢失，但不能同时重启多个主机节点
300+GB、一亿+条数据，全文检索响应场景中，基于 JuiceFS 存储的速度为 18s， SSD 节点的速度为 5s，内存节点的速度为 100ms

参考资料

[1]使用 JuiceFS 存储 Elasticsearch 数据: https://www.chenshaowen.com/blog/store-elasticsearch-data-in-juicefs.html

위 내용은 메모리 스토리지 기반 Elasticsearch 배포 - 1억 개 이상의 데이터, 전체 텍스트 검색 100ms 응답의 상세 내용입니다. 자세한 내용은 PHP 중국어 웹사이트의 기타 관련 기사를 참조하세요!

html npm operator elasticsearch kubernetes https

성명：

이전 기사：Karpathy의 새로운 튜토리얼이 입소문이 나고 네티즌들은 서둘러 그에게 H100: GPT-2 훈련을 처음부터 다시 만들기를 제공합니다.다음 기사：Karpathy의 새로운 튜토리얼이 입소문이 나고 네티즌들은 서둘러 그에게 H100: GPT-2 훈련을 처음부터 다시 만들기를 제공합니다.