Backup elasticsearch with curator.

Сжато кратко, в падлу много расписывать...

Вообщем пробуем забэкапить elasticsearch 5.0 с помощью curator 4.2
Что имеем:
2 ноды
1) vapp-cn1
2) vapp-cn2

Репозиторий для бэкапа есть на обоих хостах находится в
/backup/el_backup/front
права для пользователя под который запущен elasticsearch есть, на обоих нодах в конфиге elasticsearch.yml указанно:
path.repo: ["/backup/el_backup/front"]

Настройка curator, бэкапить будем все индексы поэтому:

1. snapshot-script.yml

actions:

action: snapshot

description: >-

Snapshot logstash- prefixed indices older than 1 day (based on index

creation_date) with the default snapshot name pattern of

'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip

the repository filesystem access check. Use the other options to create

the snapshot.

options:

repository: elbackup

# Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'

name: esmcnfront-03-02-2017-182128

ignore_unavailable: False

include_global_state: True

partial: False

wait_for_completion: True

skip_repo_fs_check: False

timeout_override: 21600

continue_if_exception: False

disable_action: False

exclude:

filters:

- filtertype: pattern

kind: regex

value: '.*'

exclude:

2. curator.yml

client:

hosts:

- vapp-cn1

port: 9200

url_prefix:

use_ssl: False

certificate:

client_cert:

client_key:

aws_key:

aws_secret_key:

aws_region:

ssl_no_validate: False

http_auth:

timeout: 30

master_only: False

logging:

loglevel: INFO

logfile:

logformat: default

blacklist: ['elasticsearch', 'urllib3']

Запускаем всё это дело:

[root@vapp-cn1 curator]# curator --config ./curator.yml ./snapshot-script.yml

2017-02-03 19:53:28,656 INFO Preparing Action ID: 1, "snapshot"

2017-02-03 19:53:28,665 INFO Trying Action ID: 1, "snapshot": Snapshot logstash- prefixed indices older than 1 day (based on index creation_date) with the default snapshot name pattern of 'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip the repository filesystem access check. Use the other options to create the snapshot.

2017-02-03 19:53:28,730 ERROR Failed to complete action: snapshot. <class 'curator.exceptions.ActionError'>: Failed to verify all nodes have repository access: --- Got a 500 response from Elasticsearch. Error message: repository_verification_exception

Смотрим лог эластика:

[2017-02-03T19:53:28,728][WARN ][r.suppressed ] path: /_snapshot/elbackup/_verify, params: {repository=elbackup}

org.elasticsearch.transport.RemoteTransportException: [vapp-cn2.gksm.local][10.196.2.56:9300][cluster:admin/repository/verify]

Caused by: org.elasticsearch.repositories.RepositoryVerificationException: [elbackup] [[KEbls__wR_yr_ih1VCsnUw, 'RemoteTransportException[[vapp-cn1.gksm.local][10.196.2.55:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elbackup] a file written by master to the store [/backup/el_backup/front] cannot be accessed on the node [{vapp-cn1.gksm.local}{KEbls__wR_yr_ih1VCsnUw}{IKiOqQ17Tgev50PVqDyFhw}{10.196.2.55}{10.196.2.55:9300}]. This might indicate that the store [/backup/el_backup/front] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]

at org.elasticsearch.action.admin.cluster.repositories.verify.TransportVerifyRepositoryAction$1.onResponse(TransportVerifyRepositoryAction.java:74) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.action.admin.cluster.repositories.verify.TransportVerifyRepositoryAction$1.onResponse(TransportVerifyRepositoryAction.java:70) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.repositories.RepositoriesService$3.onResponse(RepositoriesService.java:223) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.repositories.RepositoriesService$3.onResponse(RepositoriesService.java:213) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.repositories.VerifyNodeRepositoryAction.finishVerification(VerifyNodeRepositoryAction.java:112) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.repositories.VerifyNodeRepositoryAction$1.handleException(VerifyNodeRepositoryAction.java:103) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:954) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.transport.TcpTransport.lambda$handleException$15(TcpTransport.java:1277) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.threadpool.ThreadPool.lambda$static$0(ThreadPool.java:147) ~[elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1275) [elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.transport.TcpTransport.handlerResponseError(TcpTransport.java:1267) [elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1211) [elasticsearch-5.0.0.jar:5.0.0]

at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-5.0.0.jar:5.0.0]

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:350) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) [netty-codec-4.1.5.Final.jar:4.1.5.Final]

at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) [netty-codec-4.1.5.Final.jar:4.1.5.Final]

at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) [netty-codec-4.1.5.Final.jar:4.1.5.Final]

at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) [netty-codec-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:350) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:610) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:513) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:467) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:437) [netty-transport-4.1.5.Final.jar:4.1.5.Final]

at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) [netty-common-4.1.5.Final.jar:4.1.5.Final]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]

Вообщем идём к тому что надо проверите репу....

[root@vapp-cn1 curator]# curl -XPOST http://vapp-cn1:9200/_snapshot/elbackup/_verify

{"error":{"root_cause":[{"type":"repository_verification_exception","reason":"[elbackup] [[KEbls__wR_yr_ih1VCsnUw, 'RemoteTransportException[[vapp-cn1.gksm.local][10.196.2.55:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elbackup] a file written by master to the store [/backup/el_backup/front] cannot be accessed on the node [{vapp-cn1.gksm.local}{KEbls__wR_yr_ih1VCsnUw}{IKiOqQ17Tgev50PVqDyFhw}{10.196.2.55}{10.196.2.55:9300}]. This might indicate that the store [/backup/el_backup/front] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]"}],"type":"repository_verification_exception","reason":"[elbackup] [[KEbls__wR_yr_ih1VCsnUw, 'RemoteTransportException[[vapp-cn1.gksm.local][10.196.2.55:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elbackup] a file written by master to the store [/backup/el_backup/front] cannot be accessed on the node [{vapp-cn1.gksm.local}{KEbls__wR_yr_ih1VCsnUw}{IKiOqQ17Tgev50PVqDyFhw}{10.196.2.55}{10.196.2.55:9300}]. This might indicate that the store [/backup/el_backup/front] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]"},"

Короче блять написал до хуя (жалко уже удалять), а в итоге надо просто попровить опцию skip_repo_fs_check

Вообщем:

actions:

action: snapshot

description: >-

Snapshot logstash- prefixed indices older than 1 day (based on index

creation_date) with the default snapshot name pattern of

'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip

the repository filesystem access check. Use the other options to create

the snapshot.

options:

repository: elbackup

# Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'

name: esmcnfront-03-02-2017-182128

ignore_unavailable: False

include_global_state: True

partial: False

wait_for_completion: True

skip_repo_fs_check: True

timeout_override: 21600

continue_if_exception: False

disable_action: False

exclude:

filters:

- filtertype: pattern

kind: regex

value: '.*'

exclude:

P.S. Оставлю пост ибо вдруг я разберусь какого хуя этот curl -XPOST http://vapp-cn1/_snapshot/elbackup/_verify не работает....!!

Weblogic Diagnostic Framework Run Bash Script (reboot managed server) - (bad practices)

Вообщем переодически сервер дохнет по: The WebLogic Server encountered a critical failure java.lang.OutOfMemoryError: Metaspace Reason: There is a panic condition in the server. The server is configured to exit on panic И хотя это гавно пишет типа Reason: There is a panic condition in the server. The server is configured to exit on panic чёт ни хуя он ни куда не exit.... Вообщем т.к разбираться с тем чем он жрётся нет ни времени ни желания (оно обязательно появится)... как вариант можно приделать костыль... костыль будет заключаться в ребуте сервера при возникновении этой ошибки. Что значит для этого надо... Желания и понимание что так жить нельзя, и вообще... Делаем новый модуль называем скажем: Reboot-OOMMetaSpace Идём в в новый модуль делаем Policy: Называем его OOM-Metaspace и говорит что это Server log: Жмём next в Configuration Policy пишем: log.logMessage.contains('java.lang.OutOfMemoryError: Metaspace'...

Далее...

Temporary blog... travel

Поиск по этому блогу

Weblogic Diagnostic Framework Run Bash Script (reboot managed server) - (bad practices)

Backup elasticsearch with curator.

Комментарии

Отправить комментарий

Популярные сообщения из этого блога

Oracle Cloud Control 12c/13c modify target setup Life Cycle Status (emcli, multiple targets)

Weblogic Diagnostic Framework Run Bash Script (reboot managed server) - (bad practices)