Вообщем переодически сервер дохнет по: The WebLogic Server encountered a critical failure java.lang.OutOfMemoryError: Metaspace Reason: There is a panic condition in the server. The server is configured to exit on panic И хотя это гавно пишет типа Reason: There is a panic condition in the server. The server is configured to exit on panic чёт ни хуя он ни куда не exit.... Вообщем т.к разбираться с тем чем он жрётся нет ни времени ни желания (оно обязательно появится)... как вариант можно приделать костыль... костыль будет заключаться в ребуте сервера при возникновении этой ошибки. Что значит для этого надо... Желания и понимание что так жить нельзя, и вообще... Делаем новый модуль называем скажем: Reboot-OOMMetaSpace Идём в в новый модуль делаем Policy: Называем его OOM-Metaspace и говорит что это Server log: Жмём next в Configuration Policy пишем: log.logMessage.contains('java.lang.OutOfMemoryError: Metaspace'...
Сжато кратко, в падлу много расписывать...
Вообщем пробуем забэкапить elasticsearch 5.0 с помощью curator 4.2
Что имеем:
2 ноды
1) vapp-cn1
2) vapp-cn2
Репозиторий для бэкапа есть на обоих хостах находится в
/backup/el_backup/front
права для пользователя под который запущен elasticsearch есть, на обоих нодах в конфиге elasticsearch.yml указанно:
path.repo: ["/backup/el_backup/front"]
Вообщем пробуем забэкапить elasticsearch 5.0 с помощью curator 4.2
Что имеем:
2 ноды
1) vapp-cn1
2) vapp-cn2
Репозиторий для бэкапа есть на обоих хостах находится в
/backup/el_backup/front
права для пользователя под который запущен elasticsearch есть, на обоих нодах в конфиге elasticsearch.yml указанно:
path.repo: ["/backup/el_backup/front"]
Настройка curator, бэкапить будем все индексы поэтому:
1. snapshot-script.yml
actions:
1:
action: snapshot
description: >-
Snapshot logstash- prefixed indices older than 1 day (based on index
creation_date) with the default snapshot name pattern of
'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip
the repository filesystem access check. Use the other options to create
the snapshot.
options:
repository: elbackup
# Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
name: esmcnfront-03-02-2017-182128
ignore_unavailable: False
include_global_state: True
partial: False
wait_for_completion: True
skip_repo_fs_check: False
timeout_override: 21600
continue_if_exception: False
disable_action: False
exclude:
filters:
- filtertype: pattern
kind: regex
value: '.*'
exclude:
2. curator.yml
client:
hosts:
- vapp-cn1
port: 9200
url_prefix:
use_ssl: False
certificate:
client_cert:
client_key:
aws_key:
aws_secret_key:
aws_region:
ssl_no_validate: False
http_auth:
timeout: 30
master_only: False
logging:
loglevel: INFO
logfile:
logformat: default
blacklist: ['elasticsearch', 'urllib3']
Запускаем всё это дело:
[root@vapp-cn1 curator]# curator --config ./curator.yml ./snapshot-script.yml
2017-02-03 19:53:28,656 INFO Preparing Action ID: 1, "snapshot"
2017-02-03 19:53:28,665 INFO Trying Action ID: 1, "snapshot": Snapshot logstash- prefixed indices older than 1 day (based on index creation_date) with the default snapshot name pattern of 'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip the repository filesystem access check. Use the other options to create the snapshot.
2017-02-03 19:53:28,730 ERROR Failed to complete action: snapshot. <class 'curator.exceptions.ActionError'>: Failed to verify all nodes have repository access: --- Got a 500 response from Elasticsearch. Error message: repository_verification_exception
Смотрим лог эластика:
[2017-02-03T19:53:28,728][WARN ][r.suppressed ] path: /_snapshot/elbackup/_verify, params: {repository=elbackup}
org.elasticsearch.transport.RemoteTransportException: [vapp-cn2.gksm.local][10.196.2.56:9300][cluster:admin/repository/verify]
Caused by: org.elasticsearch.repositories.RepositoryVerificationException: [elbackup] [[KEbls__wR_yr_ih1VCsnUw, 'RemoteTransportException[[vapp-cn1.gksm.local][10.196.2.55:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elbackup] a file written by master to the store [/backup/el_backup/front] cannot be accessed on the node [{vapp-cn1.gksm.local}{KEbls__wR_yr_ih1VCsnUw}{IKiOqQ17Tgev50PVqDyFhw}{10.196.2.55}{10.196.2.55:9300}]. This might indicate that the store [/backup/el_backup/front] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]
at org.elasticsearch.action.admin.cluster.repositories.verify.TransportVerifyRepositoryAction$1.onResponse(TransportVerifyRepositoryAction.java:74) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.action.admin.cluster.repositories.verify.TransportVerifyRepositoryAction$1.onResponse(TransportVerifyRepositoryAction.java:70) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.repositories.RepositoriesService$3.onResponse(RepositoriesService.java:223) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.repositories.RepositoriesService$3.onResponse(RepositoriesService.java:213) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.repositories.VerifyNodeRepositoryAction.finishVerification(VerifyNodeRepositoryAction.java:112) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.repositories.VerifyNodeRepositoryAction$1.handleException(VerifyNodeRepositoryAction.java:103) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:954) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.transport.TcpTransport.lambda$handleException$15(TcpTransport.java:1277) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.threadpool.ThreadPool.lambda$static$0(ThreadPool.java:147) ~[elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.transport.TcpTransport.handleException(TcpTransport.java:1275) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.transport.TcpTransport.handlerResponseError(TcpTransport.java:1267) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1211) [elasticsearch-5.0.0.jar:5.0.0]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-5.0.0.jar:5.0.0]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:350) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) [netty-codec-4.1.5.Final.jar:4.1.5.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) [netty-codec-4.1.5.Final.jar:4.1.5.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) [netty-codec-4.1.5.Final.jar:4.1.5.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) [netty-codec-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:350) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:372) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:358) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:610) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:513) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:467) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:437) [netty-transport-4.1.5.Final.jar:4.1.5.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:873) [netty-common-4.1.5.Final.jar:4.1.5.Final]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_65]
Вообщем идём к тому что надо проверите репу....
[root@vapp-cn1 curator]# curl -XPOST http://vapp-cn1:9200/_snapshot/elbackup/_verify
{"error":{"root_cause":[{"type":"repository_verification_exception","reason":"[elbackup] [[KEbls__wR_yr_ih1VCsnUw, 'RemoteTransportException[[vapp-cn1.gksm.local][10.196.2.55:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elbackup] a file written by master to the store [/backup/el_backup/front] cannot be accessed on the node [{vapp-cn1.gksm.local}{KEbls__wR_yr_ih1VCsnUw}{IKiOqQ17Tgev50PVqDyFhw}{10.196.2.55}{10.196.2.55:9300}]. This might indicate that the store [/backup/el_backup/front] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]"}],"type":"repository_verification_exception","reason":"[elbackup] [[KEbls__wR_yr_ih1VCsnUw, 'RemoteTransportException[[vapp-cn1.gksm.local][10.196.2.55:9300][internal:admin/repository/verify]]; nested: RepositoryVerificationException[[elbackup] a file written by master to the store [/backup/el_backup/front] cannot be accessed on the node [{vapp-cn1.gksm.local}{KEbls__wR_yr_ih1VCsnUw}{IKiOqQ17Tgev50PVqDyFhw}{10.196.2.55}{10.196.2.55:9300}]. This might indicate that the store [/backup/el_backup/front] is not shared between this node and the master node or that permissions on the store don't allow reading files written by the master node];']]"},"
Короче блять написал до хуя (жалко уже удалять), а в итоге надо просто попровить опцию skip_repo_fs_check
Вообщем:
actions:
1:
action: snapshot
description: >-
Snapshot logstash- prefixed indices older than 1 day (based on index
creation_date) with the default snapshot name pattern of
'curator-%Y%m%d%H%M%S'. Wait for the snapshot to complete. Do not skip
the repository filesystem access check. Use the other options to create
the snapshot.
options:
repository: elbackup
# Leaving name blank will result in the default 'curator-%Y%m%d%H%M%S'
name: esmcnfront-03-02-2017-182128
ignore_unavailable: False
include_global_state: True
partial: False
wait_for_completion: True
skip_repo_fs_check: True
timeout_override: 21600
continue_if_exception: False
disable_action: False
exclude:
filters:
- filtertype: pattern
kind: regex
value: '.*'
exclude:
P.S. Оставлю пост ибо вдруг я разберусь какого хуя этот curl -XPOST http://vapp-cn1/_snapshot/elbackup/_verify не работает....!!
Комментарии
Отправить комментарий