一、简介

etcd就是个分布式非关系型数据库.
3 个节点组成的集群,可以容忍 1 个节点故障。
生成环境中,不推荐使用单个节点的 etcd 集群。

  1. etcd 支持存储多个版本的数据,允许查询指定 key 历史版本的数据。
  2. etcd 为了控制数据总空间,会周期性的清理数据的历史版本。
  3. etcd 不支持修改旧版本的数据。
  4. etcd 中,数据以二进制的方式存储在磁盘中。

二、安装

参考链接:Releases · etcd-io/etcd · GitHub

2.1 使用脚本部署

ETCD_VER=v3.4.20

# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}

rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/etcd-download-test --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

cp /tmp/etcd-download-test/etcd /usr/bin/
cp /tmp/etcd-download-test/etcdctl /usr/bin/

etcd --version
etcdctl version

使用etcdctlv3的版本时,需设置环境变量ETCDCTL_API=3

vim /etc/profile

...
ETCDCTL_API=3
...

###
source /etc/profile

2.2 检查

[root@k8s-master][16:09:03][FAIL] ~/etcdctl/etcd-v3.4.20-linux-amd64 
#etcd --version
etcd Version: 3.4.20
Git SHA: 1e26823
Go Version: go1.16.15
Go OS/Arch: linux/amd64

[root@k8s-master][16:09:05][OK] ~/etcdctl/etcd-v3.4.20-linux-amd64 
#etcdctl version
etcdctl version: 3.4.20
API version: 3.4

三、使用

3.1 帮助信息

#etcdctl --help
NAME:
	etcdctl - A simple command line client for etcd3.

USAGE:
	etcdctl [flags]

VERSION:
	3.4.20

API VERSION:
	3.4


COMMANDS:
	alarm disarm		Disarms all alarms
	alarm list		Lists all alarms
	auth disable		Disables authentication
	auth enable		Enables authentication
	check datascale		Check the memory usage of holding data for different workloads on a given server endpoint.
	check perf		Check the performance of the etcd cluster
	compaction		Compacts the event history in etcd
	defrag			Defragments the storage of the etcd members with given endpoints
	del			Removes the specified key or range of keys [key, range_end)
	elect			Observes and participates in leader election
	endpoint hashkv		Prints the KV history hash for each endpoint in --endpoints
	endpoint health		Checks the healthiness of endpoints specified in `--endpoints` flag
	endpoint status		Prints out the status of endpoints specified in `--endpoints` flag
	get			Gets the key or a range of keys
	help			Help about any command
	lease grant		Creates leases
	lease keep-alive	Keeps leases alive (renew)
	lease list		List all active leases
	lease revoke		Revokes leases
	lease timetolive	Get lease information
	lock			Acquires a named lock
	make-mirror		Makes a mirror at the destination etcd cluster
	member add		Adds a member into the cluster
	member list		Lists all members in the cluster
	member promote		Promotes a non-voting member in the cluster
	member remove		Removes a member from the cluster
	member update		Updates a member in the cluster
	migrate			Migrates keys in a v2 store to a mvcc store
	move-leader		Transfers leadership to another etcd cluster member.
	put			Puts the given key into the store
	role add		Adds a new role
	role delete		Deletes a role
	role get		Gets detailed information of a role
	role grant-permission	Grants a key to a role
	role list		Lists all roles
	role revoke-permission	Revokes a key from a role
	snapshot restore	Restores an etcd member snapshot to an etcd directory
	snapshot save		Stores an etcd node backend snapshot to a given file
	snapshot status		Gets backend snapshot status of a given file
	txn			Txn processes all the requests in one transaction
	user add		Adds a new user
	user delete		Deletes a user
	user get		Gets detailed information of a user
	user grant-role		Grants a role to a user
	user list		Lists all users
	user passwd		Changes password of user
	user revoke-role	Revokes a role from a user
	version			Prints the version of etcdctl
	watch			Watches events stream on keys or prefixes

OPTIONS:
      --cacert=""				verify certificates of TLS-enabled secure servers using this CA bundle
      --cert=""					identify secure client using this TLS certificate file
      --command-timeout=5s			timeout for short running command (excluding dial timeout)
      --debug[=false]				enable client-side debug logging
      --dial-timeout=2s				dial timeout for client connections
  -d, --discovery-srv=""			domain name to query for SRV records describing cluster endpoints
      --discovery-srv-name=""			service name to query when using DNS discovery
      --endpoints=[127.0.0.1:2379]		gRPC endpoints
  -h, --help[=false]				help for etcdctl
      --hex[=false]				print byte strings as hex encoded strings
      --insecure-discovery[=true]		accept insecure SRV records describing cluster endpoints
      --insecure-skip-tls-verify[=false]	skip server certificate verification (CAUTION: this option should be enabled only for testing purposes)
      --insecure-transport[=true]		disable transport security for client connections
      --keepalive-time=2s			keepalive time for client connections
      --keepalive-timeout=6s			keepalive timeout for client connections
      --key=""					identify secure client using this TLS key file
      --password=""				password for authentication (if this option is used, --user option shouldn't include password)
      --user=""					username[:password] for authentication (prompt if password is not supplied)
  -w, --write-out="simple"			set the output format (fields, json, protobuf, simple, table)

3.2 指定etcd集群

HOST_1=10.240.0.17
HOST_2=10.240.0.18
HOST_3=10.240.0.19
ENDPOINTS=$HOST_1:2379,$HOST_2:2379,$HOST_3:2379

etcdctl --endpoints=$ENDPOINTS member list

3.3. 增删改查

3.3.1 增

 etcdctl --endpoints=$ENDPOINTS put foo "Hello World!"

3.3.2 查 

etcdctl --endpoints=$ENDPOINTS get foo
etcdctl --endpoints=$ENDPOINTS --write-out="json" get foo 

基于相同前缀查找 

etcdctl --endpoints=$ENDPOINTS put web1 value1
etcdctl --endpoints=$ENDPOINTS put web2 value2
etcdctl --endpoints=$ENDPOINTS put web3 value3

etcdctl --endpoints=$ENDPOINTS get web --prefix

3.3.3 删 

etcdctl --endpoints=$ENDPOINTS put key myvalue
etcdctl --endpoints=$ENDPOINTS del key

etcdctl --endpoints=$ENDPOINTS put k1 value1
etcdctl --endpoints=$ENDPOINTS put k2 value2
etcdctl --endpoints=$ENDPOINTS del k --prefix

3.3.4 集群状态

集群状态主要是etcdctl endpoint status 和etcdctl endpoint health两条命令。

etcdctl --write-out=table --endpoints=$ENDPOINTS endpoint status

+------------------+------------------+---------+---------+-----------+-----------+------------+
|     ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------+------------------+---------+---------+-----------+-----------+------------+
| 10.240.0.17:2379 | 4917a7ab173fabe7 | 3.0.0   | 45 kB   | true      |         4 |      16726 |
| 10.240.0.18:2379 | 59796ba9cd1bcd72 | 3.0.0   | 45 kB   | false     |         4 |      16726 |
| 10.240.0.19:2379 | 94df724b66343e6c | 3.0.0   | 45 kB   | false     |         4 |      16726 |
+------------------+------------------+---------+---------+-----------+-----------+------------+

etcdctl --endpoints=$ENDPOINTS endpoint health

10.240.0.17:2379 is healthy: successfully committed proposal: took = 3.345431ms
10.240.0.19:2379 is healthy: successfully committed proposal: took = 3.767967ms
10.240.0.18:2379 is healthy: successfully committed proposal: took = 4.025451ms
  • ENDPOINT:etcd 实例的访问端点。
  • ID:etcd 实例的唯一标识符。
  • VERSION:etcd 实例的版本号。
  • DB SIZE:etcd 数据库的大小。
  • IS LEADER:该实例是否是当前集群的领导者。
  • RAFT TERM:当前的 Raft 任期。
  • RAFT INDEX:当前的 Raft 日志索引。

其中以下两个值帮我们判断etcd集群数据一致性:

  • DB SIZE:

    • 含义DB SIZE 表示 etcd 数据库文件的大小。这是 etcd 实例当前存储的数据量的物理大小。
    • 作用:此值用于监控 etcd 存储的使用情况,可以帮助管理员确定是否需要扩展存储或者进行数据清理。大多数情况下,数据库的大小是定期增长的,具体取决于 etcd 中存储的数据量和写入速率。
  • RAFT INDEX:

    • 含义RAFT INDEX 表示当前 etcd 实例的 Raft 日志索引。Raft 是 etcd 用于分布式一致性的共识算法。RAFT INDEX 代表当前日志条目的索引位置。
    • 作用:此值用于了解 etcd 集群内的日志复制状态和一致性状态。较高的 RAFT INDEX 可能意味着大量的操作日志需要复制到其他 etcd 节点。这也用于帮助管理员排查和调试一致性问题。

3.3.5 集群成员

跟集群成员相关的命令如下:

member add          Adds a member into the cluster
member remove    Removes a member from the cluster
member update     Updates a member in the cluster
member list            Lists all members in the cluster 

 例如 etcdctl member list列出集群成员的命令。

etcdctl --endpoints=http://172.16.5.4:12379 member list -w table

+-----------------+---------+-------+------------------------+-----------------------------------------------+
|       ID        | STATUS  | NAME  |       PEER ADDRS       |                 CLIENT ADDRS                  |
+-----------------+---------+-------+------------------------+-----------------------------------------------+
| c856d92a82ba66a | started | etcd0 | http://172.16.5.4:2380 | http://172.16.5.4:2379,http://172.16.5.4:4001 |
+-----------------+---------+-------+------------------------+-----------------------------------------------+

3.4 指定授权文件

在执行etcdctl命令时需要指定认证授权文件, 所以将认证授权步骤 别名至 etcdctl 简化操作

# 指定ETCDCTL_API版本为3
$ export ETCDCTL_API=3

# 创建etcdctl别名,指定监听地址,和证书
$ alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key'

3.4.1 查看etcd集群的成员节点

#etcdctl member list -w table
+------------------+---------+------------+------------------------+------------------------+------------+
|        ID        | STATUS  |    NAME    |       PEER ADDRS       |      CLIENT ADDRS      | IS LEARNER |
+------------------+---------+------------+------------------------+------------------------+------------+
| 8dc8eb40f5ed7ad6 | started | k8s-master | https://10.0.0.16:2380 | https://10.0.0.16:2379 |      false |
+------------------+---------+------------+------------------------+------------------------+------------+

3.4.2 查看etcd集群节点状态

[root@k8s-master][16:19:15][OK] ~/etcdctl/etcd-v3.4.20-linux-amd64 
#etcdctl endpoint status -w table
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://127.0.0.1:2379 | 8dc8eb40f5ed7ad6 |   3.5.3 |   46 MB |      true |      false |        10 |     380897 |             380897 |        |
+------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

[root@k8s-master][16:20:35][OK] ~/etcdctl/etcd-v3.4.20-linux-amd64 
#etcdctl endpoint health -w table
+------------------------+--------+-------------+-------+
|        ENDPOINT        | HEALTH |    TOOK     | ERROR |
+------------------------+--------+-------------+-------+
| https://127.0.0.1:2379 |   true | 11.021122ms |       |
+------------------------+--------+-------------+-------+

3.5 备份数据

# 字符串拼接用于定时任务
etcdctl snapshot save `hostname`-etcd_`date +%Y%m%d%H%M`.db

3.6 恢复快照

#停止etcd和apiserver
## 移走当前数据目录
mv /var/lib/etcd/ /var/lib/etcd.bak

#恢复快照
etcdctl snapshot restore `hostname`-etcd_`date +%Y%m%d%H%M`.db --data-dir=/var/lib/etcd/

二进制部署的ETCD恢复快照

在这里插入图片描述

四、故障排查

journalctl -u etcd > a.log导出日志慢慢分析
Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐