超详细 CentOS7安装部署Prometheus及其简单使用（exporter、探针、告警）

在centos7中安装部署Promethus，配置node-exporter，探针和警告规则

-Jason Liu-

4303人浏览 · 2024-01-03 16:37:55

-Jason Liu- · 2024-01-03 16:37:55 发布

系列文章

第一章（当前）：超详细 CentOS7安装部署Prometheus及其简单使用（exporter、探针、告警）
第二章：超详细 Centos7下Prometheus Alertmanager配置钉钉告警与邮箱告警（已亲手验证）
第三章：CentOS7中Prometheus PushGateway的使用
第四章：Prometheus结合Consul实现自助服务发现
第五章：CentOS7中使用Prometheus 集成 mtail 实现错误日志采集
扩展：CentOS7中使用Prometheus Process-exporter监控进程状态
扩展：CentOS7中使用Prometheus监控Windows主机

Prometheus简介

Prometheus是一个开源的监控和报警系统，用于收集、存储和查询时间序列数据。

特点

多维度数据模型
灵活的查询语言
支持多种多样的图表和界面展示，比如Grafana等
高效的存储和检索
警报和通知管理等

架构图

在这里插入图片描述

✨Prometheus Server

Prometheus Server是Prometheus组件中的 核心部分，负责实现对 监控数据的获取，存储以及查询。
Prometheus Server可以通过静态配置管理监控目标，也可以配合使用Service Discovery的方式动态管理监控目标，并从这些监控目标中获取数据。其次Prometheus Server需要对采集到的监控数据进行存储，Prometheus Server本身就是一个时序数据库，将采集到的监控数据按照时间序列的方式存储在本地磁盘当中。
最后Prometheus Server对外提供了自定义的 PromQL语言，实现对数据的查询以及分析。Prometheus Server内置的Express Browser UI，通过这个UI可以直接通过PromQL实现数据的查询以及可视化。

✨Exporter

Exporter将监控数据采集的端点通过HTTP服务的形式暴露给Prometheus Server，Prometheus Server通过访问该Exporter提供的Endpoint端点，即可获取到需要采集的监控数据。

✨Job和Instance

当需要采集不同的监控指标(例如：主机、MySQL、Nginx)时，需要运行相应的监控采集程序（exporter），并且让Prometheus Server知道Exporter实例的访问地址。

在Prometheus中，每一个暴露监控样本数据的HTTP服务称为一个实例（instance），它是被监控的具体目标。 监控这些instances的任务叫做job。每个job负责一类任务，可以为一个job配置多个instance，job对自己的instance执行相同的动作。隶属于job的instance可以直接在配置文件中指定。也可以让job自动从consul、kuberntes中动态获取，这个过程就是服务发现。

✨AlertManager

Prometheus自身不具备告警能力，需要结合AlertManager实现监控指标告警。 由Prometheus配置告警规则，当告警规则触发后，会把告警信息推送给Altermanager，AlertManager收到告警之后在根据配置的路由，根据报警级别不同分别发送给不同的receive（收件人），AlertManager可以实现 email、企业微信、钉钉等报警。Prometheus作为客户端，Alertmanager负责处理来自客户端的告警通知。对告警通知进行分组、去重后，根据路由规则将其路由到不同的receiver。

✨PushGateway

由于Prometheus数据采集基于Pull模型进行设计，因此在网络环境的配置上必须要让Prometheus Server能够直接与Exporter进行通信。当这种网络需求无法直接满足时，就可以利用PushGateway来进行中转。通过PushGateway将内部网络的监控数据主动Push到Gateway当中，Prometheus Server再采用同样Pull的方式从PushGateway中获取到监控数据。

✨四种Metrics类型

通过访问安装了exporter的机子的ip地址加上端口号9100加上路径/metrics的页面可以查看到当前抓取的数据，TYPE字段中包含有类型。如：

http://192.168.168.12:9100/metrics
使用ctrl+F可以搜索对应的类型

1、Counter（计数器）

特点是只增不减，除非系统发生重置，常用来记录某些事件发生的次数。一般在定义Counter类型指标的名称时推荐使用_total作为后缀。

2、Gauge（仪表盘）

侧重于反应系统的当前状态。因此这类指标的样本数据可增可减。

3、Histogram（直方图）

用于统计和分析样本的分布情况。

4、Summary（摘要）

也用于统计和分析样本的分布情况。

Prometheus和Zabbix的区别

Zabbix	Prometheus
传统的基于代理的模型	时间序列数据模型
自定义查询语言	PromQL
关系型数据库	时间序列数据库（TSDB）
基于Web的前端界面	与Grafana等工具集成
本地计算机、网络和基础设施监控	云计算、容器和微服务监控

一、Prometheus部署

Prometheus官网下载地址：https://prometheus.io/download/

1.创建/data/apps目录存放下载的软件

mkdir /data/apps
cd /data/apps

2.在线下载

wget https://github.com/prometheus/prometheus/releases/download/v2.37.2/prometheus-2.37.2.linux-amd64.tar.gz

或者(可能更快)

wget https://githubfast.com/prometheus/prometheus/releases/download/v2.37.2/prometheus-2.37.2.linux-amd64.tar.gz

实在下载不了访问我的蓝奏云下载

https://wwuy.lanzouo.com/i182T1ktinsf
密码:5229

3.解压到/usr/local/，再重命名

tar -xzvf prometheus-2.37.2.linux-amd64.tar.gz -C /usr/local

cd /usr/local

mv prometheus-2.37.2.linux-amd64 prometheus

4.查看Prometheus版本

cd /usr/local/prometheus

./prometheus  --version

检查prometheus.yml格式的命令

./promtool check config prometheus.yml

5.创建prometheus 本地TSDB数据存储目录

mkdir -p /var/lib/prometheus

6.使用systemctl管理Prometheus

vim /usr/lib/systemd/system/prometheus.service

写入

[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target

[Service]
# Type设置为notify时，服务会不断重启
Type=simple
User=root
# --storage.tsdb.path是可选项，默认数据目录在运行目录的./dada目录中
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/var/lib/prometheus --web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target

7.设置Prometheus开机启动

systemctl enable prometheus && systemctl start prometheus

8.查看prometheus服务状态

systemctl status prometheus

9.访问Prometheus的网页界面

浏览器输入http://你的ip地址:9090
在这里插入图片描述

二、node-exporter部署

node-exporter简介

node_exporter收集当前机器的系统数据，采用prometheus官方提供的exporter，除node_exporter外，官方还提供consul，memcached，haproxy，mysqld等exporter

node-exporter部署

1.给被监控的机器下载和解压node-exporter


wget https://github.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz

tar -zvxf node_exporter-1.4.0.linux-amd64.tar.gz   -C /usr/local/

如果下载慢试试

wget https://githubfast.com/prometheus/node_exporter/releases/download/v1.4.0/node_exporter-1.4.0.linux-amd64.tar.gz

如果实在下载不了访问我的蓝奏云下载

https://wwuy.lanzouo.com/ix4v51ktjfvg
密码:3teo

2. systemctl管理node_exporter

 vim /usr/lib/systemd/system/node_exporter.service

写入

[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=root
ExecStart=/usr/local/node_exporter/node_exporter
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target

3.设置开机启动

systemctl enable node_exporter && systemctl start node_exporter

4.在prometheus 主机添加node节点监控

在prometheus Server 配置文件中添加被监控的机器

vim  /usr/local/prometheus/prometheus.yml

添加

  - job_name: "node1"
    static_configs:
    - targets: ['被监控的机子的IP:9100']

5.检查prometheus.yml格式

/usr/local/prometheus/promtool check config prometheus.yml

6.热加载prometheus配置

curl  -X POST http://127.0.0.1:9090/-/reload

7.访问Prometheus的网页界面，查看node 节点已经被监控

在这里插入图片描述

8. 查看http metrics 采集指标

http://被监控机子IP:9100/metrics，查看从exporter具体能抓到的数据
在这里插入图片描述

9.node_exporter的PromQL查询语句

在Prometheus的web主页面，点击Graph，再在输入框里输入PromQL，执行后可以点击下面的Graph查看图表
在这里插入图片描述

一些PromQL如下

获取系统信息

node_uname_info

获取系统uptime 时间

sum(time() - node_boot_time_seconds)by(instance)

系统启动时间

node_boot_time_seconds

系统当前时间

time()

CPU核数

count(node_cpu_seconds_total{mode='system'}) by (instance)

计算 CPU 使用率

(1 - sum(rate(node_cpu_seconds_total{mode="idle"}[1m])) by (instance) / sum(rate(node_cpu_seconds_total[1m])) by (instance) ) * 100

在这里插入图片描述
计算内存使用率

 (1- (node_memory_Buffers_bytes + node_memory_Cached_bytes + node_memory_MemFree_bytes) / node_memory_MemTotal_bytes) * 100

在这里插入图片描述

查看节点总内存

node_memory_MemTotal_bytes/1024/1024/1024

计算磁盘使用率

(1 - node_filesystem_avail_bytes{fstype=~"ext4|xfs"} /
node_filesystem_size_bytes{fstype=~"ext4|xfs"}) * 100

磁盘IO

磁盘读 IO 使用

sum by (instance) (rate(node_disk_reads_completed_total[5m]))

磁盘写 IO 使用

sum by (instance) (rate(node_disk_writes_completed_total[5m]))

网络带宽
下行带宽

sum by(instance) (irate(node_network_receive_bytes_total{device!~"bond.*?|lo"}[5m]))

上行带宽

sum by(instance) (irate(node_network_transmit_bytes_total{device!~"bond.*?|lo"}[5m]))

三、blackbox_exporter 探针

简介

blackbox_exporter是 Prometheus 官方提供的 exporter，可通过 HTTP、HTTPS、DNS、TCP、ICMP 对端点进行可用性等指标探测。类似zabbix的监控项

blackbox_exporter 官方文档
https://github.com/prometheus/blackbox_exporter

blackbox_exporter能实现以下功能：

HTTP GET 探测
TCP 端口探测
ICMP 主机探测
HTTP POST 探测
SSL 证书过期

部署blackbox_exporter

1，给被监控的机子下载blackbox_exporter

wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.22.0/blackbox_exporter-0.22.0.linux-amd64.tar.gz

tar -zvxf blackbox_exporter-0.22.0.linux-amd64.tar.gz -C /usr/local/

cd /usr/local/

mv blackbox_exporter-0.22.0.linux-amd64.tar.gz  blackbox_exporter

下载慢尝试：

wget https://githubfast.com/prometheus/blackbox_exporter/releases/download/v0.22.0/blackbox_exporter-0.22.0.linux-amd64.tar.gz

2，查看blackbox_exporter版本信息

cd /usr/local/blackbox_exporter

./blackbox_exporter  --version

3，systemctl管理blackbox_exporter

vim /usr/lib/systemd/system/blackbox_exporter.service

写入

[Unit]
Description=blackbox_exporter
After=network.target

[Service]
User=root
Type=simple
ExecStart=/usr/local/blackbox_exporter/blackbox_exporter --config.file=/usr/local/blackbox_exporter/blackbox.yml
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure

[Install]
WantedBy=multi-user.target

4，启动、开机启动blackbox_exporter

systemctl start blackbox_exporter && systemctl enable blackbox_exporter

ps -ef | grep blackbox_exporter

5，http 访问测试（blackbox_exporter默认监听9115端口）

http://被监控机子的IP:9115

在这里插入图片描述

6，blackbox_exporter 配置文件

blackbox_exporter的配置文件无特殊需求使用默认配置即可

cat  /usr/local/blackbox_exporter/blackbox.yml

以下均在Prometheus Server端操作
在prometheus.yml中添加blackbox_exporter的配置， 要注意yml文件的语法规范

vim  /usr/local/prometheus/prometheus.yml

ICMP监控主机存活状态的配置

#icmp ping 监控
  - job_name: crawler_status
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets: ['223.5.5.5','114.114.114.114']
        labels:
          instance: node_status
          group: 'icmp-node'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 安装blackbox_expoter的ip地址:9115

TCP 监控端口的配置

#监控tcp端口
  - job_name: tcp_port
    metrics_path: /probe
    params:
      module: [tcp_connect]
    file_sd_configs:
      - files: ['/usr/local/prometheus/conf.d/tcp_port/*.yml']
        refresh_interval: 10s
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 安装blackbox_expoter的ip地址:9115

编辑tcp 监控targets文件

上一个配置指定了配置文件，在这里新建文件

 mkdir -p /usr/local/prometheus/conf.d/tcp_port
 
 vim /usr/local/prometheus/conf.d/tcp_port/tcp_port.yml

写入并改写要监控的ip和端口号

- targets: ['192.168.100.234:18080','192.168.100.235:22']
  labels:
   group: 'tcp port'

HTTP GET 监控的配置

# http get  监控
  - job_name: http_get
    metrics_path: /probe
    params:
      module: [http_2xx]
    file_sd_configs:
       - files: ['/usr/local/prometheus/conf.d/http_get/*.yml']
         refresh_interval: 10s
    relabel_configs:
     - source_labels: [__address__]
       target_label: __param_target
     - source_labels: [__param_target]
       target_label: instance
     - target_label: __address__
       replacement: 安装blackbox_expoter的ip地址:9115

编辑http_get监控 targets文件

上一个配置指定了配置文件，在这里新建文件

mkdir -p /usr/local/prometheus/conf.d/http_get

vim /usr/local/prometheus/conf.d/http_get/http_get.yml

自定义要监控的内容

- targets:
  - http://192.168.100.234:18080/
  labels:
    name: 'http_get'
- targets:
  - https://www.sohu.com/
  labels:
    name: 'http_get'

重新启动Prometheus

systemctl restart prometheus

访问 Prometheus前端页面查看Targets

在这里插入图片描述

四、配置Prometheus Rule 告警规则

1，创建rule告警目录

mkdir   -p /usr/local/prometheus/rules/

2，编辑rule配置文件

vim /usr/local/prometheus/rules/rules.yml

groups:
- name: http_status_code
  rules:
  - alert: probe_http_status_code
    expr: probe_http_status_code != 200
    for: 1m
    labels:
     severity: critical
    annotations:
     summary: "{{ $labels.instance }} 状态码异常"
     description: "{{ $labels.instance }} 网站访问异常!!! (value: {{ $value }})"

- name: icmp_ping_status
  rules:
  - alert: icmp_ping_status
    expr: probe_icmp_duration_seconds{phase="rtt"}  == 0
    for: 1m
    labels:
     severity: critical
    annotations:
     summary: "主机 {{ $labels.instance }} ICMP异常"
     description: "{{ $labels.instance }} ICMP异常！！！(value: {{ $value }})"
     value: '{{ $value }}'
##延迟高
- name:  link_delay_high
  rules:
  - alert: link_delay_high
    expr: probe_icmp_duration_seconds{phase="rtt"}  >0.005
    for: 1m
    labels:
     severity: critical
    annotations:
     summary: " {{ $labels.instance }} 延迟高!"
     description: "{{ $labels.instance }} 延迟高！！！(value: {{ $value }})"

3，检查rule文件格式

/usr/local/prometheus/promtool  check rules  rules.yml

4，在Prometheus主机配置文件中引入rule告警目录

vim  /usr/local/prometheus/prometheus.yml

找到rule_files那一行，改为

rule_files: ['/usr/local/prometheus/rules/*.yml']

5，重新启动Prometheus

systemctl restart prometheus

6，访问 Prometheus前端页面查看Rules

在这里插入图片描述
查看Alerts

下一篇文章：超详细 Centos7下Prometheus Alertmanager配置钉钉告警与邮箱告警（已亲手验证）

参考文档：
https://cloud.tencent.com/developer/article/2214526
https://cloud.tencent.com/developer/article/2216581

开放原子开发者工作坊

开放原子开发者工作坊旨在鼓励更多人参与开源活动，与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动，如meetup、训练营等，主打技术交流，干货满满，真诚地邀请各位开发者共同参与！

更多推荐

赛项征集 | 第二届开放原子开源大赛——汽车软件开源赛火热进行中

开放原子开发者工作坊

开放原子校源行启动优质开源课程征集，助力高校开源人才培养

校源行启动优质开源课程征集，助力高校开源人才培养

开放原子开发者工作坊

第二届Open Source Congress在北京成功举办

开放原子开发者工作坊

所有评论(0)

查看更多评论

-Jason Liu-

@JessonLiu_

已为社区贡献1条内容

超详细 CentOS7安装部署Prometheus及其简单使用（exporter、探针、告警）

-Jason Liu-

文章目录

Prometheus简介

特点

架构图

✨Prometheus Server

✨Exporter

✨Job和Instance

✨AlertManager

✨PushGateway

✨四种Metrics类型

1、Counter（计数器）

2、Gauge（仪表盘）

3、Histogram（直方图）

4、Summary（摘要）

Prometheus和Zabbix的区别

一、Prometheus部署

1.创建/data/apps目录存放下载的软件

2.在线下载

3.解压到/usr/local/，再重命名

4.查看Prometheus版本

检查prometheus.yml格式的命令

5.创建prometheus 本地TSDB数据存储目录

6.使用systemctl管理Prometheus

7.设置Prometheus开机启动

8.查看prometheus服务状态

9.访问Prometheus的网页界面

二、node-exporter部署

node-exporter简介

node-exporter部署

1.给被监控的机器下载和解压node-exporter

2. systemctl管理node_exporter

3.设置开机启动

4.在prometheus 主机添加node节点监控

5.检查prometheus.yml格式

6.热加载prometheus配置

7.访问Prometheus的网页界面，查看node 节点已经被监控

8. 查看http metrics 采集指标

9.node_exporter的PromQL查询语句

三、blackbox_exporter 探针

简介

部署blackbox_exporter

1，给被监控的机子下载blackbox_exporter

2，查看blackbox_exporter版本信息

3，systemctl管理blackbox_exporter

4，启动、开机启动blackbox_exporter

5，http 访问测试（blackbox_exporter默认监听9115端口）

6，blackbox_exporter 配置文件

ICMP监控主机存活状态的配置

TCP 监控端口的配置

编辑tcp 监控targets文件

HTTP GET 监控的配置

编辑http_get监控 targets文件

重新启动Prometheus

访问 Prometheus前端页面查看Targets

四、配置Prometheus Rule 告警规则

1，创建rule告警目录

2，编辑rule配置文件

3，检查rule文件格式

4，在Prometheus主机配置文件中引入rule告警目录

5， 重新启动Prometheus

6，访问 Prometheus前端页面查看Rules

所有评论(0)

-Jason Liu-

5，重新启动Prometheus