算了考虑了很久还是决定使用普罗米修斯来做监控好了,因为它是基于时间序列模型,基于键值的特性,在趋势度上有优势,查询速度快。 基于HTTP pull/push两种对应的数据采集数据,扩展性极强。 社区庞大,官方有很多社区高质量插件。
单点初始安装Prometheus服务端
- 官方网站下载最新版本。
- 拷贝到目录解压即可运行,这边不做过多讲解。
- 修改配置文件:
egrep -v '^$|#' prometheus.yml
# my global config
global:
scrape_interval: 15s # 数据采集频率
evaluation_interval: 15s # 监控数据规则的评估频率
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs: #抓取数据配置
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus' # 任务名
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs: # 监控目标
- targets: ['192.168.1.250:9090']
采集程序批量部署 node_exporter 服务器基本数据(非常全面)
- 我机器28台,所以我先把node_exporter下载到我本机然后放到ftp,使用ansible来批量安装。
- 安装完成后,默认端口为
9100
。
- name: node_exporter_install
hosts: all
tasks:
- name: Create group prometheus
group:
name: prometheus
state: present
- name: Create user prometheus
user:
name: prometheus
shell: /bin/nologin
groups: prometheus
- name: Decompression file to /usr/local/node_exporter
unarchive:
src: ftp://192.168.1.254/software/node_exporter.tar.gz
dest: /usr/local
remote_src: yes
- name: Create prometheus service
copy:
src: /srv/ftp/software/systemctl_file/node_exporter.service
dest: /usr/lib/systemd/system/node_exporter.service
- name: Start server and enable the server
systemd:
state: started
name: node_exporter
enabled: yes
- 测试下看是否有问题
curl 192.168.1.2:9100/metrics
,确认下。 - 然后修改服务端的配置文件添加监控目标:
static_configs:
- targets: ['192.168.1.250:9090','192.168.1.2:9100','192.168.1.3:9100','192.168.1.10:9100','192.168.1.11:9100','192.168.1.12:9100','192.168.1.13:9100','192.168.1.160:9100','192.168.1.161:9100','192.168.1.162:9100','192.168.1.167:9100','192.168.1.168:9100','192.168.1.155:9100','192.168.1.156:9100','192.168.1.180:9100','192.168.1.181:9100','192.168.1.199:9100','192.168.1.200:9100','192.168.1.201:9100','192.168.1.202:9100','192.168.1.203:9100','192.168.1.204:9100','192.168.1.210:9100','192.168.1.211:9100','192.168.1.212:9100','192.168.1.217:9100','192.168.1.218:9100','192.168.1.219:9100']
- 重启服务,然后去Prometheus的查看targets。已经查看到了28台目标已经处于up状态。
-
内存使用率:
((node_memory_MemTotal_bytes - node_memory_MemFree_bytes - node_memory_Cached_bytes - node_memory_Buffers_bytes - node_memory_Slab_bytes)/node_memory_MemTotal_bytes) * 100
-
硬盘使用量:
(node_filesystem_size_bytes{fstype="xfs"} - node_filesystem_free_bytes{fstype="xfs"})/node_filesystem_size_bytes{fstype="xfs"} * 100
-
调度器一的出站流量:
irate(node_network_receive_bytes_total{instance="192.168.1.2:9100", device="eth0"}[30s]) /1024 /1024 > 0
单位M -
调度器一的入站流量:
irate(node_network_transmit_bytes_total{instance="192.168.1.2:9100", device="eth0"}[30s]) /1024 /1024 > 0
-
调取器2呢就是换个IP端口
监控图形化搭建 grafana
-
安装太简单这里就不赘述了。默认端口:3000
-
安装完成后默认账户和密码admin,需要进行修改密码。
-
Configuration: -> DataSource -> Prometheus -> name(项目名)->httpurl(Prometheus的地址)->save and test。
-
Create -> Dashboard ->Choose Visualization -> 就是根据自己的指标去设计。