通过Prometheus查询K8S集群Pod 的CPU、内存、网络指标

· · 3599 次点击 · · 开始浏览

这是一个创建于的文章，其中的信息可能已经有所发展或是发生改变。

Kubernetes的kubelet组件内置了cadvisor，将Node上容器的指标以Prometheus支持的格式展示，可以通过这些指标计算得到更多有用的数据。

Kubelet的Cadvisor指标获取

在Prometheus的配置文件中，配置了相关的Target之后，这些指标就可以从Prometheus中查询到。

    - job_name: 'kubernetes-cadvisor'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
        target_label: __metrics_path__
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

直接访问Kubelet的apiserver接口，可以读取以Prometheus支持的格式呈现的指标：

$ curl https://IP:6443/api/v1/nodes/k8s-master01/proxy/metrics/cadvisor

# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
# TYPE cadvisor_version_info gauge
cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="17.03.0-ce",kernelVersion="4.9.148",osVersion="CentOS Linux 7 (Core)"} 1
# HELP container_cpu_load_average_10s Value of container cpu load average over the last 10 seconds.
# TYPE container_cpu_load_average_10s gauge
container_cpu_load_average_10s{container_name="",id="/",image="",name="",namespace="",pod_name=""} 0
container_cpu_load_average_10s{container_name="",id="/kubepods",image="",name="",namespace="",pod_name=""} 0
container_cpu_load_average_10s{container_name="",id="/kubepods/besteffort",image="",name="",namespace="",pod_name=""} 0
container_cpu_load_average_10s{container_name="",id="/kubepods/besteffort/pod99bbaaff-0f25-11e9-bbb3-28b4484d8d14",image="",name="",namespace="",pod_name=""} 0

Pod CPU使用率的计算

从man top手册中找到了CPU使用率的定义：

1. %CPU -- CPU Usage The task's share of the elapsed CPU time since the last screen update, expressed as a percentage of total CPU time.

In a true SMP environment, if a process is multi-threaded and top is not operating in Threads mode, amounts greater than 100% may be reported. You toggle Threads mode with the `H' inter-active command.

Also for multi-processor environments, if Irix mode is Off, top will operate in Solaris mode where a task's cpu usage will be divided by the total number of CPUs. You toggle Irix/Solaris modes with the `I' interactive command.

即在过去的一段时间里进程占用的CPU时间与CPU总时间的比率，如果有多个CPU或者多核，需要将每个CPU的时间相加。
kubelet中的cadvisor采集的指标与含义，见：Monitoring cAdvisor with Prometheus。

其中有几项是：

Metric name	Type	Description	Unit
container_cpu_usage_seconds_total	Counter	Cumulative cpu time consumed	seconds
container_spec_cpu_quota	Gauge	CPU quota of the container
container_memory_rss	Gauge	Size of RSS	bytes
container_spec_memory_limit_bytes	Gauge	Memory limit for the container	bytes
container_fs_usage_bytes	Gauge	Number of bytes that are consumed by the container on this filesystem	bytes

container_cpu_usage_seconds_total是container累计使用的CPU时间，用它除以CPU的总时间，就得到了容器的CPU使用率。
Pod 在1分钟内累计使用的CPU时间为：

sum(rate(container_cpu_usage_seconds_total{image!=""}[1m])) by (pod_name, namespace)

container_spec_cpu_quota是容器的CPU配额，它的值是：为容器指定的CPU个数*100000。
故，Pod在1分钟内CPU的总时间为：Pod的CPU核数 * 1m：

(sum(container_spec_cpu_quota{image!=""}/100000) by (pod_name, namespace))

将上面两个公式的结果相除，就得到了容器的CPU使用率：

sum(rate(container_cpu_usage_seconds_total{image!=""}[1m])) by (pod_name, namespace) / (sum(container_spec_cpu_quota{image!=""}/100000) by (pod_name, namespace)) * 100

Pod内存使用率计算

Pod 内存使用率的计算就简单多了，直接用内存实际使用量除以内存限制使用量即可：

sum(container_memory_rss{image!=""}) by(pod_name, namespace) / sum(container_spec_memory_limit_bytes{image!=""}) by(pod_name, namespace) * 100 != +inf

Pod 文件系统使用量

sum(container_fs_usage_bytes{image!=""}) by(pod_name, namespace) / 1024 / 1024 / 1024

Pod 的网络使用情况

一分钟内发送的字节量：

sum(rate(container_network_transmit_bytes_total{image!=""}[1m])) by (pod_name)

一分钟内接受的字节量：

sum(rate(container_network_receive_bytes_total{image!=""}[1m])) by (pod_name)

本文由“跟着大数据和AI去旅行”发布于“简书”，转载时有增删和错误修正。

关注本站微信公众号（和以上内容无关）InfraPub ，扫码关注：InfraPub

3599 次点击

加入收藏微博

收入我的专栏

上一篇：Debian.cn 网站名更换为 Debian中国！

下一篇：Facebook 以最高级别身份加入Linux基金会及其董事会

prometheus

kubernetes

io

linux

0 回复

添加一条新回复（您需要登录后才能回复没有账号？）

请尽量让自己的回复能够对别人有帮助
支持 Markdown 格式, **粗体**、~~删除线~~、`单行代码`
支持 @ 本站用户；支持表情（输入 : 提示），见 Emoji cheat sheet
图片支持拖拽、截图粘贴等方式上传

通过Prometheus查询K8S集群Pod 的CPU、内存、网络指标

Kubelet的Cadvisor指标获取

Pod CPU使用率的计算

Pod内存使用率计算

Pod 文件系统使用量

Pod 的网络使用情况

用户登录

今日阅读排行

一周阅读排行

微信公众号

Kubelet的Cadvisor指标获取

Pod CPU使用率的计算

Pod内存使用率计算

Pod 文件系统使用量

Pod 的网络使用情况

通过Prometheus查询K8S集群Pod 的CPU、内存、网络指标

Kubelet的Cadvisor指标获取

Pod CPU使用率的计算

Pod内存使用率计算

Pod 文件系统使用量

Pod 的网络使用情况

用户登录

今日阅读排行

一周阅读排行

微信公众号

给该专栏投稿 写篇新文章

收入到我管理的专栏 新建专栏

Kubelet的Cadvisor指标获取

Pod CPU使用率的计算

Pod内存使用率计算

Pod 文件系统使用量

Pod 的网络使用情况

给该专栏投稿写篇新文章

收入到我管理的专栏新建专栏