動態

詳情 返回 返回

返璞歸真--從零開始建設k8s監控之thanos-sidecar(七) - 動態 詳情

前言

本文詳細討論一下thanos-sidecar

環境準備

組件 版本
操作系統 Ubuntu 22.04.4 LTS
docker 24.0.7
thanos 0.36.1

thanos概述

thanos主要有4個組件

  • receive:獨立部署,提供了數據寫入的api,prometheus通過這個api把數據推送到receive的對象存儲
  • sidecar:與prometheus部署在一起,成為prometheus的sidecar,負責把prometheus本地的數據上傳至對象存儲當中
  • query:獨立部署,是一個兼容了prometheus的查詢組件,彙總了來自不同來源的查詢結果,並且可以從Sidecar和Store中讀取數據
  • store:獨立部署,提供了對象數據存儲功能,並且提供相關的api,query通過該api查詢歷史數據

sidecar模式

Sidecar 與prometheus綁定在一起,負責處理與其綁定的prometheus各種監控數據的處理

prometheus_7_1.png

1. k8s安裝sidecar

1.1 改造prometheus configmap

加入重要的external label

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-cm
  labels:
    name: prometheus-cm
  namespace: prometheus
data:
  prometheus.yml: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s

      # 新增外部標籤
      external_labels:
        cluster: "prometheus-k8s"
      # 新增結束

    scrape_configs:
      - job_name: 'prometheus'
        static_configs:
        - targets: ['localhost:9090']

      - job_name: "prometheus-kube-state-metrics"
        static_configs:
          - targets: ["kube-state-metrics.kube-system:8080"]

1.2 改造prometheus deployment

加入thanos sidecar

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-deploy
  namespace: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: registry.cn-beijing.aliyuncs.com/wilsonchai/prometheus:v2.54.1
          args:
            - "--storage.tsdb.retention.time=12h"
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus/"
            - "--storage.tsdb.min-block-duration=30m"
            - "--storage.tsdb.max-block-duration=30m"
            - --web.enable-lifecycle
          ports:
            - containerPort: 9090
          resources:
            requests:
              cpu: 500m
              memory: 500M
            limits:
              cpu: 1
              memory: 1Gi
          volumeMounts:
            - name: prometheus-config
              mountPath: /etc/prometheus/
            - name: prometheus-data
              mountPath: /prometheus
        # 新增thanos-sidecar
        - name: thanos
          image: registry.cn-beijing.aliyuncs.com/wilsonchai/thanos:0.36.1
          args:
            - "sidecar"
            - "--prometheus.url=http://localhost:9090"
            - "--tsdb.path=/prometheus"
          volumeMounts:
            - name: prometheus-data
              mountPath: /prometheus
        # 新增結束
      volumes:
        - name: prometheus-config
          configMap:
            defaultMode: 420
            name: prometheus-cm
        - emptyDir: {}
          name: prometheus-data

1.3 新增thanos的service

apiVersion: v1
kind: Service
metadata:
  name: thanos-sidecar-service
  namespace: prometheus
spec:
  ports:
    - name: thanos-sidecar-port
      port: 10901
      protocol: TCP
      targetPort: 10901
  selector:
    app: prometheus
  type: NodePort

照葫蘆畫瓢,改造另一個prometheus,專門採集node監控數據的

2. 部署thanos-query

docker run -d --net=host \
  --name thanos-query \
  registry.cn-beijing.aliyuncs.com/wilsonchai/thanos:0.36.1 \
  query \
    --http-address "0.0.0.0:39090" \
    --grpc-address "0.0.0.0:39091" \
    --store "192.168.49.2:30139" \
    --store "192.168.49.2:31165"

需要注意一下192.168.49.2:30139192.168.49.2:31165,這裏ip是thanos-sidecar所在pod的node ip,端口則是映射出來的nodeport

打開thanos-query頁面檢查

prometheus_7_2.png

3. 部署對象存儲minio

prometheus_7_3.png

3.1 部署方式同receive

3.2 新增sidecar configmap

首先準備bucket.yml,由於thanos-sidecar在k8s裏面,所以做成configmap

apiVersion: v1
kind: ConfigMap
metadata:
  name: bucket-cm
  labels:
    name: bucket-cm
  namespace: prometheus
data:
  bucket.yml: |-
    type: S3
    config:
      bucket: "wilson-test"
      endpoint: "10.22.11.156:9090"
      access_key: "zzUrkBzyqcCDXySsMLlS"
      secret_key: "nWCcztESnxnUZIKSKsELGEFdg6l6fjzhtqkARJB8"
      insecure: true

3.3 改造thanos-sidecar

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-deploy
  namespace: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
        - name: prometheus
          image: registry.cn-beijing.aliyuncs.com/wilsonchai/prometheus:v2.54.1
          args:
            - "--storage.tsdb.retention.time=12h"
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus/"
            - "--storage.tsdb.min-block-duration=30m"
            - "--storage.tsdb.max-block-duration=30m"
            - --web.enable-lifecycle
          ports:
            - containerPort: 9090
          resources:
            requests:
              cpu: 500m
              memory: 500M
            limits:
              cpu: 1
              memory: 1Gi
          volumeMounts:
            - name: prometheus-config
              mountPath: /etc/prometheus/
            - name: prometheus-data
              mountPath: /prometheus
        - name: thanos
          image: registry.cn-beijing.aliyuncs.com/wilsonchai/thanos:0.36.1
          args:
            - "sidecar"
            - "--prometheus.url=http://localhost:9090"
            - "--tsdb.path=/prometheus"
            - "--objstore.config-file=/etc/thanos/bucket.yml"
          volumeMounts:
            - name: prometheus-data
              mountPath: /prometheus
            - name: bucket-config
              mountPath: /etc/thanos/
      volumes:
        - name: prometheus-config
          configMap:
            defaultMode: 420
            name: prometheus-cm
        - name: bucket-config
          configMap:
            defaultMode: 420
            name: bucket-cm
        - emptyDir: {}
          name: prometheus-data

由於上傳對象存儲的時間是30m,所以我們先繼續下面的步驟,一會回頭過來再回來檢查minio中是否有文件上傳

4. 部署thanos-store

prometheus_7_4.png

部署方式同receive

調整thanos-query的配置,新增thanos-store的地址

docker run -d --net=host \
  --name thanos-query \
  registry.cn-beijing.aliyuncs.com/wilsonchai/thanos:0.36.1 \
  query \
    --http-address "0.0.0.0:39090" \
    --grpc-address "0.0.0.0:39091" \
    --store "192.168.49.2:30139" \
    --store "192.168.49.2:31165" \
    --store "10.22.11.156:10901"

添加完畢後,檢查thanos-query的web頁面

prometheus_7_5.png

5. pod權限調整

萬事俱備,回頭去看看minio是否有文件上傳,打開之後空空如也,怎麼回事,去看一下thanos-sidecar的日誌

▶ kubectl -n prometheus logs prometheus-deploy-6f8c5549b9-rqqk6 -c thanos
...
ts=2024-10-30T06:03:23.704299583Z caller=sidecar.go:410 level=warn err="upload 01JBDQNT0RZH4GFCFC564RWZT7: hard link block: hard link file chunks/000001: link /prometheus/01JBDQNT0RZH4GFCFC564RWZT7/chunks/000001 /prometheus/thanos/upload/01JBDQNT0RZH4GFCFC564RWZT7/chunks/000001: operation not permitted" uploaded=0

怎麼回事?沒有權限,冷靜分析一下thanos-sidecar的上傳邏輯

  • 首先數據文件是由prometheus產生的,thanos-sidecar上傳文件應該直接使用prometheus產生的數據文件,這樣是最簡便的策略,不需要把文件複製到自己的目錄,帶來額外的磁盤消耗,
  • 由於1個pod當中有2個container,帶來的問題就是啓動進程的用户與組是不一樣的,再加上prometheus與thanos-sidecar使用同一個目錄/prometheus,2個pod分別在該目錄下創建的子目錄或文件權限不一致,到此初步判斷是 因為2個pod不同的啓動用户導致權限有問題
  • 登錄到prometheus的pod之後進入/prometheus證實

    /prometheus $ ls -lrt                                                                                                                                                                                     
    total 44                                                                                                                                                                                                  
    -rw-r--r--    1 nobody   nobody       20001 Oct 30 02:46 queries.active                                                                                                                                   
    -rw-r--r--    1 nobody   nobody           0 Oct 30 02:46 lock                                                                                                                                             
    -rw-r--r--    1 1001     root            37 Oct 30 03:31 thanos.shipper.json                                                                                                                              
    drwxr-xr-x    3 nobody   nobody        4096 Oct 30 03:31 01JBDQNT0RZH4GFCFC564RWZT7
  • 再加上日誌,源文件是在/prometheus下,而thanos-sidecar會在/prometheus/thanos/下對源文件創建硬鏈接,先檢查一下源文件
/prometheus/01JBDQNT0RZH4GFCFC564RWZT7/chunks $ ls -lrt
total 96
-rw-r--r--    1 nobody   nobody       88911 Oct 30 03:31 000001
  • 源文件沒有組的寫權限,垂死病中驚坐起!創建硬鏈接是需要寫權限的,快速驗證一下
▶ id
uid=1000(wilson) gid=1000(wilson) groups=1000(wilson)

▶ touch /tmp/test

▶ sudo chown root.root /tmp/test

▶ sudo chmod 644 /tmp/test

▶ ln /tmp/test /tmp/ttttt
ln: failed to create hard link '/tmp/ttttt' => '/tmp/test': Operation not permitted

到此為止,問題已經比較明朗了,1個pod的2個container,使用了不同的啓動用户,創建出來的文件是不同用户的權限,同時他們共享了同一個目錄,而prometheus創建的數據文件是644的權限,沒有三方寫權限。而thanos-sidecar需要把prometheus創建的數據文件創建硬鏈接到自己的目錄,由於沒有寫權限,創建硬鏈接失敗

解決方案有很多種,這裏給出最簡單的一種,因為是部署在k8s中的1個pod,只需要指定同一個啓動用户去啓動不同container即可

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus
  name: prometheus-deploy
  namespace: prometheus
spec:
...
  template:
...
    spec:
      securityContext:
        runAsUser: 555
      containers:
...

加入securityContext,並且隨便指定一個用户id,這裏我隨便指定了一個555,重啓之後再登錄prometheus查看

prometheus_7_6.png

問題解決

聯繫我

  • 聯繫我,做深入的交流

至此,本文結束
在下才疏學淺,有撒湯漏水的,請各位不吝賜教...

user avatar changqingdeyema_cy7lds 頭像 duiniwukenaihe_60e4196de52b7 頭像 yunxiao0816 頭像 sealio 頭像 zingdev 頭像 yansudeshanyang 頭像 nishangliu 頭像 mimangdeyangcong 頭像
點贊 8 用戶, 點贊了這篇動態!
點贊

Add a new 評論

Some HTML is okay.