前言
本文詳細討論一下thanos-sidecar
環境準備
| 組件 | 版本 |
|---|---|
| 操作系統 | Ubuntu 22.04.4 LTS |
| docker | 24.0.7 |
| thanos | 0.36.1 |
thanos概述
thanos主要有4個組件
- receive:獨立部署,提供了數據寫入的api,prometheus通過這個api把數據推送到receive的對象存儲
- sidecar:與prometheus部署在一起,成為prometheus的sidecar,負責把prometheus本地的數據上傳至對象存儲當中
- query:獨立部署,是一個兼容了prometheus的查詢組件,彙總了來自不同來源的查詢結果,並且可以從Sidecar和Store中讀取數據
- store:獨立部署,提供了對象數據存儲功能,並且提供相關的api,query通過該api查詢歷史數據
sidecar模式
Sidecar 與prometheus綁定在一起,負責處理與其綁定的prometheus各種監控數據的處理
1. k8s安裝sidecar
1.1 改造prometheus configmap
加入重要的external label
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-cm
labels:
name: prometheus-cm
namespace: prometheus
data:
prometheus.yml: |-
global:
scrape_interval: 5s
evaluation_interval: 5s
# 新增外部標籤
external_labels:
cluster: "prometheus-k8s"
# 新增結束
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: "prometheus-kube-state-metrics"
static_configs:
- targets: ["kube-state-metrics.kube-system:8080"]
1.2 改造prometheus deployment
加入thanos sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deploy
namespace: prometheus
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: registry.cn-beijing.aliyuncs.com/wilsonchai/prometheus:v2.54.1
args:
- "--storage.tsdb.retention.time=12h"
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
- "--storage.tsdb.min-block-duration=30m"
- "--storage.tsdb.max-block-duration=30m"
- --web.enable-lifecycle
ports:
- containerPort: 9090
resources:
requests:
cpu: 500m
memory: 500M
limits:
cpu: 1
memory: 1Gi
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus/
- name: prometheus-data
mountPath: /prometheus
# 新增thanos-sidecar
- name: thanos
image: registry.cn-beijing.aliyuncs.com/wilsonchai/thanos:0.36.1
args:
- "sidecar"
- "--prometheus.url=http://localhost:9090"
- "--tsdb.path=/prometheus"
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
# 新增結束
volumes:
- name: prometheus-config
configMap:
defaultMode: 420
name: prometheus-cm
- emptyDir: {}
name: prometheus-data
1.3 新增thanos的service
apiVersion: v1
kind: Service
metadata:
name: thanos-sidecar-service
namespace: prometheus
spec:
ports:
- name: thanos-sidecar-port
port: 10901
protocol: TCP
targetPort: 10901
selector:
app: prometheus
type: NodePort
照葫蘆畫瓢,改造另一個prometheus,專門採集node監控數據的
2. 部署thanos-query
docker run -d --net=host \
--name thanos-query \
registry.cn-beijing.aliyuncs.com/wilsonchai/thanos:0.36.1 \
query \
--http-address "0.0.0.0:39090" \
--grpc-address "0.0.0.0:39091" \
--store "192.168.49.2:30139" \
--store "192.168.49.2:31165"
需要注意一下192.168.49.2:30139與192.168.49.2:31165,這裏ip是thanos-sidecar所在pod的node ip,端口則是映射出來的nodeport
打開thanos-query頁面檢查
3. 部署對象存儲minio
3.1 部署方式同receive
3.2 新增sidecar configmap
首先準備bucket.yml,由於thanos-sidecar在k8s裏面,所以做成configmap
apiVersion: v1
kind: ConfigMap
metadata:
name: bucket-cm
labels:
name: bucket-cm
namespace: prometheus
data:
bucket.yml: |-
type: S3
config:
bucket: "wilson-test"
endpoint: "10.22.11.156:9090"
access_key: "zzUrkBzyqcCDXySsMLlS"
secret_key: "nWCcztESnxnUZIKSKsELGEFdg6l6fjzhtqkARJB8"
insecure: true
3.3 改造thanos-sidecar
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-deploy
namespace: prometheus
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: registry.cn-beijing.aliyuncs.com/wilsonchai/prometheus:v2.54.1
args:
- "--storage.tsdb.retention.time=12h"
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus/"
- "--storage.tsdb.min-block-duration=30m"
- "--storage.tsdb.max-block-duration=30m"
- --web.enable-lifecycle
ports:
- containerPort: 9090
resources:
requests:
cpu: 500m
memory: 500M
limits:
cpu: 1
memory: 1Gi
volumeMounts:
- name: prometheus-config
mountPath: /etc/prometheus/
- name: prometheus-data
mountPath: /prometheus
- name: thanos
image: registry.cn-beijing.aliyuncs.com/wilsonchai/thanos:0.36.1
args:
- "sidecar"
- "--prometheus.url=http://localhost:9090"
- "--tsdb.path=/prometheus"
- "--objstore.config-file=/etc/thanos/bucket.yml"
volumeMounts:
- name: prometheus-data
mountPath: /prometheus
- name: bucket-config
mountPath: /etc/thanos/
volumes:
- name: prometheus-config
configMap:
defaultMode: 420
name: prometheus-cm
- name: bucket-config
configMap:
defaultMode: 420
name: bucket-cm
- emptyDir: {}
name: prometheus-data
由於上傳對象存儲的時間是30m,所以我們先繼續下面的步驟,一會回頭過來再回來檢查minio中是否有文件上傳
4. 部署thanos-store
部署方式同receive
調整thanos-query的配置,新增thanos-store的地址
docker run -d --net=host \
--name thanos-query \
registry.cn-beijing.aliyuncs.com/wilsonchai/thanos:0.36.1 \
query \
--http-address "0.0.0.0:39090" \
--grpc-address "0.0.0.0:39091" \
--store "192.168.49.2:30139" \
--store "192.168.49.2:31165" \
--store "10.22.11.156:10901"
添加完畢後,檢查thanos-query的web頁面
5. pod權限調整
萬事俱備,回頭去看看minio是否有文件上傳,打開之後空空如也,怎麼回事,去看一下thanos-sidecar的日誌
▶ kubectl -n prometheus logs prometheus-deploy-6f8c5549b9-rqqk6 -c thanos
...
ts=2024-10-30T06:03:23.704299583Z caller=sidecar.go:410 level=warn err="upload 01JBDQNT0RZH4GFCFC564RWZT7: hard link block: hard link file chunks/000001: link /prometheus/01JBDQNT0RZH4GFCFC564RWZT7/chunks/000001 /prometheus/thanos/upload/01JBDQNT0RZH4GFCFC564RWZT7/chunks/000001: operation not permitted" uploaded=0
怎麼回事?沒有權限,冷靜分析一下thanos-sidecar的上傳邏輯
- 首先數據文件是由prometheus產生的,thanos-sidecar上傳文件應該直接使用prometheus產生的數據文件,這樣是最簡便的策略,不需要把文件複製到自己的目錄,帶來額外的磁盤消耗,
- 由於1個pod當中有2個container,帶來的問題就是啓動進程的用户與組是不一樣的,再加上prometheus與thanos-sidecar使用同一個目錄
/prometheus,2個pod分別在該目錄下創建的子目錄或文件權限不一致,到此初步判斷是 因為2個pod不同的啓動用户導致權限有問題 -
登錄到prometheus的pod之後進入
/prometheus證實/prometheus $ ls -lrt total 44 -rw-r--r-- 1 nobody nobody 20001 Oct 30 02:46 queries.active -rw-r--r-- 1 nobody nobody 0 Oct 30 02:46 lock -rw-r--r-- 1 1001 root 37 Oct 30 03:31 thanos.shipper.json drwxr-xr-x 3 nobody nobody 4096 Oct 30 03:31 01JBDQNT0RZH4GFCFC564RWZT7 - 再加上日誌,源文件是在
/prometheus下,而thanos-sidecar會在/prometheus/thanos/下對源文件創建硬鏈接,先檢查一下源文件
/prometheus/01JBDQNT0RZH4GFCFC564RWZT7/chunks $ ls -lrt
total 96
-rw-r--r-- 1 nobody nobody 88911 Oct 30 03:31 000001
- 源文件沒有組的寫權限,垂死病中驚坐起!創建硬鏈接是需要寫權限的,快速驗證一下
▶ id
uid=1000(wilson) gid=1000(wilson) groups=1000(wilson)
▶ touch /tmp/test
▶ sudo chown root.root /tmp/test
▶ sudo chmod 644 /tmp/test
▶ ln /tmp/test /tmp/ttttt
ln: failed to create hard link '/tmp/ttttt' => '/tmp/test': Operation not permitted
到此為止,問題已經比較明朗了,1個pod的2個container,使用了不同的啓動用户,創建出來的文件是不同用户的權限,同時他們共享了同一個目錄,而prometheus創建的數據文件是644的權限,沒有三方寫權限。而thanos-sidecar需要把prometheus創建的數據文件創建硬鏈接到自己的目錄,由於沒有寫權限,創建硬鏈接失敗
解決方案有很多種,這裏給出最簡單的一種,因為是部署在k8s中的1個pod,只需要指定同一個啓動用户去啓動不同container即可
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: prometheus
name: prometheus-deploy
namespace: prometheus
spec:
...
template:
...
spec:
securityContext:
runAsUser: 555
containers:
...
加入securityContext,並且隨便指定一個用户id,這裏我隨便指定了一個555,重啓之後再登錄prometheus查看
問題解決
聯繫我
- 聯繫我,做深入的交流
至此,本文結束
在下才疏學淺,有撒湯漏水的,請各位不吝賜教...