最近遇到個memory相關的問題,在debug過程中發現非常有意思,在此記錄下來,希望有所幫助。
問題
pod中某個container的memory usage持續增長,初始值為60Mi。在運行2天之後,通過kubectl top command --containers command查看,發現memory usage已經達到了400Mi。
但是通過docker stats看,memory usage為正常值。示例如下:
- 通過kubectl命令,memory usage 19Mi
[@ ~]$kubectl top pod nginx-deployment-66979f666d-wd24b --containers
POD NAME CPU(cores) MEMORY(bytes)
nginx-deployment-66979f666d-wd24b nginx 500m 19Mi
- 但是通過docker命令看,memory usage為正常值2.461Mi
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
5d14f804062d k8s_nginx_nginx-deployment-66979f666d-wd24b_default_65cad64e-9696-4a3a-9bc6-08f93c6d263b_0 49.94% 2.461MiB / 20MiB 12.30% 0B / 0B 0B / 0B 4
Debug過程
1. 為什麼kubectl top 和docker stats結果不一致
因為2者的統計方式不一樣。通過如下的命令可以看到container詳細的memory usage
curl --unix-socket /var/run/docker.sock "http:/v1.24/containers/5d14f804062d/stats"
{
"memory_stats":{
"usage":20332544,
"max_usage":20971520,
"stats":{
"active_anon":1990656,
"active_file":17420288,
"cache":17608704,
"dirty":0,
"hierarchical_memory_limit":20971520,
"hierarchical_memsw_limit":20971520,
"inactive_anon":4096,
"inactive_file":184320,
"mapped_file":4096,
"pgfault":7646701079,
"pgmajfault":0,
"pgpgin":1265901953,
"pgpgout":1265897156,
"rss":2039808,
"rss_huge":0,
"total_active_anon":1990656,
"total_active_file":17420288,
"total_cache":17608704,
"total_dirty":0,
"total_inactive_anon":4096,
"total_inactive_file":184320,
"total_mapped_file":4096,
"total_pgfault":7646701079,
"total_pgmajfault":0,
"total_pgpgin":1265901953,
"total_pgpgout":1265897156,
"total_rss":2039808,
"total_rss_huge":0,
"total_unevictable":0,
"total_writeback":0,
"unevictable":0,
"writeback":0
},
"failcnt":10181490,
"limit":20971520
},
}
kubectl top command統計的是 "usage":20332544, 約為19Mi(並不是精確相等)
docker stats command 統計的是 usage - cache, 即20332544 - 17608704, 約為2.5Mi.
此外還可以看到cache持續的增長。
那麼現在問題來了:cache是什麼 ? 為什麼cache持續增長?
cache是什麼?
|
From https://docs.docker.com/confi...,
cache
The amount of memory used by the processes of this control group that can be associated precisely with a block on a block device. When you read from and write to files on disk, this amount increases. This is the case if you use “conventional” I/O (open,read,writesyscalls) as well as mapped files (withmmap). It also accounts for the memory used bytmpfsmounts, though the reasons are unclear.
2. 為什麼cache持續增長?
出問題的container邏輯非常簡單,只有簡單的邏輯操作和寫入log到logfile。首先排除了業務邏輯的問題,那麼有沒有可能是log的問題呢?
通過實驗,發現當暫停寫入log時,cache不再增長。因為logfile是在EmptyDir的volume,也就是位於host的disk上的。通過docker inspect可以看到mount的路徑
"Mounts": [
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/65cad64e-9696-4a3a-9bc6-08f93c6d263b/volumes/kubernetes.io~empty-dir/nginx-vol",
"Destination": "/tmp/memorytest",
"Mode": "Z",
"RW": true,
"Propagation": "rprivate"
},
那麼現在問題來了:為什麼log寫入disk上,會導致memory cache的增加?
3. 為什麼寫disk會導致memory cache的增加?
原因是Linux is borrowing unused memory for disk caching. 具體分析可見https://www.linuxatemyram.com...
需要注意的是
disk caching only borrows the ram that applications don't currently want. It will not use swap. If applications want more memory, they just take it back from the disk cache. They will not start swapping.
由於log導致的memory增加,那麼會不會導致memory usage達到memory limit, 導致container Killed?
4. memory usage增加會不會導致OOMKilled?
不會。首先
- 如上描述,當memory快到達limit時,application will take it back from disk cache. 也就是接近limit時,此時log會寫到disk上,memory cache不會再增長。
- k8s中container的resource memory limit是傳遞給docker的,等同於
docker run --memory. 也就是説只有docker意義的memory超了,才會出現OOMKilled的問題。
Reference Link:
https://docs.docker.com/confi...
https://www.linuxatemyram.com...
https://www.linuxatemyram.com...
https://www.ibm.com/support/p...