Collect app dumps on kubernetes
DevOps ·Background
当容器中的应用崩溃时,会使容器发出SIGKILL信号,之后容器会立刻崩溃, 导致程序的dump文件也被一起销毁。为了能够保存dump文件,以便之后调查 分析程序崩溃的原因,所以需要一种机制能够收集并保存dump文件。
原理
- 使用hostPath类型的volume挂载到应用的dump输出路径上,对于java程序, 通常在程序启动时可以通过 -XX:HeapDumpPath=$HEAP_DUMP_PATH 来指定dump path,这样就能够将dump文件保存到node所在的磁盘上;
- 运行DaemonSet,监控node上保存应用dump的目录,检测到目录CREATE/MODIFY等Event时,将dump目录同步到dumps文件服务器;
设计方案
应用pod挂载HostPath volume到dump path
// Program will crash if JVM's maximum memory is less than 12MB
public class HelloWorld {
static final int SIZE=2*1024*1024; // 8MB
public static void main(String []args) {
try {
System.out.println("Hello World");
Thread.sleep(30*1000);
int[] i = new int[SIZE];
} catch(Exception e) {
e.printStackTrace();
}
}
}
# run.sh
java -Xmx12m -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCApplicationStoppedTime -XX:HeapDumpPath=dumps/target -jar HelloWorld.jar
spec:
containers:
- image: rhzx3519/ubuntu:dumps
name: ubuntu
volumeMounts:
- name: vardumps
mountPath: /app/dumps # 挂载到app dump path
volumes:
- name: vardumps
hostPath:
path: /var/opt/dumps # node dumps目录
监控pod
# monitor.sh
inotifywait -mrq --format '%e' --event create,delete,modify $filename | while read event
do
case $event in MODIFY|CREATE|DELETE) bash $script ;;
esac
done
# upload.sh
function main() {
(cd $dumpsDir && clear) # 递归地清理dump文件
upload # 上传dump文件 host
}
文件服务器
文件服务器使用nginx实现
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: file-server
name: file-server
spec:
replicas: 1
selector:
matchLabels:
app: file-server
template:
metadata:
labels:
app: file-server
spec:
containers:
- image: nginx:stable
name: file-server
volumeMounts:
- mountPath: /etc/nginx/conf.d
name: config
readOnly: true
- mountPath: /data/repo
name: file-storage
ports:
- containerPort: 80
protocol: TCP
name: http
- containerPort: 22
protocol: TCP
name: ftp
volumes:
- name: file-storage
emptyDir: {}
- name: config
configMap:
name: nginx-conf-configmap
---
# nginx
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-conf-configmap
data:
nginx.conf: |-
server {
listen 80;
listen [::]:80;
server_name localhost;
charset utf-8;
# download
autoindex on; # enable directory listing output
autoindex_exact_size off; # output file sizes rounded to kilobytes, megabytes, and gigabytes
autoindex_localtime on; # output local times in the directory
autoindex_format html; # 以html风格将目录展示在浏览器中
location / {
root /data/repo/;
}
}