今天翻了一个PVE集群的日志,发现一个持续报错,单一错误居然把/var/log/syslog撑到了600M,主要就一个错误
<14>Nov 15 09:20:10 xnode010 lxcfs[2201]: proc_fuse.c: 1018: proc_stat_read: cpu0 from /lxc/113/ns has unexpected cpu time: 20509567 in /proc/stat, 25054664 in cpuacct.usage_all; unable to determine idle time
<14>Nov 15 09:20:08 xnode010 lxcfs[2201]: proc_fuse.c: 1018: proc_stat_read: cpu0 from /lxc/113/ns has unexpected cpu time: 20509565 in /proc/stat, 25054662 in cpuacct.usage_all; unable to determine idle time
<14>Nov 15 09:20:03 xnode010 lxcfs[2201]: proc_fuse.c: 1018: proc_stat_read: cpu0 from /lxc/113/ns has unexpected cpu time: 20509559 in /proc/stat, 25054653 in cpuacct.usage_all; unable to determine idle time
咋一看是CPU使用率的问题,可是仔细一看cpu使用率又很低
# top -bn1 | head
top - 09:45:45 up 159 days, 20:00, 3 users, load average: 0.20, 0.16, 0.17
Tasks: 733 total, 1 running, 732 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.4 sy, 0.0 ni, 99.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 257551.1 total, 140682.6 free, 8178.7 used, 108689.9 buff/cache
MiB Swap: 8192.0 total, 8034.0 free, 158.0 used. 242308.6 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2302 root 10 -10 3039908 361588 12140 S 5.9 0.1 2633:36 ovs-vswitchd
363712 root rt 0 580032 184640 51464 S 5.9 0.1 4208:06 corosync
2198179 root 20 0 0 0 0 I 5.9 0.0 0:45.37 kworker/u80:2-ixgbe
容器分配CPU也没有让某个CPU很负载过大
# pct cpusets
----------------------------------------------------------------------------------------------------------------
100: 2
106: 3 24
110: 0 15
113: 6 26 36 38
149: 9 10 12 17
190: 8 13
----------------------------------------------------------------------------------------------------------------
日志来自lxcfs.service
官方论坛说是执行的时候使用'-l'会导致,但实际上没有'-l'也有
https://forum.proxmox.com/threads/syslog-is-spammed-with-unable-to-determine-idle-time.55032/
# systemctl status lxcfs.service
● lxcfs.service - FUSE filesystem for LXC
Loaded: loaded (/lib/systemd/system/lxcfs.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-06-08 13:44:57 CST; 5 months 7 days ago
Docs: man:lxcfs(1)
Main PID: 2201 (lxcfs)
Tasks: 11 (limit: 308995)
Memory: 28.5M
CGroup: /system.slice/lxcfs.service
└─2201 /usr/bin/lxcfs /var/lib/lxcfs
Nov 15 09:35:27 xnode010 lxcfs[2201]: proc_fuse.c: 1018: proc_stat_read: cpu0 from /lxc/113/ns has unexpected cpu time: 20511032 in /proc/stat, 25056479 in cp
Nov 15 09:35:29 xnode010 lxcfs[2201]: proc_fuse.c: 1018: proc_stat_read: cpu0 from /lxc/113/ns has unexpected cpu time: 20511032 in /proc/stat, 25056480 in cp
Nov 15 09:35:34 xnode010 lxcfs[2201]: proc_fuse.c: 1018: proc_stat_read: cpu0 from /lxc/113/ns has unexpected cpu time: 20511038 in /proc/stat, 25056488 in cp
Nov 15 09:35:39 xnode010 lxcfs[2201]: proc_fuse.c: 1018: proc_stat_read: cpu0 from /lxc/113/ns has unexpected cpu time: 20511043 in /proc/stat, 25056497 in cp
Nov 15 09:35:44 xnode010 lxcfs[2201]: proc_fuse.c: 1018: proc_stat_read: cpu0 from /lxc/113/ns has unexpected cpu time: 20511049 in /proc/stat, 25056505 in cp
既然是误报,最简单的方法就是不让它出现
新建一个配置文件
# cat /etc/rsyslog.d/pve-local.conf
# filter out
:msg, contains, "unable to determine idle time" stop
重启一下日志服务,使之生效
# systemctl restart rsyslog