Table of Contents

一、现象

运行了使用NFS文件系统的应用程序后，内存缓慢泄漏，
最后导致整个服务器的内存全部耗尽，系统调起多个pdflush进程，并占到CPU的99%，
这时整个系统系统变得处理速度极慢，那怕是敲个命令都慢得不行；

二、检查流程

查看系统最耗内存的进程

# top -m
Tasks: 428 total,   1 running, 427 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.4%us,  0.1%sy,  0.0%ni, 94.4%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32915200k total,  1576064k used, 31339136k free,   284588k buffers
Swap:  8385920k total,        0k used,  8385920k free,   317440k cached


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                       
14099 root      15   0  144m  96m 2160 S  0.0  0.3 309:57.67 your_program

发现最耗内存的就是使用NFS的应用程序；

查看进程占用的内存

# pmap 进程号
...
00007fffbebfd000     12K r-x--    [ anon ]
ffffffffff600000   8192K -----    [ anon ]
 total           154604K

但应用程序实际使用的内存并不多，这时就开始怀疑是系统的原因；

系统内存查看

每两秒统计一次内存使用情况，看内存的使用速度

# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 31317192 284620 337800    0    0     0     0    0    1  5  0 94  0  0
...

接着看到底是什么占用了这么多内存，先看内核空间的

# vmstat -m 
Cache                       Num  Total   Size  Pages
nfs_direct_cache              0      0    136     28
nfs_write_data              189    207    832      9
nfs_read_data                32     40    768      5
nfs_inode_cache          1027389 1027389   1032      3
nfs_page                    304    360    128     30
...
kmem_cache                  144    144   2688      1

发现了是nfs_inode_cache异常，占用了大量的内存；

使用slabtop命令查看内核slab 缓冲区信息

# slabtop -s c | head 
 Active / Total Objects (% used)    : 3318152 / 3333777 (99.5%)
 Active / Total Slabs (% used)      : 561213 / 561243 (100.0%)
 Active / Total Caches (% used)     : 92 / 145 (63.4%)
 Active / Total Size (% used)       : 1848724.93K / 1851100.14K (99.9%)
 Minimum / Average / Maximum Object : 0.02K / 0.55K / 128.00K


  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME                   
1014219 1014219 100%    1.01K 338073        3   1352292K nfs_inode_cache

1021475 1021475 100%    0.52K 145925        7    583700K radix_tree_node
#

我们看到 nfs_inode_cache 占用了大概 1.3GB，
在正常情况下，nfs_inode_cache 应该 30 MiB 左右。所以 nfs_inode_cache 在这里是异常的。

三、原因

Redhat官方的解释：
https://access.redhat.com/site/solutions/64667
《nfs_inode_cache usage is high compared to older kernels on RHEL 5 and RHEL 6,
and possibly system hangs with pdflush / flush consuming CPU》

Issue
. nfs_inode_cache usage is high compared to older kernels.
. pdflush or flush processes consuming 100% CPU, or soft lockups with pdflush or flush process running in nfs_flush_inode
. nfs_inode_cache grows uncontrolled and memory pressure does not release the memory.
. Under certain type of load the NFS performance from RHEL 6.1 client becomes very slow.
The case scenario: after writing a large number of small sized files in a sequence,
the write speed for large sized files becomes very slow, around 5 MB/sec or even less.
When writing large sized files is not preceded by a sequence of writing large number of small sized files
the speed is normal – around 200 MB/sec.
. nfs_inode_cache getting very big and eating the total amount of RAM

Environment
Red Hat Enteprise Linux 5.7 (2.6.18-274)
Red Hat Enteprise Linux 6.1 (2.6.32-131.0.15)
NFS client

所以是内核的bug引起的；

四、解决办法

内核升级
定期手动释放内存

/proc是一个虚拟文件系统，我们可以通过对它的读写操作做为与kernel实体间进行通信的一种手段。
也就是说可以通过修改/proc中的文件，来对当前kernel的行为做出调整。
那么我们可以通过调整/proc/sys/vm/drop_caches来释放内存。
操作如下：

# cat /proc/sys/vm/drop_caches               【首先，/proc/sys/vm/drop_caches的值，默认为0。】
0
# sync                                       【手动执行sync命令（描述：sync 命令运行 sync 子例程。
                                               如果必须停止系统，则运行sync 命令以确保文件系统的完整性。
                                               sync 命令将所有未写的系统缓冲区写到磁盘中，
                                               包含已修改的 i-node、已延迟的块 I/O 和读写映射文件）
# echo 2 > /proc/sys/vm/drop_caches          【将/proc/sys/vm/drop_caches值设为2】
# cat /proc/sys/vm/drop_caches
2
# free -m
     total used free shared buffers cached
Mem:  249   66   182      0     0    11
-/+ buffers/cache: 55 194
Swap: 511 0 511

再来运行free命令，会发现现在的used为66MB，free为182MB，buffers为0MB，cached为11MB。
那么有效的释放了buffer和cache。

可以设置成定时任务

# crontab -e
0 */4 * * * sync&&echo 2 >/proc/sys/vm/drop_caches

NFS nfs_inode_cache 泄露以及占用内存过高问题

一、现象

二、检查流程

三、原因

四、解决办法

相关推荐

评论抢沙发

评论前必须登录！

随机推荐

分类

热门标签

最新文章

最新评论

其他操作2

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

一、现象

二、检查流程

三、原因

四、解决办法

相关推荐

评论 抢沙发

评论前必须登录！

随机推荐

分类

热门标签

最新文章

最新评论

其他操作2

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

评论抢沙发