reclaim any target process anytime, anywhere....echo file > /proc/PID/reclaim Reclaim anonymous pages only....echo anon > /proc/PID/reclaim Reclaim all pages echo all > /proc/PID/reclaim Some pages could be shared...In such scenario, per-process reclaim is rather coarse-grained and now supports more fine-grained reclaim...echo [addr] [size-byte] > /proc/pid/reclaim
= gfp_mask & __GFP_DIRECT_RECLAIM; const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER; struct...ac->preferred_zoneref->zone) goto nopage; if (gfp_mask & __GFP_KSWAPD_RECLAIM) wake_all_kswapds.... */ if (compact_result == COMPACT_DEFERRED) goto nopage; /* * Looks like reclaim/...can_direct_reclaim) goto nopage; /* Make sure we know about allocations which stall for too long...*/ if (current->flags & PF_MEMALLOC) goto nopage; /* Try direct reclaim and then allocating */
1.reclaim的相关命令 2.运行reclaim之前的情况 3.运行reclaim之后的情况 1.reclaim的相关命令 查看当前的分区信息: # /opt/oracle.SupportTools.../reclaimdisks.sh -check 在一键部署前,需先在所有DB节点上运行reclaim: # /opt/oracle.SupportTools/reclaimdisks.sh -free...-reclaim 2.运行reclaim之前的情况 查看当前的分区信息: # /opt/oracle.SupportTools/reclaimdisks.sh -check Model is...LVDoNotRemoveOrUse VGExaDb -wi-a----- 1.00g 3.运行reclaim...之后的情况 执行reclaim: # /opt/oracle.SupportTools/reclaimdisks.sh -free -reclaim 之后再次查询vgs/pvs/lvs信息,会发现
zone_allows_reclaim(ac->preferred_zoneref->zone, zone)) continue; 源码路径 : linux-4.12\mm\page_alloc.c...(zone->zone_pgdat, gfp_mask, order); switch (ret) { case NODE_RECLAIM_NOSCAN: /* did not scan...: /* did we reclaim enough */ if (zone_watermark_ok(zone, order, mark, ac_classzone_idx...zone_allows_reclaim(ac->preferred_zoneref->zone, zone)) continue; ret = node_reclaim(zone->zone_pgdat...; case NODE_RECLAIM_FULL: /* scanned but unreclaimable */ continue; default: /* did
zone_reclaim_mode Zone_reclaim_mode allows someone to set more or less aggressive approaches to reclaim...If it is set to zero then no zone reclaim occurs....This is value ORed together of 1 = Zone reclaim on 2 = Zone reclaim writes dirty pages out 4...= Zone reclaim swaps pages zone_reclaim_mode is disabled by default....备注:zone_reclaim_mode默认为0即不启用zone_reclaim模式,1为打开zone_reclaim模式从本地节点回收内存;min_free_kbytesy允许内核使用的最小内存。
* Ensure kswapd doesn't accidentally go to sleep as long as we loop */ if (gfp_mask & __GFP_KSWAPD_RECLAIM...can_direct_reclaim) goto nopage; 源码路径 : linux-4.12\mm\page_alloc.c#3817 调用 __alloc_pages_direct_reclaim...函数 , 直接进行页回收 ; /* Try direct reclaim and then allocating */ page = __alloc_pages_direct_reclaim(gfp_mask...can_direct_reclaim) goto nopage; /* Make sure we know about allocations which stall for too long...*/ if (current->flags & PF_MEMALLOC) goto nopage; /* Try direct reclaim and then allocating */
调用 wake_all_kswapds 函数 , 异步 回收 物理内存页 , 这里的异步 是通过 唤醒 " 回收线程 " 进行回收内存页的 ; if (gfp_mask & __GFP_KSWAPD_RECLAIM...got_pg; 源码路径 : linux-4.12\mm\page_alloc.c#3743 四、直接分配内存 ---- 申请 物理页 内存 的阶数 , 满足以下 3 个条件 : can_direct_reclaim...to ignore * watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen. */ if (can_direct_reclaim...want to heavily disrupt the * system, so we fail the allocation instead of entering * direct reclaim.... */ if (compact_result == COMPACT_DEFERRED) goto nopage; /* * Looks like reclaim/
要缓和这种状况,Kubelet 能够对每种资源定义 minimum-reclaim。...kubelet 一旦发现了资源压力,就会试着回收至少 minimum-reclaim 的资源,使得资源消耗量回到期望范围。...eviction-hard=memory.available<500Mi,nodefs.available<1Gi,imagefs.available<100Gi –eviction-minimum-reclaim...缺省情况下,所有资源的 eviction-minimum-reclaim 为 0。
zone_allows_reclaim(ac->preferred_zoneref->zone, zone)) continue; //从节点回收“没有映射到进程虚拟地址空间的内存页”,...然后检查水线 ret = node_reclaim(zone->zone_pgdat, gfp_mask, order); switch (ret) { case NODE_RECLAIM_NOSCAN...: /* did not scan */ continue; case NODE_RECLAIM_FULL: /* scanned but unreclaimable */...can_direct_reclaim) goto nopage; /* Avoid recursion of direct reclaim */ if (current->flags & PF_MEMALLOC...) goto nopage; /* Try direct reclaim and then allocating */ //直接页面回收,然后进行页面分配 page = __alloc_pages_direct_reclaim
zone_watermark_fast(zone, order, mark, ac_classzone_idx(ac), alloc_flags)) { ret = node_reclaim...; case NODE_RECLAIM_FULL: continue; default: if (zone_watermark_ok(zone,...从这张图可以看出: 如果空闲页数目min值,则该zone非常缺页,页面回收压力很大,应用程序写内存操作就会被阻塞,直接在应用程序的进程上下文中进行回收,即direct reclaim。...安卓系统中对水位的调节 为了避免direct reclaim,我们需要空余的内存大小一直保持在min值以上。...而linux中默认的low与min之间的值又比较小,所以就很容易造成direct reclaim的情况。
对于不可睡眠的上下文,如果我们用常规的GFP_KERNEL这样的标记去申请内存,可能引发直接的内存reclaim,从而引起睡眠,所以GFP_KERNEL这种标记只适合进程上下文调用: ?...#define GFP_KERNEL \ (__GFP_RECLAIM | __GFP_IO | __GFP_FS) #define __GFP_RECLAIM \ ((__force...gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM) 内存水位,PF_MEMALLOC和GFP_ATOMIC 那么GFP_ATOMIC是否仅仅意味着不能睡眠呢...档案是否定的,GFP_ATOMIC还与内存reclaim的水位相关。下面这个图是讲述水位watermark的一个著名的图,笔者懒得画了,直接从网下copy过来: ?...在Linux中,内存有3个水位: HIGH: 系统的free内存大于HIGH水位的时候,是一个相对保险的值,不需要急着做内存回收(reclaim); LOW: 系统的free内存达到LOW水位的时候,
* * __GFP_DIRECT_RECLAIM indicates that the caller may enter direct reclaim....* * __GFP_RECLAIM is shorthand to allow/forbid both direct and kswapd reclaim....((__force gfp_t)___GFP_DIRECT_RECLAIM) /* Caller can reclaim */ #define __GFP_KSWAPD_RECLAIM (...(__force gfp_t)___GFP_KSWAPD_RECLAIM) /* kswapd can wake */ #define __GFP_RECLAIM ((__force gfp_t)(__...__GFP_DIRECT_RECLAIM __GFP_KSWAPD_RECLAIM __GFP_RECLAIM __GFP_REPEAT 在分配失败后自动重试,但在尝试若干次之后会停止 __GFP_NOFAIL
,应该使用GFP_ATOMIC标记,譬如内核中有大量的kmalloc/GFP_ATOMIC的例子: 对于不可睡眠的上下文,如果我们用常规的GFP_KERNEL这样的标记去申请内存,可能引发直接的内存reclaim...,从而引起睡眠,所以GFP_KERNEL这种标记只适合进程上下文调用: #define GFP_KERNEL \ (__GFP_RECLAIM | __GFP_IO | __GFP_FS)...#define __GFP_RECLAIM \ ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM) 内存水位,PF_MEMALLOC...和GFP_ATOMIC HIGH: 系统的free内存大于HIGH水位的时候,是一个相对保险的值,不需要急着做内存回收(reclaim); LOW: 系统的free内存达到LOW水位的时候,启动后台...进行内存回收,回收的目标是让空闲内存达到HIGH水位; MIN:系统应该保有的最小free内存,当空闲内存达到这个值的时候,kswapd的后台回收可能来不及了,一般用户在申请内存的时候,进行DIRECT RECLAIM
* exceed the per-node dirty limit in the slowpath * (spread_dirty_pages unset) before going into reclaim...zone_allows_reclaim(ac->preferred_zoneref->zone, zone)) continue; ret = node_reclaim(zone->zone_pgdat..., gfp_mask, order); switch (ret) { case NODE_RECLAIM_NOSCAN: /* did not scan */ continue...; case NODE_RECLAIM_FULL: /* scanned but unreclaimable */ continue; default: /* did...we reclaim enough */ if (zone_watermark_ok(zone, order, mark, ac_classzone_idx(ac), alloc_flags
(gfp_mask); fs_reclaim_release(gfp_mask); /* (1.4.6) 如果指定了__GFP_DIRECT_RECLAIM,判断当前是否是非原子上下文可以睡眠...(node_reclaim_mode & RECLAIM_WRITE), // 默认为0 .may_unmap = !!...(node_reclaim_mode & RECLAIM_UNMAP), // 默认为0 .may_swap = 1, .reclaim_idx = gfp_zone(gfp_mask...NODE_RECLAIM_NOSCAN: /* did not scan */ continue; case NODE_RECLAIM_FULL: /* scanned...can_direct_reclaim) goto nopage; /* Avoid recursion of direct reclaim */ /* (10) 避免递归回收 */ if (
#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) #define GFP_KERNEL (__GFP_RECLAIM...__GFP_FS) #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM...) #define GFP_NOIO (__GFP_RECLAIM) #define GFP_NOFS (__GFP_RECLAIM | __GFP_IO) #define GFP_USER...(__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_HARDWALL) #define GFP_DMA __GFP_DMA #define GFP_DMA32...__GFP_RECLAIM=可回收,__GFP_IO=有IO的操作,有IO的操作就可能会导致sleep GFP_ATOMIC: 它也是有三个组成,__GFP_HIGH=高优先级等等 大家有兴趣可以看看Gfp.h
大致的意思就是,现在所有需要allocate memory的线程,都得调用zone_reclaim去inactive_list上去回收pagecache,这个行为也就是所谓的direct reclaim...逻辑过程如下图所示, 所以说,我们应该要避免做direct reclaim。...kswapd,然后kswapd来做background reclaim。...也就是说normal zone里面的free pages不够用了,于是触发了direct reclaim。但是,假如此时DMA zone里还有足够的free pages呢?...对于这个内核也提供了一个接口给用户使用:vm.zone_reclaim_mode.
zone_allows_reclaim(ac->preferred_zoneref->zone, zone)) //不支持回收功能 continue;...ret = node_reclaim(zone->zone_pgdat, gfp_mask, order); //zone回收 switch (ret) {...case NODE_RECLAIM_NOSCAN: /* did not scan */ continue; case...NODE_RECLAIM_FULL: /* scanned but unreclaimable */ continue;...default: /* did we reclaim enough */ if (zone_watermark_ok(zone, order
(GCC) ) #1 SMP Wed Aug 14 16:26:59 UTC 2019 三、内核参数 vm.overcommit_memory=1 vm.drop_caches=1 vm.zone_reclaim_mode...解决方式调整了主节点的内核参数 vm.zone_reclaim_mode = 1 vm.min_free_kbytes = 512000 当前主节点的内核参数配置为 vm.overcommit_memory...=1 vm.drop_caches=1 vm.zone_reclaim_mode=0 vm.max_map_count=655360 vm.dirty_background_ratio=25 vm.dirty_ratio...和min_free_kbytes参数 vm.overcommit_memory=0 vm.drop_caches=1 vm.zone_reclaim_mode=0 vm.max_map_count=655360...security_inode_permission+0x25/0x30 六、继续调整内核参数 问题首先发生在主节点,调整min_free_kbytes和zone_reclaim_mode参数后,主节点问题没有再发生
直接内存回收执行路径是: __alloc_pages_slowpath() -> __alloc_pages_direct_reclaim() -> __perform_reclaim() -> try_to_free_pages...unsigned long nr_reclaimed; struct scan_control sc = { /* 打算回收32个页框 */ .nr_to_reclaim....may_unmap = 1, /* 允许进行非文件页的操作 */ .may_swap = 1, }; /* * Do not enter reclaim...(nr_reclaimed); return nr_reclaimed; } 主要通过throttle_direct_reclaim()函数判断是否加入到pgdat->pfmemalloc_wait...pfmemalloc_wait,并设置为TASK_KILLABLE状态,表示允许 TASK_UNINTERRUPTIBLE 响应致命信号的状态 */ static bool throttle_direct_reclaim
领取专属 10元无门槛券
手把手带您无忧上云