修改2024-05-06 11:17:39
8.8 Load Averages(负载平均值)

CPU performance is one of the easier metrics to measure. The load average is the average number of processes currently ready to run. That is, it is an estimate of the number of processes that are capable of using the CPU at any given time. When thinking about a load average, keep in mind that most processes on your system are usually waiting for input (from the keyboard, mouse, or network, for example), meaning that most processes are not ready to run and should contribute nothing to the load average. Only processes that are actually doing something affect the load average.

CPU 性能是比较容易衡量的指标之一。


也就是说,它是对任何给定时间内能够使用 CPU 的进程数量的估计。



8.8.1 Using uptime(使用 uptime)

The uptime command tells you three load averages in addition to how long the kernel has been running:

除了内核运行的时间外,uptime 命令还能告诉你三个负载平均值:

$ uptime
... up 91 days, ... load average: 0.08, 0.03, 0.01

The three bolded numbers are the load averages for the past 1 minute, 5 minutes, and 15 minutes, respectively. As you can see, this system isn’t very busy: An average of only 0.01 processes have been running across all processors for the past 15 minutes. In other words, if you had just one processor, it was only running userspace applications for 1 percent of the last 15 minutes. (Traditionally, most desktop systems would exhibit a load average of about 0 when you were doing anything except compiling a program or playing a game. A load average of 0 is usually a good sign, because it means that your processor isn’t being challenged and you’re saving power.)






NOTE User interface components on current desktop systems tend to occupy more of the CPU than those in the past. For example, on Linux systems, a web browser’s Flash plugin can be a particularly notorious resource hog, and Flash applications can easily occupy much of a system’s CPU and memory due to poor all-around implementation.注意:当前桌面系统上的用户界面组件往往占用的CPU资源比过去多。例如,在Linux系统上,Web浏览器的Flash插件可能是一个特别臭名昭著的资源占用者,由于实现不佳,Flash应用程序很容易占用系统的大部分CPU和内存。

If a load average goes up to around 1, a single process is probably using the CPU nearly all of the time. To identify that process, use the top command; the process will usually rise to the the top of the display.

如果平均负载上升到 1 左右,则可能是一个进程几乎一直在使用 CPU。

要识别该进程,请使用 top 命令;该进程通常会出现在显示屏的顶部。

Most modern systems have more than one processor core or CPU, so multiple processes can easily run simultaneously. If you have two cores, a load average of 1 means that only one of the cores is likely active at any given time, and a load average of 2 means that both cores have just enough to do all of the time.





8.8.2 High Loads(高负荷)

A high load average does not necessarily mean that your system is having trouble. A system with enough memory and I/O resources can easily handle many running processes. If your load average is high and your system still responds well, don’t panic: The system just has a lot of processes sharing the CPU. The processes have to compete with each other for processor time, and as a result they’ll take longer to perform their computations than they would if they were each allowed to use the CPU all of the time. Another case where you might see a high load average as normal is a web server, where processes can start and terminate so quickly that the load average measurement mechanism can’t function effectively.






However, if you sense that the system is slow and the load average is high, you might be running into memory performance problems. When the system is low on memory, the kernel can start to thrash, or rapidly swap memory for processes to and from the disk. When this happens, many processes will become ready to run, but their memory might not be available, so they will remain in the ready-to-run state (and contribute to the load average) for much longer than they normally would.




We’ll now look at memory in much more detail.


8.9 Memory(内存)

One of the simplest ways to check your system’s memory status as a whole is to run the free command or view /proc/meminfo to see how much real memory is being used for caches and buffers. As we’ve just mentioned, performance problems can arise from memory shortages. If there isn’t much cache/buffer memory being used (and the rest of the real memory is taken), you may need more memory. However, it’s too easy to blame a shortage of memory for every performance problem on your machine.





8.9.1 How Memory Works(内存的工作原理)

Recall from Chapter 1 that the CPU has a memory management unit (MMU) that translates the virtual memory addresses used by processes into real ones. The kernel assists the MMU by breaking the memory used by processes into smaller chunks called pages. The kernel maintains a data structure, called a page table, that contains a mapping of a processes’ virtual page addresses to real page addresses in memory. As a process accesses memory, the MMU translates the virtual addresses used by the process into real addresses based on the kernel’s page table.





A user process does not actually need all of its pages to be immediately available in order to run. The kernel generally loads and allocates pages as a process needs them; this system is known as on-demand paging or just demand paging. To see how this works, consider how a program starts and runs as a new process:




  1. The kernel loads the beginning of the program’s instruction code into memory pages.
  2. The kernel may allocate some working-memory pages to the new process.
  3. As the process runs, it might reach a point where the next instruction in its code isn’t in any of the pages that the kernel initially loaded. At this point, the kernel takes over, loads the necessary pages into memory, and then lets the program resume execution.
  4. Similarly, if the program requires more working memory than was initially allocated, the kernel handles it by finding free pages (or by making room) and assigning them to the process
  5. 内核将程序的指令代码的开头加载到内存页中。
  6. 内核可能为新进程分配一些工作内存页。
  7. 当进程运行时,它可能达到一个点,其中它的代码中的下一条指令不在内核最初加载的任何页中。此时,内核接管,将所需的页加载到内存中,然后让程序继续执行。
  8. 类似地,如果程序需要的工作内存超过了最初分配的内存,内核通过找到空闲页(或腾出空间)并将其分配给进程来处理。8.9.2 Page Faults

If a memory page is not ready when a process wants to use it, the process triggers a page fault. In the event of a page fault, the kernel takes control of the CPU from the process in order to get the page ready. There are two kinds of page faults: minor and major.




Minor Page Faults( 页面小故障)

A minor page fault occurs when the desired page is actually in main memory but the MMU doesn’t know where it is. This can happen when the process requests more memory or when the MMU doesn’t have enough space to store all of the page locations for a process. In this case, the kernel tells the MMU about the page and permits the process to continue. Minor page faults aren’t such a big deal, and many occur as a process runs. Unless you need maximum performance from some memory-intensive program, you probably shouldn’t worry about them.





Major Page Faults( 主要页面故障)

A major page fault occurs when the desired memory page isn’t in main memory at all, which means that the kernel must load it from the disk or some other slow storage mechanism. A lot of major page faults will bog the system down because the kernel must do a substantial amount of work to provide the pages, robbing normal processes of their chance to run.



Some major page faults are unavoidable, such as those that occur when you load the code from disk when running a program for the first time. The biggest problems happen when you start running out of memory and the kernel starts to swap pages of working memory out to the disk in order to make room for new pages.



Watching Page Faults(观察网页故障)

You can drill down to the page faults for individual processes with the ps, top, and time commands. The following command shows a simple example of how the time command displays page faults. (The output of the cal command doesn’t matter, so we’re discarding it by redirecting that to /dev/null.)


$ /usr/bin/time cal > /dev/null
0.00user 0.00system 0:00.06elapsed 0%CPU (0avgtext+0avgdata 
648inputs+0outputs (2major+254minor)pagefaults 0swaps

As you can see from the bolded text, when this program ran, there were 2 major page faults and 254 minor ones. The major page faults occurred when the kernel needed to load the program from the disk for the first time. If you ran the command again, you probably wouldn’t get any major page faults because the kernel would have cached the pages from the disk.




If you’d rather see the page faults of processes as they’re running, use top or ps. When running top, use f to change the displayed fields and u to display the number of major page faults. (The results will show up in a new, nFLT column. You won’t see the minor page faults.)




When using ps, you can use a custom output format to view the page faults for a particular process. Here’s an example for process ID 20365:

在使用ps时,您可以使用自定义的输出格式来查看特定进程的页面错误。以下是针对进程ID 20365的示例:

$ ps -o pid,min_flt,maj_flt 20365
20365 834182 23

The MINFL and MAJFL columns show the numbers of minor and major page faults. Of course, you can combine this with any other process selection options, as described in the ps(1) manual page.

MINFL 和 MAJFL 列显示次要和主要页面故障的数量。

当然,您也可以将其与任何其他流程选择选项相结合,详见 ps(1) 手册页面。

Viewing page faults by process can help you zero in on certain problematic components. However, if you’re interested in your system performance as a whole, you need a tool to summarize CPU and memory action across all processes.


不过,如果你对系统的整体性能感兴趣,就需要一个工具来汇总所有进程的 CPU 和内存运行情况。

8.10 Monitoring CPU and Memory Performance with vmstat(使用vmstat监控CPU和内存性能)

Among the many tools available to monitor system performance, the vmstat command is one of the oldest, with minimal overhead. You’ll find it handy for getting a high-level view of how often the kernel is swapping pages in and out, how busy the CPU is, and IO utilization.



The trick to unlocking the power of vmstat is to understand its output. For example, here’s some output from vmstat 2, which reports statistics every 2 seconds:

解锁vmstat的威力的关键在于理解其输出。例如,这是使用vmstat 2命令每2秒报告一次统计数据的一些输出:

$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 320416 3027696 198636 1072568 0 0 1 1 2 0 15 2 83 0
2 0 320416 3027288 198636 1072564 0 0 0 1182 407 636 1 0 99 0
1 0 320416 3026792 198640 1072572 0 0 0 58 281 537 1 0 99 0
0 0 320416 3024932 198648 1074924 0 0 0 308 318 541 0 0 99 1
0 0 320416 3024932 198648 1074968 0 0 0 0 208 416 0 0 99 0
0 0 320416 3026800 198648 1072616 0 0 0 0 207 389 0 0 100 0

The output falls into categories: procs for processes, memory for memory usage, swap for the pages pulled in and out of swap, io for disk usage, system for the number of times the kernel switches into kernel code, and cpu for the time used by different parts of the system


The preceding output is typical for a system that isn’t doing much. You’ll usually start looking at the second line of output—the first one is an average for the entire uptime of the system. For example, here the system has 320416KB of memory swapped out to the disk (swpd) and around 3025000KB (3 GB) of real memory free. Even though some swap space is in use, the zero-valued si (swap-in) and so (swap-out) columns report that the kernel is not currently swapping anything in or out from the disk. The buff column indicates the amount of memory that the kernel is using for disk buffers (see 4.2.5 Disk Buffering, Caching, and Filesystems).





On the far right, under the CPU heading, you see the distribution of CPU time in the us, sy, id, and wa columns. These list (in order) the percentage of time that the CPU is spending on user tasks, system (kernel) tasks, idle time, and waiting for I/O. In the preceding example, there aren’t too many user processes running (they’re using a maximum of 1 percent of the CPU); the kernel is doing practically nothing, while the CPU is sitting around doing nothing 99 percent of the time.




Now, watch what happens when a big program starts up sometime later (the first two lines occur right before the program runs):


Example 8-3. Memory activity


procs -----------memory---------- ---swap-- -----io---- -system-- ----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 320412 2861252 198920 1106804 0 0 0 0 2477 4481 25 2 72 
1 0 320412 2861748 198924 1105624 0 0 0 40 2206 3966 26 2 72 0
1 0 320412 2860508 199320 1106504 0 0 210 18 2201 3904 26 2 71 1
1 1 320412 2817860 199332 1146052 0 0 19912 0 2446 4223 26 3 63 8
2 2 320284 2791608 200612 1157752 202 0 4960 854 3371 5714 27 3 51 
1 1 320252 2772076 201076 1166656 10 0 2142 1190 4188 7537 30 3 53 
0 3 320244 2727632 202104 1175420 20 0 1890 216 4631 8706 36 4 46 

As you can see at ➊ in Example 8-3, the CPU starts to see some usage for an extended period, especially from user processes. Because there is enough free memory, the amount of cache and buffer space used starts to increase as the kernel starts to use the disk more.



Later on, we see something interesting: Notice at ➋ that the kernel pulls some pages into memory that were once swapped out (the si column). This means that the program that just ran probably accessed some pages shared by another process. This is common; many processes use the code in certain shared libraries only when starting up.




Also notice from the b column that a few processes are blocked (prevented from running) while waiting for memory pages. Overall, the amount of free memory is decreasing, but it’s nowhere near being depleted. There’s also a fair amount of disk activity, as seen by the increasing numbers in the bi (blocks in) and bo (blocks out) columns.



The output is quite different when you run out of memory. As the free space depletes, both the buffer and cache sizes decrease because the kernel increasingly needs the space for user processes. Once there is nothing left, you’ll start to see activity in the so (swapped out) column as the kernel starts moving pages onto the disk, at which point nearly all of the other output columns change to reflect the amount of work that the kernel is doing. You see more system time, more data going in and out of the disk, and more processes blocked because the memory they want to use is not available (it has been swapped out).





We haven’t explained all of the vmstat output columns. You can dig deeper into them in the vmstat(8) manual page, but you might have to learn more about kernel memory management first from a class or a book like Operating System Concepts, 9th edition (Wiley, 2012) in order to understand them.


你可以在vmstat(8)的手册页中深入了解它们,但为了理解它们,你可能需要先从课程或者像《操作系统概念》(第9版,Wiley,2012) 这样的书籍中更多地了解内核内存管理。

8.11 I/O Monitoring(输入/输出监控)

By default, vmstat shows you some general I/O statistics. Although you can get very detailed per-partition resource usage with vmstat -d, you’ll get a lot of output from this option, which might be overwhelming. Instead, try starting out with a tool just for I/O called iostat.

默认情况下,vmstat 会显示一些一般的 I/O 统计信息。

虽然使用 vmstat -d 可以获得非常详细的每个分区资源使用情况,但该选项会产生大量输出,可能会让人难以承受。

相反,你可以尝试从名为 iostat 的 I/O 工具开始。

8.11.1 Using iostat

Like vmstat, when run without any options, iostat shows the statistics for your machine’s current uptime:

与 vmstat 一样,在不带任何选项的情况下运行时,iostat 会显示机器当前正常运行时间的统计数据:

$ iostat
[kernel information]
avg-cpu: %user %nice %system %iowait %steal %idle
 4.46 0.01 0.67 0.31 0.00 94.55
Device: tp s kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 4.6 7 7.2 8 49.86 9493727 65011716
sde 0.0 0 0.0 0 0.00 1230 0

The avg-cpu part at the top reports the same CPU utilization information as other utilities that you’ve seen in this chapter, so skip down to the bottom, which shows you the following for each device:

顶部的 avg-cpu 部分报告的 CPU 利用率信息与本章中的其他实用程序相同,因此请跳到底部,它将显示每个设备的以下信息:


Another similarity to vmstat is that you can give an interval argument, such as iostat 2, to give an update every 2 seconds. When using an interval, you might want to display only the device report by using the -d option (such as iostat -d 2).

与vmstat类似的另一个特点是,你可以提供一个间隔参数,比如iostat 2,以便每2秒更新一次。

当使用间隔时,你可能希望只显示设备报告,可以使用-d选项(比如iostat -d 2)。

By default, the iostat output omits partition information. To show all of the partition information, use the -p ALL option. Because there are many partitions on a typical system, you’ll get a lot of output. Here’s part of what you might see:


要显示所有分区信息,请使用-p ALL选项。


$ iostat -p ALL
--Device: tps kB_read/s kB_wrtn/s kB_read 
--snipsda 4.67 7.27 49.83 9496139 
sda1 4.38 7.16 49.51 9352969 
sda2 0.00 0.00 0.00 6 
sda5 0.01 0.11 0.32 141884 
scd0 0.00 0.00 0.00 0 
sde 0.00 0.00 0.00 1230 

In this example, sda1, sda2, and sda5 are all partitions of the sda disk, so there will be some overlap between the read and written columns. However, the sum of the partition columns won’t necessarily add up to the disk column. Although a read from sda1 also counts as a read from sda, keep in mind that you can read from sda directly, such as when reading the partition table.

在本例中,sda1、sda2 和 sda5 都是 sda 磁盘的分区,因此读取列和写入列之间会有一些重叠。


虽然从 sda1 的读取也算作从 sda 的读取,但请记住,您可以直接从 sda 读取,例如在读取分区表时。

8.11.2 Per-Process I/O Utilization and Monitoring: iotop(每进程 I/O 利用率和监控:iotop)

If you need to dig even deeper to see I/O resources used by individual processes, the iotop tool can help. Using iotop is similar to using top. There is a continuously updating display that shows the processes using the most I/O, with a general summary at the top:

如果需要更深入地查看单个进程使用的 I/O 资源,iotop 工具可以提供帮助。

使用 iotop 与使用 top 类似。

它有一个持续更新的显示屏,显示使用最多 I/O 的进程,顶部有一个总的摘要:

# iotop
Total DISK READ: 4.76 K/s | Total DISK WRITE: 333.31 K/s
 260 be/3 root 0.00 B/s 38.09 K/s 0.00 % 6.98 % [jbd2/sda1-
2611 be/4 juser 4.76 K/s 10.32 K/s 0.00 % 0.21 % zeitgeistdaemon
2636 be/4 juser 0.00 B/s 84.12 K/s 0.00 % 0.20 % zeitgeistfts
1329 be/4 juser 0.00 B/s 65.87 K/s 0.00 % 0.03 % soffice.b~ashpipe=6
6845 be/4 juser 0.00 B/s 812.63 B/s 0.00 % 0.00 % chromium-browser
19069 be/4 juser 0.00 B/s 812.63 B/s 0.00 % 0.00 % rhythmbox

Along with the user, command, and read/write columns, notice that there is a TID column (thread ID) instead of a process ID. The iotop tool is one of the few utilities that displays threads instead of processes.



The PRIO (priority) column indicates the I/O priority. It’s similar to the CPU priority that you’ve already seen, but it affects how quickly the kernel schedules I/O reads and writes for the process. In a priority such as be/4, the be part is the scheduling class, and the number is the priority level. As with CPU priorities, lower numbers are more important; for example, the kernel allows more time for I/O for a process with be/3 than one with be/4.






The kernel uses the scheduling class to add more control for I/O scheduling. You’ll see three scheduling classes from iotop:


o be Best-effort. The kernel does its best to fairly schedule I/O for this class. Most processes run under this I/O scheduling class.

o rt Real-time. The kernel schedules any real-time I/O before any other class of I/O, no matter what.

o idle Idle. The kernel performs I/O for this class only when there is no other I/O to be done. There is no priority level for the idle scheduling class.

o be 最佳努力。内核尽其所能公平地为该类别调度I/O。大多数进程在此I/O调度类下运行。

o rt 实时。内核在任何其他I/O类别之前调度任何实时I/O。

o idle 空闲。内核仅在没有其他I/O需要完成时才为此类别执行I/O操作。空闲调度类别没有优先级级别。

You can check and change the I/O priority for a process with the ionice utility; see the ionice(1) manual page for details. You probably will never need to worry about the I/O priority, though.


8.12 Per-Process Monitoring with pidstat

You’ve seen how you can monitor specific processes with utilities such as top and iotop. However, this display refreshes over time, and each update erases the previous output. The pidstat utility allows you to see the resource consumption of a process over time in the style of vmstat. Here’s a simple example for monitoring process 1329, updating every second:





$ pidstat -p 1329 1
Linux 3.2.0-44-generic-pae (duplex) 07/01/2015 _i686_ (4 CPU)
09:26:55 PM PID %usr %system %guest %CPU CPU Command
09:27:03 PM 1329 8.00 0.00 0.00 8.00 1 myprocess
09:27:04 PM 1329 0.00 0.00 0.00 0.00 3 myprocess
09:27:05 PM 1329 3.00 0.00 0.00 3.00 1 myprocess
09:27:06 PM 1329 8.00 0.00 0.00 8.00 3 myprocess
09:27:07 PM 1329 2.00 0.00 0.00 2.00 3 myprocess
09:27:08 PM 1329 6.00 0.00 0.00 6.00 2 myprocess

The default output shows the percentages of user and system time and the overall percentage of CPU time, and it even tells you which CPU the process was running on. (The %guest column here is somewhat odd— it’s the percentage of time that the process spent running something inside a virtual machine. Unless you’re running a virtual machine, don’t worry about this.)



Although pidstat shows CPU utilization by default, it can do much more. For example, you can use the - r option to monitor memory and -d to turn on disk monitoring. Try them out, and then look at the pidstat(1) manual page to see even more options for threads, context switching, or just about anything else that we’ve talked about in this chapter.



8.13 Further Topics(进一步的主题)

One reason there are so many tools to measure resource utilization is that a wide array of resource types are consumed in many different ways. In this chapter, you’ve seen CPU, memory, and I/O as system resources being consumed by processes, threads inside processes, and the kernel.


在本章中,您已经看到了 CPU、内存和 I/O 作为系统资源被进程、进程内的线程和内核所消耗。

The other reason that the tools exist is that the resources are limited and, for a system to perform well, its components must strive to consume fewer resources. In the past, many users shared a machine, so it was necessary to make sure that each user had a fair share of resources. Now, although a modern desktop computer may not have multiple users, it still has many processes competing for resources. Likewise, high-performance network servers require intense system resource monitoring.





Further topics in resource monitoring and performance analysis include the following:


o sar (System Activity Reporter) The sar package has many of the continuous monitoring capabilities of vmstat, but it also records resource utilization over time. With sar, you can look back at a particular time to see what your system was doing. This is handy when you have a past system event that you want to analyze

o sar(系统活动报告器) sar 软件包具有 vmstat 的许多连续监控功能,但它还记录了资源利用情况的变化。

通过 sar,您可以回顾特定时间以查看系统的运行情况。当您有一个过去的系统事件需要分析时,这非常方便。

o acct (Process accounting) The acct package can record the processes and their resource utilization.

o acct(进程记账) acct 软件包可以记录进程及其资源利用情况。

o Quotas. You can limit many system resources on a per-process or peruser basis. See /etc/security/limits.conf for some of the CPU and memory options; there’s also a limits.conf(5) manual page. This is a PAM feature, so processes are subject to this only if they’ve been started from something that uses PAM (such as a login shell). You can also limit the amount of disk space that a user can use with the quota system.

o 配额。您可以在每个进程或每个用户的基础上限制许多系统资源。

有关 CPU 和内存选项,请参阅 /etc/security/limits.conf;还有一个 limits.conf(5) 手册页。

这是一个 PAM 功能,因此只有从使用 PAM 的东西(如登录 shell)启动的进程才受到此限制。


If you’re interested in systems tuning and performance in particular, Systems Performance: Enterprise and the Cloud by Brendan Gregg (Prentice Hall, 2013) goes into much more detail.

如果您对系统调优和性能特别感兴趣,Brendan Gregg 的《系统性能:企业和云计算》(Prentice Hall,2013)提供了更详细的信息。

We also haven’t yet touched on the many, many tools that can be used to monitor network resource utilization. To use those, you first have to understand how the network works. That’s where we’re headed next.




