【Linux】《how linux work》第八章流程和資源利用的近距離觀察詳情 - linux,翻譯阿東動態日志

Chapter 8. A Closer Look at Processes and Resource Utilization（第 8 章流程和資源利用的近距離觀察）

This chapter takes you deeper into the relationships between processes, the kernel, and system resources. There are three basic kinds of hardware resources: CPU, memory, and I/O. Processes vie for these resources, and the kernel’s job is to allocate resources fairly. The kernel itself is also a resource—a software resource that processes use to perform tasks such as creating new processes and communicating with other processes. Many of the tools that you see in this chapter are often thought of as performance-monitoring tools. They’re particularly helpful if your system is slowing to a crawl and you’re trying to figure out why. However, you shouldn’t get too distracted by performance; trying to optimize a system that’s already working correctly is often a waste of time. Instead, concentrate on understanding what the tools actually measure, and you’ll gain great insight into how the kernel works.

本章將深入介紹進程、內核和系統資源之間的關係。硬件資源主要有三種：CPU、內存和I/O。

進程爭奪這些資源，而內核的工作是公平地分配資源。

內核本身也是一種資源，進程可以使用它來執行任務，如創建新進程和與其他進程通信。本章中的許多工具通常被視為性能監控工具。

如果您的系統變得緩慢，您可以使用這些工具來找出原因。

然而，不要過於關注性能；試圖優化一個已經正常工作的系統通常是浪費時間。

相反，應該集中精力理解這些工具實際測量的內容，從而深入瞭解內核的工作原理。

8.1 Tracking Processes（追蹤進程）

You learned how to use ps in 2.16 Listing and Manipulating Processes to list processes running on your system at a particular time. The ps command lists current processes, but it does little to tell you how processes change over time. Therefore, it won’t really help you to determine which process is using too much CPU time or memory.

您已經學會了如何使用ps命令在2.16節“列出和操作進程”中列出系統上運行的進程。

ps命令列出當前進程，但它很少告訴您進程如何隨時間變化。

因此，它無法真正幫助您確定哪個進程使用了過多的CPU時間或內存。

The top program is often more useful than ps because it displays the current system status as well as many of the fields in a ps listing, and it updates the display every second. Perhaps most important is that top shows the most active processes (that is, those currently taking up the most CPU time) at the top of its display

與ps相比，top程序通常更有用，因為它顯示當前系統狀態以及ps列表中的許多字段，並且每秒更新一次顯示。最重要的是，top顯示最活躍的進程（即當前佔用最多CPU時間的進程）在其顯示的頂部。

You can send commands to top with keystrokes. These are some of the most important commands:

您可以使用按鍵向top發送命令。

以下是一些最重要的命令：

Two other utilities for Linux, similar to top, offer an enhanced set of views and features: atop and htop. Most of the extra features are available from other utilities. For example, htop has many of abilities of the lsof command described in the next section.

與 top 類似，Linux 上的另外兩個實用程序提供了一套增強的視圖和功能：atop 和 htop。

大多數額外的功能都可以從其他工具中獲得。

例如，htop 擁有下一節所述的 lsof 命令的許多功能。

8.2 Finding Open Files with lsof（用 lsof 查找打開的文件）

The lsof command lists open files and the processes using them. Because Unix places a lot of emphasis on files, lsof is among the most useful tools for finding trouble spots. But lsof doesn’t stop at regular files— it can list network resources, dynamic libraries, pipes, and more.

lsof 命令列出打開的文件和使用這些文件的進程。

由於 Unix 非常重視文件，因此 lsof 是查找故障點最有用的工具之一。

但 lsof 並不侷限於普通文件，它還能列出網絡資源、動態庫、管道等。

8.2.1 Reading the lsof Output（讀取 lsof 輸出）

Running lsof on the command line usually produces a tremendous amount of output. Below is a fragment of what you might see. This output includes open files from the init process as well as a running vi process:

在命令行上運行 lsof 通常會產生大量輸出。

下面是你可能看到的一個片段。

該輸出包括來自初始進程和正在運行的 vi 進程的打開文件：

$ lsof
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
init 1 root cwd DIR 8,1 4096 2 /
init 1 root rtd DIR 8,1 4096 2 /
init 1 root mem REG 8, 47040 9705817 /lib/i386-linuxgnu/libnss_files-2.15.so
init 1 root mem REG 8,1 42652 9705821 /lib/i386-linuxgnu/libnss_nis-2.15.so
init 1 root mem REG 8,1 92016 9705833 /lib/i386-linuxgnu/libnsl-2.15.so
--snip--
vi 22728 juser cwd DIR 8,1 4096 14945078 /home/juser/w/c
vi 22728 juser 4u REG 8,1 1288 1056519 /home/juser/w/c/f
--snip--

The output shows the following fields (listed in the top row):

輸出顯示了以下字段（按照頂部行的順序列出）：

o COMMAND. The command name for the process that holds the file descriptor.
o PID. The process ID.
o USER. The user running the process.
o FD. This field can contain two kinds of elements. In the output above, the FD column shows the purpose of the file. The FD field can also list the file descriptor of the open file—a number that a process uses together with the system libraries and kernel to identify and manipulate a file.
o TYPE. The file type (regular file, directory, socket, and so on).
o DEVICE. The major and minor number of the device that holds the file.
o SIZE. The file’s size.
o NODE. The file’s inode number.
o NAME. The filename.

o COMMAND：持有文件描述符的進程的命令名稱。
o PID：進程ID。
o USER：運行該進程的用户。
o FD：該字段可以包含兩種類型的元素。在上面的輸出中，FD列顯示了文件的用途。FD字段還可以列出打開文件的文件描述符，這是一個進程與系統庫和內核一起使用的數字，用於標識和操作文件。
o TYPE：文件類型（普通文件、目錄、套接字等）。
o DEVICE：持有文件的設備的主要和次要編號。
o SIZE：文件的大小。
o NODE：文件的inode號。
o NAME：文件名。

The lsof(1) manual page contains a full list of what you might see for each field, but you should be able to figure out what you’re looking at just by looking at the output. For example, look at the entries with cwd in the FD field as highlighted in bold. These lines indicate the current working directories of the processes. Another example is the very last line, which shows a file that the user is currently editing with vi

lsof(1)手冊頁包含了每個字段可能出現的完整列表，但是通過查看輸出，您應該能夠弄清楚您正在查看什麼。

例如，查看FD字段中以cwd加粗顯示的條目。

這些行指示了進程的當前工作目錄。

另一個例子是最後一行，顯示了用户當前正在使用vi編輯的文件。

輸出顯示了以下字段（按照頂部行的順序列出）：

lsof(1)手冊頁包含了每個字段可能出現的完整列表，但是通過查看輸出，您應該能夠弄清楚您正在查看什麼。例如，查看FD字段中以cwd加粗顯示的條目。

這些行指示了進程的當前工作目錄。另一個例子是最後一行，顯示了用户當前正在使用vi編輯的文件。

8.2.2 Using lsof（使用 lsof）

There are two basic approaches to running lsof:

運行lsof有兩種基本方法：

o List everything and pipe the output to a command like less, and then search for what you’re looking for. This can take a while due to the amount of output generated.
o Narrow down the list that lsof provides with command-line options.
You can use command-line options to provide a filename as an argument and have lsof list only the entries that match the argument. For example, the following command displays entries for open files in /usr:

列出所有內容並將輸出導入到類似less的命令中，然後搜索你要查找的內容。
由於生成的輸出量很大，這可能需要一些時間。
使用命令行選項縮小lsof提供的列表。
你可以使用命令行選項提供一個文件名作為參數，並讓lsof只列出與該參數匹配的條目。例如，下面的命令會顯示/usr目錄中打開文件的條目。

$ lsof /usr

To list the open files for a particular process ID, run:

要列出特定進程 ID 的打開文件，請運行

$ lsof -p pid

For a brief summary of lsof’s many options, run lsof -h. Most options pertain to the output format. (See Chapter 10 for a discussion of the lsof network features.)

要了解lsof的許多選項的簡要概述，請運行lsof -h。大多數選項與輸出格式有關。

（有關lsof網絡功能的討論，請參見第10章。）

NOTE lsof is highly dependent on kernel information. If you upgrade your kernel and you’re not routinely updating everything, you might need to upgrade lsof. In addition, if you perform a distribution update to both the kernel and lsof, the updated lsof might not work until you reboot with the new kernel.

注意：lsof高度依賴於內核信息。

如果您升級了內核，而且您沒有定期更新所有內容，您可能需要升級lsof。

此外，如果您同時對內核和lsof進行了發行版更新，則更新後的lsof可能在您使用新內核重新啓動之前無法正常工作。

8.3 Tracing Program Execution and System Calls（追蹤程序執行和系統調用）

The tools we’ve seen so far examine active processes. However, if you have no idea why a program dies almost immediately after starting up, even lsof won’t help you. In fact, you’d have a difficult time even running lsof concurrently with a failed command.

到目前為止，我們看到的工具都是用於檢查活動進程的。

然而，如果您不知道為什麼一個程序在啓動後幾乎立即崩潰，即使是lsof也無法幫助您。

實際上，您甚至很難在命令失敗的同時運行lsof。

The strace (system call trace) and ltrace (library trace) commands can help you discover what a program attempts to do. These tools produce extraordinarily large amounts of output, but once you know what to look for, you’ll have more tools at your disposal for tracking down problems.

strace（系統調用跟蹤）和 ltrace（庫跟蹤）命令可以幫助您發現程序試圖做什麼。

這些工具產生了非常大量的輸出，但是一旦您知道要尋找什麼，您將擁有更多的工具來追蹤問題。

8.3.1 strace

Recall that a system call is a privileged operation that a user-space process asks the kernel to perform, such as opening and reading data from a file. The strace utility prints all the system calls that a process makes. To see it in action, run this command:

請回憶一下，系統調用是用户空間進程向內核請求執行的特權操作，例如打開和讀取文件中的數據。strace實用程序打印出進程所進行的所有系統調用。

要看它的實際效果，請運行以下命令：

$ strace cat /dev/null

In Chapter 1, you learned that when one process wants to start another process, it invokes the fork() system call to spawn a copy of itself, and then the copy uses a member of the exec() family of system calls to start running a new program. The strace command begins working on the new process (the copy of the original process) just after the fork() call. Therefore, the first lines of the output from this command should show execve() in action, followed by a memory initialization call, brk(), as follows:

在第 1 章中，我們瞭解到當一個進程想要啓動另一個進程時，它會調用 fork() 系統調用來生成一個自身的副本，然後副本使用 exec() 系列系統調用的一個成員來開始運行一個新程序。

就在 fork() 調用之後，strace 命令開始在新進程（原始進程的副本）上運行。

因此，該命令輸出的第一行應顯示 execve() 正在運行，隨後是內存初始化調用 brk()，如下所示：

execve("/bin/cat", ["cat", "/dev/null"], [/* 58 vars */]) = 0
brk(0) = 0x9b65000

The next part of the output deals primarily with loading shared libraries. You can ignore this unless you really want to know what the shared library system does.

輸出的下一部分主要涉及加載共享庫。

除非你真的想知道共享庫系統是做什麼的，否則可以忽略這部分內容。

access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or 
directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0) = 0xb77b5000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or 
directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
--snip--
open("/lib/libc.so.6", O_RDONLY) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\200^\1"..., 
1024)= 1024

In addition, skip past the mmap output until you get to the lines that look like this:

此外，跳過 mmap 輸出，直到看到類似這樣的行：

fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 6), ...}) = 0
open("/dev/null", O_RDONLY|O_LARGEFILE) = 3
fstat64(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
fadvise64_64(3, 0, 0, POSIX_FADV_SEQUENTIAL)= 0
read(3,"", 32768) = 0
close(3) = 0
close(1) = 0
close(2) = 0
exit_group(0) = ?

This part of the output shows the command at work. First, look at the open() call, which opens a file. The 3 is a result that means success (3 is the file descriptor that the kernel returns after opening the file). Below that, you see where cat reads from /dev/null (the read() call, which also has 3 as the file descriptor). Then there’s nothing more to read, so the program closes the file descriptor and exits with exit_group().

這部分輸出顯示了命令的運行情況。

首先看打開文件的 open() 調用。

3 代表成功的結果（3 是打開文件後內核返回的文件描述符）。

下面是 cat 從 /dev/null 讀取的內容（read()調用，文件描述符也是 3）。

然後就沒什麼可讀取的了，所以程序關閉了文件描述符，並通過 exit_group() 退出。

What happens when there’s a problem? Try strace cat not_a_file instead and examine the open() call in the resulting output:

出現問題時會怎樣？

試試 strace cat not_a_file，然後檢查輸出結果中的 open() 調用：

open("not_a_file", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or 
directory)

Because open() couldn’t open the file, it returned -1 to signal an error. You can see that strace reports the exact error and gives you a small description of the error.

由於 open() 無法打開文件，它返回了-1 表示出錯。

你可以看到，strace 報告了確切的錯誤，並給出了錯誤的一小段描述。

Missing files are the most common problems with Unix programs, so if the system log and other log information aren’t very helpful and you have nowhere else to turn, strace can be of great use. You can even use it on daemons that detach themselves. For example:

文件丟失是 Unix 程序最常見的問題，因此如果系統日誌和其他日誌信息幫不上什麼忙，而你又無處求助，strace 就能派上大用場。

你甚至可以把它用在自行分離的守護進程上。例如

$ strace -o crummyd_strace -ff crummyd

In this example, the -o option to strace logs the action of any child process that crummyd spawns into crummyd_strace.pid, where pid is the process ID of the child process.

在這個例子中，strace命令的-o選項將crummyd生成的任何子進程的操作記錄到crummyd_strace.pid文件中，其中pid是子進程的進程ID。

8.3.2 ltrace（追蹤）

The ltrace command tracks shared library calls. The output is similar to that of strace, which is why we’re mentioning it here, but it doesn’t track anything at the kernel level. Be warned that there are many more shared library calls than system calls. You’ll definitely need to filter the output, and ltrace itself has many built-in options to assist you.

ltrace命令用於跟蹤共享庫調用。

輸出與strace類似，這也是為什麼我們在這裏提到它的原因，但它不會跟蹤內核級別的任何內容。

請注意，共享庫調用比系統調用要多得多。

您肯定需要過濾輸出，並且ltrace本身有許多內置選項可幫助您。

NOTE See 15.1.4 Shared Libraries for more on shared libraries. The ltrace command doesn’t work on statically linked binaries.

注意有關共享庫的更多信息，請參閲 15.1.4 共享庫。

ltrace 命令不適用於靜態鏈接的二進制文件。

8.4 Threads（線程）

In Linux, some processes are divided into pieces called threads. A thread is very similar to a process—it has an identifier (TID, or thread ID), and the kernel schedules and runs threads just like processes. However, unlike separate processes, which usually do not share system resources such as memory and I/O connections with other processes, all threads inside a single process share their system resources and some memory.

在Linux中，一些進程被劃分為稱為線程的片段。

線程與進程非常相似——它有一個標識符（TID，或線程ID），內核會像調度和運行進程一樣調度和運行線程。

然而，與通常不與其他進程共享系統資源（如內存和I/O連接）的獨立進程不同，單個進程內的所有線程共享其系統資源和一些內存。

8.4.1 Single-Threaded and Multithreaded Processes（單線程和多線程進程）

Many processes have only one thread. A process with one thread is single-threaded, and a process with more than one thread is multithreaded. All processes start out single-threaded. This starting thread is usually called the main thread. The main thread may then start new threads in order for the process to become multithreaded, similar to the way a process can call fork() to start a new process.

許多進程只有一個線程。只有一個線程的進程被稱為單線程進程，而有多個線程的進程被稱為多線程進程。

所有進程最初都是單線程的。這個起始線程通常被稱為主線程。

然後，主線程可以啓動新線程，使進程變為多線程，類似於進程可以調用fork()來啓動一個新進程。

NOTE It’s rare to refer to threads at all when a process is single-threaded. This book will not mention threads unless multithreaded processes make a difference in what you see or experience.

注意當進程是單線程的時候，很少提到線程。

除非多線程進程會對你所見或體驗的內容產生影響，本書不會提到線程。

The primary advantage of a multithreaded process is that when the process has a lot to do, threads can run simultaneously on multiple processors, potentially speeding up computation. Although you can also achieve simultaneous computation with multiple processes, threads start faster than processes, and it is often easier and/or more efficient for threads to intercommunicate using their shared memory than it is for processes to communicate over a channel such as a network connection or a pipe.

多線程進程的主要優勢在於，當進程有很多事情要做時，線程可以在多個處理器上同時運行，從而可能加快計算速度。

雖然你也可以通過多個進程實現同時計算，但是線程比進程啓動更快，而且線程使用共享內存進行相互通信通常更容易和/或更高效，而進程之間的通信則需要使用網絡連接或管道等通道。

Some programs use threads to overcome problems managing multiple I/O resources. Traditionally, a process would sometimes use fork() to start a new subprocess in order to deal with a new input or output stream. Threads offer a similar mechanism without the overhead of starting a new process.

一些程序使用線程來解決管理多個I/O資源的問題。

傳統上，一個進程有時會使用fork()來啓動一個新的子進程，以處理新的輸入或輸出流。

線程提供了一種類似的機制，但不需要啓動一個新進程的開銷。

8.4.2 Viewing Threads（查看主題）

By default, the output from the ps and top commands shows only processes. To display the thread information in ps, add the m option. Here is some sample output:

默認情況下，ps 和 top 命令的輸出只顯示進程。要在 ps 中顯示線程信息，請添加 m 選項。下面是一些輸出示例：

Example 8-1. Viewing threads with ps m

例 8-1. 使用 ps m 查看線程

$ ps m
 PID TTY STAT TIME COMMAND
3587 pts/3 - 0:00 bash➊
 - - Ss 0:00 -
3592 pts/4 - 0:00 bash➋
 - - Ss 0:00 -
12287 pts/8 - 0:54 /usr/bin/python /usr/bin/gm-notify➌
 - - SL1 0:48 -
 - - SL1 0:00 -
 - - SL1 0:06 -
 - - SL1 0:00 -

Example 8-1 shows processes along with threads. Each line with a number in the PID column (at ➊, ➋, and ➌) represents a process, as in the normal ps output. The lines with the dashes in the PID column represent the threads associated with the process. In this output, the processes at ➊ and ➋ have only one thread each, but process 12287 at ➌ is multithreaded with four threads.

例 8-1 顯示了進程和線程。

PID 列（➊、➋ 和 ➌）中帶有數字的每一行代表一個進程，與正常的 ps 輸出一樣。

PID 列中的破折號線代表與進程相關的線程。

在此輸出中，➊ 和 ➋ 處的進程各有一個線程，但 ➌ 處的進程 12287 是多線程的，有四個線程。

If you would like to view the thread IDs with ps, you can use a custom output format. This example shows only the process IDs, thread IDs, and command:

如果想用 ps 查看線程 ID，可以使用自定義輸出格式。本例只顯示了進程 ID、線程 ID 和命令：

Example 8-2. Showing process IDs and thread IDs with ps m

例 8-2. 用 ps m 顯示進程 ID 和線程 ID

$ ps m -o pid,tid,command
 PID TID COMMAND
3587 - bash
 - 3587 -
3592 - bash
 - 3592 -
12287 - /usr/bin/python /usr/bin/gm-notify
 - 12287 -
 - 12288 -
 - 12289 -
 - 12295 -

The sample output in Example 8-2 corresponds to the threads shown in Example 8-1. Notice that the thread IDs of the single-threaded processes are identical to the process IDs; this is the main thread. For the multithreaded process 12287, thread 12287 is also the main thread.

在示例8-2中的示例輸出對應於示例8-1中顯示的線程。

請注意，單線程進程的線程ID與進程ID相同，這是主線程。

對於多線程進程12287，線程12287也是主線程。

NOTE Normally, you won’t interact with individual threads as you would processes. You need to know a lot about how a multithreaded program was written in order to act on one thread at a time, and even then, doing so might not be a good idea.

注意
通常情況下，您不會像處理進程一樣與單個線程進行交互。

要逐個線程進行操作，您需要了解有關多線程程序的許多信息，即使這樣做可能不是一個好主意。

Threads can confuse things when it comes to resource monitoring because individual threads in a multithreaded process can consume resources simultaneously. For example, top doesn’t show threads by default; you’ll need to press H to turn it on. For most of the resource monitoring tools that you’re about to see, you’ll have to do a little extra work to turn on the thread display.

線程在資源監控方面可能會引起混淆，因為多線程進程中的各個線程可以同時消耗資源。

例如，默認情況下，top不顯示線程；您需要按下H鍵來打開線程顯示。

對於即將看到的大多數資源監控工具，您需要做一些額外的工作來打開線程顯示。

8.5 Introduction to Resource Monitoring（資源監測簡介）

Now we’ll discuss some topics in resource monitoring, including processor (CPU) time, memory, and disk I/O. We’ll examine utilization on a systemwide scale, as well as on a per-process basis.

現在，我們將討論資源監控中的一些主題，包括處理器（CPU）時間、內存和磁盤 I/O。

我們將檢查整個系統和每個進程的利用率。

Many people touch the inner workings of the Linux kernel in the interest of improving performance. However, most Linux systems perform well under a distribution’s default settings, and you can spend days trying to tune your machine’s performance without meaningful results, especially if you don’t know what to look for. So rather than think about performance as you experiment with the tools in this chapter, think about seeing the kernel in action as it divides resources among processes.

為了提高性能，很多人都會接觸 Linux 內核的內部工作原理。

然而，大多數 Linux 系統在發行版的默認設置下性能良好，你可能要花費數天時間來調整機器的性能，卻得不到有意義的結果，尤其是如果你不知道要注意什麼的話。

因此，在使用本章中的工具進行實驗時，與其考慮性能，不如看看內核在進程間分配資源時的運行情況。

8.6 Measuring CPU Time（測量 CPU 時間）

To monitor one or more specific processes over time, use the -p option to top, with this syntax:

要在一段時間內監控一個或多個特定進程，請使用 top 的 -p 選項，語法如下：

$ top -p pid1 [-p pid2 ...]

To find out how much CPU time a command uses during its lifetime, use time. Most shells have a built-in time command that doesn’t provide extensive statistics, so you’ll probably need to run /usr/bin/time. For example, to measure the CPU time used by ls, run

要想知道一條命令在其生命週期內佔用了多少 CPU 時間，可以使用 time。

大多數 shell 都有一個內置的 time 命令，但並不提供大量的統計數據，所以你可能需要運行 /usr/bin/time。

例如，要測量 ls 佔用的 CPU 時間，運行

$ /usr/bin/time ls

After ls terminates, time should print output like that below. The key fields are in boldface:

ls 終止後，time 打印輸出應如下所示。關鍵字段用粗體表示：

0.05user 0.09system 0:00.44elapsed 31%CPU (0avgtext+0avgdata 
0maxresident)k
0inputs+0outputs (125major+51minor)pagefaults 0swaps

o User time. The number of seconds that the CPU has spent running the program’s own code. On modern processors, some commands run so quickly, and therefore the CPU time is so low, that time rounds down to zero.

用户時間。CPU花費在運行程序自身代碼的秒數。

在現代處理器上，某些命令運行得非常快，因此CPU時間非常低，時間會被四捨五入為零。

o System time. How much time the kernel spends doing the process’s work (for example, reading files and directories).

系統時間。內核花費在執行進程工作的時間（例如，讀取文件和目錄）。

o Elapsed time. The total time it took to run the process from start to finish, including the time that the CPU spent doing other tasks. This number is normally not very useful for performance measurement, but subtracting the user and system time from elapsed time can give you a general idea of how long a process spends waiting for system resources.

The remainder of the output primarily details memory and I/O usage. You’ll learn more about the page fault output in 8.9 Memory.

經過的時間。從開始到結束運行進程所花費的總時間，包括CPU花費在其他任務上的時間。

這個數字通常對性能測量沒有太大用處，但從經過的時間中減去用户時間和系統時間可以讓你大致瞭解進程等待系統資源的時間。

輸出的其餘部分主要詳細説明了內存和I/O使用情況。

你將在8.9內存中瞭解更多關於頁面錯誤輸出的內容。

8.7 Adjusting Process Priorities（調整流程優先級）

You can change the way the kernel schedules a process in order to give the process more or less CPU time than other processes. The kernel runs each process according to its scheduling priority, which is a number between –20 and 20, with –20 being the foremost priority. (Yes, this can be confusing.)

您可以改變內核調度進程的方式，使該進程獲得比其他進程更多或更少的 CPU 時間。

內核會根據每個進程的調度優先級來運行進程，調度優先級是一個介於 -20 和 20 之間的數字，其中 -20 的優先級最高。

(是的，這可能會引起混淆）。

The ps -l command lists the current priority of a process, but it’s a little easier to see the priorities in action with the top command, as shown here:

ps -l 命令會列出進程的當前優先級，但使用top命令更容易查看優先級，如圖所示：

$ top
Tasks: 244 total, 2 running, 242 sleeping, 0 stopped, 0 zombie
Cpu(s): 31.7%us, 2.8%sy, 0.0%ni, 65.4%id, 0.2%wa, 0.0%hi, 0.0%si, 
0.0%st
Mem: 6137216k total, 5583560k used, 553656k free, 72008k buffers
Swap: 4135932k total, 694192k used, 3441740k free, 767640k cached
 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28883 bri 20 0 1280m 763m 32m S 58 12.7 213:00.65 chromium-
browse
1175 root 20 0 210m 43m 28m R 44 0.7 14292:35 Xorg
4022 bri 20 0 413m 201m 28m S 29 3.4 3640:13 chromiumbrowse
4029 bri 20 0 378m 206m 19m S 2 3.5 32:50.86 chromiumbrowse
3971 bri 20 0 881m 359m 32m S 2 6.0 563:06.88 chromiumbrowse
5378 bri 20 0 152m 10m 7064 S 1 0.2 24:30.21 compiz
3821 bri 20 0 312m 37m 14m S 0 0.6 29:25.57 soffice.bin
4117 bri 20 0 321m 105m 18m S 0 1.8 34:55.01 chromiumbrowse
4138 bri 20 0 331m 99m 21m S 0 1.7 121:44.19 chromiumbrowse
4274 bri 20 0 232m 60m 13m S 0 1.0 37:33.78 chromiumbrowse
4267 bri 20 0 1102m 844m 11m S 0 14.1 29:59.27 chromiumbrowse
2327 bri 20 0 301m 43m 16m S 0 0.7 109:55.65 unity-2d-shell

In the top output above, the PR (priority) column lists the kernel’s current schedule priority for the process. The higher the number, the less likely the kernel is to schedule the process if others need CPU time. The schedule priority alone does not determine the kernel’s decision to give CPU time to a process, and it changes frequently during program execution according to the amount of CPU time that the process consumes.

在上面的輸出中，PR（優先級）列顯示了內核對進程的當前調度優先級。

數字越高，如果其他進程需要CPU時間，內核調度該進程的可能性就越小。

調度優先級本身並不能決定內核是否將CPU時間分配給進程，並且根據進程消耗的CPU時間，在程序執行過程中頻繁變化。

Next to the priority column is the nice value (NI) column, which gives a hint to the kernel’s scheduler. This is what you care about when trying to influence the kernel’s decision. The kernel adds the nice value to the current priority to determine the next time slot for the process.

在優先級列旁邊是nice值（NI）列，它向內核的調度器提供了一個提示。

當您想要影響內核的決策時，這是您關心的內容。

內核將nice值添加到當前優先級，以確定進程的下一個時間片。

By default, the nice value is 0. Now, say you’re running a big computation in the background that you don’t want to bog down your interactive session. To have that process take a backseat to other processes and run only when the other tasks have nothing to do, you could change the nice value to 20 with the renice command (where pid is the process ID of the process that you want to change):

默認情況下，nice值為0。現在，假設您在後台運行一個大型計算任務，您不希望它影響您的交互會話。

為了讓該進程在其他任務沒有任務時才運行，並且讓其他進程有更高的優先級，您可以使用renice命令將nice值更改為20（其中pid是您想要更改的進程的進程ID）：

$ renice 20 pid

If you’re the superuser, you can set the nice value to a negative number, but doing so is almost always a bad idea because system processes may not get enough CPU time. In fact, you probably won’t need to alter nice values much because many Linux systems have only a single user, and that user does not perform much real computation. (The nice value was much more important back when there were many users on a single machine.)

如果你是超級用户，可以將 nice 值設置為負數，但這樣做幾乎總是個壞主意，因為系統進程可能得不到足夠的 CPU 時間。

事實上，你可能並不需要過多修改 nice 值，因為許多 Linux 系統只有一個用户，而且該用户並不執行很多實際計算。

(在一台機器上有很多用户的時候，nice 值要重要得多）。

8.8 Load Averages（負載平均值）

CPU performance is one of the easier metrics to measure. The load average is the average number of processes currently ready to run. That is, it is an estimate of the number of processes that are capable of using the CPU at any given time. When thinking about a load average, keep in mind that most processes on your system are usually waiting for input (from the keyboard, mouse, or network, for example), meaning that most processes are not ready to run and should contribute nothing to the load average. Only processes that are actually doing something affect the load average.

CPU 性能是比較容易衡量的指標之一。

平均負載是指當前準備運行的進程的平均數量。

也就是説，它是對任何給定時間內能夠使用 CPU 的進程數量的估計。

在考慮平均負載時，請記住系統中的大多數進程通常都在等待輸入（例如來自鍵盤、鼠標或網絡的輸入），這意味着大多數進程都沒有準備好運行，因此不會對平均負載產生任何影響。

只有實際運行的進程才會影響平均負載。

8.8.1 Using uptime（使用 uptime）

The uptime command tells you three load averages in addition to how long the kernel has been running:

除了內核運行的時間外，uptime 命令還能告訴你三個負載平均值：

$ uptime
... up 91 days, ... load average: 0.08, 0.03, 0.01

The three bolded numbers are the load averages for the past 1 minute, 5 minutes, and 15 minutes, respectively. As you can see, this system isn’t very busy: An average of only 0.01 processes have been running across all processors for the past 15 minutes. In other words, if you had just one processor, it was only running userspace applications for 1 percent of the last 15 minutes. (Traditionally, most desktop systems would exhibit a load average of about 0 when you were doing anything except compiling a program or playing a game. A load average of 0 is usually a good sign, because it means that your processor isn’t being challenged and you’re saving power.)

三個加粗的數字分別是過去1分鐘、5分鐘和15分鐘的平均負載。

正如你所見，這個系統並不是很忙：過去15分鐘內，所有處理器上平均只有0.01個進程在運行。

換句話説，如果你只有一個處理器，在過去的15分鐘內，它只有1%的時間在運行用户空間應用程序。

（傳統上，除了編譯程序或玩遊戲之外，大多數桌面系統的負載平均值約為0。

負載平均值為0通常是一個好的跡象，因為這意味着你的處理器沒有受到挑戰，同時也節省了能量。）

NOTE User interface components on current desktop systems tend to occupy more of the CPU than those in the past. For example, on Linux systems, a web browser’s Flash plugin can be a particularly notorious resource hog, and Flash applications can easily occupy much of a system’s CPU and memory due to poor all-around implementation.

注意：當前桌面系統上的用户界面組件往往佔用的CPU資源比過去多。

例如，在Linux系統上，Web瀏覽器的Flash插件可能是一個特別臭名昭著的資源佔用者，由於實現不佳，Flash應用程序很容易佔用系統的大部分CPU和內存。

If a load average goes up to around 1, a single process is probably using the CPU nearly all of the time. To identify that process, use the top command; the process will usually rise to the the top of the display.

如果平均負載上升到 1 左右，則可能是一個進程幾乎一直在使用 CPU。

要識別該進程，請使用 top 命令；該進程通常會出現在顯示屏的頂部。

Most modern systems have more than one processor core or CPU, so multiple processes can easily run simultaneously. If you have two cores, a load average of 1 means that only one of the cores is likely active at any given time, and a load average of 2 means that both cores have just enough to do all of the time.

如果負載平均值升至約為1，一個進程可能幾乎一直在使用CPU。

要識別該進程，可以使用top命令；該進程通常會升至顯示屏的頂部。

大多數現代系統都有多個處理器核心或CPU，因此多個進程可以輕鬆同時運行。

如果你有兩個核心，負載平均值為1意味着任何給定時間只有一個核心處於活動狀態，負載平均值為2意味着兩個核心一直有足夠的工作量。

8.8.2 High Loads（高負荷）

A high load average does not necessarily mean that your system is having trouble. A system with enough memory and I/O resources can easily handle many running processes. If your load average is high and your system still responds well, don’t panic: The system just has a lot of processes sharing the CPU. The processes have to compete with each other for processor time, and as a result they’ll take longer to perform their computations than they would if they were each allowed to use the CPU all of the time. Another case where you might see a high load average as normal is a web server, where processes can start and terminate so quickly that the load average measurement mechanism can’t function effectively.

一個高負載平均值並不一定意味着您的系統出現了問題。

具有足夠內存和I/O資源的系統可以輕鬆處理許多運行中的進程。

如果您的負載平均值很高，但系統仍然響應良好，不要驚慌：系統只是有很多進程共享CPU。

這些進程必須相互競爭處理器時間，因此它們執行計算的時間比如果它們每個都被允許始終使用CPU要長。

另一個可能正常情況下看到高負載平均值的情況是Web服務器，在這種情況下，進程可以快速啓動和終止，以至於負載平均值測量機制無法有效運作。

However, if you sense that the system is slow and the load average is high, you might be running into memory performance problems. When the system is low on memory, the kernel can start to thrash, or rapidly swap memory for processes to and from the disk. When this happens, many processes will become ready to run, but their memory might not be available, so they will remain in the ready-to-run state (and contribute to the load average) for much longer than they normally would.

然而，如果您感覺系統變慢，負載平均值很高，那麼可能是內存性能問題。

當系統內存不足時，內核可能會開始頻繁地將內存與進程之間進行交換。

當這種情況發生時，許多進程將準備好運行，但它們的內存可能不可用，因此它們將比通常更長時間保持在準備運行狀態（並對負載平均值做出貢獻）。

We’ll now look at memory in much more detail.

現在我們將更詳細地討論內存。

8.9 Memory（內存）

One of the simplest ways to check your system’s memory status as a whole is to run the free command or view /proc/meminfo to see how much real memory is being used for caches and buffers. As we’ve just mentioned, performance problems can arise from memory shortages. If there isn’t much cache/buffer memory being used (and the rest of the real memory is taken), you may need more memory. However, it’s too easy to blame a shortage of memory for every performance problem on your machine.

檢查系統內存狀態的最簡單方法之一是運行free命令或查看/proc/meminfo，以查看用於緩存和緩衝區的實際內存使用量。

正如我們剛才提到的，內存不足可能導致性能問題。

如果沒有使用很多緩存/緩衝區內存（而其餘的實際內存已被佔用），您可能需要更多的內存。

然而，很容易將內存不足歸咎於機器上的每個性能問題。

8.9.1 How Memory Works（內存的工作原理）

Recall from Chapter 1 that the CPU has a memory management unit (MMU) that translates the virtual memory addresses used by processes into real ones. The kernel assists the MMU by breaking the memory used by processes into smaller chunks called pages. The kernel maintains a data structure, called a page table, that contains a mapping of a processes’ virtual page addresses to real page addresses in memory. As a process accesses memory, the MMU translates the virtual addresses used by the process into real addresses based on the kernel’s page table.

回顧第1章中的內容，CPU具有一個內存管理單元（MMU），用於將進程使用的虛擬內存地址轉換為實際地址。

內核通過將進程使用的內存分割成稱為頁的較小塊來幫助MMU。

內核維護一個稱為頁表的數據結構，其中包含進程的虛擬頁地址與內存中的實際頁地址之間的映射。

當進程訪問內存時，MMU根據內核的頁表將進程使用的虛擬地址轉換為實際地址。

A user process does not actually need all of its pages to be immediately available in order to run. The kernel generally loads and allocates pages as a process needs them; this system is known as on-demand paging or just demand paging. To see how this works, consider how a program starts and runs as a new process:

用户進程實際上並不需要立即可用的所有頁來運行。

內核通常在進程需要它們時加載和分配頁；這個系統被稱為按需分頁或只是需求分頁。

為了瞭解這是如何工作的，請考慮程序作為新進程啓動和運行的方式：

The kernel loads the beginning of the program’s instruction code into memory pages.
The kernel may allocate some working-memory pages to the new process.
As the process runs, it might reach a point where the next instruction in its code isn’t in any of the pages that the kernel initially loaded. At this point, the kernel takes over, loads the necessary pages into memory, and then lets the program resume execution.
Similarly, if the program requires more working memory than was initially allocated, the kernel handles it by finding free pages (or by making room) and assigning them to the process
內核將程序的指令代碼的開頭加載到內存頁中。
內核可能為新進程分配一些工作內存頁。
當進程運行時，它可能達到一個點，其中它的代碼中的下一條指令不在內核最初加載的任何頁中。此時，內核接管，將所需的頁加載到內存中，然後讓程序繼續執行。
類似地，如果程序需要的工作內存超過了最初分配的內存，內核通過找到空閒頁（或騰出空間）並將其分配給進程來處理。

8.9.2 Page Faults

If a memory page is not ready when a process wants to use it, the process triggers a page fault. In the event of a page fault, the kernel takes control of the CPU from the process in order to get the page ready. There are two kinds of page faults: minor and major.

如果一個進程想要使用的內存頁尚未準備好，那麼該進程將觸發一個頁錯誤。

在發生頁錯誤時，內核從進程那裏接管CPU，以準備好該頁。

有兩種類型的頁錯誤：次要頁錯誤和主要頁錯誤。

Minor Page Faults（頁面小故障）

A minor page fault occurs when the desired page is actually in main memory but the MMU doesn’t know where it is. This can happen when the process requests more memory or when the MMU doesn’t have enough space to store all of the page locations for a process. In this case, the kernel tells the MMU about the page and permits the process to continue. Minor page faults aren’t such a big deal, and many occur as a process runs. Unless you need maximum performance from some memory-intensive program, you probably shouldn’t worry about them.

當所需的頁實際上在主存中，但MMU不知道它在哪裏時，發生次要頁錯誤。

這可能發生在進程請求更多內存時，或者當MMU沒有足夠的空間來存儲進程的所有頁位置時。

在這種情況下，內核告訴MMU有關該頁的信息，並允許進程繼續執行。次要頁錯誤並不是什麼大問題，很多次都會發生。

除非您需要從一些內存密集型程序中獲得最大的性能，否則您可能不需要擔心它們。

Major Page Faults（主要頁面故障）

A major page fault occurs when the desired memory page isn’t in main memory at all, which means that the kernel must load it from the disk or some other slow storage mechanism. A lot of major page faults will bog the system down because the kernel must do a substantial amount of work to provide the pages, robbing normal processes of their chance to run.

當所需的內存頁根本不在主存中時，發生主要頁錯誤，這意味着內核必須從磁盤或其他較慢的存儲機制中加載它。

大量的主要頁錯誤會拖慢系統，因為內核必須做大量的工作來提供頁，從而剝奪正常進程運行的機會。

Some major page faults are unavoidable, such as those that occur when you load the code from disk when running a program for the first time. The biggest problems happen when you start running out of memory and the kernel starts to swap pages of working memory out to the disk in order to make room for new pages.

一些主要頁錯誤是不可避免的，例如在首次運行程序時從磁盤加載代碼時發生的錯誤。

當您開始內存不足並且內核開始將工作內存的頁交換到磁盤以騰出空間來容納新的頁時，問題就變得更嚴重了。

Watching Page Faults（觀察網頁故障）

You can drill down to the page faults for individual processes with the ps, top, and time commands. The following command shows a simple example of how the time command displays page faults. (The output of the cal command doesn’t matter, so we’re discarding it by redirecting that to /dev/null.)

您可以使用ps、top和time命令來查看各個進程的頁面錯誤。下面的命令展示了time命令如何顯示頁面錯誤的一個簡單示例。（cal命令的輸出並不重要，我們通過將其重定向到/dev/null來丟棄它。）

$ /usr/bin/time cal > /dev/null
0.00user 0.00system 0:00.06elapsed 0%CPU (0avgtext+0avgdata 
3328maxresident)k
648inputs+0outputs (2major+254minor)pagefaults 0swaps

As you can see from the bolded text, when this program ran, there were 2 major page faults and 254 minor ones. The major page faults occurred when the kernel needed to load the program from the disk for the first time. If you ran the command again, you probably wouldn’t get any major page faults because the kernel would have cached the pages from the disk.

從加粗的文本中可以看出，當該程序運行時，發生了2個重要頁面錯誤和254個次要頁面錯誤。

重要頁面錯誤發生在內核首次從磁盤加載程序時。

如果再次運行該命令，可能不會發生重要頁面錯誤，因為內核已經將頁面緩存到了磁盤。

If you’d rather see the page faults of processes as they’re running, use top or ps. When running top, use f to change the displayed fields and u to display the number of major page faults. (The results will show up in a new, nFLT column. You won’t see the minor page faults.)

如果您希望在進程運行時查看頁面錯誤，請使用top或ps命令。

在運行top時，使用f來更改顯示的字段，使用u來顯示重要頁面錯誤的數量。

（結果將顯示在一個新的nFLT列中，您將看不到次要頁面錯誤。）

When using ps, you can use a custom output format to view the page faults for a particular process. Here’s an example for process ID 20365:

在使用ps時，您可以使用自定義的輸出格式來查看特定進程的頁面錯誤。以下是針對進程ID 20365的示例：

$ ps -o pid,min_flt,maj_flt 20365
 PID MINFL MAJFL
20365 834182 23

The MINFL and MAJFL columns show the numbers of minor and major page faults. Of course, you can combine this with any other process selection options, as described in the ps(1) manual page.

MINFL 和 MAJFL 列顯示次要和主要頁面故障的數量。

當然，您也可以將其與任何其他流程選擇選項相結合，詳見 ps(1) 手冊頁面。

Viewing page faults by process can help you zero in on certain problematic components. However, if you’re interested in your system performance as a whole, you need a tool to summarize CPU and memory action across all processes.

按進程查看頁面故障可以幫助你找到某些有問題的組件。

不過，如果你對系統的整體性能感興趣，就需要一個工具來彙總所有進程的 CPU 和內存運行情況。

8.10 Monitoring CPU and Memory Performance with vmstat（使用vmstat監控CPU和內存性能）

Among the many tools available to monitor system performance, the vmstat command is one of the oldest, with minimal overhead. You’ll find it handy for getting a high-level view of how often the kernel is swapping pages in and out, how busy the CPU is, and IO utilization.

在眾多可用於監控系統性能的工具中，vmstat命令是最古老且開銷最小的之一。

您會發現它非常方便，可以提供關於內核頁面交換頻率、CPU忙碌程度和IO利用率的整體視圖。

The trick to unlocking the power of vmstat is to understand its output. For example, here’s some output from vmstat 2, which reports statistics every 2 seconds:

解鎖vmstat的威力的關鍵在於理解其輸出。例如，這是使用vmstat 2命令每2秒報告一次統計數據的一些輸出：

$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----
cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 320416 3027696 198636 1072568 0 0 1 1 2 0 15 2 83 0
2 0 320416 3027288 198636 1072564 0 0 0 1182 407 636 1 0 99 0
1 0 320416 3026792 198640 1072572 0 0 0 58 281 537 1 0 99 0
0 0 320416 3024932 198648 1074924 0 0 0 308 318 541 0 0 99 1
0 0 320416 3024932 198648 1074968 0 0 0 0 208 416 0 0 99 0
0 0 320416 3026800 198648 1072616 0 0 0 0 207 389 0 0 100 0

The output falls into categories: procs for processes, memory for memory usage, swap for the pages pulled in and out of swap, io for disk usage, system for the number of times the kernel switches into kernel code, and cpu for the time used by different parts of the system

輸出分為幾個類別：procs代表進程，memory代表內存使用情況，swap代表從交換區中換入和換出的頁面，io代表磁盤使用情況，system代表內核切換到內核代碼的次數，cpu代表系統不同部分使用的時間。

The preceding output is typical for a system that isn’t doing much. You’ll usually start looking at the second line of output—the first one is an average for the entire uptime of the system. For example, here the system has 320416KB of memory swapped out to the disk (swpd) and around 3025000KB (3 GB) of real memory free. Even though some swap space is in use, the zero-valued si (swap-in) and so (swap-out) columns report that the kernel is not currently swapping anything in or out from the disk. The buff column indicates the amount of memory that the kernel is using for disk buffers (see 4.2.5 Disk Buffering, Caching, and Filesystems).

前面的輸出對於一個沒有做太多事情的系統來説是典型的。

通常你會從輸出的第二行開始查看，第一行是整個系統運行時間的平均值。例如，在這個例子中，系統將320416KB的內存交換到磁盤(swpd)，並且大約有3025000KB（3GB）的真實內存空閒。

儘管有一些交換空間在使用，但是零值的si（換入）和so（換出）列顯示內核當前沒有從磁盤中交換任何內容。

buff列表示內核用於磁盤緩衝區的內存量（參見4.2.5磁盤緩衝、緩存和文件系統）。

On the far right, under the CPU heading, you see the distribution of CPU time in the us, sy, id, and wa columns. These list (in order) the percentage of time that the CPU is spending on user tasks, system (kernel) tasks, idle time, and waiting for I/O. In the preceding example, there aren’t too many user processes running (they’re using a maximum of 1 percent of the CPU); the kernel is doing practically nothing, while the CPU is sitting around doing nothing 99 percent of the time.

在最右邊的CPU標題下，你可以看到CPU時間在us、sy、id和wa列中的分佈情況。

它們按順序列出了CPU在用户任務、系統（內核）任務、空閒時間和等待I/O上所花費的時間的百分比。

在前面的例子中，沒有太多用户進程在運行（它們最多使用1%的CPU）；內核幾乎沒有做任何事情，而CPU在99%的時間裏都閒置。

Now, watch what happens when a big program starts up sometime later (the first two lines occur right before the program runs):

現在，看看當一個大程序在稍後啓動時會發生什麼（前兩行發生在程序運行之前）：

Example 8-3. Memory activity

例子8-3.內存活動

procs -----------memory---------- ---swap-- -----io---- -system-- ----
cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 320412 2861252 198920 1106804 0 0 0 0 2477 4481 25 2 72 
0➊
1 0 320412 2861748 198924 1105624 0 0 0 40 2206 3966 26 2 72 0
1 0 320412 2860508 199320 1106504 0 0 210 18 2201 3904 26 2 71 1
1 1 320412 2817860 199332 1146052 0 0 19912 0 2446 4223 26 3 63 8
2 2 320284 2791608 200612 1157752 202 0 4960 854 3371 5714 27 3 51 
18➋
1 1 320252 2772076 201076 1166656 10 0 2142 1190 4188 7537 30 3 53 
14
0 3 320244 2727632 202104 1175420 20 0 1890 216 4631 8706 36 4 46 
14

As you can see at ➊ in Example 8-3, the CPU starts to see some usage for an extended period, especially from user processes. Because there is enough free memory, the amount of cache and buffer space used starts to increase as the kernel starts to use the disk more.

正如你在示例8-3中所看到的，CPU開始在一個較長的時間內出現一些使用情況，尤其是來自用户進程。

由於有足夠的空閒內存，緩存和緩衝區使用的量開始增加，因為內核開始更多地使用磁盤。

Later on, we see something interesting: Notice at ➋ that the kernel pulls some pages into memory that were once swapped out (the si column). This means that the program that just ran probably accessed some pages shared by another process. This is common; many processes use the code in certain shared libraries only when starting up.

稍後，我們看到一些有趣的現象：請注意在➋處，內核將一些曾經被交換出去的頁面調入內存（si列）。

這意味着剛剛運行的程序可能訪問了其他進程共享的某些頁面。

這是很常見的；許多進程只在啓動時使用某些共享庫中的代碼。

Also notice from the b column that a few processes are blocked (prevented from running) while waiting for memory pages. Overall, the amount of free memory is decreasing, but it’s nowhere near being depleted. There’s also a fair amount of disk activity, as seen by the increasing numbers in the bi (blocks in) and bo (blocks out) columns.

還請注意b列中有一些進程被阻塞（無法運行），因為它們在等待內存頁面。總體而言，空閒內存的數量在減少，但遠未耗盡。

同時，磁盤活動也相當頻繁，可以從bi（塊輸入）和bo（塊輸出）列中看出。

The output is quite different when you run out of memory. As the free space depletes, both the buffer and cache sizes decrease because the kernel increasingly needs the space for user processes. Once there is nothing left, you’ll start to see activity in the so (swapped out) column as the kernel starts moving pages onto the disk, at which point nearly all of the other output columns change to reflect the amount of work that the kernel is doing. You see more system time, more data going in and out of the disk, and more processes blocked because the memory they want to use is not available (it has been swapped out).

當內存耗盡時，輸出會有很大的變化。

隨着空閒空間的減少，緩衝區和緩存大小也會減小，因為內核越來越需要這些空間來供用户進程使用。

一旦沒有剩餘空間，你將開始在so（交換出）列中看到活動，此時內核開始將頁面移到磁盤上，幾乎所有其他輸出列都會根據內核的工作量發生變化。

你會看到更多的系統時間，更多的數據進出磁盤，以及更多的進程被阻塞，因為它們想要使用的內存不可用（已經被交換出）。

We haven’t explained all of the vmstat output columns. You can dig deeper into them in the vmstat(8) manual page, but you might have to learn more about kernel memory management first from a class or a book like Operating System Concepts, 9th edition (Wiley, 2012) in order to understand them.

我們沒有解釋所有的vmstat輸出列。

你可以在vmstat(8)的手冊頁中深入瞭解它們，但為了理解它們，你可能需要先從課程或者像《操作系統概念》（第9版，Wiley，2012） 這樣的書籍中更多地瞭解內核內存管理。

8.11 I/O Monitoring（輸入/輸出監控）

By default, vmstat shows you some general I/O statistics. Although you can get very detailed per-partition resource usage with vmstat -d, you’ll get a lot of output from this option, which might be overwhelming. Instead, try starting out with a tool just for I/O called iostat.

默認情況下，vmstat 會顯示一些一般的 I/O 統計信息。

雖然使用 vmstat -d 可以獲得非常詳細的每個分區資源使用情況，但該選項會產生大量輸出，可能會讓人難以承受。

相反，你可以嘗試從名為 iostat 的 I/O 工具開始。

8.11.1 Using iostat

Like vmstat, when run without any options, iostat shows the statistics for your machine’s current uptime:

與 vmstat 一樣，在不帶任何選項的情況下運行時，iostat 會顯示機器當前正常運行時間的統計數據：

$ iostat
[kernel information]
avg-cpu: %user %nice %system %iowait %steal %idle
 4.46 0.01 0.67 0.31 0.00 94.55
Device: tp s kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 4.6 7 7.2 8 49.86 9493727 65011716
sde 0.0 0 0.0 0 0.00 1230 0

The avg-cpu part at the top reports the same CPU utilization information as other utilities that you’ve seen in this chapter, so skip down to the bottom, which shows you the following for each device:

頂部的 avg-cpu 部分報告的 CPU 利用率信息與本章中的其他實用程序相同，因此請跳到底部，它將顯示每個設備的以下信息：

Another similarity to vmstat is that you can give an interval argument, such as iostat 2, to give an update every 2 seconds. When using an interval, you might want to display only the device report by using the -d option (such as iostat -d 2).

與vmstat類似的另一個特點是，你可以提供一個間隔參數，比如iostat 2，以便每2秒更新一次。

當使用間隔時，你可能希望只顯示設備報告，可以使用-d選項（比如iostat -d 2）。

By default, the iostat output omits partition information. To show all of the partition information, use the -p ALL option. Because there are many partitions on a typical system, you’ll get a lot of output. Here’s part of what you might see:

默認情況下，iostat輸出不包含分區信息。

要顯示所有分區信息，請使用-p ALL選項。

由於典型系統上有許多分區，你將得到大量輸出。以下是你可能看到的部分內容：

$ iostat -p ALL
--snip
--Device: tps kB_read/s kB_wrtn/s kB_read 
kB_wrtn
--snipsda 4.67 7.27 49.83 9496139 
65051472
sda1 4.38 7.16 49.51 9352969 
64635440
sda2 0.00 0.00 0.00 6 
0
sda5 0.01 0.11 0.32 141884 
416032
scd0 0.00 0.00 0.00 0 
0
--snip--
sde 0.00 0.00 0.00 1230 
0

In this example, sda1, sda2, and sda5 are all partitions of the sda disk, so there will be some overlap between the read and written columns. However, the sum of the partition columns won’t necessarily add up to the disk column. Although a read from sda1 also counts as a read from sda, keep in mind that you can read from sda directly, such as when reading the partition table.

在本例中，sda1、sda2 和 sda5 都是 sda 磁盤的分區，因此讀取列和寫入列之間會有一些重疊。

不過，分區列的總和並不一定等於磁盤列。

雖然從 sda1 的讀取也算作從 sda 的讀取，但請記住，您可以直接從 sda 讀取，例如在讀取分區表時。

8.11.2 Per-Process I/O Utilization and Monitoring: iotop（每進程 I/O 利用率和監控：iotop）

If you need to dig even deeper to see I/O resources used by individual processes, the iotop tool can help. Using iotop is similar to using top. There is a continuously updating display that shows the processes using the most I/O, with a general summary at the top:

如果需要更深入地查看單個進程使用的 I/O 資源，iotop 工具可以提供幫助。

使用 iotop 與使用 top 類似。

它有一個持續更新的顯示屏，顯示使用最多 I/O 的進程，頂部有一個總的摘要：

# iotop
Total DISK READ: 4.76 K/s | Total DISK WRITE: 333.31 K/s
 TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
 260 be/3 root 0.00 B/s 38.09 K/s 0.00 % 6.98 % [jbd2/sda1-
8]
2611 be/4 juser 4.76 K/s 10.32 K/s 0.00 % 0.21 % zeitgeistdaemon
2636 be/4 juser 0.00 B/s 84.12 K/s 0.00 % 0.20 % zeitgeistfts
1329 be/4 juser 0.00 B/s 65.87 K/s 0.00 % 0.03 % soffice.b~ashpipe=6
6845 be/4 juser 0.00 B/s 812.63 B/s 0.00 % 0.00 % chromium-browser
19069 be/4 juser 0.00 B/s 812.63 B/s 0.00 % 0.00 % rhythmbox

Along with the user, command, and read/write columns, notice that there is a TID column (thread ID) instead of a process ID. The iotop tool is one of the few utilities that displays threads instead of processes.

隨着用户、命令和讀/寫列，注意到有一個TID列（線程ID）而不是進程ID。

iotop工具是為數不多顯示線程而不是進程的實用工具之一。

The PRIO (priority) column indicates the I/O priority. It’s similar to the CPU priority that you’ve already seen, but it affects how quickly the kernel schedules I/O reads and writes for the process. In a priority such as be/4, the be part is the scheduling class, and the number is the priority level. As with CPU priorities, lower numbers are more important; for example, the kernel allows more time for I/O for a process with be/3 than one with be/4.

PRIO（優先級）列指示I/O優先級。

它類似於你已經見過的CPU優先級，但它影響內核為進程調度I/O讀取和寫入的速度。

在像be/4這樣的優先級中，be部分是調度類，數字是優先級級別。

與CPU優先級一樣，較低的數字更重要；

例如，內核為具有be/3的進程允許更多的時間進行I/O，而不是具有be/4的進程。

The kernel uses the scheduling class to add more control for I/O scheduling. You’ll see three scheduling classes from iotop:

內核使用調度類來增加對I/O調度的更多控制。你將從iotop中看到三個調度類：

o be Best-effort. The kernel does its best to fairly schedule I/O for this class. Most processes run under this I/O scheduling class.
o rt Real-time. The kernel schedules any real-time I/O before any other class of I/O, no matter what.
o idle Idle. The kernel performs I/O for this class only when there is no other I/O to be done. There is no priority level for the idle scheduling class.

o be 最佳努力。內核盡其所能公平地為該類別調度I/O。大多數進程在此I/O調度類下運行。

o rt 實時。內核在任何其他I/O類別之前調度任何實時I/O。

o idle 空閒。內核僅在沒有其他I/O需要完成時才為此類別執行I/O操作。空閒調度類別沒有優先級級別。

You can check and change the I/O priority for a process with the ionice utility; see the ionice(1) manual page for details. You probably will never need to worry about the I/O priority, though.

你可以使用ionice實用程序來檢查和更改進程的I/O優先級；有關詳細信息，請參閲ionice（1）手冊頁。但是，你可能永遠不需要擔心I/O優先級。

8.12 Per-Process Monitoring with pidstat

You’ve seen how you can monitor specific processes with utilities such as top and iotop. However, this display refreshes over time, and each update erases the previous output. The pidstat utility allows you to see the resource consumption of a process over time in the style of vmstat. Here’s a simple example for monitoring process 1329, updating every second:

您已經瞭解到如何使用top和iotop等工具來監視特定的進程。

然而，這些顯示屏幕會隨時間刷新，每次更新都會清除之前的輸出。

pidstat工具允許您以vmstat的方式查看進程隨時間的資源消耗情況。

下面是一個簡單的示例，用於監視進程1329，每秒更新一次：

$ pidstat -p 1329 1
Linux 3.2.0-44-generic-pae (duplex) 07/01/2015 _i686_ (4 CPU)
09:26:55 PM PID %usr %system %guest %CPU CPU Command
09:27:03 PM 1329 8.00 0.00 0.00 8.00 1 myprocess
09:27:04 PM 1329 0.00 0.00 0.00 0.00 3 myprocess
09:27:05 PM 1329 3.00 0.00 0.00 3.00 1 myprocess
09:27:06 PM 1329 8.00 0.00 0.00 8.00 3 myprocess
09:27:07 PM 1329 2.00 0.00 0.00 2.00 3 myprocess
09:27:08 PM 1329 6.00 0.00 0.00 6.00 2 myprocess

The default output shows the percentages of user and system time and the overall percentage of CPU time, and it even tells you which CPU the process was running on. (The %guest column here is somewhat odd— it’s the percentage of time that the process spent running something inside a virtual machine. Unless you’re running a virtual machine, don’t worry about this.)

默認輸出顯示了用户和系統時間的百分比，以及CPU時間的總體百分比，甚至還告訴您進程在哪個CPU上運行。

（這裏的%guest列有點奇怪，它是進程在虛擬機內運行的時間百分比。除非您正在運行虛擬機，否則不必擔心這個。）

Although pidstat shows CPU utilization by default, it can do much more. For example, you can use the - r option to monitor memory and -d to turn on disk monitoring. Try them out, and then look at the pidstat(1) manual page to see even more options for threads, context switching, or just about anything else that we’ve talked about in this chapter.

雖然pidstat默認顯示CPU利用率，但它還可以做更多的事情。例如，您可以使用-r選項來監視內存，使用-d選項來開啓磁盤監視。

試試它們，然後查看pidstat(1)手冊頁面，以瞭解更多有關線程、上下文切換或本章中討論的其他任何內容的選項。

8.13 Further Topics（進一步的主題）

One reason there are so many tools to measure resource utilization is that a wide array of resource types are consumed in many different ways. In this chapter, you’ve seen CPU, memory, and I/O as system resources being consumed by processes, threads inside processes, and the kernel.

有很多工具用於測量資源利用情況的一個原因是，有各種各樣的資源類型以多種不同的方式被消耗。

在本章中，您已經看到了 CPU、內存和 I/O 作為系統資源被進程、進程內的線程和內核所消耗。

The other reason that the tools exist is that the resources are limited and, for a system to perform well, its components must strive to consume fewer resources. In the past, many users shared a machine, so it was necessary to make sure that each user had a fair share of resources. Now, although a modern desktop computer may not have multiple users, it still has many processes competing for resources. Likewise, high-performance network servers require intense system resource monitoring.

這些工具存在的另一個原因是資源是有限的，為了系統能夠良好運行，其組件必須努力消耗更少的資源。

過去，許多用户共享一台機器，因此有必要確保每個用户都有公平的資源份額。

現在，儘管現代台式計算機可能沒有多個用户，但仍然有許多進程競爭資源。

同樣，高性能網絡服務器需要進行強大的系統資源監控。

Further topics in resource monitoring and performance analysis include the following:

資源監控和性能分析的進一步主題包括以下內容：

o sar (System Activity Reporter) The sar package has many of the continuous monitoring capabilities of vmstat, but it also records resource utilization over time. With sar, you can look back at a particular time to see what your system was doing. This is handy when you have a past system event that you want to analyze

o sar（系統活動報告器） sar 軟件包具有 vmstat 的許多連續監控功能，但它還記錄了資源利用情況的變化。

通過 sar，您可以回顧特定時間以查看系統的運行情況。當您有一個過去的系統事件需要分析時，這非常方便。

o acct (Process accounting) The acct package can record the processes and their resource utilization.

o acct（進程記賬） acct 軟件包可以記錄進程及其資源利用情況。

o Quotas. You can limit many system resources on a per-process or peruser basis. See /etc/security/limits.conf for some of the CPU and memory options; there’s also a limits.conf(5) manual page. This is a PAM feature, so processes are subject to this only if they’ve been started from something that uses PAM (such as a login shell). You can also limit the amount of disk space that a user can use with the quota system.

o 配額。您可以在每個進程或每個用户的基礎上限制許多系統資源。

有關 CPU 和內存選項，請參閲 /etc/security/limits.conf；還有一個 limits.conf(5) 手冊頁。

這是一個 PAM 功能，因此只有從使用 PAM 的東西（如登錄 shell）啓動的進程才受到此限制。

您還可以使用配額系統限制用户可以使用的磁盤空間的數量。

If you’re interested in systems tuning and performance in particular, Systems Performance: Enterprise and the Cloud by Brendan Gregg (Prentice Hall, 2013) goes into much more detail.

如果您對系統調優和性能特別感興趣，Brendan Gregg 的《系統性能：企業和雲計算》（Prentice Hall，2013）提供了更詳細的信息。

We also haven’t yet touched on the many, many tools that can be used to monitor network resource utilization. To use those, you first have to understand how the network works. That’s where we’re headed next.

我們還沒有涉及用於監控網絡資源利用情況的眾多工具。

要使用這些工具，首先必須瞭解網絡的工作原理。這就是我們接下來要討論的內容。

阿東 動態日志

@lazytimes

標簽

後端 (616)

JAVA (556)

Linux (168)

源碼分析 (33)

shell (29)

源碼學習 (27)

翻譯 (26)

tomcat (24)

編碼 (24)

url (20)

socket (19)

rocketmq (8)

動態

【Linux】《how linux work》第八章 流程和資源利用的近距離觀察 - 動態 詳情

Chapter 8. A Closer Look at Processes and Resource Utilization（第 8 章 流程和資源利用的近距離觀察）