从单线程到多线程服务器
From a Single-Threaded to a Multithreaded Server
现在,服务器将依次处理每个请求,这意味着在处理完第一个连接之前,它不会处理第二个连接。如果服务器收到的请求越来越多,这种串行执行的效果会越来越差。如果服务器收到一个处理时间很长的请求,后续的请求即使能很快处理,也必须等待长请求处理完毕。我们需要解决这个问题,但首先让我们看看实际存在的问题。
Right now, the server will process each request in turn, meaning it won’t process a second connection until the first connection is finished processing. If the server received more and more requests, this serial execution would be less and less optimal. If the server receives a request that takes a long time to process, subsequent requests will have to wait until the long request is finished, even if the new requests can be processed quickly. We’ll need to fix this, but first we’ll look at the problem in action.
模拟慢请求
Simulating a Slow Request
我们将看看缓慢处理的请求如何影响对当前服务器实现发出的其他请求。示例 21-10 实现了对 /sleep 请求的处理,其中包含模拟的慢响应,该响应将导致服务器在响应前休眠五秒钟。
We’ll look at how a slowly processing request can affect other requests made to our current server implementation. Listing 21-10 implements handling a request to /sleep with a simulated slow response that will cause the server to sleep for five seconds before responding.
use std::{
fs,
io::{BufReader, prelude::*},
net::{TcpListener, TcpStream},
thread,
time::Duration,
};
// --snip--
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
for stream in listener.incoming() {
let stream = stream.unwrap();
handle_connection(stream);
}
}
fn handle_connection(mut stream: TcpStream) {
// --snip--
let buf_reader = BufReader::new(&stream);
let request_line = buf_reader.lines().next().unwrap().unwrap();
let (status_line, filename) = match &request_line[..] {
"GET / HTTP/1.1" => ("HTTP/1.1 200 OK", "hello.html"),
"GET /sleep HTTP/1.1" => {
thread::sleep(Duration::from_secs(5));
("HTTP/1.1 200 OK", "hello.html")
}
_ => ("HTTP/1.1 404 NOT FOUND", "404.html"),
};
// --snip--
let contents = fs::read_to_string(filename).unwrap();
let length = contents.len();
let response =
format!("{status_line}\r\nContent-Length: {length}\r\n\r\n{contents}");
stream.write_all(response.as_bytes()).unwrap();
}
既然有了三种情况,我们现在从 if 切换到了 match。我们需要显式地在 request_line 的切片上进行匹配,以便与字符串字面值进行模式匹配;match 不会像相等方法那样自动进行引用和解引用。
We switched from if to match now that we have three cases. We need to
explicitly match on a slice of request_line to pattern-match against the
string literal values; match doesn’t do automatic referencing and
dereferencing, like the equality method does.
第一个分支与示例 21-9 中的 if 块相同。第二个分支匹配对 /sleep 的请求。收到该请求后,服务器将在渲染成功的 HTML 页面之前休眠五秒钟。第三个分支与示例 21-9 中的 else 块相同。
The first arm is the same as the if block from Listing 21-9. The second arm
matches a request to /sleep. When that request is received, the server will
sleep for five seconds before rendering the successful HTML page. The third arm
is the same as the else block from Listing 21-9.
你可以看到我们的服务器是多么原始:真正的库会以一种更简洁的方式处理多个请求的识别!
You can see how primitive our server is: Real libraries would handle the recognition of multiple requests in a much less verbose way!
使用 cargo run 启动服务器。然后,打开两个浏览器窗口:一个访问 http://127.0.0.1:7878,另一个访问 http://127.0.0.1:7878/sleep。如果你像以前一样多次输入 / URI,你会看到它响应很快。但如果你输入 /sleep 然后加载 /,你会看到 / 会一直等待直到 sleep 完成整整五秒的休眠后才加载。
Start the server using cargo run. Then, open two browser windows: one for
http://127.0.0.1:7878 and the other for http://127.0.0.1:7878/sleep. If you
enter the / URI a few times, as before, you’ll see it respond quickly. But if
you enter /sleep and then load /, you’ll see that / waits until sleep
has slept for its full five seconds before loading.
我们可以使用多种技术来避免请求在慢请求之后堆积,包括像我们在第 17 章中所做的那样使用 async;我们要实现的是线程池。
There are multiple techniques we could use to avoid requests backing up behind a slow request, including using async as we did Chapter 17; the one we’ll implement is a thread pool.
使用线程池改善吞吐量
Improving Throughput with a Thread Pool
线程池(thread pool)是一组已派生并准备好等待处理任务的线程。当程序收到新任务时,它会将池中的一个线程分配给该任务,该线程将处理该任务。池中的剩余线程可用于处理在第一个线程处理期间进入的任何其他任务。当第一个线程处理完任务后,它会被返回到空闲线程池中,准备处理新任务。线程池允许你并发处理连接,从而增加服务器的吞吐量。
A thread pool is a group of spawned threads that are ready and waiting to handle a task. When the program receives a new task, it assigns one of the threads in the pool to the task, and that thread will process the task. The remaining threads in the pool are available to handle any other tasks that come in while the first thread is processing. When the first thread is done processing its task, it’s returned to the pool of idle threads, ready to handle a new task. A thread pool allows you to process connections concurrently, increasing the throughput of your server.
我们将池中线程的数量限制在一个较小的数字,以保护我们免受 DoS 攻击;如果我们的程序为每个进入的请求创建一个新线程,那么向我们服务器发出 1000 万个请求的人可能会耗尽我们服务器的所有资源并使请求处理陷于停顿,从而造成严重破坏。
We’ll limit the number of threads in the pool to a small number to protect us from DoS attacks; if we had our program create a new thread for each request as it came in, someone making 10 million requests to our server could wreak havoc by using up all our server’s resources and grinding the processing of requests to a halt.
因此,我们将让固定数量的线程在池中等待,而不是派生无限数量的线程。进入的请求被发送到池中进行处理。池将维护一个入站请求队列。池中的每个线程都会从这个队列中弹出一个请求,处理该请求,然后再向队列索要另一个请求。通过这种设计,我们可以并发处理多达 N 个请求,其中 N 是线程数。如果每个线程都在响应一个耗时较长的请求,后续请求仍然可以在队列中积压,但我们增加了在达到该点之前可以处理的耗时较长请求的数量。
Rather than spawning unlimited threads, then, we’ll have a fixed number of
threads waiting in the pool. Requests that come in are sent to the pool for
processing. The pool will maintain a queue of incoming requests. Each of the
threads in the pool will pop off a request from this queue, handle the request,
and then ask the queue for another request. With this design, we can process up
to N requests concurrently, where N is the number of threads. If each
thread is responding to a long-running request, subsequent requests can still
back up in the queue, but we’ve increased the number of long-running requests
we can handle before reaching that point.
这种技术只是提高 Web 服务器吞吐量的众多方法之一。你可能探索的其他选项包括 fork/join 模型、单线程异步 I/O 模型和多线程异步 I/O 模型。如果你对这个话题感兴趣,可以阅读更多关于其他解决方案的信息并尝试实现它们;对于像 Rust 这样的底层语言,所有这些选项都是可能的。
This technique is just one of many ways to improve the throughput of a web server. Other options you might explore are the fork/join model, the single-threaded async I/O model, and the multithreaded async I/O model. If you’re interested in this topic, you can read more about other solutions and try to implement them; with a low-level language like Rust, all of these options are possible.
在开始实现线程池之前,让我们先谈谈使用该池应该是什么样子的。当你尝试设计代码时,先编写客户端接口可以帮助指导你的设计。编写代码的 API,使其以你想要调用它的方式进行结构化;然后,在该结构内实现功能,而不是先实现功能然后再设计公共 API。
Before we begin implementing a thread pool, let’s talk about what using the pool should look like. When you’re trying to design code, writing the client interface first can help guide your design. Write the API of the code so that it’s structured in the way you want to call it; then, implement the functionality within that structure rather than implementing the functionality and then designing the public API.
类似于我们在第 12 章的项目中使用测试驱动开发的方式,这里我们将使用编译器驱动开发。我们将编写调用我们想要的函数的代码,然后查看编译器的错误,以确定接下来应该更改什么以使代码工作。然而,在此之前,我们将探讨我们不打算作为起点的技术。
Similar to how we used test-driven development in the project in Chapter 12, we’ll use compiler-driven development here. We’ll write the code that calls the functions we want, and then we’ll look at errors from the compiler to determine what we should change next to get the code to work. Before we do that, however, we’ll explore the technique we’re not going to use as a starting point.
为每个请求派生一个线程
Spawning a Thread for Each Request
首先,让我们探讨一下如果代码确实为每个连接创建一个新线程,它会是什么样子。如前所述,由于可能会派生无限数量的线程,这并不是我们的最终计划,但它是首先获得一个工作的多线程服务器的起点。然后,我们将添加线程池作为改进,对比这两个解决方案会更容易。
First, let’s explore how our code might look if it did create a new thread for every connection. As mentioned earlier, this isn’t our final plan due to the problems with potentially spawning an unlimited number of threads, but it is a starting point to get a working multithreaded server first. Then, we’ll add the thread pool as an improvement, and contrasting the two solutions will be easier.
示例 21-11 显示了对 main 进行的更改,以便在 for 循环中派生一个新线程来处理每个流。
Listing 21-11 shows the changes to make to main to spawn a new thread to
handle each stream within the for loop.
use std::{
fs,
io::{BufReader, prelude::*},
net::{TcpListener, TcpStream},
thread,
time::Duration,
};
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
for stream in listener.incoming() {
let stream = stream.unwrap();
thread::spawn(|| {
handle_connection(stream);
});
}
}
fn handle_connection(mut stream: TcpStream) {
let buf_reader = BufReader::new(&stream);
let request_line = buf_reader.lines().next().unwrap().unwrap();
let (status_line, filename) = match &request_line[..] {
"GET / HTTP/1.1" => ("HTTP/1.1 200 OK", "hello.html"),
"GET /sleep HTTP/1.1" => {
thread::sleep(Duration::from_secs(5));
("HTTP/1.1 200 OK", "hello.html")
}
_ => ("HTTP/1.1 404 NOT FOUND", "404.html"),
};
let contents = fs::read_to_string(filename).unwrap();
let length = contents.len();
let response =
format!("{status_line}\r\nContent-Length: {length}\r\n\r\n{contents}");
stream.write_all(response.as_bytes()).unwrap();
}
正如你在第 16 章中学到的,thread::spawn 将创建一个新线程,然后在闭包中在新线程中运行代码。如果你运行这段代码并在浏览器中加载 /sleep,然后在另外两个浏览器标签页中加载 /,你确实会看到对 / 的请求不必等待 /sleep 完成。然而,正如我们提到的,这最终会使系统不堪重负,因为你会无限制地创建新线程。
As you learned in Chapter 16, thread::spawn will create a new thread and then
run the code in the closure in the new thread. If you run this code and load
/sleep in your browser, then / in two more browser tabs, you’ll indeed see
that the requests to / don’t have to wait for /sleep to finish. However, as
we mentioned, this will eventually overwhelm the system because you’d be making
new threads without any limit.
你可能还记得第 17 章中提到,这正是 async 和 await 大显身手的场景!在构建线程池时请记住这一点,并思考使用 async 会有哪些不同或相同之处。
You may also recall from Chapter 17 that this is exactly the kind of situation where async and await really shine! Keep that in mind as we build the thread pool and think about how things would look different or the same with async.
创建有限数量的线程
Creating a Finite Number of Threads
我们希望我们的线程池能以类似、熟悉的方式工作,这样从线程切换到线程池就不需要对使用我们 API 的代码进行大幅更改。示例 21-12 展示了我们要使用的 ThreadPool 结构体的假设接口,用来代替 thread::spawn。
We want our thread pool to work in a similar, familiar way so that switching
from threads to a thread pool doesn’t require large changes to the code that
uses our API. Listing 21-12 shows the hypothetical interface for a ThreadPool
struct we want to use instead of thread::spawn.
use std::{
fs,
io::{BufReader, prelude::*},
net::{TcpListener, TcpStream},
thread,
time::Duration,
};
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
let pool = ThreadPool::new(4);
for stream in listener.incoming() {
let stream = stream.unwrap();
pool.execute(|| {
handle_connection(stream);
});
}
}
fn handle_connection(mut stream: TcpStream) {
let buf_reader = BufReader::new(&stream);
let request_line = buf_reader.lines().next().unwrap().unwrap();
let (status_line, filename) = match &request_line[..] {
"GET / HTTP/1.1" => ("HTTP/1.1 200 OK", "hello.html"),
"GET /sleep HTTP/1.1" => {
thread::sleep(Duration::from_secs(5));
("HTTP/1.1 200 OK", "hello.html")
}
_ => ("HTTP/1.1 404 NOT FOUND", "404.html"),
};
let contents = fs::read_to_string(filename).unwrap();
let length = contents.len();
let response =
format!("{status_line}\r\nContent-Length: {length}\r\n\r\n{contents}");
stream.write_all(response.as_bytes()).unwrap();
}
我们使用 ThreadPool::new 来创建一个具有可配置线程数量的新线程池,在本例中为四个。然后,在 for 循环中,pool.execute 具有与 thread::spawn 类似的接口,因为它接收一个闭包,该池应该为每个流运行该闭包。我们需要实现 pool.execute,使其接收闭包并将其交给池中的线程运行。这段代码还不能编译,但我们将尝试这样做,以便编译器可以指导我们如何修复它。
We use ThreadPool::new to create a new thread pool with a configurable number
of threads, in this case four. Then, in the for loop, pool.execute has a
similar interface as thread::spawn in that it takes a closure that the pool
should run for each stream. We need to implement pool.execute so that it
takes the closure and gives it to a thread in the pool to run. This code won’t
yet compile, but we’ll try so that the compiler can guide us in how to fix it.
使用编译器驱动开发构建 ThreadPool
Building ThreadPool Using Compiler-Driven Development
对 src/main.rs 进行示例 21-12 中的更改,然后让我们使用来自 cargo check 的编译器错误来驱动我们的开发。这是我们得到的第一个错误:
Make the changes in Listing 21-12 to src/main.rs, and then let’s use the
compiler errors from cargo check to drive our development. Here is the first
error we get:
$ cargo check
Checking hello v0.1.0 (file:///projects/hello)
error[E0433]: failed to resolve: use of undeclared type `ThreadPool`
--> src/main.rs:11:16
|
11 | let pool = ThreadPool::new(4);
| ^^^^^^^^^^ use of undeclared type `ThreadPool`
For more information about this error, try `rustc --explain E0433`.
error: could not compile `hello` (bin "hello") due to 1 previous error
太棒了!这个错误告诉我们我们需要一个 ThreadPool 类型或模块,所以我们现在就构建一个。我们的 ThreadPool 实现将独立于我们的 Web 服务器正在执行的工作类型。因此,让我们将 hello crate 从二进制 crate 切换为库 crate,以容纳我们的 ThreadPool 实现。在更改为库 crate 后,我们还可以将独立的线程池库用于我们想要使用线程池执行的任何工作,而不仅仅是为 Web 请求提供服务。
Great! This error tells us we need a ThreadPool type or module, so we’ll
build one now. Our ThreadPool implementation will be independent of the kind
of work our web server is doing. So, let’s switch the hello crate from a
binary crate to a library crate to hold our ThreadPool implementation. After
we change to a library crate, we could also use the separate thread pool
library for any work we want to do using a thread pool, not just for serving
web requests.
创建一个包含以下内容的 src/lib.rs 文件,这是我们目前可以拥有的 ThreadPool 结构体的最简单定义:
Create a src/lib.rs file that contains the following, which is the simplest
definition of a ThreadPool struct that we can have for now:
pub struct ThreadPool;
然后,编辑 main.rs 文件,通过在 src/main.rs 顶部添加以下代码,将 ThreadPool 从库 crate 引入作用域:
Then, edit the main.rs file to bring ThreadPool into scope from the library
crate by adding the following code to the top of src/main.rs:
use hello::ThreadPool;
use std::{
fs,
io::{BufReader, prelude::*},
net::{TcpListener, TcpStream},
thread,
time::Duration,
};
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
let pool = ThreadPool::new(4);
for stream in listener.incoming() {
let stream = stream.unwrap();
pool.execute(|| {
handle_connection(stream);
});
}
}
fn handle_connection(mut stream: TcpStream) {
let buf_reader = BufReader::new(&stream);
let request_line = buf_reader.lines().next().unwrap().unwrap();
let (status_line, filename) = match &request_line[..] {
"GET / HTTP/1.1" => ("HTTP/1.1 200 OK", "hello.html"),
"GET /sleep HTTP/1.1" => {
thread::sleep(Duration::from_secs(5));
("HTTP/1.1 200 OK", "hello.html")
}
_ => ("HTTP/1.1 404 NOT FOUND", "404.html"),
};
let contents = fs::read_to_string(filename).unwrap();
let length = contents.len();
let response =
format!("{status_line}\r\nContent-Length: {length}\r\n\r\n{contents}");
stream.write_all(response.as_bytes()).unwrap();
}
这段代码仍然无法运行,但让我们再次检查它以获得我们需要解决的下一个错误:
This code still won’t work, but let’s check it again to get the next error that we need to address:
$ cargo check
Checking hello v0.1.0 (file:///projects/hello)
error[E0599]: no function or associated item named `new` found for struct `ThreadPool` in the current scope
--> src/main.rs:12:28
|
12 | let pool = ThreadPool::new(4);
| ^^^ function or associated item not found in `ThreadPool`
For more information about this error, try `rustc --explain E0599`.
error: could not compile `hello` (bin "hello") due to 1 previous error
此错误表明接下来我们需要为 ThreadPool 创建一个名为 new 的关联函数。我们还知道 new 需要有一个可以接受 4 作为参数的参数,并应返回一个 ThreadPool 实例。让我们实现具有这些特征的最简单的 new 函数:
This error indicates that next we need to create an associated function named
new for ThreadPool. We also know that new needs to have one parameter
that can accept 4 as an argument and should return a ThreadPool instance.
Let’s implement the simplest new function that will have those
characteristics:
pub struct ThreadPool;
impl ThreadPool {
pub fn new(size: usize) -> ThreadPool {
ThreadPool
}
}
我们选择 usize 作为 size 参数的类型,因为我们知道负数的线程数量没有任何意义。我们还知道我们将使用这个 4 作为线程集合中的元素数量,这正是 usize 类型的用途,如第 3 章“整数类型”一节中所述。
We chose usize as the type of the size parameter because we know that a
negative number of threads doesn’t make any sense. We also know we’ll use this
4 as the number of elements in a collection of threads, which is what the
usize type is for, as discussed in the “Integer Types” section in Chapter 3.
让我们再次检查代码:
Let’s check the code again:
$ cargo check
Checking hello v0.1.0 (file:///projects/hello)
error[E0599]: no method named `execute` found for struct `ThreadPool` in the current scope
--> src/main.rs:17:14
|
17 | pool.execute(|| {
| -----^^^^^^^ method not found in `ThreadPool`
For more information about this error, try `rustc --explain E0599`.
error: could not compile `hello` (bin "hello") due to 1 previous error
现在的错误是因为我们在 ThreadPool 上没有 execute 方法。回想一下“创建有限数量的线程”一节,我们决定我们的线程池应该具有类似于 thread::spawn 的接口。此外,我们将实现 execute 函数,使其接收它被给出的闭包并将其交给池中的空闲线程运行。
Now the error occurs because we don’t have an execute method on ThreadPool.
Recall from the “Creating a Finite Number of
Threads” section that we
decided our thread pool should have an interface similar to thread::spawn. In
addition, we’ll implement the execute function so that it takes the closure
it’s given and gives it to an idle thread in the pool to run.
我们将在 ThreadPool 上定义 execute 方法以接收一个闭包作为参数。回想一下第 13 章中的“将捕获的值移出闭包”,我们可以通过三种不同的 trait 接收闭包作为参数:Fn、FnMut 和 FnOnce。我们需要决定在这里使用哪种闭包。我们知道最终将执行与标准库 thread::spawn 实现类似的操作,因此我们可以查看 thread::spawn 的签名对其参数有哪些约束。文档向我们展示了以下内容:
We’ll define the execute method on ThreadPool to take a closure as a
parameter. Recall from the “Moving Captured Values Out of
Closures” in Chapter 13 that we can
take closures as parameters with three different traits: Fn, FnMut, and
FnOnce. We need to decide which kind of closure to use here. We know we’ll
end up doing something similar to the standard library thread::spawn
implementation, so we can look at what bounds the signature of thread::spawn
has on its parameter. The documentation shows us the following:
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T,
F: Send + 'static,
T: Send + 'static,
F 类型参数是我们在这里关注的参数;T 类型参数与返回值有关,我们不关心。我们可以看到 spawn 使用 FnOnce 作为 F 的 trait 约束。这可能也是我们想要的,因为我们最终会将 execute 中获得的参数传递给 spawn。我们可以进一步确信 FnOnce 是我们要使用的 trait,因为运行请求的线程只会执行该请求的闭包一次,这与 FnOnce 中的 Once 相匹配。
The F type parameter is the one we’re concerned with here; the T type
parameter is related to the return value, and we’re not concerned with that. We
can see that spawn uses FnOnce as the trait bound on F. This is probably
what we want as well, because we’ll eventually pass the argument we get in
execute to spawn. We can be further confident that FnOnce is the trait we
want to use because the thread for running a request will only execute that
request’s closure one time, which matches the Once in FnOnce.
F 类型参数还具有 trait 约束 Send 和生命周期约束 'static,这在我们的情况下很有用:我们需要 Send 将闭包从一个线程转移到另一个线程,需要 'static 是因为我们不知道线程执行需要多长时间。让我们在 ThreadPool 上创建一个 execute 方法,它将接受一个具有这些约束的 F 类型泛型参数:
The F type parameter also has the trait bound Send and the lifetime bound
'static, which are useful in our situation: We need Send to transfer the
closure from one thread to another and 'static because we don’t know how long
the thread will take to execute. Let’s create an execute method on
ThreadPool that will take a generic parameter of type F with these bounds:
pub struct ThreadPool;
impl ThreadPool {
// --snip--
pub fn new(size: usize) -> ThreadPool {
ThreadPool
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
}
}
我们仍然在 FnOnce 后面使用 (),因为这个 FnOnce 代表一个不带参数并返回单元类型 () 的闭包。就像函数定义一样,返回类型可以从签名中省略,但即使我们没有参数,我们仍然需要括号。
We still use the () after FnOnce because this FnOnce represents a closure
that takes no parameters and returns the unit type (). Just like function
definitions, the return type can be omitted from the signature, but even if we
have no parameters, we still need the parentheses.
同样,这是 execute 方法的最简单实现:它什么都不做,但我们只是试图让我们的代码编译。让我们再次检查它:
Again, this is the simplest implementation of the execute method: It does
nothing, but we’re only trying to make our code compile. Let’s check it again:
$ cargo check
Checking hello v0.1.0 (file:///projects/hello)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.24s
编译通过了!但请注意,如果你尝试 cargo run 并在浏览器中发出请求,你将在浏览器中看到我们在本章开头看到的错误。我们的库实际上还没有调用传递给 execute 的闭包!
It compiles! But note that if you try cargo run and make a request in the
browser, you’ll see the errors in the browser that we saw at the beginning of
the chapter. Our library isn’t actually calling the closure passed to execute
yet!
注意:关于具有严格编译器的语言(如 Haskell 和 Rust),你可能会听到一种说法:“如果代码编译通过,它就能工作。”但这种说法并非普遍成立。我们的项目编译通过了,但它绝对什么也没做!如果我们正在构建一个真实的、完整的项目,现在是开始编写单元测试以检查代码是否既编译通过又具有我们想要的行为的好时机。
Note: A saying you might hear about languages with strict compilers, such as Haskell and Rust, is “If the code compiles, it works.” But this saying is not universally true. Our project compiles, but it does absolutely nothing! If we were building a real, complete project, this would be a good time to start writing unit tests to check that the code compiles and has the behavior we want.
思考一下:如果我们要执行的是 future 而不是闭包,这里会有什么不同?
Consider: What would be different here if we were going to execute a future instead of a closure?
在 new 中验证线程数量
Validating the Number of Threads in new
我们还没有对 new 和 execute 的参数做任何处理。让我们实现这些函数的函数体,并使其具备我们想要的行为。首先,让我们考虑一下 new。之前我们为 size 参数选择了一个无符号类型,因为具有负数线程的池没有任何意义。然而,具有零个线程的池也没有任何意义,但零是一个完全有效的 usize。我们将添加代码以在返回 ThreadPool 实例之前检查 size 是否大于零,并且如果接收到零,我们将使用 assert! 宏使程序 panic,如示例 21-13 所示。
We aren’t doing anything with the parameters to new and execute. Let’s
implement the bodies of these functions with the behavior we want. To start,
let’s think about new. Earlier we chose an unsigned type for the size
parameter because a pool with a negative number of threads makes no sense.
However, a pool with zero threads also makes no sense, yet zero is a perfectly
valid usize. We’ll add code to check that size is greater than zero before
we return a ThreadPool instance, and we’ll have the program panic if it
receives a zero by using the assert! macro, as shown in Listing 21-13.
pub struct ThreadPool;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
ThreadPool
}
// --snip--
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
}
}
我们还通过文档注释为我们的 ThreadPool 添加了一些文档。请注意,我们遵循了良好的文档实践,添加了一个部分来说明我们的函数可能发生 panic 的情况,如第 14 章所述。尝试运行 cargo doc --open 并单击 ThreadPool 结构体以查看 new 生成的文档是什么样子的!
We’ve also added some documentation for our ThreadPool with doc comments.
Note that we followed good documentation practices by adding a section that
calls out the situations in which our function can panic, as discussed in
Chapter 14. Try running cargo doc --open and clicking the ThreadPool struct
to see what the generated docs for new look like!
除了像我们在这里所做的那样添加 assert! 宏之外,我们还可以将 new 更改为 build 并返回一个 Result,就像我们在示例 12-9 的 I/O 项目中对 Config::build 所做的那样。但在这种情况下,我们认为尝试创建没有任何线程的线程池应该是一个不可恢复的错误。如果你觉得自己雄心勃勃,可以尝试编写一个名为 build 的函数,其签名如下,以便与 new 函数进行比较:
Instead of adding the assert! macro as we’ve done here, we could change new
into build and return a Result like we did with Config::build in the I/O
project in Listing 12-9. But we’ve decided in this case that trying to create a
thread pool without any threads should be an unrecoverable error. If you’re
feeling ambitious, try to write a function named build with the following
signature to compare with the new function:
pub fn build(size: usize) -> Result<ThreadPool, PoolCreationError> {
创建存储线程的空间
Creating Space to Store the Threads
既然我们已经有办法知道我们拥有要在池中存储的有效线程数量,我们就可以在返回结构体之前创建这些线程并将它们存储在 ThreadPool 结构体中。但是我们如何“存储”一个线程呢?让我们再看看 thread::spawn 签名:
Now that we have a way to know we have a valid number of threads to store in
the pool, we can create those threads and store them in the ThreadPool struct
before returning the struct. But how do we “store” a thread? Let’s take another
look at the thread::spawn signature:
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T,
F: Send + 'static,
T: Send + 'static,
spawn 函数返回一个 JoinHandle<T>,其中 T 是闭包返回的类型。让我们也尝试使用 JoinHandle 看看会发生什么。在我们的例子中,我们传递给线程池的闭包将处理连接且不返回任何内容,因此 T 将是单元类型 ()。
The spawn function returns a JoinHandle<T>, where T is the type that the
closure returns. Let’s try using JoinHandle too and see what happens. In our
case, the closures we’re passing to the thread pool will handle the connection
and not return anything, so T will be the unit type ().
示例 21-14 中的代码可以编译,但它还没有创建任何线程。我们更改了 ThreadPool 的定义以持有一个 thread::JoinHandle<()> 实例的向量,用 size 容量初始化该向量,设置了一个将运行一些代码来创建线程的 for 循环,并返回了一个包含它们的 ThreadPool 实例。
The code in Listing 21-14 will compile, but it doesn’t create any threads yet.
We’ve changed the definition of ThreadPool to hold a vector of
thread::JoinHandle<()> instances, initialized the vector with a capacity of
size, set up a for loop that will run some code to create the threads, and
returned a ThreadPool instance containing them.
use std::thread;
pub struct ThreadPool {
threads: Vec<thread::JoinHandle<()>>,
}
impl ThreadPool {
// --snip--
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let mut threads = Vec::with_capacity(size);
for _ in 0..size {
// create some threads and store them in the vector
}
ThreadPool { threads }
}
// --snip--
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
}
}
我们在库 crate 中引入了 std::thread,因为我们在 ThreadPool 的向量中使用了 thread::JoinHandle 作为项的类型。
We’ve brought std::thread into scope in the library crate because we’re
using thread::JoinHandle as the type of the items in the vector in
ThreadPool.
一旦收到有效的大小,我们的 ThreadPool 就会创建一个可以容纳 size 个项的新向量。with_capacity 函数执行与 Vec::new 相同的任务,但有一个重要的区别:它在向量中预先分配空间。因为我们知道我们需要在向量中存储 size 个元素,所以预先进行这种分配比使用 Vec::new(它在插入元素时会调整自身大小)效率稍微高一点。
Once a valid size is received, our ThreadPool creates a new vector that can
hold size items. The with_capacity function performs the same task as
Vec::new but with an important difference: It pre-allocates space in the
vector. Because we know we need to store size elements in the vector, doing
this allocation up front is slightly more efficient than using Vec::new,
which resizes itself as elements are inserted.
当你再次运行 cargo check 时,它应该会成功。
When you run cargo check again, it should succeed.
从 ThreadPool 发送代码到线程
Sending Code from the ThreadPool to a Thread
我们在示例 21-14 的 for 循环中留下了关于创建线程的注释。在这里,我们将看看我们如何实际创建线程。标准库提供了 thread::spawn 作为创建线程的一种方式,并且 thread::spawn 期望在线程创建后立即获取该线程应运行的一些代码。然而,在我们的例子中,我们希望创建线程并让它们等待我们稍后发送的代码。标准库的线程实现不包含任何执行此操作的方法;我们必须手动实现它。
We left a comment in the for loop in Listing 21-14 regarding the creation of
threads. Here, we’ll look at how we actually create threads. The standard
library provides thread::spawn as a way to create threads, and
thread::spawn expects to get some code the thread should run as soon as the
thread is created. However, in our case, we want to create the threads and have
them wait for code that we’ll send later. The standard library’s
implementation of threads doesn’t include any way to do that; we have to
implement it manually.
我们将通过在 ThreadPool 和线程之间引入一种管理这种新行为的新数据结构来实现这种行为。我们将这个数据结构称为 Worker,这是池化实现中的常用术语。Worker 获取需要运行的代码并在其线程中运行该代码。
We’ll implement this behavior by introducing a new data structure between the
ThreadPool and the threads that will manage this new behavior. We’ll call
this data structure Worker, which is a common term in pooling
implementations. The Worker picks up code that needs to be run and runs the
code in its thread.
想想在餐厅厨房里工作的人:工作人员等待客户点餐,然后他们负责接单并完成点单。
Think of people working in the kitchen at a restaurant: The workers wait until orders come in from customers, and then they’re responsible for taking those orders and filling them.
我们不会在线程池中存储 JoinHandle<()> 实例的向量,而是存储 Worker 结构体的实例。每个 Worker 将存储一个 JoinHandle<()> 实例。然后,我们将在 Worker 上实现一个方法,该方法将获取要运行的代码闭包并将其发送到已经运行的线程中执行。我们还将给每个 Worker 一个 id,以便我们在日志记录或调试时能够区分池中不同的 Worker 实例。
Instead of storing a vector of JoinHandle<()> instances in the thread pool,
we’ll store instances of the Worker struct. Each Worker will store a single
JoinHandle<()> instance. Then, we’ll implement a method on Worker that will
take a closure of code to run and send it to the already running thread for
execution. We’ll also give each Worker an id so that we can distinguish
between the different instances of Worker in the pool when logging or
debugging.
这是我们在创建 ThreadPool 时将发生的新过程。在以这种方式设置好 Worker 后,我们将实现将闭包发送到线程的代码:
Here is the new process that will happen when we create a ThreadPool. We’ll
implement the code that sends the closure to the thread after we have Worker
set up in this way:
-
定义一个持有
id和JoinHandle<()>的Worker结构体。 -
将
ThreadPool更改为持有Worker实例的向量。 -
定义一个
Worker::new函数,它接收一个id编号并返回一个持有该id和通过空闭包派生的线程的Worker实例。 -
在
ThreadPool::new中,使用for循环计数器生成一个id,使用该id创建一个新的Worker,并将该Worker存储在向量中。 -
Define a
Workerstruct that holds anidand aJoinHandle<()>. -
Change
ThreadPoolto hold a vector ofWorkerinstances. -
Define a
Worker::newfunction that takes anidnumber and returns aWorkerinstance that holds theidand a thread spawned with an empty closure. -
In
ThreadPool::new, use theforloop counter to generate anid, create a newWorkerwith thatid, and store theWorkerin the vector.
如果你准备好迎接挑战,请在查看示例 21-15 中的代码之前尝试自己实现这些更改。
If you’re up for a challenge, try implementing these changes on your own before looking at the code in Listing 21-15.
准备好了吗?这是示例 21-15,它是进行上述修改的一种方式。
Ready? Here is Listing 21-15 with one way to make the preceding modifications.
use std::thread;
pub struct ThreadPool {
workers: Vec<Worker>,
}
impl ThreadPool {
// --snip--
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id));
}
ThreadPool { workers }
}
// --snip--
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
}
}
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
impl Worker {
fn new(id: usize) -> Worker {
let thread = thread::spawn(|| {});
Worker { id, thread }
}
}
我们将 ThreadPool 上字段的名称从 threads 更改为 workers,因为它现在持有的是 Worker 实例而不是 JoinHandle<()> 实例。我们将 for 循环中的计数器作为 Worker::new 的参数,并将每个新的 Worker 存储在名为 workers 的向量中。
We’ve changed the name of the field on ThreadPool from threads to workers
because it’s now holding Worker instances instead of JoinHandle<()>
instances. We use the counter in the for loop as an argument to
Worker::new, and we store each new Worker in the vector named workers.
外部代码(如 src/main.rs 中的服务器)不需要知道关于在 ThreadPool 内部使用 Worker 结构体的实现细节,因此我们将 Worker 结构体及其 new 函数设为私有。Worker::new 函数使用我们给它的 id 并存储一个 JoinHandle<()> 实例,该实例是通过使用空闭包派生新线程创建的。
External code (like our server in src/main.rs) doesn’t need to know the
implementation details regarding using a Worker struct within ThreadPool,
so we make the Worker struct and its new function private. The
Worker::new function uses the id we give it and stores a JoinHandle<()>
instance that is created by spawning a new thread using an empty closure.
注意:如果操作系统因为系统资源不足而无法创建线程,
thread::spawn将会 panic。这将导致我们的整个服务器 panic,即使某些线程的创建可能已经成功。为了简单起见,这种行为是可以接受的,但在生产级线程池实现中,你可能希望使用std::thread::Builder及其返回Result的spawn方法。
Note: If the operating system can’t create a thread because there aren’t enough system resources,
thread::spawnwill panic. That will cause our whole server to panic, even though the creation of some threads might succeed. For simplicity’s sake, this behavior is fine, but in a production thread pool implementation, you’d likely want to usestd::thread::Builderand itsspawnmethod that returnsResultinstead.
这段代码将编译并存储我们在 ThreadPool::new 的参数中指定的 Worker 实例数量。但是我们仍然没有处理我们在 execute 中获取的闭包。接下来让我们看看如何做到这一点。
This code will compile and will store the number of Worker instances we
specified as an argument to ThreadPool::new. But we’re still not processing
the closure that we get in execute. Let’s look at how to do that next.
通过通道向线程发送请求
Sending Requests to Threads via Channels
我们要解决的下一个问题是传递给 thread::spawn 的闭包绝对什么也没做。目前,我们在 execute 方法中获取了想要执行的闭包。但是我们需要在创建 ThreadPool 期间创建每个 Worker 时,给 thread::spawn 一个要运行的闭包。
The next problem we’ll tackle is that the closures given to thread::spawn do
absolutely nothing. Currently, we get the closure we want to execute in the
execute method. But we need to give thread::spawn a closure to run when we
create each Worker during the creation of the ThreadPool.
我们希望刚刚创建的 Worker 结构体从 ThreadPool 持有的队列中获取要运行的代码,并将该代码发送到其线程中运行。
We want the Worker structs that we just created to fetch the code to run from
a queue held in the ThreadPool and send that code to its thread to run.
我们在第 16 章中学到的通道——两个线程之间通信的一种简单方式——将非常适合这种用例。我们将使用通道作为任务队列,execute 将从 ThreadPool 发送一个任务到 Worker 实例,后者将任务发送到其线程。计划如下:
The channels we learned about in Chapter 16—a simple way to communicate between
two threads—would be perfect for this use case. We’ll use a channel to function
as the queue of jobs, and execute will send a job from the ThreadPool to
the Worker instances, which will send the job to its thread. Here is the plan:
-
ThreadPool将创建一个通道并持有发送端。 -
每个
Worker将持有接收端。 -
我们将创建一个新的
Job结构体,它将持有我们想要通过通道发送的闭包。 -
execute方法将通过发送端发送它想要执行的任务。 -
在其线程中,
Worker将循环遍历其接收端并执行它接收到的任何任务的闭包。 -
The
ThreadPoolwill create a channel and hold on to the sender. -
Each
Workerwill hold on to the receiver. -
We’ll create a new
Jobstruct that will hold the closures we want to send down the channel. -
The
executemethod will send the job it wants to execute through the sender. -
In its thread, the
Workerwill loop over its receiver and execute the closures of any jobs it receives.
让我们先在 ThreadPool::new 中创建一个通道并在 ThreadPool 实例中持有发送端,如示例 21-16 所示。Job 结构体目前不持有任何内容,但将作为我们通过通道发送的项的类型。
Let’s start by creating a channel in ThreadPool::new and holding the sender
in the ThreadPool instance, as shown in Listing 21-16. The Job struct
doesn’t hold anything for now but will be the type of item we’re sending down
the channel.
use std::{sync::mpsc, thread};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: mpsc::Sender<Job>,
}
struct Job;
impl ThreadPool {
// --snip--
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id));
}
ThreadPool { workers, sender }
}
// --snip--
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
}
}
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
impl Worker {
fn new(id: usize) -> Worker {
let thread = thread::spawn(|| {});
Worker { id, thread }
}
}
在 ThreadPool::new 中,我们创建了新通道并让池持有发送端。这将成功编译。
In ThreadPool::new, we create our new channel and have the pool hold the
sender. This will successfully compile.
让我们尝试在线程池创建通道时,将通道的接收端传递到每个 Worker 中。我们知道我们想在 Worker 实例派生的线程中使用接收端,所以我们将在闭包中引用 receiver 参数。示例 21-17 中的代码还不能编译。
Let’s try passing a receiver of the channel into each Worker as the thread
pool creates the channel. We know we want to use the receiver in the thread that
the Worker instances spawn, so we’ll reference the receiver parameter in the
closure. The code in Listing 21-17 won’t quite compile yet.
use std::{sync::mpsc, thread};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: mpsc::Sender<Job>,
}
struct Job;
impl ThreadPool {
// --snip--
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, receiver));
}
ThreadPool { workers, sender }
}
// --snip--
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
}
}
// --snip--
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
impl Worker {
fn new(id: usize, receiver: mpsc::Receiver<Job>) -> Worker {
let thread = thread::spawn(|| {
receiver;
});
Worker { id, thread }
}
}
我们做了一些简单直接的更改:我们将接收端传递给 Worker::new,然后在闭包内部使用它。
We’ve made some small and straightforward changes: We pass the receiver into
Worker::new, and then we use it inside the closure.
当我们尝试检查这段代码时,我们得到了这个错误:
When we try to check this code, we get this error:
$ cargo check
Checking hello v0.1.0 (file:///projects/hello)
error[E0382]: use of moved value: `receiver`
--> src/lib.rs:26:42
|
21 | let (sender, receiver) = mpsc::channel();
| -------- move occurs because `receiver` has type `std::sync::mpsc::Receiver<Job>`, which does not implement the `Copy` trait
...
25 | for id in 0..size {
| ----------------- inside of this loop
26 | workers.push(Worker::new(id, receiver));
| ^^^^^^^^ value moved here, in previous iteration of loop
|
note: consider changing this parameter type in method `new` to borrow instead if owning the value isn't necessary
--> src/lib.rs:47:33
|
47 | fn new(id: usize, receiver: mpsc::Receiver<Job>) -> Worker {
| --- in this method ^^^^^^^^^^^^^^^^^^^ this parameter takes ownership of the value
help: consider moving the expression out of the loop so it is only moved once
|
25 ~ let mut value = Worker::new(id, receiver);
26 ~ for id in 0..size {
27 ~ workers.push(value);
|
For more information about this error, try `rustc --explain E0382`.
error: could not compile `hello` (lib) due to 1 previous error
代码试图将 receiver 传递给多个 Worker 实例。这行不通,你可能还记得第 16 章:Rust 提供的通道实现是多生产者、单消费者(multiple producer, single consumer)。这意味着我们不能仅仅通过克隆通道的消费端来修复这段代码。我们也不想多次向多个消费者发送消息;我们希望有一个消息列表,其中有多个 Worker 实例,使得每条消息只被处理一次。
The code is trying to pass receiver to multiple Worker instances. This
won’t work, as you’ll recall from Chapter 16: The channel implementation that
Rust provides is multiple producer, single consumer. This means we can’t
just clone the consuming end of the channel to fix this code. We also don’t
want to send a message multiple times to multiple consumers; we want one list
of messages with multiple Worker instances such that each message gets
processed once.
此外,从通道队列中取出任务涉及修改 receiver,因此线程需要一种安全的方式来共享和修改 receiver;否则,我们可能会遇到竞态条件(如第 16 章所述)。
Additionally, taking a job off the channel queue involves mutating the
receiver, so the threads need a safe way to share and modify receiver;
otherwise, we might get race conditions (as covered in Chapter 16).
回想一下第 16 章中讨论的线程安全智能指针:为了在多个线程之间共享所有权并允许线程修改值,我们需要使用 Arc<Mutex<T>>。Arc 类型将允许多个 Worker 实例拥有接收端,而 Mutex 将确保一次只有一个 Worker 从接收端获取任务。示例 21-18 显示了我们需要做的更改。
Recall the thread-safe smart pointers discussed in Chapter 16: To share
ownership across multiple threads and allow the threads to mutate the value, we
need to use Arc<Mutex<T>>. The Arc type will let multiple Worker instances
own the receiver, and Mutex will ensure that only one Worker gets a job from
the receiver at a time. Listing 21-18 shows the changes we need to make.
use std::{
sync::{Arc, Mutex, mpsc},
thread,
};
// --snip--
pub struct ThreadPool {
workers: Vec<Worker>,
sender: mpsc::Sender<Job>,
}
struct Job;
impl ThreadPool {
// --snip--
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool { workers, sender }
}
// --snip--
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
}
}
// --snip--
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
// --snip--
let thread = thread::spawn(|| {
receiver;
});
Worker { id, thread }
}
}
在 ThreadPool::new 中,我们将接收端放入 Arc 和 Mutex 中。对于每个新 Worker,我们克隆 Arc 以增加引用计数,以便 Worker 实例可以共享接收端的所有权。
In ThreadPool::new, we put the receiver in an Arc and a Mutex. For each
new Worker, we clone the Arc to bump the reference count so that the
Worker instances can share ownership of the receiver.
通过这些更改,代码编译通过了!我们就快成功了!
With these changes, the code compiles! We’re getting there!
实现 execute 方法
Implementing the execute Method
最后让我们实现 ThreadPool 上的 execute 方法。我们还将把 Job 从结构体更改为 trait 对象的类型别名,该对象持有 execute 接收的闭包类型。正如第 20 章“类型别名”一节中所述,类型别名允许我们将长类型缩短以便于使用。查看示例 21-19。
Let’s finally implement the execute method on ThreadPool. We’ll also change
Job from a struct to a type alias for a trait object that holds the type of
closure that execute receives. As discussed in the “Type Synonyms and Type
Aliases” section in Chapter 20, type aliases
allow us to make long types shorter for ease of use. Look at Listing 21-19.
use std::{
sync::{Arc, Mutex, mpsc},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: mpsc::Sender<Job>,
}
// --snip--
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
// --snip--
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool { workers, sender }
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.send(job).unwrap();
}
}
// --snip--
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(|| {
receiver;
});
Worker { id, thread }
}
}
在使用 execute 中获得的闭包创建新的 Job 实例后,我们将该任务发送到通道的发送端。我们在 send 上调用 unwrap 以处理发送失败的情况。这可能会发生,例如,如果我们停止了所有线程的执行,这意味着接收端已停止接收新消息。目前,我们无法停止线程执行:只要池存在,我们的线程就会继续执行。我们使用 unwrap 的原因是我们知道失败情况不会发生,但编译器并不知道。
After creating a new Job instance using the closure we get in execute, we
send that job down the sending end of the channel. We’re calling unwrap on
send for the case that sending fails. This might happen if, for example, we
stop all our threads from executing, meaning the receiving end has stopped
receiving new messages. At the moment, we can’t stop our threads from
executing: Our threads continue executing as long as the pool exists. The
reason we use unwrap is that we know the failure case won’t happen, but the
compiler doesn’t know that.
但我们还没完呢!在 Worker 中,传递给 thread::spawn 的闭包仍然只引用通道的接收端。相反,我们需要闭包永远循环,向通道的接收端索要任务,并在获得任务时运行它。让我们对 Worker::new 进行示例 21-20 中所示的更改。
But we’re not quite done yet! In the Worker, our closure being passed to
thread::spawn still only references the receiving end of the channel.
Instead, we need the closure to loop forever, asking the receiving end of the
channel for a job and running the job when it gets one. Let’s make the change
shown in Listing 21-20 to Worker::new.
use std::{
sync::{Arc, Mutex, mpsc},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: mpsc::Sender<Job>,
}
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool { workers, sender }
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.send(job).unwrap();
}
}
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
// --snip--
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || {
loop {
let job = receiver.lock().unwrap().recv().unwrap();
println!("Worker {id} got a job; executing.");
job();
}
});
Worker { id, thread }
}
}
在这里,我们首先在 receiver 上调用 lock 以获取互斥锁,然后调用 unwrap 以在发生任何错误时 panic。如果互斥锁处于*被污染(poisoned)*状态,获取锁可能会失败,这发生在其他某个线程在持有锁时发生 panic 而不是释放锁的情况下。在这种情况下,调用 unwrap 使此线程 panic 是正确的做法。你可以随意将此 unwrap 更改为带对你有意义的错误消息的 expect。
Here, we first call lock on the receiver to acquire the mutex, and then we
call unwrap to panic on any errors. Acquiring a lock might fail if the mutex
is in a poisoned state, which can happen if some other thread panicked while
holding the lock rather than releasing the lock. In this situation, calling
unwrap to have this thread panic is the correct action to take. Feel free to
change this unwrap to an expect with an error message that is meaningful to
you.
如果我们获得了互斥锁,我们就调用 recv 从通道接收一个 Job。最后一个 unwrap 也会跳过这里的任何错误,如果持有发送端的线程已经关闭,可能会发生错误,类似于如果接收端关闭,send 方法会返回 Err。
If we get the lock on the mutex, we call recv to receive a Job from the
channel. A final unwrap moves past any errors here as well, which might occur
if the thread holding the sender has shut down, similar to how the send
method returns Err if the receiver shuts down.
对 recv 的调用是阻塞的,因此如果还没有任务,当前线程将等待直到任务可用。Mutex<T> 确保一次只有一个 Worker 线程尝试请求任务。
The call to recv blocks, so if there is no job yet, the current thread will
wait until a job becomes available. The Mutex<T> ensures that only one
Worker thread at a time is trying to request a job.
我们的线程池现在处于工作状态!运行 cargo run 并发出一些请求:
Our thread pool is now in a working state! Give it a cargo run and make some
requests:
$ cargo run
Compiling hello v0.1.0 (file:///projects/hello)
warning: field `workers` is never read
--> src/lib.rs:7:5
|
6 | pub struct ThreadPool {
| ---------- field in this struct
7 | workers: Vec<Worker>,
| ^^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: fields `id` and `thread` are never read
--> src/lib.rs:48:5
|
47 | struct Worker {
| ------ fields in this struct
48 | id: usize,
| ^^
49 | thread: thread::JoinHandle<()>,
| ^^^^^^
warning: `hello` (lib) generated 2 warnings
Finished `dev` profile [unoptimized + debuginfo] target(s) in 4.91s
Running `target/debug/hello`
Worker 0 got a job; executing.
Worker 2 got a job; executing.
Worker 1 got a job; executing.
Worker 3 got a job; executing.
Worker 0 got a job; executing.
Worker 2 got a job; executing.
Worker 1 got a job; executing.
Worker 3 got a job; executing.
Worker 0 got a job; executing.
Worker 2 got a job; executing.
成功了!我们现在有了一个异步执行连接的线程池。创建的线程永远不会超过四个,因此如果服务器收到大量请求,我们的系统就不会超载。如果我们向 /sleep 发出请求,服务器将能够通过让另一个线程运行其他请求来为它们提供服务。
Success! We now have a thread pool that executes connections asynchronously. There are never more than four threads created, so our system won’t get overloaded if the server receives a lot of requests. If we make a request to /sleep, the server will be able to serve other requests by having another thread run them.
注意:如果你在多个浏览器窗口中同时打开 /sleep,它们可能会以五秒的间隔逐个加载。出于缓存原因,某些 Web 浏览器会按顺序执行同一请求的多个实例。这种限制不是由我们的 Web 服务器造成的。
Note: If you open /sleep in multiple browser windows simultaneously, they might load one at a time in five-second intervals. Some web browsers execute multiple instances of the same request sequentially for caching reasons. This limitation is not caused by our web server.
现在是暂停并思考示例 21-18、21-19 和 21-20 中的代码如果使用 future 而不是闭包来完成工作会有什么不同的好时机。哪些类型会改变?方法签名会有什么不同(如果有的话)?代码的哪些部分将保持不变?
This is a good time to pause and consider how the code in Listings 21-18, 21-19, and 21-20 would be different if we were using futures instead of a closure for the work to be done. What types would change? How would the method signatures be different, if at all? What parts of the code would stay the same?
在学习了第 17 章和第 19 章中的 while let 循环之后,你可能会想知道为什么我们没有像示例 21-21 所示那样编写 Worker 线程代码。
After learning about the while let loop in Chapter 17 and Chapter 19, you
might be wondering why we didn’t write the Worker thread code as shown in
Listing 21-21.
use std::{
sync::{Arc, Mutex, mpsc},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: mpsc::Sender<Job>,
}
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool { workers, sender }
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.send(job).unwrap();
}
}
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
// --snip--
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || {
while let Ok(job) = receiver.lock().unwrap().recv() {
println!("Worker {id} got a job; executing.");
job();
}
});
Worker { id, thread }
}
}
这段代码可以编译并运行,但不会产生预期的线程行为:慢请求仍然会导致其他请求等待处理。原因有些微妙:Mutex 结构体没有公共的 unlock 方法,因为锁的所有权基于 lock 方法返回的 LockResult<MutexGuard<T>> 中 MutexGuard<T> 的生命周期。在编译时,借用检查器可以强制执行以下规则:除非我们持有锁,否则无法访问受 Mutex 保护的资源。然而,如果我们不注意 MutexGuard<T> 的生命周期,这种实现也可能导致锁被持有的时间超过预期。
This code compiles and runs but doesn’t result in the desired threading
behavior: A slow request will still cause other requests to wait to be
processed. The reason is somewhat subtle: The Mutex struct has no public
unlock method because the ownership of the lock is based on the lifetime of
the MutexGuard<T> within the LockResult<MutexGuard<T>> that the lock
method returns. At compile time, the borrow checker can then enforce the rule
that a resource guarded by a Mutex cannot be accessed unless we hold the
lock. However, this implementation can also result in the lock being held
longer than intended if we aren’t mindful of the lifetime of the
MutexGuard<T>.
示例 21-20 中使用 let job = receiver.lock().unwrap().recv().unwrap(); 的代码之所以有效,是因为对于 let,等号右侧表达式中使用的任何临时值都会在 let 语句结束时立即丢弃。然而,while let(以及 if let 和 match)在相关联的语句块结束之前不会丢弃临时值。在示例 21-21 中,锁在调用 job() 的整个过程中一直被持有,这意味着其他 Worker 实例无法接收任务。
The code in Listing 21-20 that uses let job = receiver.lock().unwrap().recv().unwrap(); works because with let, any
temporary values used in the expression on the right-hand side of the equal
sign are immediately dropped when the let statement ends. However, while let (and if let and match) does not drop temporary values until the end of
the associated block. In Listing 21-21, the lock remains held for the duration
of the call to job(), meaning other Worker instances cannot receive jobs.
优雅停机与清理
Graceful Shutdown and Cleanup
示例 21-20 中的代码按照我们的预期,通过使用线程池异步响应请求。我们收到了一些关于未直接使用的 workers、id 和 thread 字段的警告,这提醒我们没有进行任何清理工作。当我们使用不太优雅的 ctrl-C 方法停止主线程时,所有其他线程也会立即停止,即使它们正处于处理请求的过程中。
The code in Listing 21-20 is responding to requests asynchronously through the
use of a thread pool, as we intended. We get some warnings about the workers,
id, and thread fields that we’re not using in a direct way that reminds us
we’re not cleaning up anything. When we use the less elegant
ctrl-C method to halt the main thread, all other threads
are stopped immediately as well, even if they’re in the middle of serving a
request.
接下来,我们将实现 Drop trait,对线程池中的每个线程调用 join,以便它们在关闭前可以完成正在处理的请求。然后,我们将实现一种方法来告诉线程它们应该停止接受新请求并关机。为了看到代码的效果,我们将修改服务器,使其在优雅关闭其线程池之前仅接受两个请求。
Next, then, we’ll implement the Drop trait to call join on each of the
threads in the pool so that they can finish the requests they’re working on
before closing. Then, we’ll implement a way to tell the threads they should
stop accepting new requests and shut down. To see this code in action, we’ll
modify our server to accept only two requests before gracefully shutting down
its thread pool.
在此过程中需要注意的一点是:这些都不会影响处理执行闭包的代码部分,所以如果我们为 async 运行时使用线程池,这里的一切都会是一样的。
One thing to notice as we go: None of this affects the parts of the code that handle executing the closures, so everything here would be the same if we were using a thread pool for an async runtime.
在 ThreadPool 上实现 Drop Trait
Implementing the Drop Trait on ThreadPool
让我们从在线程池上实现 Drop 开始。当线程池被丢弃时,我们的线程应该全部进行 join,以确保它们完成工作。示例 21-22 展示了 Drop 实现的第一次尝试;这段代码还不能完全工作。
Let’s start with implementing Drop on our thread pool. When the pool is
dropped, our threads should all join to make sure they finish their work.
Listing 21-22 shows a first attempt at a Drop implementation; this code won’t
quite work yet.
use std::{
sync::{Arc, Mutex, mpsc},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: mpsc::Sender<Job>,
}
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool { workers, sender }
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.send(job).unwrap();
}
}
impl Drop for ThreadPool {
fn drop(&mut self) {
for worker in &mut self.workers {
println!("Shutting down worker {}", worker.id);
worker.thread.join().unwrap();
}
}
}
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || {
loop {
let job = receiver.lock().unwrap().recv().unwrap();
println!("Worker {id} got a job; executing.");
job();
}
});
Worker { id, thread }
}
}
首先,我们遍历线程池中的每个 workers。我们为此使用 &mut,因为 self 是一个可变引用,而且我们也需要能够修改 worker。对于每个 worker,我们打印一条消息,说明该特定的 Worker 实例正在关闭,然后我们在该 Worker 实例的线程上调用 join。如果 join 调用失败,我们使用 unwrap 让 Rust 发生 panic 并进入非优雅停机状态。
First, we loop through each of the thread pool workers. We use &mut for this
because self is a mutable reference, and we also need to be able to mutate
worker. For each worker, we print a message saying that this particular
Worker instance is shutting down, and then we call join on that Worker
instance’s thread. If the call to join fails, we use unwrap to make Rust
panic and go into an ungraceful shutdown.
这是我们编译这段代码时得到的错误:
Here is the error we get when we compile this code:
$ cargo check
Checking hello v0.1.0 (file:///projects/hello)
error[E0507]: cannot move out of `worker.thread` which is behind a mutable reference
--> src/lib.rs:52:13
|
52 | worker.thread.join().unwrap();
| ^^^^^^^^^^^^^ ------ `worker.thread` moved due to this method call
| |
| move occurs because `worker.thread` has type `JoinHandle<()>`, which does not implement the `Copy` trait
|
note: `JoinHandle::<T>::join` takes ownership of the receiver `self`, which moves `worker.thread`
--> /rustc/1159e78c4747b02ef996e55082b704c09b970588/library/std/src/thread/mod.rs:1921:17
For more information about this error, try `rustc --explain E0507`.
error: could not compile `hello` (lib) due to 1 previous error
错误告诉我们无法调用 join,因为我们只有每个 worker 的可变借用,而 join 需要获取其参数的所有权。为了解决这个问题,我们需要将线程从拥有 thread 的 Worker 实例中移出,以便 join 可以消耗该线程。实现这一点的一种方法是采取我们在示例 18-15 中采取的方法。如果 Worker 持有一个 Option<thread::JoinHandle<()>>,我们可以在 Option 上调用 take 方法将值从 Some 变体中移出,并在其位置留下一个 None 变体。换句话说,一个正在运行的 Worker 在 thread 中会有一个 Some 变体,而当我们想要清理 Worker 时,我们会用 None 替换 Some,这样 Worker 就没有线程可以运行了。
The error tells us we can’t call join because we only have a mutable borrow
of each worker and join takes ownership of its argument. To solve this
issue, we need to move the thread out of the Worker instance that owns
thread so that join can consume the thread. One way to do this is to take
the same approach we took in Listing 18-15. If Worker held an
Option<thread::JoinHandle<()>>, we could call the take method on the
Option to move the value out of the Some variant and leave a None variant
in its place. In other words, a Worker that is running would have a Some
variant in thread, and when we wanted to clean up a Worker, we’d replace
Some with None so that the Worker wouldn’t have a thread to run.
然而,这种情况唯一出现的时候是在丢弃 Worker 时。作为代价,我们在访问 worker.thread 的任何地方都必须处理 Option<thread::JoinHandle<()>>。惯用的 Rust 经常使用 Option,但当你发现自己为了像这样规避问题而将某些你已知永远存在的东西包装在 Option 中时,寻找替代方法以使你的代码更简洁且更不容易出错是一个好主意。
However, the only time this would come up would be when dropping the
Worker. In exchange, we’d have to deal with an
Option<thread::JoinHandle<()>> anywhere we accessed worker.thread.
Idiomatic Rust uses Option quite a bit, but when you find yourself wrapping
something you know will always be present in an Option as a workaround like
this, it’s a good idea to look for alternative approaches to make your code
cleaner and less error-prone.
在这种情况下,存在更好的替代方案:Vec::drain 方法。它接受一个范围参数来指定要从向量中移除哪些项,并返回这些项的迭代器。传递 .. 范围语法将移除向量中的每一个值。
In this case, a better alternative exists: the Vec::drain method. It accepts
a range parameter to specify which items to remove from the vector and returns
an iterator of those items. Passing the .. range syntax will remove every
value from the vector.
所以,我们需要像这样更新 ThreadPool 的 drop 实现:
So, we need to update the ThreadPool drop implementation like this:
#![allow(unused)]
fn main() {
use std::{
sync::{Arc, Mutex, mpsc},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: mpsc::Sender<Job>,
}
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool { workers, sender }
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.send(job).unwrap();
}
}
impl Drop for ThreadPool {
fn drop(&mut self) {
for worker in self.workers.drain(..) {
println!("Shutting down worker {}", worker.id);
worker.thread.join().unwrap();
}
}
}
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || {
loop {
let job = receiver.lock().unwrap().recv().unwrap();
println!("Worker {id} got a job; executing.");
job();
}
});
Worker { id, thread }
}
}
}
这解决了编译器错误,并且不需要对我们的代码进行任何其他更改。请注意,由于 drop 可以在发生 panic 时调用,unwrap 也可能发生 panic 并导致双重 panic,这会立即导致程序崩溃并结束任何正在进行的清理。对于示例程序来说这是可以的,但不建议用于生产代码。
This resolves the compiler error and does not require any other changes to our code. Note that, because drop can be called when panicking, the unwrap could also panic and cause a double panic, which immediately crashes the program and ends any cleanup in progress. This is fine for an example program, but it isn’t recommended for production code.
向线程发出停止监听任务的信号
Signaling to the Threads to Stop Listening for Jobs
随着我们所做的所有更改,我们的代码编译通过且没有任何警告。然而,坏消息是这段代码还不能按照我们想要的方式运行。关键在于 Worker 实例线程运行的闭包中的逻辑:目前我们调用了 join,但这不会关闭线程,因为它们永远在 loop 中寻找任务。如果我们尝试使用当前的 drop 实现来丢弃 ThreadPool,主线程将永远阻塞,等待第一个线程完成。
With all the changes we’ve made, our code compiles without any warnings.
However, the bad news is that this code doesn’t function the way we want it to
yet. The key is the logic in the closures run by the threads of the Worker
instances: At the moment, we call join, but that won’t shut down the threads,
because they loop forever looking for jobs. If we try to drop our
ThreadPool with our current implementation of drop, the main thread will
block forever, waiting for the first thread to finish.
为了解决这个问题,我们需要修改 ThreadPool 的 drop 实现,然后修改 Worker 循环。
To fix this problem, we’ll need a change in the ThreadPool drop
implementation and then a change in the Worker loop.
首先,我们将更改 ThreadPool 的 drop 实现,在等待线程完成之前显式丢弃 sender。示例 21-23 展示了对 ThreadPool 进行的显式丢弃 sender 的更改。与线程不同,这里我们确实需要使用 Option 才能使用 Option::take 将 sender 从 ThreadPool 中移出。
First, we’ll change the ThreadPool drop implementation to explicitly drop
the sender before waiting for the threads to finish. Listing 21-23 shows the
changes to ThreadPool to explicitly drop sender. Unlike with the thread,
here we do need to use an Option to be able to move sender out of
ThreadPool with Option::take.
use std::{
sync::{Arc, Mutex, mpsc},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: Option<mpsc::Sender<Job>>,
}
// --snip--
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
// --snip--
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool {
workers,
sender: Some(sender),
}
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.as_ref().unwrap().send(job).unwrap();
}
}
impl Drop for ThreadPool {
fn drop(&mut self) {
drop(self.sender.take());
for worker in self.workers.drain(..) {
println!("Shutting down worker {}", worker.id);
worker.thread.join().unwrap();
}
}
}
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || {
loop {
let job = receiver.lock().unwrap().recv().unwrap();
println!("Worker {id} got a job; executing.");
job();
}
});
Worker { id, thread }
}
}
丢弃 sender 会关闭通道,这表明将不再发送任何消息运行。当这种情况发生时,Worker 实例在无限循环中执行的所有 recv 调用都将返回错误。在示例 21-24 中,我们更改 Worker 循环,使其在那种情况下优雅地退出循环,这意味着当 ThreadPool 的 drop 实现对线程调用 join 时,线程将会完成。
Dropping sender closes the channel, which indicates no more messages will be
sent. When that happens, all the calls to recv that the Worker instances do
in the infinite loop will return an error. In Listing 21-24, we change the
Worker loop to gracefully exit the loop in that case, which means the threads
will finish when the ThreadPool drop implementation calls join on them.
use std::{
sync::{Arc, Mutex, mpsc},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: Option<mpsc::Sender<Job>>,
}
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool {
workers,
sender: Some(sender),
}
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.as_ref().unwrap().send(job).unwrap();
}
}
impl Drop for ThreadPool {
fn drop(&mut self) {
drop(self.sender.take());
for worker in self.workers.drain(..) {
println!("Shutting down worker {}", worker.id);
worker.thread.join().unwrap();
}
}
}
struct Worker {
id: usize,
thread: thread::JoinHandle<()>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || {
loop {
let message = receiver.lock().unwrap().recv();
match message {
Ok(job) => {
println!("Worker {id} got a job; executing.");
job();
}
Err(_) => {
println!("Worker {id} disconnected; shutting down.");
break;
}
}
}
});
Worker { id, thread }
}
}
为了看到代码的运行效果,让我们修改 main,使其在优雅地关闭服务器之前仅接受两个请求,如示例 21-25 所示。
To see this code in action, let’s modify main to accept only two requests
before gracefully shutting down the server, as shown in Listing 21-25.
use hello::ThreadPool;
use std::{
fs,
io::{BufReader, prelude::*},
net::{TcpListener, TcpStream},
thread,
time::Duration,
};
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
let pool = ThreadPool::new(4);
for stream in listener.incoming().take(2) {
let stream = stream.unwrap();
pool.execute(|| {
handle_connection(stream);
});
}
println!("Shutting down.");
}
fn handle_connection(mut stream: TcpStream) {
let buf_reader = BufReader::new(&stream);
let request_line = buf_reader.lines().next().unwrap().unwrap();
let (status_line, filename) = match &request_line[..] {
"GET / HTTP/1.1" => ("HTTP/1.1 200 OK", "hello.html"),
"GET /sleep HTTP/1.1" => {
thread::sleep(Duration::from_secs(5));
("HTTP/1.1 200 OK", "hello.html")
}
_ => ("HTTP/1.1 404 NOT FOUND", "404.html"),
};
let contents = fs::read_to_string(filename).unwrap();
let length = contents.len();
let response =
format!("{status_line}\r\nContent-Length: {length}\r\n\r\n{contents}");
stream.write_all(response.as_bytes()).unwrap();
}
你不会希望现实世界中的 Web 服务器在处理完两个请求后就关闭。这段代码只是演示优雅停机和清理工作正常。
You wouldn’t want a real-world web server to shut down after serving only two requests. This code just demonstrates that the graceful shutdown and cleanup is in working order.
take 方法定义在 Iterator trait 中,它将迭代限制在最多前两个项。ThreadPool 将在 main 结束时超出作用域,随后 drop 实现将运行。
The take method is defined in the Iterator trait and limits the iteration
to the first two items at most. The ThreadPool will go out of scope at the
end of main, and the drop implementation will run.
使用 cargo run 启动服务器并发出三个请求。第三个请求应该会报错,在你的终端中,你应该会看到类似于这样的输出:
Start the server with cargo run and make three requests. The third request
should error, and in your terminal, you should see output similar to this:
$ cargo run
Compiling hello v0.1.0 (file:///projects/hello)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.41s
Running `target/debug/hello`
Worker 0 got a job; executing.
Shutting down.
Shutting down worker 0
Worker 3 got a job; executing.
Worker 1 disconnected; shutting down.
Worker 2 disconnected; shutting down.
Worker 3 disconnected; shutting down.
Worker 0 disconnected; shutting down.
Shutting down worker 1
Shutting down worker 2
Shutting down worker 3
你可能会看到打印出的 Worker ID 和消息顺序有所不同。我们可以从这些消息中看到这段代码是如何工作的:Worker 实例 0 和 3 获取了前两个请求。服务器在第二个连接后停止接受连接,并且 ThreadPool 上的 Drop 实现甚至在 Worker 3 开始其工作之前就开始执行。丢弃 sender 会断开所有 Worker 实例的连接并告诉它们关机。Worker 实例在断开连接时各打印一条消息,然后线程池调用 join 以等待每个 Worker 线程完成。
You might see a different ordering of Worker IDs and messages printed. We can
see how this code works from the messages: Worker instances 0 and 3 got the
first two requests. The server stopped accepting connections after the second
connection, and the Drop implementation on ThreadPool starts executing
before Worker 3 even starts its job. Dropping the sender disconnects all the
Worker instances and tells them to shut down. The Worker instances each
print a message when they disconnect, and then the thread pool calls join to
wait for each Worker thread to finish.
请注意这次特定执行的一个有趣方面:ThreadPool 丢弃了 sender,但在任何 Worker 收到错误之前,我们就尝试 join Worker 0。Worker 0 尚未从 recv 获得错误,因此主线程发生阻塞,等待 Worker 0 完成。与此同时,Worker 3 收到一个任务,然后所有线程都收到了一个错误。当 Worker 0 完成时,主线程等待其余 Worker 实例完成。在那时,它们都已退出循环并停止。
Notice one interesting aspect of this particular execution: The ThreadPool
dropped the sender, and before any Worker received an error, we tried to
join Worker 0. Worker 0 had not yet gotten an error from recv, so the main
thread blocked, waiting for Worker 0 to finish. In the meantime, Worker 3
received a job and then all threads received an error. When Worker 0 finished,
the main thread waited for the rest of the Worker instances to finish. At that
point, they had all exited their loops and stopped.
恭喜!我们现在完成了我们的项目;我们有一个基本的 Web 服务器,它使用线程池进行异步响应。我们能够对服务器执行优雅停机,清理池中的所有线程。
Congrats! We’ve now completed our project; we have a basic web server that uses a thread pool to respond asynchronously. We’re able to perform a graceful shutdown of the server, which cleans up all the threads in the pool.
以下是完整的代码供参考:
Here’s the full code for reference:
use hello::ThreadPool;
use std::{
fs,
io::{BufReader, prelude::*},
net::{TcpListener, TcpStream},
thread,
time::Duration,
};
fn main() {
let listener = TcpListener::bind("127.0.0.1:7878").unwrap();
let pool = ThreadPool::new(4);
for stream in listener.incoming().take(2) {
let stream = stream.unwrap();
pool.execute(|| {
handle_connection(stream);
});
}
println!("Shutting down.");
}
fn handle_connection(mut stream: TcpStream) {
let buf_reader = BufReader::new(&stream);
let request_line = buf_reader.lines().next().unwrap().unwrap();
let (status_line, filename) = match &request_line[..] {
"GET / HTTP/1.1" => ("HTTP/1.1 200 OK", "hello.html"),
"GET /sleep HTTP/1.1" => {
thread::sleep(Duration::from_secs(5));
("HTTP/1.1 200 OK", "hello.html")
}
_ => ("HTTP/1.1 404 NOT FOUND", "404.html"),
};
let contents = fs::read_to_string(filename).unwrap();
let length = contents.len();
let response =
format!("{status_line}\r\nContent-Length: {length}\r\n\r\n{contents}");
stream.write_all(response.as_bytes()).unwrap();
}
use std::{
sync::{Arc, Mutex, mpsc},
thread,
};
pub struct ThreadPool {
workers: Vec<Worker>,
sender: Option<mpsc::Sender<Job>>,
}
type Job = Box<dyn FnOnce() + Send + 'static>;
impl ThreadPool {
/// Create a new ThreadPool.
///
/// The size is the number of threads in the pool.
///
/// # Panics
///
/// The `new` function will panic if the size is zero.
pub fn new(size: usize) -> ThreadPool {
assert!(size > 0);
let (sender, receiver) = mpsc::channel();
let receiver = Arc::new(Mutex::new(receiver));
let mut workers = Vec::with_capacity(size);
for id in 0..size {
workers.push(Worker::new(id, Arc::clone(&receiver)));
}
ThreadPool {
workers,
sender: Some(sender),
}
}
pub fn execute<F>(&self, f: F)
where
F: FnOnce() + Send + 'static,
{
let job = Box::new(f);
self.sender.as_ref().unwrap().send(job).unwrap();
}
}
impl Drop for ThreadPool {
fn drop(&mut self) {
drop(self.sender.take());
for worker in &mut self.workers {
println!("Shutting down worker {}", worker.id);
if let Some(thread) = worker.thread.take() {
thread.join().unwrap();
}
}
}
}
struct Worker {
id: usize,
thread: Option<thread::JoinHandle<()>>,
}
impl Worker {
fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker {
let thread = thread::spawn(move || {
loop {
let message = receiver.lock().unwrap().recv();
match message {
Ok(job) => {
println!("Worker {id} got a job; executing.");
job();
}
Err(_) => {
println!("Worker {id} disconnected; shutting down.");
break;
}
}
}
});
Worker {
id,
thread: Some(thread),
}
}
}
我们在这里还可以做得更多!如果你想继续增强这个项目,这里有一些想法:
We could do more here! If you want to continue enhancing this project, here are some ideas:
-
为
ThreadPool及其公共方法添加更多文档。 -
Add more documentation to
ThreadPooland its public methods. -
为库的功能添加测试。
-
Add tests of the library’s functionality.
-
将
unwrap调用更改为更健壮的错误处理。 -
Change calls to
unwrapto more robust error handling. -
使用
ThreadPool执行 Web 请求服务之外的某些任务。 -
Use
ThreadPoolto perform some task other than serving web requests. -
在 crates.io 上找一个线程池 crate,并改用该 crate 实现类似的 Web 服务器。然后,将其 API 和健壮性与我们实现的线程池进行比较。
-
Find a thread pool crate on crates.io and implement a similar web server using the crate instead. Then, compare its API and robustness to the thread pool we implemented.
总结
Summary
做得好!你已经读到了本书的结尾!我们要感谢你加入我们的 Rust 之旅。你现在已经准备好实现你自己的 Rust 项目并协助他人的项目了。请记住,有一个热情的 Rustacean 社区,他们非常愿意帮助你解决在 Rust 旅程中遇到的任何挑战。
Well done! You’ve made it to the end of the book! We want to thank you for joining us on this tour of Rust. You’re now ready to implement your own Rust projects and help with other people’s projects. Keep in mind that there is a welcoming community of other Rustaceans who would love to help you with any challenges you encounter on your Rust journey.