改进我们的 I/O 项目 - Rust 程序设计语言简体中文版

Improving Our I/O Project

有了关于迭代器的新知识，我们可以通过使用迭代器使代码中某些部分更清晰、更简洁，从而改进第 12 章中的 I/O 项目。让我们看看迭代器如何改进 Config::build 函数和 search 函数的实现。

With this new knowledge about iterators, we can improve the I/O project in Chapter 12 by using iterators to make places in the code clearer and more concise. Let’s look at how iterators can improve our implementation of the Config::build function and the search function.

使用迭代器去掉 `clone`

Removing a `clone` Using an Iterator

在示例 12-6 中，我们添加了获取 String 值切片的代码，并通过索引切片并克隆值来创建 Config 结构体的实例，从而允许 Config 结构体拥有这些值。在示例 13-17 中，我们重现了示例 12-23 中 Config::build 函数的实现。

In Listing 12-6, we added code that took a slice of String values and created an instance of the Config struct by indexing into the slice and cloning the values, allowing the Config struct to own those values. In Listing 13-17, we’ve reproduced the implementation of the Config::build function as it was in Listing 12-23.

use std::env;
use std::error::Error;
use std::fs;
use std::process;

use minigrep::{search, search_case_insensitive};

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = Config::build(&args).unwrap_or_else(|err| {
        println!("Problem parsing arguments: {err}");
        process::exit(1);
    });

    if let Err(e) = run(config) {
        println!("Application error: {e}");
        process::exit(1);
    }
}

pub struct Config {
    pub query: String,
    pub file_path: String,
    pub ignore_case: bool,
}

impl Config {
    fn build(args: &[String]) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let file_path = args[2].clone();

        let ignore_case = env::var("IGNORE_CASE").is_ok();

        Ok(Config {
            query,
            file_path,
            ignore_case,
        })
    }
}

fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    let results = if config.ignore_case {
        search_case_insensitive(&config.query, &contents)
    } else {
        search(&config.query, &contents)
    };

    for line in results {
        println!("{line}");
    }

    Ok(())
}

当时，我们说不要担心效率低下的 clone 调用，因为我们将来会去掉它们。现在，那个时刻到来了！

At the time, we said not to worry about the inefficient clone calls because we would remove them in the future. Well, that time is now!

我们在这里需要 clone 是因为我们在参数 args 中有一个带有 String 元素的切片，但 build 函数并不拥有 args。为了返回一个 Config 实例的所有权，我们必须克隆 Config 的 query 和 file_path 字段中的值，以便 Config 实例可以拥有其值。

We needed clone here because we have a slice with String elements in the parameter args, but the build function doesn’t own args. To return ownership of a Config instance, we had to clone the values from the query and file_path fields of Config so that the Config instance can own its values.

有了关于迭代器的新知识，我们可以将 build 函数改为接受一个迭代器的所有权作为其参数，而不是借用一个切片。我们将使用迭代器功能，而不是检查切片长度并索引特定位置的代码。由于迭代器将访问这些值，这将阐明 Config::build 函数正在执行的操作。

With our new knowledge about iterators, we can change the build function to take ownership of an iterator as its argument instead of borrowing a slice. We’ll use the iterator functionality instead of the code that checks the length of the slice and indexes into specific locations. This will clarify what the Config::build function is doing because the iterator will access the values.

一旦 Config::build 获取了迭代器的所有权并停止使用借用的索引操作，我们就可以将 String 值从迭代器移动到 Config 中，而不是调用 clone 并进行新的内存分配。

Once Config::build takes ownership of the iterator and stops using indexing operations that borrow, we can move the String values from the iterator into Config rather than calling clone and making a new allocation.

直接使用返回的迭代器

Using the Returned Iterator Directly

打开 I/O 项目的 src/main.rs 文件，它看起来应该像这样：

Open your I/O project’s src/main.rs file, which should look like this:

文件名：src/main.rs Filename: src/main.rs

use std::env;
use std::error::Error;
use std::fs;
use std::process;

use minigrep::{search, search_case_insensitive};

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = Config::build(&args).unwrap_or_else(|err| {
        eprintln!("Problem parsing arguments: {err}");
        process::exit(1);
    });

    // --snip--

    if let Err(e) = run(config) {
        eprintln!("Application error: {e}");
        process::exit(1);
    }
}

pub struct Config {
    pub query: String,
    pub file_path: String,
    pub ignore_case: bool,
}

impl Config {
    fn build(args: &[String]) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let file_path = args[2].clone();

        let ignore_case = env::var("IGNORE_CASE").is_ok();

        Ok(Config {
            query,
            file_path,
            ignore_case,
        })
    }
}

fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    let results = if config.ignore_case {
        search_case_insensitive(&config.query, &contents)
    } else {
        search(&config.query, &contents)
    };

    for line in results {
        println!("{line}");
    }

    Ok(())
}

我们首先将示例 12-24 中 main 函数的开头部分更改为示例 13-18 中的代码，这次使用了迭代器。这在我们也更新 Config::build 之前是无法编译的。

We’ll first change the start of the main function that we had in Listing 12-24 to the code in Listing 13-18, which this time uses an iterator. This won’t compile until we update Config::build as well.

use std::env;
use std::error::Error;
use std::fs;
use std::process;

use minigrep::{search, search_case_insensitive};

fn main() {
    let config = Config::build(env::args()).unwrap_or_else(|err| {
        eprintln!("Problem parsing arguments: {err}");
        process::exit(1);
    });

    // --snip--

    if let Err(e) = run(config) {
        eprintln!("Application error: {e}");
        process::exit(1);
    }
}

pub struct Config {
    pub query: String,
    pub file_path: String,
    pub ignore_case: bool,
}

impl Config {
    fn build(args: &[String]) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let file_path = args[2].clone();

        let ignore_case = env::var("IGNORE_CASE").is_ok();

        Ok(Config {
            query,
            file_path,
            ignore_case,
        })
    }
}

fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    let results = if config.ignore_case {
        search_case_insensitive(&config.query, &contents)
    } else {
        search(&config.query, &contents)
    };

    for line in results {
        println!("{line}");
    }

    Ok(())
}

env::args 函数返回一个迭代器！现在我们不再将迭代器的值收集到 vector 中然后将切片传递给 Config::build，而是直接将 env::args 返回的迭代器的所有权传递给 Config::build。

The env::args function returns an iterator! Rather than collecting the iterator values into a vector and then passing a slice to Config::build, now we’re passing ownership of the iterator returned from env::args to Config::build directly.

接下来，我们需要更新 Config::build 的定义。让我们将 Config::build 的签名改为示例 13-19 所示的样子。由于我们需要更新函数体，这仍然无法编译。

Next, we need to update the definition of Config::build. Let’s change the signature of Config::build to look like Listing 13-19. This still won’t compile, because we need to update the function body.

use std::env;
use std::error::Error;
use std::fs;
use std::process;

use minigrep::{search, search_case_insensitive};

fn main() {
    let config = Config::build(env::args()).unwrap_or_else(|err| {
        eprintln!("Problem parsing arguments: {err}");
        process::exit(1);
    });

    if let Err(e) = run(config) {
        eprintln!("Application error: {e}");
        process::exit(1);
    }
}

pub struct Config {
    pub query: String,
    pub file_path: String,
    pub ignore_case: bool,
}

impl Config {
    fn build(
        mut args: impl Iterator<Item = String>,
    ) -> Result<Config, &'static str> {
        // --snip--
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let file_path = args[2].clone();

        let ignore_case = env::var("IGNORE_CASE").is_ok();

        Ok(Config {
            query,
            file_path,
            ignore_case,
        })
    }
}

fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    let results = if config.ignore_case {
        search_case_insensitive(&config.query, &contents)
    } else {
        search(&config.query, &contents)
    };

    for line in results {
        println!("{line}");
    }

    Ok(())
}

env::args 函数的标准库文档显示它返回的迭代器类型是 std::env::Args，该类型实现了 Iterator trait 并返回 String 值。

The standard library documentation for the env::args function shows that the type of the iterator it returns is std::env::Args, and that type implements the Iterator trait and returns String values.

我们更新了 Config::build 函数的签名，使参数 args 具有带 trait 约束 impl Iterator<Item = String> 的泛型类型，而不是 &[String]。这种我们在第 10 章 “将 Trait 作为参数”部分讨论过的 impl Trait 语法的用法，意味着 args 可以是任何实现了 Iterator trait 并返回 String 项的类型。

We’ve updated the signature of the Config::build function so that the parameter args has a generic type with the trait bounds impl Iterator<Item = String> instead of &[String]. This usage of the impl Trait syntax we discussed in the “Using Traits as Parameters” section of Chapter 10 means that args can be any type that implements the Iterator trait and returns String items.

因为我们要获取 args 的所有权，并且我们将通过对其进行迭代来修改 args，所以我们可以在 args 参数的说明中添加 mut 关键字以使其可变。

Because we’re taking ownership of args and we’ll be mutating args by iterating over it, we can add the mut keyword into the specification of the args parameter to make it mutable.

使用 `Iterator` Trait 方法

Using `Iterator` Trait Methods

接下来，我们将修复 Config::build 的函数体。由于 args 实现了 Iterator trait，我们知道可以在其上调用 next 方法！示例 13-20 将示例 12-23 中的代码更新为使用 next 方法。

Next, we’ll fix the body of Config::build. Because args implements the Iterator trait, we know we can call the next method on it! Listing 13-20 updates the code from Listing 12-23 to use the next method.

use std::env;
use std::error::Error;
use std::fs;
use std::process;

use minigrep::{search, search_case_insensitive};

fn main() {
    let config = Config::build(env::args()).unwrap_or_else(|err| {
        eprintln!("Problem parsing arguments: {err}");
        process::exit(1);
    });

    if let Err(e) = run(config) {
        eprintln!("Application error: {e}");
        process::exit(1);
    }
}

pub struct Config {
    pub query: String,
    pub file_path: String,
    pub ignore_case: bool,
}

impl Config {
    fn build(
        mut args: impl Iterator<Item = String>,
    ) -> Result<Config, &'static str> {
        args.next();

        let query = match args.next() {
            Some(arg) => arg,
            None => return Err("Didn't get a query string"),
        };

        let file_path = match args.next() {
            Some(arg) => arg,
            None => return Err("Didn't get a file path"),
        };

        let ignore_case = env::var("IGNORE_CASE").is_ok();

        Ok(Config {
            query,
            file_path,
            ignore_case,
        })
    }
}

fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    let results = if config.ignore_case {
        search_case_insensitive(&config.query, &contents)
    } else {
        search(&config.query, &contents)
    };

    for line in results {
        println!("{line}");
    }

    Ok(())
}

请记住，env::args 返回值的第一个值是程序的名称。我们想要忽略它并获取下一个值，所以首先我们调用 next 并且不对返回值进行任何操作。然后，我们调用 next 来获取我们想要放入 Config 的 query 字段中的值。如果 next 返回 Some，我们使用 match 来提取该值。如果它返回 None，则意味着没有给出足够的参数，我们提前返回一个 Err 值。我们对 file_path 值执行同样的操作。

Remember that the first value in the return value of env::args is the name of the program. We want to ignore that and get to the next value, so first we call next and do nothing with the return value. Then, we call next to get the value we want to put in the query field of Config. If next returns Some, we use a match to extract the value. If it returns None, it means not enough arguments were given, and we return early with an Err value. We do the same thing for the file_path value.

使用迭代器适配器澄清代码

Clarifying Code with Iterator Adapters

我们也可以在 I/O 项目的 search 函数中利用迭代器，示例 13-21 重现了示例 12-19 中的代码。

We can also take advantage of iterators in the search function in our I/O project, which is reproduced here in Listing 13-21 as it was in Listing 12-19.

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.contains(query) {
            results.push(line);
        }
    }

    results
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn one_result() {
        let query = "duct";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.";

        assert_eq!(vec!["safe, fast, productive."], search(query, contents));
    }
}

我们可以使用迭代器适配器方法以更简洁的方式编写这段代码。这样做还可以让我们避免使用可变的中间 results vector。函数式编程风格更倾向于尽量减少可变状态的量，以使代码更清晰。移除可变状态可能会使将来实现并行搜索成为可能，因为我们不必管理对 results vector 的并发访问。示例 13-22 显示了这一变化。

We can write this code in a more concise way using iterator adapter methods. Doing so also lets us avoid having a mutable intermediate results vector. The functional programming style prefers to minimize the amount of mutable state to make code clearer. Removing the mutable state might enable a future enhancement to make searching happen in parallel because we wouldn’t have to manage concurrent access to the results vector. Listing 13-22 shows this change.

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    contents
        .lines()
        .filter(|line| line.contains(query))
        .collect()
}

pub fn search_case_insensitive<'a>(
    query: &str,
    contents: &'a str,
) -> Vec<&'a str> {
    let query = query.to_lowercase();
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.to_lowercase().contains(&query) {
            results.push(line);
        }
    }

    results
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn case_sensitive() {
        let query = "duct";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.
Duct tape.";

        assert_eq!(vec!["safe, fast, productive."], search(query, contents));
    }

    #[test]
    fn case_insensitive() {
        let query = "rUsT";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.
Trust me.";

        assert_eq!(
            vec!["Rust:", "Trust me."],
            search_case_insensitive(query, contents)
        );
    }
}

回想一下，search 函数的目的是返回 contents 中所有包含 query 的行。类似于示例 13-16 中的 filter 示例，这段代码使用 filter 适配器仅保留 line.contains(query) 返回 true 的行。然后我们使用 collect 将匹配的行收集到另一个 vector 中。简单得多！你也可以随意在 search_case_insensitive 函数中进行相同的更改以使用迭代器方法。

Recall that the purpose of the search function is to return all lines in contents that contain the query. Similar to the filter example in Listing 13-16, this code uses the filter adapter to keep only the lines for which line.contains(query) returns true. We then collect the matching lines into another vector with collect. Much simpler! Feel free to make the same change to use iterator methods in the search_case_insensitive function as well.

进一步改进，可以通过移除对 collect 的调用并将返回类型更改为 impl Iterator<Item = &'a str> 来使 search 函数返回一个迭代器，从而使该函数成为一个迭代器适配器。请注意，你还需要更新测试！在进行此更改之前和之后，使用你的 minigrep 工具搜索一个大文件，以观察行为上的差异。在更改之前，程序在收集完所有结果之前不会打印任何结果，但在更改之后，每找到一个匹配行就会打印结果，因为 run 函数中的 for 循环能够利用迭代器的惰性。

For a further improvement, return an iterator from the search function by removing the call to collect and changing the return type to impl Iterator<Item = &'a str> so that the function becomes an iterator adapter. Note that you’ll also need to update the tests! Search through a large file using your minigrep tool before and after making this change to observe the difference in behavior. Before this change, the program won’t print any results until it has collected all of the results, but after the change, the results will be printed as each matching line is found because the for loop in the run function is able to take advantage of the laziness of the iterator.

在循环和迭代器之间做出选择

Choosing Between Loops and Iterators

接下来的逻辑问题是在你自己的代码中应该选择哪种风格，以及为什么：是示例 13-21 中的原始实现，还是示例 13-22 中使用迭代器的版本（假设我们在返回之前收集了所有结果，而不是返回迭代器）。大多数 Rust 程序员更喜欢使用迭代器风格。起初可能比较难掌握，但一旦你熟悉了各种迭代器适配器及其功能，迭代器就会变得更容易理解。代码不再折腾各种循环片段和构建新的 vector，而是专注于循环的高级目标。这抽象掉了一些常见的代码，从而更容易看到这段代码独有的概念，例如迭代器中每个元素必须通过的过滤条件。

The next logical question is which style you should choose in your own code and why: the original implementation in Listing 13-21 or the version using iterators in Listing 13-22 (assuming we’re collecting all the results before returning them rather than returning the iterator). Most Rust programmers prefer to use the iterator style. It’s a bit tougher to get the hang of at first, but once you get a feel for the various iterator adapters and what they do, iterators can be easier to understand. Instead of fiddling with the various bits of looping and building new vectors, the code focuses on the high-level objective of the loop. This abstracts away some of the commonplace code so that it’s easier to see the concepts that are unique to this code, such as the filtering condition each element in the iterator must pass.

但是这两种实现真的是等效的吗？直觉上的假设可能是低级循环会更快。让我们来谈谈性能。

But are the two implementations truly equivalent? The intuitive assumption might be that the lower-level loop will be faster. Let’s talk about performance.

Keyboard shortcuts

Rust 程序设计语言 简体中文版

Rust 程序设计语言简体中文版