切片类型 - Rust 程序设计语言简体中文版

The Slice Type

“切片”（Slices）允许你引用集合中一段连续的元素序列。切片是一种引用，因此它不具有所有权。

Slices let you reference a contiguous sequence of elements in a collection. A slice is a kind of reference, so it does not have ownership.

这里有一个小的编程问题：编写一个函数，接收一个由空格分隔的单词字符串，并返回它在该字符串中找到的第一个单词。如果函数在字符串中没有找到空格，则说明整个字符串就是一个单词，因此应该返回整个字符串。

Here’s a small programming problem: Write a function that takes a string of words separated by spaces and returns the first word it finds in that string. If the function doesn’t find a space in the string, the whole string must be one word, so the entire string should be returned.

注意：为了介绍切片，我们在本节中仅假设 ASCII 编码；关于 UTF-8 处理的更透彻讨论见第 8 章的“使用字符串存储 UTF-8 编码的文本”部分。

Note: For the purposes of introducing slices, we are assuming ASCII only in this section; a more thorough discussion of UTF-8 handling is in the “Storing UTF-8 Encoded Text with Strings” section of Chapter 8.

让我们看看在不使用切片的情况下如何编写此函数的签名，以理解切片将解决的问题：

Let’s work through how we’d write the signature of this function without using slices, to understand the problem that slices will solve:

fn first_word(s: &String) -> ?

first_word 函数有一个 &String 类型的参数。我们不需要所有权，所以这没问题。（在惯用的 Rust 中，除非需要，否则函数不会获取参数的所有权，随着深入学习，其原因会变得更加清晰。）但是我们应该返回什么呢？我们并没有一个真正的方法来谈论字符串的“一部分”。但是，我们可以返回由空格指示的单词结尾的索引。让我们尝试一下，如示例 4-7 所示。

The first_word function has a parameter of type &String. We don’t need ownership, so this is fine. (In idiomatic Rust, functions do not take ownership of their arguments unless they need to, and the reasons for that will become clear as we keep going.) But what should we return? We don’t really have a way to talk about part of a string. However, we could return the index of the end of the word, indicated by a space. Let’s try that, as shown in Listing 4-7.

fn first_word(s: &String) -> usize {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return i;
        }
    }

    s.len()
}

fn main() {}

因为我们需要逐个检查 String 的元素并判断是否为空格，所以我们将使用 as_bytes 方法将 String 转换为字节数组。

Because we need to go through the String element by element and check whether a value is a space, we’ll convert our String to an array of bytes using the as_bytes method.

fn first_word(s: &String) -> usize {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return i;
        }
    }

    s.len()
}

fn main() {}

接下来，我们使用 iter 方法在字节数组上创建一个迭代器：

Next, we create an iterator over the array of bytes using the iter method:

fn first_word(s: &String) -> usize {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return i;
        }
    }

    s.len()
}

fn main() {}

我们将在第 13 章中更详细地讨论迭代器。现在，你只需知道 iter 是一个返回集合中每个元素的方法，而 enumerate 包装了 iter 的结果，并将每个元素作为元组的一部分返回。enumerate 返回的元组的第一个元素是索引，第二个元素是该元素的引用。这比我们自己计算索引要方便一些。

We’ll discuss iterators in more detail in Chapter 13. For now, know that iter is a method that returns each element in a collection and that enumerate wraps the result of iter and returns each element as part of a tuple instead. The first element of the tuple returned from enumerate is the index, and the second element is a reference to the element. This is a bit more convenient than calculating the index ourselves.

因为 enumerate 方法返回一个元组，所以我们可以使用模式来解构该元组。我们将在第 6 章中更多地讨论模式。在 for 循环中，我们指定了一个模式，其中 i 代表元组中的索引，而 &item 代表元组中的单个字节。因为我们从 .iter().enumerate() 获得的是元素的引用，所以我们在模式中使用 &。

Because the enumerate method returns a tuple, we can use patterns to destructure that tuple. We’ll be discussing patterns more in Chapter 6. In the for loop, we specify a pattern that has i for the index in the tuple and &item for the single byte in the tuple. Because we get a reference to the element from .iter().enumerate(), we use & in the pattern.

在 for 循环内部，我们使用字节字面量语法搜索代表空格的字节。如果找到了空格，我们就返回该位置。否则，我们通过使用 s.len() 返回字符串的长度。

Inside the for loop, we search for the byte that represents the space by using the byte literal syntax. If we find a space, we return the position. Otherwise, we return the length of the string by using s.len().

fn first_word(s: &String) -> usize {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return i;
        }
    }

    s.len()
}

fn main() {}

现在我们有了一种找出字符串中第一个单词结尾索引的方法，但这里有一个问题。我们单独返回了一个 usize，但它只有在 &String 的上下文中才是有意义的数字。换句话说，因为它是一个与 String 分离的值，所以无法保证它在将来仍然有效。考虑示例 4-8 中使用了示例 4-7 中 first_word 函数的程序。

We now have a way to find out the index of the end of the first word in the string, but there’s a problem. We’re returning a usize on its own, but it’s only a meaningful number in the context of the &String. In other words, because it’s a separate value from the String, there’s no guarantee that it will still be valid in the future. Consider the program in Listing 4-8 that uses the first_word function from Listing 4-7.

fn first_word(s: &String) -> usize {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return i;
        }
    }

    s.len()
}

fn main() {
    let mut s = String::from("hello world");

    let word = first_word(&s); // word will get the value 5

    s.clear(); // this empties the String, making it equal to ""

    // word still has the value 5 here, but s no longer has any content that we
    // could meaningfully use with the value 5, so word is now totally invalid!
}

这个程序编译时不会报错，即使我们在调用 s.clear() 之后使用 word 也是如此。因为 word 与 s 的状态完全没有联系，所以 word 仍然包含值 5。我们可以将值 5 与变量 s 配合使用，尝试提取出第一个单词，但这将是一个错误，因为自我们将 5 保存到 word 以来，s 的内容已经发生了变化。

This program compiles without any errors and would also do so if we used word after calling s.clear(). Because word isn’t connected to the state of s at all, word still contains the value 5. We could use that value 5 with the variable s to try to extract the first word out, but this would be a bug because the contents of s have changed since we saved 5 in word.

必须担心 word 中的索引与 s 中的数据不同步是乏味且容易出错的！如果我们编写一个 second_word 函数，管理这些索引会变得更加脆弱。它的签名必须看起来像这样：

Having to worry about the index in word getting out of sync with the data in s is tedious and error-prone! Managing these indices is even more brittle if we write a second_word function. Its signature would have to look like this:

fn second_word(s: &String) -> (usize, usize) {

现在我们正在跟踪起始索引“和”结束索引，我们甚至有更多从特定状态的数据计算出来的、但与该状态完全没有关联的值。我们有三个不相关的变量散落在各处，需要保持同步。

Now we’re tracking a starting and an ending index, and we have even more values that were calculated from data in a particular state but aren’t tied to that state at all. We have three unrelated variables floating around that need to be kept in sync.

幸运的是，Rust 有一个解决这个问题的方案：字符串切片（string slices）。

Luckily, Rust has a solution to this problem: string slices.

字符串切片

String Slices

“字符串切片”（string slice）是对 String 中连续元素序列的引用，它看起来像这样：

A string slice is a reference to a contiguous sequence of the elements of a String, and it looks like this:

fn main() {
    let s = String::from("hello world");

    let hello = &s[0..5];
    let world = &s[6..11];
}

hello 不是对整个 String 的引用，而是对 String 一部分的引用，这由额外的 [0..5] 指定。我们通过在方括号内指定 [starting_index..ending_index] 来创建切片，其中 starting_index 是切片中的第一个位置，而 ending_index 比切片中的最后一个位置大一。在内部，切片数据结构存储切片的起始位置和长度，长度对应于 ending_index 减去 starting_index。所以，在 let world = &s[6..11]; 的情况下，world 将是一个包含指向 s 索引 6 处字节的指针以及长度值 5 的切片。

Rather than a reference to the entire String, hello is a reference to a portion of the String, specified in the extra [0..5] bit. We create slices using a range within square brackets by specifying [starting_index..ending_index], where starting_index is the first position in the slice and ending_index is one more than the last position in the slice. Internally, the slice data structure stores the starting position and the length of the slice, which corresponds to ending_index minus starting_index. So, in the case of let world = &s[6..11];, world would be a slice that contains a pointer to the byte at index 6 of s with a length value of 5.

图 4-7 的图示显示了这一点。

Figure 4-7 shows this in a diagram.

三张表格：一张代表 s 的栈数据，它指向堆上字符串数据 "hello world" 索引 0 处的字节。第三张表格代表切片 world 的栈数据，它有一个长度值 5，并指向堆数据表格的字节 6。

图 4-7：引用 String 一部分的字符串切片 Figure 4-7: A string slice referring to part of a String

使用 Rust 的 .. 范围语法，如果你想从索引 0 开始，可以省略两个点之前的值。换句话说，以下写法是等价的：

With Rust’s .. range syntax, if you want to start at index 0, you can drop the value before the two periods. In other words, these are equal:

#![allow(unused)]
fn main() {
let s = String::from("hello");

let slice = &s[0..2];
let slice = &s[..2];
}

同理，如果你的切片包含 String 的最后一个字节，你可以省略末尾的数字。这意味着以下写法是等价的：

By the same token, if your slice includes the last byte of the String, you can drop the trailing number. That means these are equal:

#![allow(unused)]
fn main() {
let s = String::from("hello");

let len = s.len();

let slice = &s[3..len];
let slice = &s[3..];
}

你也可以省略两个值来获取整个字符串的切片。所以，以下写法是等价的：

You can also drop both values to take a slice of the entire string. So, these are equal:

#![allow(unused)]
fn main() {
let s = String::from("hello");

let len = s.len();

let slice = &s[0..len];
let slice = &s[..];
}

注意：字符串切片范围索引必须位于有效的 UTF-8 字符边界处。如果你尝试在多字节字符中间创建字符串切片，你的程序将因错误而退出。

Note: String slice range indices must occur at valid UTF-8 character boundaries. If you attempt to create a string slice in the middle of a multibyte character, your program will exit with an error.

了解了这些信息后，让我们重写 first_word 以返回一个切片。代表“字符串切片”的类型写作 &str：

With all this information in mind, let’s rewrite first_word to return a slice. The type that signifies “string slice” is written as &str:

fn first_word(s: &String) -> &str {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return &s[0..i];
        }
    }

    &s[..]
}

fn main() {}

我们以与示例 4-7 相同的方式获取单词结尾的索引，即寻找第一次出现的空格。当我们找到空格时，我们使用字符串的开头和空格索引作为起始和结束索引返回一个字符串切片。

We get the index for the end of the word the same way we did in Listing 4-7, by looking for the first occurrence of a space. When we find a space, we return a string slice using the start of the string and the index of the space as the starting and ending indices.

现在当我们调用 first_word 时，我们得到一个与底层数据相关联的单一值。该值由切片起始点的引用和切片中的元素数量组成。

Now when we call first_word, we get back a single value that is tied to the underlying data. The value is made up of a reference to the starting point of the slice and the number of elements in the slice.

返回切片对于 second_word 函数也同样有效：

Returning a slice would also work for a second_word function:

fn second_word(s: &String) -> &str {

我们现在拥有了一个简单直观的 API，它更不容易出错，因为编译器将确保对 String 的引用保持有效。还记得示例 4-8 程序中的错误吗？当时我们获取了第一个单词结尾的索引，但随后清空了字符串，导致索引无效。那段代码逻辑上是不正确的，但没有显示任何即时错误。如果我们继续在清空后的字符串上使用第一个单词的索引，问题稍后就会显现。切片使这种错误变得不可能，并让我们更早地知道代码存在问题。使用切片版本的 first_word 将抛出编译时错误：

We now have a straightforward API that’s much harder to mess up because the compiler will ensure that the references into the String remain valid. Remember the bug in the program in Listing 4-8, when we got the index to the end of the first word but then cleared the string so our index was invalid? That code was logically incorrect but didn’t show any immediate errors. The problems would show up later if we kept trying to use the first word index with an emptied string. Slices make this bug impossible and let us know much sooner that we have a problem with our code. Using the slice version of first_word will throw a compile-time error:

fn first_word(s: &String) -> &str {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return &s[0..i];
        }
    }

    &s[..]
}

fn main() {
    let mut s = String::from("hello world");

    let word = first_word(&s);

    s.clear(); // error!

    println!("the first word is: {word}");
}

这是编译器错误：

Here’s the compiler error:

$ cargo run
   Compiling ownership v0.1.0 (file:///projects/ownership)
error[E0502]: cannot borrow `s` as mutable because it is also borrowed as immutable
  --> src/main.rs:18:5
   |
16 |     let word = first_word(&s);
   |                           -- immutable borrow occurs here
17 |
18 |     s.clear(); // error!
   |     ^^^^^^^^^ mutable borrow occurs here
19 |
20 |     println!("the first word is: {word}");
   |                                   ---- immutable borrow later used here

For more information about this error, try `rustc --explain E0502`.
error: could not compile `ownership` (bin "ownership") due to 1 previous error

回想借用规则，如果我们对某样东西有一个不可变引用，我们就不能同时也获得一个可变引用。因为 clear 需要截断 String，所以它需要获得一个可变引用。在调用 clear 之后的 println! 使用了 word 中的引用，所以此时不可变引用必须仍然有效。Rust 不允许 clear 中的可变引用和 word 中的不可变引用同时存在，编译失败。Rust 不仅使我们的 API 更易于使用，还在编译时消除了一整类错误！

Recall from the borrowing rules that if we have an immutable reference to something, we cannot also take a mutable reference. Because clear needs to truncate the String, it needs to get a mutable reference. The println! after the call to clear uses the reference in word, so the immutable reference must still be active at that point. Rust disallows the mutable reference in clear and the immutable reference in word from existing at the same time, and compilation fails. Not only has Rust made our API easier to use, but it has also eliminated an entire class of errors at compile time!

字符串字面量作为切片

String Literals as Slices

回想一下我们谈到的字符串字面量存储在二进制文件内部。现在我们了解了切片，就可以正确理解字符串字面量了：

Recall that we talked about string literals being stored inside the binary. Now that we know about slices, we can properly understand string literals:

#![allow(unused)]
fn main() {
let s = "Hello, world!";
}

这里 s 的类型是 &str：它是一个指向二进制文件特定点的切片。这也是为什么字符串字面量是不可变的；&str 是一个不可变引用。

The type of s here is &str: It’s a slice pointing to that specific point of the binary. This is also why string literals are immutable; &str is an immutable reference.

字符串切片作为参数

String Slices as Parameters

既然知道可以获取字面量和 String 值的切片，这引导我们对 first_word 进行最后一项改进，那就是它的签名：

Knowing that you can take slices of literals and String values leads us to one more improvement on first_word, and that’s its signature:

fn first_word(s: &String) -> &str {

经验丰富的 Rustacean 会改写为示例 4-9 所示的签名，因为它允许我们在 &String 值和 &str 值上使用相同的函数。

A more experienced Rustacean would write the signature shown in Listing 4-9 instead because it allows us to use the same function on both &String values and &str values.

fn first_word(s: &str) -> &str {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return &s[0..i];
        }
    }

    &s[..]
}

fn main() {
    let my_string = String::from("hello world");

    // `first_word` works on slices of `String`s, whether partial or whole.
    let word = first_word(&my_string[0..6]);
    let word = first_word(&my_string[..]);
    // `first_word` also works on references to `String`s, which are equivalent
    // to whole slices of `String`s.
    let word = first_word(&my_string);

    let my_string_literal = "hello world";

    // `first_word` works on slices of string literals, whether partial or
    // whole.
    let word = first_word(&my_string_literal[0..6]);
    let word = first_word(&my_string_literal[..]);

    // Because string literals *are* string slices already,
    // this works too, without the slice syntax!
    let word = first_word(my_string_literal);
}

如果我们有一个字符串切片，我们可以直接传递它。如果我们有一个 String，我们可以传递 String 的切片或对 String 的引用。这种灵活性利用了“解引用强制转换”（deref coercions），这是我们将在第 15 章的“在函数和方法中使用解引用强制转换”部分介绍的功能。

If we have a string slice, we can pass that directly. If we have a String, we can pass a slice of the String or a reference to the String. This flexibility takes advantage of deref coercions, a feature we will cover in the “Using Deref Coercions in Functions and Methods” section of Chapter 15.

定义一个接受字符串切片而不是 String 引用的函数，可以使我们的 API 在不丢失任何功能的情况下更加通用和有用：

Defining a function to take a string slice instead of a reference to a String makes our API more general and useful without losing any functionality:

fn first_word(s: &str) -> &str {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return &s[0..i];
        }
    }

    &s[..]
}

fn main() {
    let my_string = String::from("hello world");

    // `first_word` works on slices of `String`s, whether partial or whole.
    let word = first_word(&my_string[0..6]);
    let word = first_word(&my_string[..]);
    // `first_word` also works on references to `String`s, which are equivalent
    // to whole slices of `String`s.
    let word = first_word(&my_string);

    let my_string_literal = "hello world";

    // `first_word` works on slices of string literals, whether partial or
    // whole.
    let word = first_word(&my_string_literal[0..6]);
    let word = first_word(&my_string_literal[..]);

    // Because string literals *are* string slices already,
    // this works too, without the slice syntax!
    let word = first_word(my_string_literal);
}

其他切片

Other Slices

正如你可能想象的那样，字符串切片是专门针对字符串的。但也有更通用的切片类型。考虑这个数组：

String slices, as you might imagine, are specific to strings. But there’s a more general slice type too. Consider this array:

#![allow(unused)]
fn main() {
let a = [1, 2, 3, 4, 5];
}

正如我们可能想要引用字符串的一部分一样，我们也可能想要引用数组的一部分。我们会这样做：

Just as we might want to refer to part of a string, we might want to refer to part of an array. We’d do so like this:

#![allow(unused)]
fn main() {
let a = [1, 2, 3, 4, 5];

let slice = &a[1..3];

assert_eq!(slice, &[2, 3]);
}

该切片的类型是 &[i32]。它的工作方式与字符串切片相同，通过存储首个元素的引用和长度。你将对各种其他集合使用此类切片。我们将在第 8 章讨论 vector 时详细讨论这些集合。

This slice has the type &[i32]. It works the same way as string slices do, by storing a reference to the first element and a length. You’ll use this kind of slice for all sorts of other collections. We’ll discuss these collections in detail when we talk about vectors in Chapter 8.

总结

Summary

所有权、借用和切片的概念确保了 Rust 程序在编译时的内存安全。Rust 语言赋予你像其他系统编程语言一样控制内存使用的权力。但是，让数据的所有者在所有者离开作用域时自动清理该数据，意味着你不需要为了获得这种控制权而编写和调试额外的代码。

The concepts of ownership, borrowing, and slices ensure memory safety in Rust programs at compile time. The Rust language gives you control over your memory usage in the same way as other systems programming languages. But having the owner of data automatically clean up that data when the owner goes out of scope means you don’t have to write and debug extra code to get this control.

所有权影响了 Rust 的许多其他部分的运作方式，因此我们将在本书的后续章节中进一步讨论这些概念。让我们继续进入第 5 章，看看如何在 struct（结构体）中对多块数据进行分组。

Ownership affects how lots of other parts of Rust work, so we’ll talk about these concepts further throughout the rest of the book. Let’s move on to Chapter 5 and look at grouping pieces of data together in a struct.

Keyboard shortcuts

Rust 程序设计语言 简体中文版

Rust 程序设计语言简体中文版