使用 String 存储 UTF-8 编码的文本
Storing UTF-8 Encoded Text with Strings
我们在第 4 章讨论过字符串,但现在我们要更深入地研究它们。新 Rust 用户通常会在字符串上遇到困难,原因有三点:Rust 倾向于暴露可能的错误、字符串是比许多程序员想象中更复杂的数据结构,以及 UTF-8。当你从其他编程语言转到 Rust 时,这些因素结合在一起可能会让你觉得困难。
We talked about strings in Chapter 4, but we’ll look at them in more depth now. New Rustaceans commonly get stuck on strings for a combination of three reasons: Rust’s propensity for exposing possible errors, strings being a more complicated data structure than many programmers give them credit for, and UTF-8. These factors combine in a way that can seem difficult when you’re coming from other programming languages.
我们在集合的上下文中讨论字符串,是因为字符串被实现为字节集合,并提供了一些当这些字节被解释为文本时提供有用功能的方法。在本节中,我们将讨论每个集合类型都有的对 String 的操作,例如创建、更新和读取。我们还将讨论 String 与其他集合的不同之处,即由于人类和计算机解释 String 数据的方式不同,对 String 进行索引是如何变得复杂的。
We discuss strings in the context of collections because strings are implemented as a collection of bytes, plus some methods to provide useful functionality when those bytes are interpreted as text. In this section, we’ll talk about the operations on String that every collection type has, such as creating, updating, and reading. We’ll also discuss the ways in which String is different from the other collections, namely, how indexing into a String is complicated by the differences between how people and computers interpret String data.
定义字符串
Defining Strings
我们首先定义术语“字符串”的含义。Rust 的核心语言中只有一种字符串类型,即字符串切片 str,通常以其借用形式 &str 出现。在第 4 章中,我们讨论了字符串切片,它们是对存储在别处的某些 UTF-8 编码的字符串数据的引用。例如,字符串字面量存储在程序的二进制文件中,因此它们是字符串切片。
We’ll first define what we mean by the term string. Rust has only one string type in the core language, which is the string slice str that is usually seen in its borrowed form, &str. In Chapter 4, we talked about string slices, which are references to some UTF-8 encoded string data stored elsewhere. String literals, for example, are stored in the program’s binary and are therefore string slices.
String 类型由 Rust 标准库提供,而不是编码在核心语言中,它是一种可增长、可变、拥有所有权且采用 UTF-8 编码的字符串类型。当 Rust 用户在 Rust 中提到“字符串”时,他们可能指的是 String 或字符串切片 &str 类型,而不仅仅是其中一种。虽然本节主要讨论 String,但 Rust 标准库中大量使用了这两种类型,且 String 和字符串切片都是 UTF-8 编码的。
The String type, which is provided by Rust’s standard library rather than coded into the core language, is a growable, mutable, owned, UTF-8 encoded string type. When Rustaceans refer to “strings” in Rust, they might be referring to either the String or the string slice &str types, not just one of those types. Although this section is largely about String, both types are used heavily in Rust’s standard library, and both String and string slices are UTF-8 encoded.
创建一个新的 String
Creating a New String
Vec<T> 上的许多相同操作也可用于 String,因为 String 实际上被实现为对字节 vector 的包装,并具有一些额外的保证、限制和功能。Vec<T> 和 String 以相同方式工作的函数示例是创建实例的 new 函数,如示例 8-11 所示。
Many of the same operations available with Vec<T> are available with String as well because String is actually implemented as a wrapper around a vector of bytes with some extra guarantees, restrictions, and capabilities. An example of a function that works the same way with Vec<T> and String is the new function to create an instance, shown in Listing 8-11.
fn main() {
let mut s = String::new();
}
这一行创建了一个名为 s 的新的空字符串,然后我们可以向其中加载数据。通常,我们会希望字符串以一些初始数据开始。为此,我们使用 to_string 方法,该方法可用于任何实现了 Display trait 的类型,字符串字面量就是如此。示例 8-12 展示了两个例子。
This line creates a new, empty string called s, into which we can then load data. Often, we’ll have some initial data with which we want to start the string. For that, we use the to_string method, which is available on any type that implements the Display trait, as string literals do. Listing 8-12 shows two examples.
fn main() {
let data = "initial contents";
let s = data.to_string();
// The method also works on a literal directly:
let s = "initial contents".to_string();
}
这段代码创建了一个包含 initial contents 的字符串。
This code creates a string containing initial contents.
我们也可以使用 String::from 函数从字符串字面量创建 String。示例 8-13 中的代码等同于示例 8-12 中使用 to_string 的代码。
We can also use the function String::from to create a String from a string literal. The code in Listing 8-13 is equivalent to the code in Listing 8-12 that uses to_string.
fn main() {
let s = String::from("initial contents");
}
因为字符串用途广泛,所以我们可以为字符串使用许多不同的泛型 API,为我们提供了很多选择。其中一些看起来可能多余,但它们都有各自的用武之地!在这种情况下,String::from 和 to_string 做的是相同的事情,所以你选择哪一个纯粹是风格和可读性的问题。
Because strings are used for so many things, we can use many different generic APIs for strings, providing us with a lot of options. Some of them can seem redundant, but they all have their place! In this case, String::from and to_string do the same thing, so which one you choose is a matter of style and readability.
请记住,字符串是 UTF-8 编码的,因此我们可以将任何正确编码的数据包含在其中,如示例 8-14 所示。
Remember that strings are UTF-8 encoded, so we can include any properly encoded data in them, as shown in Listing 8-14.
fn main() {
let hello = String::from("السلام عليكم");
let hello = String::from("Dobrý den");
let hello = String::from("Hello");
let hello = String::from("שלום");
let hello = String::from("नमस्ते");
let hello = String::from("こんにちは");
let hello = String::from("안녕하세요");
let hello = String::from("你好");
let hello = String::from("Olá");
let hello = String::from("Здравствуйте");
let hello = String::from("Hola");
}
所有这些都是有效的 String 值。
All of these are valid String values.
更新 String
Updating a String
如果向 String 中推入更多数据,它的尺寸可以增长,其内容也可以改变,就像 Vec<T> 的内容一样。此外,你可以方便地使用 + 运算符或 format! 宏来拼接 String 值。
A String can grow in size and its contents can change, just like the contents of a Vec<T>, if you push more data into it. In addition, you can conveniently use the + operator or the format! macro to concatenate String values.
使用 push_str 或 push 追加内容
Appending with push_str or push
我们可以通过使用 push_str 方法追加字符串切片来增加 String 的长度,如示例 8-15 所示。
We can grow a String by using the push_str method to append a string slice, as shown in Listing 8-15.
fn main() {
let mut s = String::from("foo");
s.push_str("bar");
}
在这两行代码之后,s 将包含 foobar。push_str 方法采用字符串切片,因为我们不一定希望获取参数的所有权。例如,在示例 8-16 的代码中,我们希望在将 s2 的内容追加到 s1 后,仍然能够使用 s2。
After these two lines, s will contain foobar. The push_str method takes a string slice because we don’t necessarily want to take ownership of the parameter. For example, in the code in Listing 8-16, we want to be able to use s2 after appending its contents to s1.
fn main() {
let mut s1 = String::from("foo");
let s2 = "bar";
s1.push_str(s2);
println!("s2 is {s2}");
}
如果 push_str 方法获取了 s2 的所有权,我们就无法在最后一行打印它的值。然而,这段代码如我们所愿地工作!
If the push_str method took ownership of s2, we wouldn’t be able to print its value on the last line. However, this code works as we’d expect!
push 方法将单个字符作为参数,并将其添加到 String 中。示例 8-17 使用 push 方法将字母 l 添加到 String 中。
The push method takes a single character as a parameter and adds it to the String. Listing 8-17 adds the letter l to a String using the push method.
fn main() {
let mut s = String::from("lo");
s.push('l');
}
结果,s 将包含 lol。
As a result, s will contain lol.
使用 + 或 format! 拼接
Concatenating with + or format!
通常,你会想要组合两个现有的字符串。一种方法是使用 + 运算符,如示例 8-18 所示。
Often, you’ll want to combine two existing strings. One way to do so is to use the + operator, as shown in Listing 8-18.
fn main() {
let s1 = String::from("Hello, ");
let s2 = String::from("world!");
let s3 = s1 + &s2; // note s1 has been moved here and can no longer be used
}
字符串 s3 将包含 Hello, world!。s1 在相加后不再有效的原因,以及我们使用 s2 引用的原因,与使用 + 运算符时调用的方法签名有关。+ 运算符使用 add 方法,其签名看起来像这样:
The string s3 will contain Hello, world!. The reason s1 is no longer valid after the addition, and the reason we used a reference to s2, has to do with the signature of the method that’s called when we use the + operator. The + operator uses the add method, whose signature looks something like this:
fn add(self, s: &str) -> String {
在标准库中,你会看到 add 是使用泛型和关联类型定义的。在这里,我们替换了具体类型,这正是我们使用 String 值调用此方法时发生的情况。我们将在第 10 章讨论泛型。这个签名提供了我们理解 + 运算符棘手部分所需的线索。
In the standard library, you’ll see add defined using generics and associated types. Here, we’ve substituted in concrete types, which is what happens when we call this method with String values. We’ll discuss generics in Chapter 10. This signature gives us the clues we need in order to understand the tricky bits of the + operator.
首先,s2 有一个 &,这意味着我们将第二个字符串的引用添加到第一个字符串中。这是因为 add 函数中的 s 参数:我们只能将字符串切片添加到 String;我们不能将两个 String 值相加。但是等一下——&s2 的类型是 &String,而不是 add 的第二个参数中指定的 &str。那么,为什么示例 8-18 可以编译呢?
First, s2 has an &, meaning that we’re adding a reference of the second string to the first string. This is because of the s parameter in the add function: We can only add a string slice to a String; we can’t add two String values together. But wait—the type of &s2 is &String, not &str, as specified in the second parameter to add. So, why does Listing 8-18 compile?
我们能够在 add 调用中使用 &s2 的原因是编译器可以将 &String 参数强制转换为 &str。当我们调用 add 方法时,Rust 使用了解引用强制转换(deref coercion),在这里它将 &s2 转换为 &s2[..]。我们将在第 15 章更深入地讨论解引用强制转换。因为 add 不获取 s 参数的所有权,所以在此操作之后 s2 仍将是一个有效的 String。
The reason we’re able to use &s2 in the call to add is that the compiler can coerce the &String argument into a &str. When we call the add method, Rust uses a deref coercion, which here turns &s2 into &s2[..]. We’ll discuss deref coercion in more depth in Chapter 15. Because add does not take ownership of the s parameter, s2 will still be a valid String after this operation.
其次,我们可以在签名中看到 add 获取了 self 的所有权,因为 self 没有 &。这意味着示例 8-18 中的 s1 将被移动到 add 调用中,并且在那之后将不再有效。因此,虽然 let s3 = s1 + &s2; 看起来像它会复制两个字符串并创建一个新字符串,但该语句实际上获取了 s1 的所有权,追加了 s2 内容的副本,然后返回结果的所有权。换句话说,它看起来像是在进行大量的复制,但事实并非如此;该实现比复制更有效。
Second, we can see in the signature that add takes ownership of self because self does not have an &. This means s1 in Listing 8-18 will be moved into the add call and will no longer be valid after that. So, although let s3 = s1 + &s2; looks like it will copy both strings and create a new one, this statement actually takes ownership of s1, appends a copy of the contents of s2, and then returns ownership of the result. In other words, it looks like it’s making a lot of copies, but it isn’t; the implementation is more efficient than copying.
如果我们需要拼接多个字符串,+ 运算符的行为会变得难以处理:
If we need to concatenate multiple strings, the behavior of the + operator gets unwieldy:
fn main() {
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
let s = s1 + "-" + &s2 + "-" + &s3;
}
此时,s 将是 tic-tac-toe。由于所有的 + 和 " 字符,很难看出发生了什么。为了以更复杂的方式组合字符串,我们可以改用 format! 宏:
At this point, s will be tic-tac-toe. With all of the + and " characters, it’s difficult to see what’s going on. For combining strings in more complicated ways, we can instead use the format! macro:
fn main() {
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
let s = format!("{s1}-{s2}-{s3}");
}
这段代码也将 s 设置为 tic-tac-toe。format! 宏的工作原理类似于 println!,但它不是将输出打印到屏幕上,而是返回一个包含内容的 String。使用 format! 的版本代码更容易阅读,并且 format! 宏生成的代码使用引用,因此该调用不会获取其任何参数的所有权。
This code also sets s to tic-tac-toe. The format! macro works like println!, but instead of printing the output to the screen, it returns a String with the contents. The version of the code using format! is much easier to read, and the code generated by the format! macro uses references so that this call doesn’t take ownership of any of its parameters.
字符串索引
Indexing into Strings
在许多其他编程语言中,通过索引引用访问字符串中的单个字符是一种有效且常见的操作。但是,如果你尝试在 Rust 中使用索引语法访问 String 的某些部分,你将得到一个错误。考虑示例 8-19 中的无效代码。
In many other programming languages, accessing individual characters in a string by referencing them by index is a valid and common operation. However, if you try to access parts of a String using indexing syntax in Rust, you’ll get an error. Consider the invalid code in Listing 8-19.
fn main() {
let s1 = String::from("hi");
let h = s1[0];
}
这段代码将导致以下错误:
$ cargo run
Compiling collections v0.1.0 (file:///projects/collections)
error[E0277]: the type `str` cannot be indexed by `{integer}`
--> src/main.rs:3:16
|
3 | let h = s1[0];
| ^ string indices are ranges of `usize`
|
= help: the trait `SliceIndex<str>` is not implemented for `{integer}`
= note: you can use `.chars().nth()` or `.bytes().nth()`
for more information, see chapter 8 in The Book: <https://doc.rust-lang.org/book/ch08-02-strings.html#indexing-into-strings>
= help: the following other types implement trait `SliceIndex<T>`:
`usize` implements `SliceIndex<ByteStr>`
`usize` implements `SliceIndex<[T]>`
= note: required for `String` to implement `Index<{integer}>`
For more information about this error, try `rustc --explain E0277`.
error: could not compile `collections` (bin "collections") due to 1 previous error
错误说明了一切:Rust 字符串不支持索引。但为什么不支持呢?为了回答这个问题,我们需要讨论 Rust 如何在内存中存储字符串。
The error tells the story: Rust strings don’t support indexing. But why not? To answer that question, we need to discuss how Rust stores strings in memory.
内部表示
Internal Representation
String 是对 Vec<u8> 的包装。让我们看看示例 8-14 中一些正确编码的 UTF-8 示例字符串。首先,看这个:
A String is a wrapper over a Vec<u8>. Let’s look at some of our properly encoded UTF-8 example strings from Listing 8-14. First, this one:
fn main() {
let hello = String::from("السلام عليكم");
let hello = String::from("Dobrý den");
let hello = String::from("Hello");
let hello = String::from("שלום");
let hello = String::from("नमस्ते");
let hello = String::from("こんにちは");
let hello = String::from("안녕하세요");
let hello = String::from("你好");
let hello = String::from("Olá");
let hello = String::from("Здравствуйте");
let hello = String::from("Hola");
}
在这种情况下,len 将是 4,这意味着存储字符串 "Hola" 的 vector 长度为 4 字节。当采用 UTF-8 编码时,这些字母中的每一个都占用 1 字节。然而,下面这一行可能会让你感到惊讶(请注意,这个字符串以大写的西里尔字母 Ze 开头,而不是数字 3):
In this case, len will be 4, which means the vector storing the string "Hola" is 4 bytes long. Each of these letters takes 1 byte when encoded in UTF-8. The following line, however, may surprise you (note that this string begins with the capital Cyrillic letter Ze, not the number 3):
fn main() {
let hello = String::from("السلام عليكم");
let hello = String::from("Dobrý den");
let hello = String::from("Hello");
let hello = String::from("שלום");
let hello = String::from("नमस्ते");
let hello = String::from("こんにちは");
let hello = String::from("안녕하세요");
let hello = String::from("你好");
let hello = String::from("Olá");
let hello = String::from("Здравствуйте");
let hello = String::from("Hola");
}
如果问你这个字符串有多长,你可能会说是 12。事实上,Rust 的答案是 24:这是在 UTF-8 中编码“Здравствуйте”所需的字节数,因为该字符串中的每个 Unicode 标量值占用 2 字节的存储空间。因此,对字符串字节的索引并不总是对应于一个有效的 Unicode 标量值。为了演示,考虑这段无效的 Rust 代码:
If you were asked how long the string is, you might say 12. In fact, Rust’s answer is 24: That’s the number of bytes it takes to encode “Здравствуйте” in UTF-8, because each Unicode scalar value in that string takes 2 bytes of storage. Therefore, an index into the string’s bytes will not always correlate to a valid Unicode scalar value. To demonstrate, consider this invalid Rust code:
let hello = "Здравствуйте";
let answer = &hello[0];
你已经知道 answer 不会是 З(第一个字母)。当以 UTF-8 编码时,З 的第一个字节是 208,第二个字节是 151,所以看起来 answer 实际上应该是 208,但 208 本身并不是一个有效的字符。如果用户请求该字符串的第一个字母,返回 208 可能不是他们想要的;然而,这是 Rust 在字节索引 0 处拥有的唯一数据。即使字符串仅包含拉丁字母,用户通常也不希望返回字节值:如果 &"hi"[0] 是返回字节值的有效代码,它将返回 104,而不是 h。
You already know that answer will not be З, the first letter. When encoded in UTF-8, the first byte of З is 208 and the second is 151, so it would seem that answer should in fact be 208, but 208 is not a valid character on its own. Returning 208 is likely not what a user would want if they asked for the first letter of this string; however, that’s the only data that Rust has at byte index 0. Users generally don’t want the byte value returned, even if the string contains only Latin letters: If &"hi"[0] were valid code that returned the byte value, it would return 104, not h.
因此,为了避免返回意外值并导致可能不会立即发现的错误,Rust 根本不编译这段代码,并在开发过程的早期就防止了误解。
The answer, then, is that to avoid returning an unexpected value and causing bugs that might not be discovered immediately, Rust doesn’t compile this code at all and prevents misunderstandings early in the development process.
字节、标量值和字形集
Bytes, Scalar Values, and Grapheme Clusters
关于 UTF-8 的另一点是,从 Rust 的角度来看,实际上有三种相关的方法可以查看字符串:作为字节、标量值和字形集(最接近我们称之为 字母 的东西)。
Another point about UTF-8 is that there are actually three relevant ways to look at strings from Rust’s perspective: as bytes, scalar values, and grapheme clusters (the closest thing to what we would call letters).
如果我们看用天城体书写的印地语单词“नमस्ते”,它被存储为一个 u8 值的 vector,看起来像这样:
If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is stored as a vector of u8 values that looks like this:
[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164,
224, 165, 135]
那是 18 个字节,这是计算机最终存储此数据的方式。如果我们把它们看作 Unicode 标量值(即 Rust 的 char 类型),那些字节看起来像这样:
That’s 18 bytes and is how computers ultimately store this data. If we look at them as Unicode scalar values, which are what Rust’s char type is, those bytes look like this:
['न', 'म', 'स', '्', 'त', 'े']
这里有六个 char 值,但第四个和第六个不是字母:它们是变音符号,单独存在没有意义。最后,如果我们把它们看作字形集,我们会得到人类所说的构成该印地语单词的四个字母:
There are six char values here, but the fourth and sixth are not letters: They’re diacritics that don’t make sense on their own. Finally, if we look at them as grapheme clusters, we’d get what a person would call the four letters that make up the Hindi word:
["न", "म", "स्", "ते"]
Rust 提供了不同的方式来解释计算机存储的原始字符串数据,以便每个程序可以选择它所需的解释方式,无论数据使用的是哪种人类语言。
Rust provides different ways of interpreting the raw string data that computers store so that each program can choose the interpretation it needs, no matter what human language the data is in.
Rust 不允许我们通过索引 String 来获取字符的最后一个原因是索引操作被期望始终花费常数时间 (O(1))。但在 String 上无法保证该性能,因为 Rust 必须从头开始遍历内容到索引处,以确定有多少个有效的字符。
A final reason Rust doesn’t allow us to index into a String to get a character is that indexing operations are expected to always take constant time (O(1)). But it isn’t possible to guarantee that performance with a String, because Rust would have to walk through the contents from the beginning to the index to determine how many valid characters were were.
字符串切片
Slicing Strings
对字符串进行索引通常是一个坏主意,因为不清楚字符串索引操作的返回类型应该是什么:字节值、字符、字形集或字符串切片。因此,如果你确实需要使用索引来创建字符串切片,Rust 会要求你更加明确。
Indexing into a string is often a bad idea because it’s not clear what the return type of the string-indexing operation should be: a byte value, a character, a grapheme cluster, or a string slice. If you really need to use indices to create string slices, therefore, Rust asks you to be more specific.
你可以使用带有范围的 [] 来创建一个包含特定字节的字符串切片,而不是使用带有单个数字的 [] 进行索引:
Rather than indexing using [] with a single number, you can use [] with a range to create a string slice containing particular bytes:
#![allow(unused)]
fn main() {
let hello = "Здравствуйте";
let s = &hello[0..4];
}
在这里,s 将是一个包含字符串前 4 个字节的 &str。前面我们提到这些字符每个都是 2 字节,这意味着 s 将是 Зд。
Here, s will be a &str that contains the first 4 bytes of the string. Earlier, we mentioned that each of these characters was 2 bytes, which means s will be Зд.
如果我们尝试使用类似 &hello[0..1] 的方式仅切分字符的部分字节,Rust 在运行时会发生恐慌,就像在访问 vector 中的无效索引一样:
If we were to try to slice only part of a character’s bytes with something like &hello[0..1], Rust would panic at runtime in the same way as if an invalid index were accessed in a vector:
$ cargo run
Compiling collections v0.1.0 (file:///projects/collections)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.43s
Running `target/debug/collections`
thread 'main' panicked at src/main.rs:4:19:
byte index 1 is not a char boundary; it is inside 'З' (bytes 0..2) of `Здравствуйте`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
使用范围创建字符串切片时应格外小心,因为这样做可能会导致程序崩溃。
You should use caution when creating string slices with ranges, because doing so can crash your program.
遍历字符串
Iterating Over Strings
对字符串片段进行操作的最佳方式是明确你是想要字符还是字节。对于单个 Unicode 标量值,使用 chars 方法。在“Зд”上调用 chars 会分离并返回两个 char 类型的值,你可以遍历结果以访问每个元素:
The best way to operate on pieces of strings is to be explicit about whether you want characters or bytes. For individual Unicode scalar values, use the chars method. Calling chars on “Зд” separates out and returns two values of type char, and you can iterate over the result to access each element:
#![allow(unused)]
fn main() {
for c in "Зд".chars() {
println!("{c}");
}
}
这段代码将打印以下内容:
This code will print the following:
З
д
或者,bytes 方法返回每个原始字节,这可能适合你的领域需求:
Alternatively, the bytes method returns each raw byte, which might be appropriate for your domain:
#![allow(unused)]
fn main() {
for b in "Зд".bytes() {
println!("{b}");
}
}
这段代码将打印构成该字符串的 4 个字节:
This code will print the 4 bytes that make up this string:
208
151
208
180
但请务必记住,有效的 Unicode 标量值可能由 1 个以上的字节组成。
But be sure to remember that valid Unicode scalar values may be made up of more than 1 byte.
从字符串中获取字形集(如天城体脚本)非常复杂,因此标准库不提供此功能。如果你需要此功能,可以在 crates.io 上找到相关的 crate。
Getting grapheme clusters from strings, as with the Devanagari script, is complex, so this functionality is not provided by the standard library. Crates are available on crates.io if this is the functionality you need.
处理字符串的复杂性
Handling the Complexities of Strings
总而言之,字符串很复杂。不同的编程语言对于如何向程序员呈现这种复杂性做出了不同的选择。Rust 选择将正确处理 String 数据作为所有 Rust 程序的默认行为,这意味着程序员必须预先花更多心思处理 UTF-8 数据。这种权衡比其他编程语言显现出了更多的字符串复杂性,但它可以防止你在开发生命周期的后期不得不处理涉及非 ASCII 字符的错误。
To summarize, strings are complicated. Different programming languages make different choices about how to present this complexity to the programmer. Rust has chosen to make the correct handling of String data the default behavior for all Rust programs, which means programmers have to put more thought into handling UTF-8 data up front. This trade-off exposes more of the complexity of strings than is apparent in other programming languages, but it prevents you from having to handle errors involving non-ASCII characters later in your development life cycle.
好消息是,标准库提供了许多基于 String 和 &str 类型构建的功能,以帮助正确处理这些复杂情况。请务必查看文档中非常有用的方法,例如用于在字符串中搜索的 contains 和用于将字符串的一部分替换为另一个字符串的 replace。
The good news is that the standard library offers a lot of functionality built off the String and &str types to help handle these complex situations correctly. Be sure to check out the documentation for useful methods like contains for searching in a string and replace for substituting parts of a string with another string.
让我们切换到稍微简单一点的东西:哈希映射(hash map)!
Let’s switch to something a bit less complex: hash maps!