在哈希映射中存储键和关联的值
Storing Keys with Associated Values in Hash Maps
我们要看的最后一种常用集合是哈希映射(hash map)。HashMap<K, V> 类型通过一个哈希函数(hashing function)存储类型为 K 的键到类型为 V 的值的映射,该函数决定了它如何将这些键和值放入内存。许多编程语言都支持这种数据结构,但它们通常使用不同的名称,例如 hash、map、object、hash table(哈希表)、dictionary(字典)或 associative array(关联数组),这里仅列举一部分。
The last of our common collections is the hash map. The type HashMap<K, V> stores a mapping of keys of type K to values of type V using a hashing function, which determines how it places these keys and values into memory. Many programming languages support this kind of data structure, but they often use a different name, such as hash, map, object, hash table, dictionary, or associative array, just to name a few.
当你不想像在 vector 中那样使用索引,而是想通过可以是任何类型的键来查找数据时,哈希映射非常有用。例如,在一个游戏中,你可以使用哈希映射来跟踪每个队伍的分数,其中每个键是队伍的名称,值是每个队伍的分数。给定一个队伍名称,你就可以检索到它的分数。
Hash maps are useful when you want to look up data not by using an index, as you can with vectors, but by using a key that can be of any type. For example, in a game, you could keep track of each team’s score in a hash map in which each key is a team’s name and the values are each team’s score. Given a team name, you can retrieve its score.
在本节中,我们将介绍哈希映射的基本 API,但在标准库为 HashMap<K, V> 定义的函数中还隐藏着更多好东西。一如既往,请查看标准库文档以获取更多信息。
We’ll go over the basic API of hash maps in this section, but many more goodies are hiding in the functions defined on HashMap<K, V> by the standard library. As always, check the standard library documentation for more information.
创建一个新的哈希映射
Creating a New Hash Map
创建一个空哈希映射的一种方法是使用 new,并使用 insert 添加元素。在示例 8-20 中,我们正在跟踪两个队伍的分数,队伍名称分别为 Blue(蓝队)和 Yellow(黄队)。蓝队初始分数为 10 分,黄队初始分数为 50 分。
One way to create an empty hash map is to use new and to add elements with insert. In Listing 8-20, we’re keeping track of the scores of two teams whose names are Blue and Yellow. The Blue team starts with 10 points, and the Yellow team starts with 50.
fn main() {
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
}
请注意,我们首先需要从标准库的集合部分 use(导入)HashMap。在我们这三种常用集合中,这一种是最不常用的,因此它没有包含在 prelude(预导入模块)自动带入作用域的功能中。哈希映射得到的标准库支持也较少;例如,没有内置的宏来构建它们。
Note that we need to first use the HashMap from the collections portion of the standard library. Of our three common collections, this one is the least often used, so it’s not included in the features brought into scope automatically in the prelude. Hash maps also have less support from the standard library; there’s no built-in macro to construct them, for example.
就像 vector 一样,哈希映射将数据存储在堆上。这个 HashMap 的键类型是 String,值类型是 i32。与 vector 类似,哈希映射是同质的:所有的键必须具有相同的类型,所有的值也必须具有相同的类型。
Just like vectors, hash maps store their data on the heap. This HashMap has keys of type String and values of type i32. Like vectors, hash maps are homogeneous: All of the keys must have the same type, and all of the values must have the same type.
访问哈希映射中的值
Accessing Values in a Hash Map
我们可以通过将键提供给 get 方法来从哈希映射中获取值,如示例 8-21 所示。
We can get a value out of the hash map by providing its key to the get method, as shown in Listing 8-21.
fn main() {
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
let team_name = String::from("Blue");
let score = scores.get(&team_name).copied().unwrap_or(0);
}
在这里,score 将拥有与蓝队关联的值,结果将是 10。get 方法返回一个 Option<&V>;如果哈希映射中没有该键的值,get 将返回 None。此程序通过调用 copied 来获取 Option<i32> 而不是 Option<&i32>,然后调用 unwrap_or 在 scores 没有该键的条目时将 score 设置为零,以此来处理 Option。
Here, score will have the value that’s associated with the Blue team, and the result will be 10. The get method returns an Option<&V>; if there’s no value for that key in the hash map, get will return None. This program handles the Option by calling copied to get an Option<i32> rather than an Option<&i32>, then unwrap_or to set score to zero if scores doesn’t have an entry for the key.
我们可以使用 for 循环以类似于遍历 vector 的方式遍历哈希映射中的每个键值对:
We can iterate over each key-value pair in a hash map in a similar manner as we do with vectors, using a for loop:
fn main() {
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
for (key, value) in &scores {
println!("{key}: {value}");
}
}
这段代码将以任意顺序打印每一对:
This code will print each pair in an arbitrary order:
Yellow: 50
Blue: 10
哈希映射中的所有权管理
Managing Ownership in Hash Maps
对于实现了 Copy trait 的类型(如 i32),值会被复制到哈希映射中。对于拥有所有权的值(如 String),值将被移动(move),而哈希映射将成为这些值的所有者,如示例 8-22 所示。
For types that implement the Copy trait, like i32, the values are copied into the hash map. For owned values like String, the values will be moved and the hash map will be the owner of those values, as demonstrated in Listing 8-22.
fn main() {
use std::collections::HashMap;
let field_name = String::from("Favorite color");
let field_value = String::from("Blue");
let mut map = HashMap::new();
map.insert(field_name, field_value);
// field_name and field_value are invalid at this point, try using them and
// see what compiler error you get!
}
在调用 insert 将变量 field_name 和 field_value 移动到哈希映射之后,我们就无法再使用它们了。
We aren’t able to use the variables field_name and field_value after they’ve been moved into the hash map with the call to insert.
如果我们将值的引用插入哈希映射,值将不会被移动到哈希映射中。引用指向的值必须至少在哈希映射有效期间保持有效。我们将在第 10 章的“使用生命周期验证引用”中详细讨论这些问题。
If we insert references to values into the hash map, the values won’t be moved into the hash map. The values that the references point to must be valid for at least as long as the hash map is valid. We’ll talk more about these issues in “Validating References with Lifetimes” in Chapter 10.
更新哈希映射
Updating a Hash Map
虽然键值对的数量是可增长的,但每个唯一的键在同一时间只能关联一个值(但反之则不然:例如,蓝队和黄队都可以在 scores 哈希映射中存储值 10)。
Although the number of key and value pairs is growable, each unique key can only have one value associated with it at a time (but not vice versa: For example, both the Blue team and the Yellow team could have the value 10 stored in the scores hash map).
当你想要更改哈希映射中的数据时,必须决定如何处理键已经分配了值的情况。你可以用新值替换旧值,完全忽略旧值。你可以保留旧值并忽略新值,只有当键不存在值时才添加新值。或者你可以将旧值和新值结合起来。让我们看看如何实现其中的每一种!
When you want to change the data in a hash map, you have to decide how to handle the case when a key already has a value assigned. You could replace the old value with the new value, completely disregarding the old value. You could keep the old value and ignore the new value, only adding the new value if the key doesn’t already have a value. Or you could combine the old value and the new value. Let’s look at how to do each of these!
覆盖一个值
Overwriting a Value
如果我们向哈希映射插入一个键和一个值,然后再次插入具有不同值的相同键,则与该键关联的值将被替换。尽管示例 8-23 中的代码调用了两次 insert,但哈希映射将只包含一个键值对,因为我们两次都在插入蓝队键的值。
If we insert a key and a value into a hash map and then insert that same key with a different value, the value associated with that key will be replaced. Even though the code in Listing 8-23 calls insert twice, the hash map will only contain one key-value pair because we’re inserting the value for the Blue team’s key both times.
fn main() {
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Blue"), 25);
println!("{scores:?}");
}
这段代码将打印 {"Blue": 25}。原始值 10 已被覆盖。
This code will print {"Blue": 25}. The original value of 10 has been overwritten.
仅在键不存在时添加键和值
Adding a Key and Value Only If a Key Isn’t Present
通常需要检查哈希映射中是否已存在某个特定键及其值,然后采取以下操作:如果该键确实存在于哈希映射中,则现有值应保持原样;如果该键不存在,则插入它及其值。
It’s common to check whether a particular key already exists in the hash map with a value and then to take the following actions: If the key does exist in the hash map, the existing value should remain the way it is; if the key doesn’t exist, insert it and a value for it.
哈希映射为此提供了一个名为 entry 的特殊 API,它将你想要检查的键作为参数。entry 方法的返回值是一个名为 Entry 的枚举,代表一个可能存在也可能不存在的值。假设我们要检查黄队的键是否有与之关联的值。如果没有,我们要插入值 50,蓝队也是如此。使用 entry API,代码如示例 8-24 所示。
Hash maps have a special API for this called entry that takes the key you want to check as a parameter. The return value of the entry method is an enum called Entry that represents a value that might or might not exist. Let’s say we want to check whether the key for the Yellow team has a value associated with it. If it doesn’t, we want to insert the value 50, and the same for the Blue team. Using the entry API, the code looks like Listing 8-24.
fn main() {
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.entry(String::from("Yellow")).or_insert(50);
scores.entry(String::from("Blue")).or_insert(50);
println!("{scores:?}");
}
Entry 上的 or_insert 方法被定义为:如果对应的 Entry 键存在,则返回指向该值的可变引用;如果不存在,则将参数插入为该键的新值,并返回指向新值的可变引用。这种技术比我们自己编写逻辑要简洁得多,而且与借用检查器结合得更好。
The or_insert method on Entry is defined to return a mutable reference to the value for the corresponding Entry key if that key exists, and if not, it inserts the parameter as the new value for this key and returns a mutable reference to the new value. This technique is much cleaner than writing the logic ourselves and, in addition, plays more nicely with the borrow checker.
运行示例 8-24 中的代码将打印 {"Yellow": 50, "Blue": 10}。第一次调用 entry 将为黄队插入键和值 50,因为黄队之前没有值。第二次调用 entry 不会改变哈希映射,因为蓝队已经有了值 10。
Running the code in Listing 8-24 will print {"Yellow": 50, "Blue": 10}. The first call to entry will insert the key for the Yellow team with the value 50 because the Yellow team doesn’t have a value already. The second call to entry will not change the hash map, because the Blue team already has the value 10.
根据旧值更新值
Updating a Value Based on the Old Value
哈希映射的另一个常见用例是查找键的值,然后根据旧值对其进行更新。例如,示例 8-25 展示了统计一段文本中每个单词出现次数的代码。我们使用一个以单词为键的哈希映射,并递增其值以跟踪我们看到该单词的次数。如果这是我们第一次看到某个单词,我们将首先插入值 0。
Another common use case for hash maps is to look up a key’s value and then update it based on the old value. For instance, Listing 8-25 shows code that counts how many times each word appears in some text. We use a hash map with the words as keys and increment the value to keep track of how many times we’ve seen that word. If it’s the first time we’ve seen a word, we’ll first insert the value 0.
fn main() {
use std::collections::HashMap;
let text = "hello world wonderful world";
let mut map = HashMap::new();
for word in text.split_whitespace() {
let count = map.entry(word).or_insert(0);
*count += 1;
}
println!("{map:?}");
}
这段代码将打印 {"world": 2, "hello": 1, "wonderful": 1}。你可能会看到相同的键值对以不同的顺序打印出来:回想一下“访问哈希映射中的值”部分,遍历哈希映射是以任意顺序进行的。
This code will print {"world": 2, "hello": 1, "wonderful": 1}. You might see the same key-value pairs printed in a different order: Recall from “Accessing Values in a Hash Map” that iterating over a hash map happens in an arbitrary order.
split_whitespace 方法返回一个迭代器,遍历 text 中的值按空白字符分隔出的子切片。or_insert 方法返回指向指定键的值的可变引用(&mut V)。在这里,我们将该可变引用存储在 count 变量中,因此为了给该值赋值,我们必须首先使用星号(*)对 count 进行解引用。可变引用在 for 循环结束时超出作用域,因此所有这些更改都是安全的,并且符合借用规则。
The split_whitespace method returns an iterator over subslices, separated by whitespace, of the value in text. The or_insert method returns a mutable reference (&mut V) to the value for the specified key. Here, we store that mutable reference in the count variable, so in order to assign to that value, we must first dereference count using the asterisk (*). The mutable reference goes out of scope at the end of the for loop, so all of these changes are safe and allowed by the borrowing rules.
哈希函数
Hashing Functions
默认情况下,HashMap 使用一种名为 SipHash 的哈希函数,它可以抵抗涉及哈希表的拒绝服务(DoS)攻击1。这不是目前最快的哈希算法,但为了降低性能而换取更好的安全性是值得的。如果你对代码进行性能分析并发现默认哈希函数对你的目的而言太慢,你可以通过指定不同的 hasher(哈希器)来切换到另一个函数。hasher 是一个实现了 BuildHasher trait 的类型。我们将在第 10 章讨论 trait 及其实现方法。你不一定非要从头开始实现自己的 hasher;crates.io 上有其他 Rust 用户分享的库,提供了实现许多常见哈希算法的 hasher。
By default, HashMap uses a hashing function called SipHash that can provide resistance to denial-of-service (DoS) attacks involving hash tables1. This is not the fastest hashing algorithm available, but the trade-off for better security that comes with the drop in performance is worth it. If you profile your code and find that the default hash function is too slow for your purposes, you can switch to another function by specifying a different hasher. A hasher is a type that implements the BuildHasher trait. We’ll talk about traits and how to implement them in Chapter 10. You don’t necessarily have to implement your own hasher from scratch; crates.io has libraries shared by other Rust users that provide hashers implementing many common hashing algorithms.
总结
Summary
当需要存储、访问和修改数据时,vector、string 和哈希映射将提供程序所需的大量功能。以下是你现在应该有能力解决的一些练习:
Vectors, strings, and hash maps will provide a large amount of functionality necessary in programs when you need to store, access, and modify data. Here are some exercises you should now be equipped to solve:
-
给定一个整数列表,使用 vector 并返回列表的中位数(排序后位于中间位置的值)和众数(出现次数最多的值;这里哈希映射会很有帮助)。
-
Given a list of integers, use a vector and return the median (when sorted, the value in the middle position) and mode (the value that occurs most often; a hash map will be helpful here) of the list.
-
将字符串转换为猪拉丁语(Pig Latin)。每个单词的第一个辅音字母移到单词末尾并加上 ay,例如 first 变成 irst-fay。以元音字母开头的单词则在末尾加上 hay(apple 变成 apple-hay)。请牢记有关 UTF-8 编码的细节!
-
Convert strings to Pig Latin. The first consonant of each word is moved to the end of the word and ay is added, so first becomes irst-fay. Words that start with a vowel have hay added to the end instead (apple becomes apple-hay). Keep in mind the details about UTF-8 encoding!
-
使用哈希映射和 vector,创建一个文本界面,允许用户将员工姓名添加到公司的部门中;例如,“Add Sally to Engineering”(将 Sally 添加到工程部)或 “Add Amir to Sales”(将 Amir 添加到销售部)。然后,让用户检索按字母顺序排序的特定部门的所有人员列表,或按部门检索公司所有人员的列表。
-
Using a hash map and vectors, create a text interface to allow a user to add employee names to a department in a company; for example, “Add Sally to Engineering” or “Add Amir to Sales.” Then, let the user retrieve a list of all people in a department or all people in the company by department, sorted alphabetically.
标准库 API 文档描述了 vector、string 和哈希映射所具有的方法,这些方法对完成这些练习很有帮助!
The standard library API documentation describes methods that vectors, strings, and hash maps have that will be helpful for these exercises!
我们正在进入更复杂的程序,其中的操作可能会失败,所以现在是讨论错误处理的最佳时机。我们接下来就开始讨论!
We’re getting into more complex programs in which operations can fail, so it’s a perfect time to discuss error handling. We’ll do that next!