数据类型
Data Types
Rust 中的每一个值都有其特定的数据类型(data type),这告诉 Rust 它被指定为什么样的数据,以便它知道如何处理这些数据。我们将研究两种数据类型子集:标量类型和复合类型。
Every value in Rust is of a certain data type, which tells Rust what kind of data is being specified so that it knows how to work with that data. We’ll look at two data type subsets: scalar and compound.
请记住,Rust 是一种静态类型(statically typed)语言,这意味着它必须在编译时知道所有变量的类型。编译器通常可以根据值及其使用方式推断出我们想要使用的类型。在可能有多种类型的情况下,例如在第 2 章的“比较猜测数字与秘密数字”部分中,我们使用 parse 将 String 转换为数值类型时,我们必须添加类型标注,如下所示:
Keep in mind that Rust is a statically typed language, which means that it
must know the types of all variables at compile time. The compiler can usually
infer what type we want to use based on the value and how we use it. In cases
when many types are possible, such as when we converted a String to a numeric
type using parse in the “Comparing the Guess to the Secret
Number” section in
Chapter 2, we must add a type annotation, like this:
#![allow(unused)]
fn main() {
let guess: u32 = "42".parse().expect("Not a number!");
}
如果我们不添加上面代码中显示的 : u32 类型标注,Rust 将显示以下错误,这意味着编译器需要我们提供更多信息才能知道我们要使用哪种类型:
If we don’t add the : u32 type annotation shown in the preceding code, Rust
will display the following error, which means the compiler needs more
information from us to know which type we want to use:
$ cargo build
Compiling no_type_annotations v0.1.0 (file:///projects/no_type_annotations)
error[E0284]: type annotations needed
--> src/main.rs:2:9
|
2 | let guess = "42".parse().expect("Not a number!");
| ^^^^^ ----- type must be known at this point
|
= note: cannot satisfy `<_ as FromStr>::Err == _`
help: consider giving `guess` an explicit type
|
2 | let guess: /* Type */ = "42".parse().expect("Not a number!");
| ++++++++++++
For more information about this error, try `rustc --explain E0284`.
error: could not compile `no_type_annotations` (bin "no_type_annotations") due to 1 previous error
你会看到其他数据类型的不同类型标注。
You’ll see different type annotations for other data types.
标量类型
Scalar Types
标量(scalar)类型代表单个值。Rust 有四种主要的标量类型:整数、浮点数、布尔值和字符。你可能从其他编程语言中认识这些类型。让我们跳入它们在 Rust 中是如何工作的。
A scalar type represents a single value. Rust has four primary scalar types: integers, floating-point numbers, Booleans, and characters. You may recognize these from other programming languages. Let’s jump into how they work in Rust.
整数类型
Integer Types
整数(integer)是一个没有小数部分的数字。我们在第 2 章中使用过一种整数类型,即 u32 类型。这个类型声明表明它关联的值应该是一个无符号整数(有符号整数类型以 i 而不是 u 开头),占用 32 位的空间。表 3-1 显示了 Rust 中内置的整数类型。我们可以使用这些变体中的任何一种来声明整数值的类型。
An integer is a number without a fractional component. We used one integer
type in Chapter 2, the u32 type. This type declaration indicates that the
value it’s associated with should be an unsigned integer (signed integer types
start with i instead of u) that takes up 32 bits of space. Table 3-1 shows
the built-in integer types in Rust. We can use any of these variants to declare
the type of an integer value.
表 3-1:Rust 中的整数类型 Table 3-1: Integer Types in Rust
| 长度 | 有符号 | 无符号 |
|---|---|---|
| 8-bit | i8 | u8 |
| 16-bit | i16 | u16 |
| 32-bit | i32 | u32 |
| 64-bit | i64 | u64 |
| 128-bit | i128 | u128 |
| 依赖架构 | isize | usize |
每个变体都可以是有符号的或无符号的,并且具有明确的大小。有符号(signed)和无符号(unsigned)是指数字是否可能为负数——换句话说,数字是否需要带有符号(有符号),或者它是否永远为正数因此可以在没有符号的情况下表示(无符号)。这就像在纸上写数字一样:当符号很重要时,数字会显示加号或减号;但是,当可以安全地假设数字为正数时,它就不显示符号。有符号数使用二进制补码表示法存储。
Each variant can be either signed or unsigned and has an explicit size. Signed and unsigned refer to whether it’s possible for the number to be negative—in other words, whether the number needs to have a sign with it (signed) or whether it will only ever be positive and can therefore be represented without a sign (unsigned). It’s like writing numbers on paper: When the sign matters, a number is shown with a plus sign or a minus sign; however, when it’s safe to assume the number is positive, it’s shown with no sign. Signed numbers are stored using two’s complement representation.
每个有符号变体可以存储从 −(2n − 1) 到 2n − 1 − 1(包含端点)的数字,其中 n 是该变体使用的位数。因此,i8 可以存储从 −(27) 到 27 − 1 的数字,即 -128 到 127。无符号变体可以存储从 0 到 2n − 1 的数字,因此 u8 可以存储从 0 到 28 − 1 的数字,即 0 到 255。
Each signed variant can store numbers from −(2n − 1) to 2n −
1 − 1 inclusive, where n is the number of bits that variant uses. So, an
i8 can store numbers from −(27) to 27 − 1, which equals
−128 to 127. Unsigned variants can store numbers from 0 to 2n − 1,
so a u8 can store numbers from 0 to 28 − 1, which equals 0 to 255.
此外,isize 和 usize 类型取决于程序运行所在的计算机架构:如果你在 64 位架构上,则为 64 位;如果你在 32 位架构上,则为 32 位。
Additionally, the isize and usize types depend on the architecture of the
computer your program is running on: 64 bits if you’re on a 64-bit architecture
and 32 bits if you’re on a 32-bit architecture.
你可以按照表 3-2 中所示的任何形式编写整型字面量。请注意,可以是多种数值类型的数字字面量允许使用类型后缀(例如 57u8)来指定类型。数字字面量还可以使用 _ 作为视觉分隔符,使数字更易读,例如 1_000,其值与你指定 1000 时的值相同。
You can write integer literals in any of the forms shown in Table 3-2. Note
that number literals that can be multiple numeric types allow a type suffix,
such as 57u8, to designate the type. Number literals can also use _ as a
visual separator to make the number easier to read, such as 1_000, which will
have the same value as if you had specified 1000.
表 3-2:Rust 中的整型字面量 Table 3-2: Integer Literals in Rust
| 数字字面量 | 示例 |
|---|---|
| 十进制 | 98_222 |
| 十六进制 | 0xff |
| 八进制 | 0o77 |
| 二进制 | 0b1111_0000 |
字节(仅限 u8) | b'A' |
那么你如何知道该使用哪种类型的整数呢?如果你不确定,Rust 的默认值通常是很好的起点:整数类型默认为 i32。使用 isize 或 usize 的主要场景是在对某种集合进行索引时。
So how do you know which type of integer to use? If you’re unsure, Rust’s
defaults are generally good places to start: Integer types default to i32.
The primary situation in which you’d use isize or usize is when indexing
some sort of collection.
整数溢出
Integer Overflow
假设你有一个
u8类型的变量,它可以持有 0 到 255 之间的值。如果你尝试将变量更改为该范围之外的值(例如 256),则会发生整数溢出(integer overflow),这可能导致两种行为之一。当你以调试模式编译时,Rust 会包含整数溢出检查,如果发生这种行为,会导致程序在运行时恐慌(panic)。当程序带着错误退出时,Rust 使用“恐慌”这个术语;我们将在第 9 章的“使用panic!处理不可恢复的错误”部分中更深入地讨论恐慌。Let’s say you have a variable of type
u8that can hold values between 0 and 255. If you try to change the variable to a value outside that range, such as 256, integer overflow will occur, which can result in one of two behaviors. When you’re compiling in debug mode, Rust includes checks for integer overflow that cause your program to panic at runtime if this behavior occurs. Rust uses the term panicking when a program exits with an error; we’ll discuss panics in more depth in the “Unrecoverable Errors withpanic!” section in Chapter 9.当你使用
--release标志以发布模式编译时,Rust 不包含会导致恐慌的整数溢出检查。相反,如果发生溢出,Rust 会执行二进制补码回绕(two’s complement wrapping)。简而言之,大于该类型所能持有的最大值的数值会“回绕”到该类型所能持有的最小值。在u8的情况下,值 256 变为 0,值 257 变为 1,依此类推。程序不会恐慌,但变量的值可能不是你预期的值。依赖整数溢出的回绕行为被认为是一个错误。When you’re compiling in release mode with the
--releaseflag, Rust does not include checks for integer overflow that cause panics. Instead, if overflow occurs, Rust performs two’s complement wrapping. In short, values greater than the maximum value the type can hold “wrap around” to the minimum of the values the type can hold. In the case of au8, the value 256 becomes 0, the value 257 becomes 1, and so on. The program won’t panic, but the variable will have a value that probably isn’t what you were expecting it to have. Relying on integer overflow’s wrapping behavior is considered an error.为了显式地处理可能发生的溢出,你可以使用标准库为原始数值类型提供的这些方法系列:
To explicitly handle the possibility of overflow, you can use these families of methods provided by the standard library for primitive numeric types:
- 使用
wrapping_*方法在所有模式下进行回绕,例如wrapping_add。
- Wrap in all modes with the
wrapping_*methods, such aswrapping_add.
- 如果发生溢出,使用
checked_*方法返回None值。
- Return the
Nonevalue if there is overflow with thechecked_*methods.
- 使用
overflowing_*方法返回该值和一个指示是否发生溢出的布尔值。
- Return the value and a Boolean indicating whether there was overflow with the
overflowing_*methods.
- 使用
saturating_*方法使数值饱和在值的最小值或最大值处。
- Saturate at the value’s minimum or maximum values with the
saturating_*methods.
浮点类型
Floating-Point Types
Rust 还有两种用于浮点数(floating-point numbers)的原始类型,即带小数点的数字。Rust 的浮点类型是 f32 和 f64,其大小分别为 32 位和 64 位。默认类型是 f64,因为在现代 CPU 上,它的速度与 f32 大致相同,但精度更高。所有浮点类型都是有符号的。
Rust also has two primitive types for floating-point numbers, which are
numbers with decimal points. Rust’s floating-point types are f32 and f64,
which are 32 bits and 64 bits in size, respectively. The default type is f64
because on modern CPUs, it’s roughly the same speed as f32 but is capable of
more precision. All floating-point types are signed.
这是一个展示浮点数实际应用的例子:
Here’s an example that shows floating-point numbers in action:
文件名:src/main.rs Filename: src/main.rs
fn main() {
let x = 2.0; // f64
let y: f32 = 3.0; // f32
}
浮点数是根据 IEEE-754 标准表示的。
Floating-point numbers are represented according to the IEEE-754 standard.
数值运算
Numeric Operations
Rust 支持你对所有数字类型所期望的基本数学运算:加法、减法、乘法、除法和取余。整数除法会向零截断到最近的整数。以下代码显示了你如何在 let 语句中使用各种数值运算:
Rust supports the basic mathematical operations you’d expect for all the number
types: addition, subtraction, multiplication, division, and remainder. Integer
division truncates toward zero to the nearest integer. The following code shows
how you’d use each numeric operation in a let statement:
文件名:src/main.rs Filename: src/main.rs
fn main() {
// addition
let sum = 5 + 10;
// subtraction
let difference = 95.5 - 4.3;
// multiplication
let product = 4 * 30;
// division
let quotient = 56.7 / 32.2;
let truncated = -5 / 3; // Results in -1
// remainder
let remainder = 43 % 5;
}
这些语句中的每个表达式都使用了一个数学运算符,并求值为一个单独的值,然后将其绑定到一个变量。附录 B包含了 Rust 提供的所有运算符的列表。
Each expression in these statements uses a mathematical operator and evaluates to a single value, which is then bound to a variable. Appendix B contains a list of all operators that Rust provides.
布尔类型
The Boolean Type
与大多数其他编程语言一样,Rust 中的布尔类型有两个可能的值:true 和 false。布尔值的大小为一字节。Rust 中的布尔类型使用 bool 指定。例如:
As in most other programming languages, a Boolean type in Rust has two possible
values: true and false. Booleans are one byte in size. The Boolean type in
Rust is specified using bool. For example:
文件名:src/main.rs Filename: src/main.rs
fn main() {
let t = true;
let f: bool = false; // with explicit type annotation
}
使用布尔值的主要方式是通过条件判断,例如 if 表达式。我们将在“控制流”部分介绍 if 表达式在 Rust 中是如何工作的。
The main way to use Boolean values is through conditionals, such as an if
expression. We’ll cover how if expressions work in Rust in the “Control
Flow” section.
字符类型
The Character Type
Rust 的 char 类型是该语言最原始的字母类型。以下是一些声明 char 值的例子:
Rust’s char type is the language’s most primitive alphabetic type. Here are
some examples of declaring char values:
文件名:src/main.rs Filename: src/main.rs
fn main() {
let c = 'z';
let z: char = 'ℤ'; // with explicit type annotation
let heart_eyed_cat = '😻';
}
注意,我们使用单引号指定 char 字面量,这与使用双引号的字符串字面量不同。Rust 的 char 类型大小为 4 字节,代表一个 Unicode 标量值,这意味着它可以代表比 ASCII 多得多的内容。重音字母、中文、日文和韩文文本、emoji 以及零宽空格在 Rust 中都是有效的 char 值。Unicode 标量值的范围从 U+0000 到 U+D7FF 以及 U+E000 到 U+10FFFF(包含端点)。然而,“字符”在 Unicode 中并不是一个真正的概念,所以你对什么是“字符”的直觉可能与 Rust 中的 char 是什么不匹配。我们将在第 8 章的“使用字符串存储 UTF-8 编码的文本”中详细讨论这个话题。
Note that we specify char literals with single quotation marks, as opposed to
string literals, which use double quotation marks. Rust’s char type is 4
bytes in size and represents a Unicode scalar value, which means it can
represent a lot more than just ASCII. Accented letters; Chinese, Japanese, and
Korean characters; emojis; and zero-width spaces are all valid char values in
Rust. Unicode scalar values range from U+0000 to U+D7FF and U+E000 to
U+10FFFF inclusive. However, a “character” isn’t really a concept in Unicode,
so your human intuition for what a “character” is may not match up with what a
char is in Rust. We’ll discuss this topic in detail in “Storing UTF-8
Encoded Text with Strings” in Chapter 8.
复合类型
Compound Types
复合类型(compound types)可以将多个值组合成一个类型。Rust 有两种原始复合类型:元组(tuple)和数组(array)。
Compound types can group multiple values into one type. Rust has two primitive compound types: tuples and arrays.
元组类型
The Tuple Type
元组(tuple)是将多种类型的多个值组合成一个复合类型的通用方法。元组具有固定长度:一旦声明,它们的大小就不能增长或缩小。
A tuple is a general way of grouping together a number of values with a variety of types into one compound type. Tuples have a fixed length: Once declared, they cannot grow or shrink in size.
我们通过在圆括号内编写以逗号分隔的值列表来创建元组。元组中的每个位置都有一个类型,元组中不同值之间的类型不必相同。在这个例子中,我们添加了可选的类型标注:
We create a tuple by writing a comma-separated list of values inside parentheses. Each position in the tuple has a type, and the types of the different values in the tuple don’t have to be the same. We’ve added optional type annotations in this example:
文件名:src/main.rs Filename: src/main.rs
fn main() {
let tup: (i32, f64, u8) = (500, 6.4, 1);
}
变量 tup 绑定到整个元组,因为元组被认为是一个单独的复合元素。要从元组中获取单个值,我们可以使用模式匹配来解构元组值,如下所示:
The variable tup binds to the entire tuple because a tuple is considered a
single compound element. To get the individual values out of a tuple, we can
use pattern matching to destructure a tuple value, like this:
文件名:src/main.rs Filename: src/main.rs
fn main() {
let tup = (500, 6.4, 1);
let (x, y, z) = tup;
println!("The value of y is: {y}");
}
该程序首先创建一个元组并将其绑定到变量 tup。然后,它使用带有 let 的模式将 tup 拆分为三个独立的变量 x、y 和 z。这被称为解构(destructuring),因为它将单个元组拆分为三个部分。最后,程序打印 y 的值,即 6.4。
This program first creates a tuple and binds it to the variable tup. It then
uses a pattern with let to take tup and turn it into three separate
variables, x, y, and z. This is called destructuring because it breaks
the single tuple into three parts. Finally, the program prints the value of
y, which is 6.4.
我们还可以通过使用点号(.)后跟我们要访问的值的索引来直接访问元组元素。例如:
We can also access a tuple element directly by using a period (.) followed by
the index of the value we want to access. For example:
文件名:src/main.rs Filename: src/main.rs
fn main() {
let x: (i32, f64, u8) = (500, 6.4, 1);
let five_hundred = x.0;
let six_point_four = x.1;
let one = x.2;
}
该程序创建元组 x,然后使用各自的索引访问元组的每个元素。与大多数编程语言一样,元组中的第一个索引是 0。
This program creates the tuple x and then accesses each element of the tuple
using their respective indices. As with most programming languages, the first
index in a tuple is 0.
没有任何值的元组有一个特殊的名称:单元类型(unit)。该值及其相应的类型都写作 (),代表一个空值或空返回类型。如果表达式不返回任何其他值,则它们会隐式返回单元值。
The tuple without any values has a special name, unit. This value and its
corresponding type are both written () and represent an empty value or an
empty return type. Expressions implicitly return the unit value if they don’t
return any other value.
数组类型
The Array Type
拥有多个值集合的另一种方法是使用数组(array)。与元组不同,数组的每个元素都必须具有相同的类型。与某些其他语言中的数组不同,Rust 中的数组具有固定长度。
Another way to have a collection of multiple values is with an array. Unlike a tuple, every element of an array must have the same type. Unlike arrays in some other languages, arrays in Rust have a fixed length.
我们将数组中的值写在方括号内,并以逗号分隔:
We write the values in an array as a comma-separated list inside square brackets:
文件名:src/main.rs Filename: src/main.rs
fn main() {
let a = [1, 2, 3, 4, 5];
}
当你希望数据分配在栈上(与我们目前看到的其他类型相同)而不是堆上(我们将在第 4 章中更多地讨论栈和堆),或者当你希望确保始终拥有固定数量的元素时,数组非常有用。不过,数组不像 vector 类型那样灵活。vector 是由标准库提供的一种类似的集合类型,它被允许增长或缩小,因为其内容存储在堆上。如果你不确定是使用数组还是 vector,那么很可能你应该使用 vector。 第 8 章更详细地讨论了 vector。
Arrays are useful when you want your data allocated on the stack, the same as the other types we have seen so far, rather than the heap (We will discuss the stack and the heap more in Chapter 4) or when you want to ensure that you always have a fixed number of elements. An array isn’t as flexible as the vector type, though. A vector is a similar collection type provided by the standard library that is allowed to grow or shrink in size because its contents live on the heap. If you’re unsure whether to use an array or a vector, chances are you should use a vector. Chapter 8 discusses vectors in more detail.
但是,当你确定元素数量不需要更改时,数组会更有用。例如,如果你在程序中使用月份的名称,你可能会使用数组而不是 vector,因为你知道它将始终包含 12 个元素:
However, arrays are more useful when you know the number of elements will not need to change. For example, if you were using the names of the month in a program, you would probably use an array rather than a vector because you know it will always contain 12 elements:
#![allow(unused)]
fn main() {
let months = ["January", "February", "March", "April", "May", "June", "July",
"August", "September", "October", "November", "December"];
}
你可以使用方括号编写数组类型,其中包含每个元素的类型、分号,然后是数组中的元素数量,如下所示:
You write an array’s type using square brackets with the type of each element, a semicolon, and then the number of elements in the array, like so:
#![allow(unused)]
fn main() {
let a: [i32; 5] = [1, 2, 3, 4, 5];
}
在这里,i32 是每个元素的类型。分号之后,数字 5 表示该数组包含五个元素。
Here, i32 is the type of each element. After the semicolon, the number 5
indicates the array contains five elements.
你还可以通过指定初始值,后跟分号,然后在方括号中指定数组长度,来初始化一个每个元素都包含相同值的数组,如下所示:
You can also initialize an array to contain the same value for each element by specifying the initial value, followed by a semicolon, and then the length of the array in square brackets, as shown here:
#![allow(unused)]
fn main() {
let a = [3; 5];
}
名为 a 的数组将包含 5 个元素,这些元素最初都将被设置为值 3。这与编写 let a = [3, 3, 3, 3, 3]; 相同,但方式更简洁。
The array named a will contain 5 elements that will all be set to the value
3 initially. This is the same as writing let a = [3, 3, 3, 3, 3]; but in a
more concise way.
访问数组元素
Array Element Access
数组是分配在栈上的、已知固定大小的单块内存。你可以使用索引访问数组的元素,如下所示:
An array is a single chunk of memory of a known, fixed size that can be allocated on the stack. You can access elements of an array using indexing, like this:
文件名:src/main.rs Filename: src/main.rs
fn main() {
let a = [1, 2, 3, 4, 5];
let first = a[0];
let second = a[1];
}
在这个例子中,名为 first 的变量将获得值 1,因为那是数组中索引 [0] 处的值。名为 second 的变量将从数组的索引 [1] 处获得值 2。
In this example, the variable named first will get the value 1 because that
is the value at index [0] in the array. The variable named second will get
the value 2 from index [1] in the array.
无效的数组元素访问
Invalid Array Element Access
让我们看看如果你尝试访问数组末尾之后的数组元素会发生什么。假设你运行这段代码(类似于第 2 章中的猜数字游戏),以从用户那里获取数组索引:
Let’s see what happens if you try to access an element of an array that is past the end of the array. Say you run this code, similar to the guessing game in Chapter 2, to get an array index from the user:
文件名:src/main.rs Filename: src/main.rs
use std::io;
fn main() {
let a = [1, 2, 3, 4, 5];
println!("Please enter an array index.");
let mut index = String::new();
io::stdin()
.read_line(&mut index)
.expect("Failed to read line");
let index: usize = index
.trim()
.parse()
.expect("Index entered was not a number");
let element = a[index];
println!("The value of the element at index {index} is: {element}");
}
这段代码可以成功编译。如果你使用 cargo run 运行此代码并输入 0、1、2、3 或 4,程序将打印出数组中该索引对应的相应值。如果你改为输入一个超过数组末尾的数字(例如 10),你将看到如下输出:
This code compiles successfully. If you run this code using cargo run and
enter 0, 1, 2, 3, or 4, the program will print out the corresponding
value at that index in the array. If you instead enter a number past the end of
the array, such as 10, you’ll see output like this:
thread 'main' panicked at src/main.rs:19:19:
index out of bounds: the len is 5 but the index is 10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
程序在索引操作中使用无效值的地方导致了运行时错误。程序以一条错误消息退出,并没有执行最后的 println! 语句。当你尝试使用索引访问元素时,Rust 将检查你指定的索引是否小于数组长度。如果索引大于或等于长度,Rust 会发生恐慌。这种检查必须在运行时发生,尤其是在这种情况下,因为编译器不可能知道用户以后运行代码时会输入什么值。
The program resulted in a runtime error at the point of using an invalid
value in the indexing operation. The program exited with an error message and
didn’t execute the final println! statement. When you attempt to access an
element using indexing, Rust will check that the index you’ve specified is less
than the array length. If the index is greater than or equal to the length,
Rust will panic. This check has to happen at runtime, especially in this case,
because the compiler can’t possibly know what value a user will enter when they
run the code later.
这是 Rust 内存安全原则的一个实际应用。在许多底层语言中,不会进行这种检查,当你提供不正确的索引时,可能会访问到无效内存。Rust 通过立即退出而不是允许内存访问并继续运行来保护你免受此类错误的影响。第 9 章讨论了 Rust 的更多错误处理方式,以及如何编写既不会发生恐慌也不允许无效内存访问的可读、安全的代码。
This is an example of Rust’s memory safety principles in action. In many low-level languages, this kind of check is not done, and when you provide an incorrect index, invalid memory can be accessed. Rust protects you against this kind of error by immediately exiting instead of allowing the memory access and continuing. Chapter 9 discusses more of Rust’s error handling and how you can write readable, safe code that neither panics nor allows invalid memory access.