泛型数据类型
Generic Data Types
我们使用泛型来为函数签名或结构体等项创建定义,然后我们可以将其与许多不同的具体数据类型一起使用。让我们首先看看如何使用泛型定义函数、结构体、枚举和方法。然后,我们将讨论泛型如何影响代码性能。
We use generics to create definitions for items like function signatures or structs, which we can then use with many different concrete data types. Let’s first look at how to define functions, structs, enums, and methods using generics. Then, we’ll discuss how generics affect code performance.
在函数定义中
In Function Definitions
当定义一个使用泛型的函数时,我们将泛型放在函数签名中通常指定参数和返回值数据类型的地方。这样做可以使我们的代码更灵活,并为函数的调用者提供更多功能,同时防止代码重复。
When defining a function that uses generics, we place the generics in the signature of the function where we would usually specify the data types of the parameters and return value. Doing so makes our code more flexible and provides more functionality to callers of our function while preventing code duplication.
继续我们的 largest 函数,示例 10-4 展示了两个都在切片中寻找最大值的函数。然后我们将把它们合并成一个使用泛型的函数。
Continuing with our largest function, Listing 10-4 shows two functions that
both find the largest value in a slice. We’ll then combine these into a single
function that uses generics.
fn largest_i32(list: &[i32]) -> &i32 {
let mut largest = &list[0];
for item in list {
if item > largest {
largest = item;
}
}
largest
}
fn largest_char(list: &[char]) -> &char {
let mut largest = &list[0];
for item in list {
if item > largest {
largest = item;
}
}
largest
}
fn main() {
let number_list = vec![34, 50, 25, 100, 65];
let result = largest_i32(&number_list);
println!("The largest number is {result}");
assert_eq!(*result, 100);
let char_list = vec!['y', 'm', 'a', 'q'];
let result = largest_char(&char_list);
println!("The largest char is {result}");
assert_eq!(*result, 'y');
}
largest_i32 函数是我们在示例 10-3 中提取的,用于寻找切片中最大的 i32。largest_char 函数寻找切片中最大的 char。这两个函数体具有相同的代码,所以让我们通过在单个函数中引入泛型类型参数来消除重复。
The largest_i32 function is the one we extracted in Listing 10-3 that finds
the largest i32 in a slice. The largest_char function finds the largest
char in a slice. The function bodies have the same code, so let’s eliminate
the duplication by introducing a generic type parameter in a single function.
为了在新的单个函数中参数化类型,我们需要为类型参数命名,就像我们为函数的数值参数命名一样。你可以使用任何标识符作为类型参数名称。但我们将使用 T,因为按照惯例,Rust 中的类型参数名称都很短,通常只有一个字母,而且 Rust 的类型命名约定是 UpperCamelCase(大驼峰式)。作为 type 的缩写,T 是大多数 Rust 程序员的默认选择。
To parameterize the types in a new single function, we need to name the type
parameter, just as we do for the value parameters to a function. You can use
any identifier as a type parameter name. But we’ll use T because, by
convention, type parameter names in Rust are short, often just one letter, and
Rust’s type-naming convention is UpperCamelCase. Short for type, T is the
default choice of most Rust programmers.
当我们在函数体中使用参数时,必须在签名中声明参数名,以便编译器知道该名称的含义。同样地,当我们在函数签名中使用类型参数名时,必须在使用它之前声明该类型参数名。为了定义泛型 largest 函数,我们将类型名称声明放在尖括号 <> 中,位于函数名和参数列表之间,如下所示:
When we use a parameter in the body of the function, we have to declare the
parameter name in the signature so that the compiler knows what that name
means. Similarly, when we use a type parameter name in a function signature, we
have to declare the type parameter name before we use it. To define the generic
largest function, we place type name declarations inside angle brackets,
<>, between the name of the function and the parameter list, like this:
fn largest<T>(list: &[T]) -> &T {
我们将此定义读作“函数 largest 对某种类型 T 是泛型的”。该函数有一个名为 list 的参数,它是一个类型为 T 的值的切片。largest 函数将返回对相同类型 T 的值的引用。
We read this definition as “The function largest is generic over some type
T.” This function has one parameter named list, which is a slice of values
of type T. The largest function will return a reference to a value of the
same type T.
示例 10-5 展示了在签名中使用泛型数据类型的合并后的 largest 函数定义。该示例还展示了我们如何使用 i32 值的切片或 char 值的切片来调用该函数。请注意,这段代码目前还无法编译。
Listing 10-5 shows the combined largest function definition using the generic
data type in its signature. The listing also shows how we can call the function
with either a slice of i32 values or char values. Note that this code won’t
compile yet.
fn largest<T>(list: &[T]) -> &T {
let mut largest = &list[0];
for item in list {
if item > largest {
largest = item;
}
}
largest
}
fn main() {
let number_list = vec![34, 50, 25, 100, 65];
let result = largest(&number_list);
println!("The largest number is {result}");
let char_list = vec!['y', 'm', 'a', 'q'];
let result = largest(&char_list);
println!("The largest char is {result}");
}
如果我们现在编译这段代码,我们会得到这个错误:
If we compile this code right now, we’ll get this error:
$ cargo run
Compiling chapter10 v0.1.0 (file:///projects/chapter10)
error[E0369]: binary operation `>` cannot be applied to type `&T`
--> src/main.rs:5:17
|
5 | if item > largest {
| ---- ^ ------- &T
| |
| &T
|
help: consider restricting type parameter `T` with trait `PartialOrd`
|
1 | fn largest<T: std::cmp::PartialOrd>(list: &[T]) -> &T {
| ++++++++++++++++++++++
For more information about this error, try `rustc --explain E0369`.
error: could not compile `chapter10` (bin "chapter10") due to 1 previous error
帮助文本提到了 std::cmp::PartialOrd,这是一个 Trait,我们将在下一节讨论 Trait。目前,请记住这个错误说明 largest 的函数体不适用于 T 可能代表的所有类型。因为我们想在函数体中比较类型 T 的值,所以我们只能使用那些其值可以排序的类型。为了启用比较,标准库提供了 std::cmp::PartialOrd Trait,你可以在类型上实现它(有关此 Trait 的更多信息,请参阅附录 C)。为了修复示例 10-5,我们可以遵循帮助文本的建议,将 T 的有效类型限制为仅实现 PartialOrd 的类型。这样该示例就可以编译了,因为标准库为 i32 和 char 都实现了 PartialOrd。
The help text mentions std::cmp::PartialOrd, which is a trait, and we’re
going to talk about traits in the next section. For now, know that this error
states that the body of largest won’t work for all possible types that T
could be. Because we want to compare values of type T in the body, we can
only use types whose values can be ordered. To enable comparisons, the standard
library has the std::cmp::PartialOrd trait that you can implement on types
(see Appendix C for more on this trait). To fix Listing 10-5, we can follow the
help text’s suggestion and restrict the types valid for T to only those that
implement PartialOrd. The listing will then compile, because the standard
library implements PartialOrd on both i32 and char.
在结构体定义中
In Struct Definitions
我们还可以使用 <> 语法定义结构体,以便在一个或多个字段中使用泛型类型参数。示例 10-6 定义了一个 Point<T> 结构体,用于保存任何类型的 x 和 y 坐标值。
We can also define structs to use a generic type parameter in one or more
fields using the <> syntax. Listing 10-6 defines a Point<T> struct to hold
x and y coordinate values of any type.
struct Point<T> {
x: T,
y: T,
}
fn main() {
let integer = Point { x: 5, y: 10 };
let float = Point { x: 1.0, y: 4.0 };
}
在结构体定义中使用泛型的语法与函数定义中使用的语法相似。首先,我们在结构体名称后面紧跟的尖括号内声明类型参数的名称。然后,在结构体定义中使用泛型类型,而原本我们会指定具体的数据类型。
The syntax for using generics in struct definitions is similar to that used in function definitions. First, we declare the name of the type parameter inside angle brackets just after the name of the struct. Then, we use the generic type in the struct definition where we would otherwise specify concrete data types.
请注意,因为我们只使用了一个泛型类型来定义 Point<T>,所以这个定义表示 Point<T> 结构体对某种类型 T 是泛型的,并且字段 x 和 y 都是 相同的类型,无论该类型是什么。如果我们创建了一个具有不同类型值的 Point<T> 实例,如示例 10-7 所示,我们的代码将无法编译。
Note that because we’ve used only one generic type to define Point<T>, this
definition says that the Point<T> struct is generic over some type T, and
the fields x and y are both that same type, whatever that type may be. If
we create an instance of a Point<T> that has values of different types, as in
Listing 10-7, our code won’t compile.
struct Point<T> {
x: T,
y: T,
}
fn main() {
let wont_work = Point { x: 5, y: 4.0 };
}
在这个例子中,当我们为 x 分配整数值 5 时,我们让编译器知道对于 Point<T> 的这个实例,泛型 T 将是一个整数。然后,当我们为 y 指定 4.0 时(我们已经定义 y 与 x 类型相同),我们将得到如下所示的类型不匹配错误:
In this example, when we assign the integer value 5 to x, we let the
compiler know that the generic type T will be an integer for this instance of
Point<T>. Then, when we specify 4.0 for y, which we’ve defined to have
the same type as x, we’ll get a type mismatch error like this:
$ cargo run
Compiling chapter10 v0.1.0 (file:///projects/chapter10)
error[E0308]: mismatched types
--> src/main.rs:7:38
|
7 | let wont_work = Point { x: 5, y: 4.0 };
| ^^^ expected integer, found floating-point number
For more information about this error, try `rustc --explain E0308`.
error: could not compile `chapter10` (bin "chapter10") due to 1 previous error
为了定义一个 Point 结构体,其中 x 和 y 都是泛型但可以具有不同的类型,我们可以使用多个泛型类型参数。例如,在示例 10-8 中,我们将 Point 的定义更改为对类型 T 和 U 是泛型的,其中 x 是类型 T,y 是类型 U。
To define a Point struct where x and y are both generics but could have
different types, we can use multiple generic type parameters. For example, in
Listing 10-8, we change the definition of Point to be generic over types T
and U where x is of type T and y is of type U.
struct Point<T, U> {
x: T,
y: U,
}
fn main() {
let both_integer = Point { x: 5, y: 10 };
let both_float = Point { x: 1.0, y: 4.0 };
let integer_and_float = Point { x: 5, y: 4.0 };
}
现在展示的所有 Point 实例都被允许了!你可以在定义中使用任意数量的泛型类型参数,但使用超过几个会使你的代码难以阅读。如果你发现你的代码中需要大量的泛型类型,这可能表明你的代码需要重新重构为更小的部分。
Now all the instances of Point shown are allowed! You can use as many generic
type parameters in a definition as you want, but using more than a few makes
your code hard to read. If you’re finding you need lots of generic types in
your code, it could indicate that your code needs restructuring into smaller
pieces.
在枚举定义中
In Enum Definitions
正如我们在结构体中所做的那样,我们可以定义枚举在其变体中持有泛型数据类型。让我们再看看标准库提供的 Option<T> 枚举,我们在第 6 章中用到过它:
As we did with structs, we can define enums to hold generic data types in their
variants. Let’s take another look at the Option<T> enum that the standard
library provides, which we used in Chapter 6:
#![allow(unused)]
fn main() {
enum Option<T> {
Some(T),
None,
}
}
这个定义现在对你来说应该更有意义了。如你所见,Option<T> 枚举对类型 T 是泛型的,它有两个变体:Some 持有一个类型为 T 的值,以及一个不持有任何值的 None 变体。通过使用 Option<T> 枚举,我们可以表达可选值的抽象概念,并且由于 Option<T> 是泛型的,无论可选值的类型是什么,我们都可以使用这种抽象。
This definition should now make more sense to you. As you can see, the
Option<T> enum is generic over type T and has two variants: Some, which
holds one value of type T, and a None variant that doesn’t hold any value.
By using the Option<T> enum, we can express the abstract concept of an
optional value, and because Option<T> is generic, we can use this abstraction
no matter what the type of the optional value is.
枚举也可以使用多个泛型类型。我们在第 9 章中使用的 Result 枚举的定义就是一个例子:
Enums can use multiple generic types as well. The definition of the Result
enum that we used in Chapter 9 is one example:
#![allow(unused)]
fn main() {
enum Result<T, E> {
Ok(T),
Err(E),
}
}
Result 枚举对两个类型 T 和 E 是泛型的,并有两个变体:Ok 持有一个类型为 T 的值,Err 持有一个类型为 E 的值。这个定义使得在任何我们有可能会成功(返回某种类型 T 的值)或失败(返回某种类型 E 的错误)的操作的地方,都可以方便地使用 Result 枚举。事实上,这就是我们在示例 9-3 中用于打开文件的方法,其中成功打开文件时 T 被填充为 std::fs::File 类型,而在打开文件出现问题时 E 被填充为 std::io::Error 类型。
The Result enum is generic over two types, T and E, and has two variants:
Ok, which holds a value of type T, and Err, which holds a value of type
E. This definition makes it convenient to use the Result enum anywhere we
have an operation that might succeed (return a value of some type T) or fail
(return an error of some type E). In fact, this is what we used to open a
file in Listing 9-3, where T was filled in with the type std::fs::File when
the file was opened successfully and E was filled in with the type
std::io::Error when there were problems opening the file.
当你识别出代码中存在多个结构体或枚举定义,而它们仅在所持有的值的类型上有所不同时,你可以通过改用泛型类型来避免重复。
When you recognize situations in your code with multiple struct or enum definitions that differ only in the types of the values they hold, you can avoid duplication by using generic types instead.
在方法定义中
In Method Definitions
我们可以在结构体和枚举上实现方法(正如我们在第 5 章所做的那样),并可以在其定义中使用泛型类型。示例 10-9 展示了我们在示例 10-6 中定义的 Point<T> 结构体,其上实现了一个名为 x 的方法。
We can implement methods on structs and enums (as we did in Chapter 5) and use
generic types in their definitions too. Listing 10-9 shows the Point<T>
struct we defined in Listing 10-6 with a method named x implemented on it.
struct Point<T> {
x: T,
y: T,
}
impl<T> Point<T> {
fn x(&self) -> &T {
&self.x
}
}
fn main() {
let p = Point { x: 5, y: 10 };
println!("p.x = {}", p.x());
}
在这里,我们在 Point<T> 上定义了一个名为 x 的方法,它返回对字段 x 中数据的引用。
Here, we’ve defined a method named x on Point<T> that returns a reference
to the data in the field x.
请注意,我们必须在 impl 之后立即声明 T,以便我们可以使用 T 来指定我们正在为类型 Point<T> 实现方法。通过在 impl 之后将 T 声明为泛型类型,Rust 可以识别出 Point 尖括号中的类型是泛型类型而不是具体类型。我们可以为这个泛型参数选择一个与结构体定义中声明的泛型参数不同的名称,但使用相同的名称是惯例。如果你在一个声明了泛型类型的 impl 块中编写方法,该方法将被定义在该类型的任何实例上,无论最终替换泛型类型的具体类型是什么。
Note that we have to declare T just after impl so that we can use T to
specify that we’re implementing methods on the type Point<T>. By declaring
T as a generic type after impl, Rust can identify that the type in the
angle brackets in Point is a generic type rather than a concrete type. We
could have chosen a different name for this generic parameter than the generic
parameter declared in the struct definition, but using the same name is
conventional. If you write a method within an impl that declares a generic
type, that method will be defined on any instance of the type, no matter what
concrete type ends up substituting for the generic type.
在为类型定义方法时,我们还可以指定对泛型类型的约束。例如,我们可以只在 Point<f32> 实例上实现方法,而不是在具有任何泛型类型的 Point<T> 实例上实现。在示例 10-10 中,我们使用了具体类型 f32,这意味着我们不在 impl 之后声明任何类型。
We can also specify constraints on generic types when defining methods on the
type. We could, for example, implement methods only on Point<f32> instances
rather than on Point<T> instances with any generic type. In Listing 10-10, we
use the concrete type f32, meaning we don’t declare any types after impl.
struct Point<T> {
x: T,
y: T,
}
impl<T> Point<T> {
fn x(&self) -> &T {
&self.x
}
}
impl Point<f32> {
fn distance_from_origin(&self) -> f32 {
(self.x.powi(2) + self.y.powi(2)).sqrt()
}
}
fn main() {
let p = Point { x: 5, y: 10 };
println!("p.x = {}", p.x());
}
这段代码意味着 Point<f32> 类型将拥有一个 distance_from_origin 方法;而其他 T 不是 f32 类型的 Point<T> 实例则不会定义此方法。该方法测量点到坐标 (0.0, 0.0) 的距离,并使用了仅对浮点类型可用的数学运算。
This code means the type Point<f32> will have a distance_from_origin
method; other instances of Point<T> where T is not of type f32 will not
have this method defined. The method measures how far our point is from the
point at coordinates (0.0, 0.0) and uses mathematical operations that are
available only for floating-point types.
结构体定义中的泛型类型参数并不总是与你在该结构体的方法签名中使用的相同。示例 10-11 在 Point 结构体中使用了泛型类型 X1 和 Y1,在 mixup 方法签名中使用了 X2 和 Y2,以使示例更清晰。该方法使用来自 self Point(类型为 X1)的 x 值和来自传入 Point(类型为 Y2)的 y 值创建一个新的 Point 实例。
Generic type parameters in a struct definition aren’t always the same as those
you use in that same struct’s method signatures. Listing 10-11 uses the generic
types X1 and Y1 for the Point struct and X2 and Y2 for the mixup
method signature to make the example clearer. The method creates a new Point
instance with the x value from the self Point (of type X1) and the y
value from the passed-in Point (of type Y2).
struct Point<X1, Y1> {
x: X1,
y: Y1,
}
impl<X1, Y1> Point<X1, Y1> {
fn mixup<X2, Y2>(self, other: Point<X2, Y2>) -> Point<X1, Y2> {
Point {
x: self.x,
y: other.y,
}
}
}
fn main() {
let p1 = Point { x: 5, y: 10.4 };
let p2 = Point { x: "Hello", y: 'c' };
let p3 = p1.mixup(p2);
println!("p3.x = {}, p3.y = {}", p3.x, p3.y);
}
在 main 中,我们定义了一个 Point,其 x 为 i32(值为 5),y 为 f64(值为 10.4)。p2 变量是一个 Point 结构体,其 x 为字符串切片(值为 "Hello"),y 为 char(值为 c)。在 p1 上调用 mixup 并传入参数 p2 会得到 p3,其 x 将是 i32 类型,因为 x 来自 p1。p3 变量的 y 将是 char 类型,因为 y 来自 p2。println! 宏调用将打印 p3.x = 5, p3.y = c。
In main, we’ve defined a Point that has an i32 for x (with value 5)
and an f64 for y (with value 10.4). The p2 variable is a Point struct
that has a string slice for x (with value "Hello") and a char for y
(with value c). Calling mixup on p1 with the argument p2 gives us p3,
which will have an i32 for x because x came from p1. The p3 variable
will have a char for y because y came from p2. The println! macro
call will print p3.x = 5, p3.y = c.
此示例的目的是演示一种情况,其中一些泛型参数在 impl 中声明,而另一些在方法定义中声明。在这里,泛型参数 X1 和 Y1 在 impl 之后声明,因为它们与结构体定义相对应。泛型参数 X2 和 Y2 在 fn mixup 之后声明,因为它们只与该方法相关。
The purpose of this example is to demonstrate a situation in which some generic
parameters are declared with impl and some are declared with the method
definition. Here, the generic parameters X1 and Y1 are declared after
impl because they go with the struct definition. The generic parameters X2
and Y2 are declared after fn mixup because they’re only relevant to the
method.
使用泛型的代码性能
Performance of Code Using Generics
你可能想知道使用泛型类型参数是否会产生运行时开销。好消息是,使用泛型类型不会使你的程序运行速度比使用具体类型慢。
You might be wondering whether there is a runtime cost when using generic type parameters. The good news is that using generic types won’t make your program run any slower than it would with concrete types.
Rust 通过在编译时对使用泛型的代码执行单态化(monomorphization)来实现这一点。单态化 是通过填充编译时使用的具体类型,将泛型代码转换为特定代码的过程。在此过程中,编译器执行的操作与我们在示例 10-5 中创建泛型函数所采取的步骤相反:编译器查看所有调用泛型代码的地方,并为泛型代码被调用时使用的具体类型生成代码。
Rust accomplishes this by performing monomorphization of the code using generics at compile time. Monomorphization is the process of turning generic code into specific code by filling in the concrete types that are used when compiled. In this process, the compiler does the opposite of the steps we used to create the generic function in Listing 10-5: The compiler looks at all the places where generic code is called and generates code for the concrete types the generic code is called with.
让我们通过使用标准库的泛型 Option<T> 枚举来看看它是如何工作的:
Let’s look at how this works by using the standard library’s generic
Option<T> enum:
#![allow(unused)]
fn main() {
let integer = Some(5);
let float = Some(5.0);
}
当 Rust 编译这段代码时,它会执行单态化。在这个过程中,编译器读取在 Option<T> 实例中使用的值,并识别出两种 Option<T>:一种是 i32,另一种是 f64。因此,它将 Option<T> 的泛型定义展开为针对 i32 和 f64 特化的两个定义,从而将泛型定义替换为具体的定义。
When Rust compiles this code, it performs monomorphization. During that
process, the compiler reads the values that have been used in Option<T>
instances and identifies two kinds of Option<T>: One is i32 and the other
is f64. As such, it expands the generic definition of Option<T> into two
definitions specialized to i32 and f64, thereby replacing the generic
definition with the specific ones.
单态化版本的代码看起来类似于下面这样(编译器使用的名称与我们此处用于说明的名称不同):
The monomorphized version of the code looks similar to the following (the compiler uses different names than what we’re using here for illustration):
enum Option_i32 {
Some(i32),
None,
}
enum Option_f64 {
Some(f64),
None,
}
fn main() {
let integer = Option_i32::Some(5);
let float = Option_f64::Some(5.0);
}
泛型的 Option<T> 被编译器创建的具体定义所取代。因为 Rust 会将泛型代码编译为在每个实例中指定类型的代码,所以我们使用泛型不需要付出运行时开销。当代码运行时,它的表现就像我们手动复制了每个定义一样。单态化的过程使得 Rust 的泛型在运行时极其高效。
The generic Option<T> is replaced with the specific definitions created by
the compiler. Because Rust compiles generic code into code that specifies the
type in each instance, we pay no runtime cost for using generics. When the code
runs, it performs just as it would if we had duplicated each definition by
hand. The process of monomorphization makes Rust’s generics extremely efficient
at runtime.