Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

不安全 Rust

Unsafe Rust

到目前为止,我们讨论的所有代码都在编译时强制执行了 Rust 的内存安全保证。然而,Rust 内部隐藏了第二种不强制执行这些内存安全保证的语言:它被称为不安全 Rust (unsafe Rust),其工作方式与普通 Rust 相同,但赋予了我们额外的超能力。

All the code we’ve discussed so far has had Rust’s memory safety guarantees enforced at compile time. However, Rust has a second language hidden inside it that doesn’t enforce these memory safety guarantees: It’s called unsafe Rust and works just like regular Rust but gives us extra superpowers.

不安全 Rust 存在的原因是,静态分析本质上是保守的。当编译器尝试确定代码是否维护保证时,与其接受一些无效程序,不如拒绝一些有效的程序。尽管代码可能没问题,但如果 Rust 编译器没有足够的信息来确信,它就会拒绝该代码。在这种情况下,你可以使用不安全代码来告诉编译器:“相信我,我知道自己在做什么。” 但要警告你,使用不安全 Rust 的风险由你自己承担:如果你错误地使用了不安全代码,可能会由于内存不安全性(如空指针解引用)而导致问题。

Unsafe Rust exists because, by nature, static analysis is conservative. When the compiler tries to determine whether or not code upholds the guarantees, it’s better for it to reject some valid programs than to accept some invalid programs. Although the code might be okay, if the Rust compiler doesn’t have enough information to be confident, it will reject the code. In these cases, you can use unsafe code to tell the compiler, “Trust me, I know what I’m doing.” Be warned, however, that you use unsafe Rust at your own risk: If you use unsafe code incorrectly, problems can occur due to memory unsafety, such as null pointer dereferencing.

Rust 拥有不安全这一面的另一个原因是,底层的计算机硬件本身就是不安全的。如果 Rust 不允许你执行不安全的操作,你就无法完成某些任务。Rust 需要允许你进行底层系统编程,例如直接与操作系统交互,甚至编写你自己的操作系统。进行底层系统编程是该语言的目标之一。让我们探索一下我们可以用不安全 Rust 做什么以及如何去做。

Another reason Rust has an unsafe alter ego is that the underlying computer hardware is inherently unsafe. If Rust didn’t let you do unsafe operations, you couldn’t do certain tasks. Rust needs to allow you to do low-level systems programming, such as directly interacting with the operating system or even writing your own operating system. Working with low-level systems programming is one of the goals of the language. Let’s explore what we can do with unsafe Rust and how to do it.

行使不安全超能力

Performing Unsafe Superpowers

要切换到不安全 Rust,请使用 unsafe 关键字,然后开始一个包含不安全代码的新代码块。在不安全 Rust 中你可以采取五个在安全 Rust 中不能采取的行动,我们称之为不安全超能力。这些超能力包括:

To switch to unsafe Rust, use the unsafe keyword and then start a new block that holds the unsafe code. You can take five actions in unsafe Rust that you can’t in safe Rust, which we call unsafe superpowers. Those superpowers include the ability to:

  1. 解引用原生指针。

  2. 调用不安全函数或方法。

  3. 访问或修改可变的静态变量。

  4. 实现不安全 trait。

  5. 访问 union(联合体)的字段。

  6. Dereference a raw pointer.

  7. Call an unsafe function or method.

  8. Access or modify a mutable static variable.

  9. Implement an unsafe trait.

  10. Access fields of unions.

重要的是要理解 unsafe 并没有关闭借用检查器或禁用 Rust 的任何其他安全检查:如果你在不安全代码中使用引用,它仍然会被检查。unsafe 关键字只允许你访问这五个随后不会由编译器进行内存安全检查的功能。在不安全块内部,你仍然可以获得一定程度的安全保障。

It’s important to understand that unsafe doesn’t turn off the borrow checker or disable any of Rust’s other safety checks: If you use a reference in unsafe code, it will still be checked. The unsafe keyword only gives you access to these five features that are then not checked by the compiler for memory safety. You’ll still get some degree of safety inside an unsafe block.

此外,unsafe 并不意味着块内的代码必然是危险的,或者它一定会产生内存安全问题:其意图是作为程序员,你要确保 unsafe 块内的代码将以有效的方式访问内存。

In addition, unsafe does not mean the code inside the block is necessarily dangerous or that it will definitely have memory safety problems: The intent is that as the programmer, you’ll ensure that the code inside an unsafe block will access memory in a valid way.

人非圣贤,孰能无过,错误总会发生,但通过要求这五种不安全操作必须位于带有 unsafe 注解的块中,你就会知道任何与内存安全相关的错误必须在 unsafe 块内。请保持 unsafe 块尽可能小;以后当你调查内存 bug 时,你会对此心存感激。

People are fallible and mistakes will happen, but by requiring these five unsafe operations to be inside blocks annotated with unsafe, you’ll know that any errors related to memory safety must be within an unsafe block. Keep unsafe blocks small; you’ll be thankful later when you investigate memory bugs.

为了尽可能地隔离不安全代码,最好将此类代码封装在安全抽象中并提供安全的 API,我们将在本章稍后检查不安全函数和方法时讨论这一点。标准库的部分内容被实现为经过审计的不安全代码之上的安全抽象。将不安全代码包装在安全抽象中可以防止 unsafe 的使用泄露到你或你的用户可能想要使用通过 unsafe 代码实现的功能的所有地方,因为使用安全抽象是安全的。

To isolate unsafe code as much as possible, it’s best to enclose such code within a safe abstraction and provide a safe API, which we’ll discuss later in the chapter when we examine unsafe functions and methods. Parts of the standard library are implemented as safe abstractions over unsafe code that has been audited. Wrapping unsafe code in a safe abstraction prevents uses of unsafe from leaking out into all the places that you or your users might want to use the functionality implemented with unsafe code, because using a safe abstraction is safe.

让我们依次看看这五种不安全超能力。我们还将研究一些为不安全代码提供安全接口的抽象。

Let’s look at each of the five unsafe superpowers in turn. We’ll also look at some abstractions that provide a safe interface to unsafe code.

解引用原生指针

Dereferencing a Raw Pointer

在第 4 章的“悬垂引用”部分,我们提到过编译器会确保引用总是有效的。不安全 Rust 有两种类似于引用的新类型,称为原生指针 (raw pointers)。与引用一样,原生指针可以是不可变的或可变的,分别写作 *const T*mut T。这里的星号不是解引用运算符;它是类型名称的一部分。在原生指针的语境下,“不可变”意味着指针在被解引用后不能直接赋值。

In Chapter 4, in the “Dangling References” section, we mentioned that the compiler ensures that references are always valid. Unsafe Rust has two new types called raw pointers that are similar to references. As with references, raw pointers can be immutable or mutable and are written as *const T and *mut T, respectively. The asterisk isn’t the dereference operator; it’s part of the type name. In the context of raw pointers, immutable means that the pointer can’t be directly assigned to after being dereferenced.

与引用和智能指针不同,原生指针:

Different from references and smart pointers, raw pointers:

  • 允许通过同时拥有指向相同位置的不可变和可变指针,或多个可变指针来忽略借用规则

  • 不保证指向有效的内存

  • 允许为空(null)

  • 不实现任何自动清理

  • Are allowed to ignore the borrowing rules by having both immutable and mutable pointers or multiple mutable pointers to the same location

  • Aren’t guaranteed to point to valid memory

  • Are allowed to be null

  • Don’t implement any automatic cleanup

通过选择不让 Rust 强制执行这些保证,你可以放弃保证安全性,以换取更好的性能,或与 Rust 保证不适用的另一种语言或硬件进行交互的能力。

By opting out of having Rust enforce these guarantees, you can give up guaranteed safety in exchange for greater performance or the ability to interface with another language or hardware where Rust’s guarantees don’t apply.

示例 20-1 展示了如何创建一个不可变和可变的原生指针。

Listing 20-1 shows how to create an immutable and a mutable raw pointer.

fn main() {
    let mut num = 5;

    let r1 = &raw const num;
    let r2 = &raw mut num;
}

请注意,我们在这段代码中没有包含 unsafe 关键字。我们可以在安全代码中创建原生指针;只是不能在不安全块之外解引用原生指针,稍后你就会看到。

Notice that we don’t include the unsafe keyword in this code. We can create raw pointers in safe code; we just can’t dereference raw pointers outside an unsafe block, as you’ll see in a bit.

我们通过使用原生借用运算符创建了原生指针:&raw const num 创建了一个 *const i32 不可变原生指针,而 &raw mut num 创建了一个 *mut i32 可变原生指针。因为我们直接从局部变量创建了它们,所以我们知道这些特定的原生指针是有效的,但我们不能对任何原生指针都做这样的假设。

We’ve created raw pointers by using the raw borrow operators: &raw const num creates a *const i32 immutable raw pointer, and &raw mut num creates a *mut i32 mutable raw pointer. Because we created them directly from a local variable, we know these particular raw pointers are valid, but we can’t make that assumption about just any raw pointer.

为了演示这一点,接下来我们将使用关键字 as 来强制转换一个值,而不是使用原生借用运算符,来创建一个我们无法确定其有效性的原生指针。示例 20-2 展示了如何创建一个指向内存中任意位置的原生指针。尝试使用任意内存是未定义的:该地址可能有数据,也可能没有,编译器可能会优化代码使其不进行内存访问,或者程序可能会因分段错误(segmentation fault)而终止。通常没有充分的理由编写这样的代码,尤其是在可以使用原生借用运算符的情况下,但这是可能的。

To demonstrate this, next we’ll create a raw pointer whose validity we can’t be so certain of, using the keyword as to cast a value instead of using the raw borrow operator. Listing 20-2 shows how to create a raw pointer to an arbitrary location in memory. Trying to use arbitrary memory is undefined: There might be data at that address or there might not, the compiler might optimize the code so that there is no memory access, or the program might terminate with a segmentation fault. Usually, there is no good reason to write code like this, especially in cases where you can use a raw borrow operator instead, but it is possible.

fn main() {
    let address = 0x012345usize;
    let r = address as *const i32;
}

回想一下,我们可以在安全代码中创建原生指针,但不能解引用原生指针并读取所指向的数据。在示例 20-3 中,我们在一个需要 unsafe 块的原生指针上使用了解引用运算符 *

Recall that we can create raw pointers in safe code, but we can’t dereference raw pointers and read the data being pointed to. In Listing 20-3, we use the dereference operator * on a raw pointer that requires an unsafe block.

fn main() {
    let mut num = 5;

    let r1 = &raw const num;
    let r2 = &raw mut num;

    unsafe {
        println!("r1 is: {}", *r1);
        println!("r2 is: {}", *r2);
    }
}

创建指针没有害处;只有当我们尝试访问它指向的值时,我们才可能最终处理一个无效的值。

Creating a pointer does no harm; it’s only when we try to access the value that it points at that we might end up dealing with an invalid value.

还要注意,在示例 20-1 和 20-3 中,我们创建了指向相同内存位置(即存储 num 的位置)的 *const i32*mut i32 原生指针。如果我们转而尝试创建指向 num 的不可变和可变引用,代码将无法编译,因为 Rust 的所有权规则不允许在存在任何不可变引用的同时存在可变引用。使用原生指针,我们可以创建指向同一位置的可变指针和不可变指针,并通过可变指针更改数据,从而可能造成数据竞争。请务必小心!

Note also that in Listings 20-1 and 20-3, we created *const i32 and *mut i32 raw pointers that both pointed to the same memory location, where num is stored. If we instead tried to create an immutable and a mutable reference to num, the code would not have compiled because Rust’s ownership rules don’t allow a mutable reference at the same time as any immutable references. With raw pointers, we can create a mutable pointer and an immutable pointer to the same location and change data through the mutable pointer, potentially creating a data race. Be careful!

既然有这么多危险,为什么还要使用原生指针呢?一个主要的用例是在与 C 代码接口时,正如你将在下一节中看到的。另一种情况是构建借用检查器无法理解的安全抽象。我们将先介绍不安全函数,然后看一个使用不安全代码的安全抽象示例。

With all of these dangers, why would you ever use raw pointers? One major use case is when interfacing with C code, as you’ll see in the next section. Another case is when building up safe abstractions that the borrow checker doesn’t understand. We’ll introduce unsafe functions and then look at an example of a safe abstraction that uses unsafe code.

调用不安全函数或方法

Calling an Unsafe Function or Method

在不安全块中可以执行的第二类操作是调用不安全函数。不安全函数和方法看起来与普通函数和方法完全一样,但在定义其余部分之前多了一个 unsafe。在这种情况下,unsafe 关键字表示调用此函数时我们需要维护一些要求,因为 Rust 无法保证我们已经满足了这些要求。通过在 unsafe 块内调用不安全函数,我们表明已经阅读了该函数的文档,并承担了维护函数契约的责任。

The second type of operation you can perform in an unsafe block is calling unsafe functions. Unsafe functions and methods look exactly like regular functions and methods, but they have an extra unsafe before the rest of the definition. The unsafe keyword in this context indicates the function has requirements we need to uphold when we call this function, because Rust can’t guarantee we’ve met these requirements. By calling an unsafe function within an unsafe block, we’re saying that we’ve read this function’s documentation and we take responsibility for upholding the function’s contracts.

这是一个名为 dangerous 的不安全函数,它的函数体中没有任何操作:

Here is an unsafe function named dangerous that doesn’t do anything in its body:

fn main() {
    unsafe fn dangerous() {}

    unsafe {
        dangerous();
    }
}

我们必须在一个单独的 unsafe 块中调用 dangerous 函数。如果我们尝试在没有 unsafe 块的情况下调用 dangerous,我们将得到一个错误:

We must call the dangerous function within a separate unsafe block. If we try to call dangerous without the unsafe block, we’ll get an error:

$ cargo run
   Compiling unsafe-example v0.1.0 (file:///projects/unsafe-example)
error[E0133]: call to unsafe function `dangerous` is unsafe and requires unsafe block
 --> src/main.rs:4:5
  |
4 |     dangerous();
  |     ^^^^^^^^^^^ call to unsafe function
  |
  = note: consult the function's documentation for information on how to avoid undefined behavior

For more information about this error, try `rustc --explain E0133`.
error: could not compile `unsafe-example` (bin "unsafe-example") due to 1 previous error

通过 unsafe 块,我们向 Rust 断言我们已经阅读了该函数的文档,理解了如何正确使用它,并已验证我们履行了该函数的契约。

With the unsafe block, we’re asserting to Rust that we’ve read the function’s documentation, we understand how to use it properly, and we’ve verified that we’re fulfilling the contract of the function.

要在 unsafe 函数体内执行不安全操作,你仍然需要像在普通函数内部一样使用 unsafe 块,如果你忘记了,编译器会提醒你。这有助于我们保持 unsafe 块尽可能小,因为整个函数体可能并不都需要不安全操作。

To perform unsafe operations in the body of an unsafe function, you still need to use an unsafe block, just as within a regular function, and the compiler will warn you if you forget. This helps us keep unsafe blocks as small as possible, as unsafe operations may not be needed across the whole function body.

在不安全代码之上创建安全抽象

Creating a Safe Abstraction over Unsafe Code

仅仅因为一个函数包含不安全代码并不意味着我们需要将整个函数标记为不安全。事实上,将不安全代码包装在安全函数中是一种常见的抽象。作为一个例子,让我们研究一下标准库中的 split_at_mut 函数,它需要一些不安全代码。我们将探索如何实现它。这个安全方法定义在可变切片上:它接受一个切片,并根据作为参数给出的索引将其一分为二。示例 20-4 展示了如何使用 split_at_mut

Just because a function contains unsafe code doesn’t mean we need to mark the entire function as unsafe. In fact, wrapping unsafe code in a safe function is a common abstraction. As an example, let’s study the split_at_mut function from the standard library, which requires some unsafe code. We’ll explore how we might implement it. This safe method is defined on mutable slices: It takes one slice and makes it two by splitting the slice at the index given as an argument. Listing 20-4 shows how to use split_at_mut.

fn main() {
    let mut v = vec![1, 2, 3, 4, 5, 6];

    let r = &mut v[..];

    let (a, b) = r.split_at_mut(3);

    assert_eq!(a, &mut [1, 2, 3]);
    assert_eq!(b, &mut [4, 5, 6]);
}

我们不能仅使用安全 Rust 来实现这个函数。一次尝试可能看起来像示例 20-5,它无法编译。为了简单起见,我们将 split_at_mut 实现为一个函数而不是方法,并且仅针对 i32 值的切片而不是泛型 T

We can’t implement this function using only safe Rust. An attempt might look something like Listing 20-5, which won’t compile. For simplicity, we’ll implement split_at_mut as a function rather than a method and only for slices of i32 values rather than for a generic type T.

fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
    let len = values.len();

    assert!(mid <= len);

    (&mut values[..mid], &mut values[mid..])
}

fn main() {
    let mut vector = vec![1, 2, 3, 4, 5, 6];
    let (left, right) = split_at_mut(&mut vector, 3);
}

该函数首先获取切片的总长度。然后,它通过检查给定索引是否小于或等于长度来断言该索引在切片范围内。这种断言意味着,如果我们传递一个大于切片长度的索引进行拆分,该函数将在尝试使用该索引之前 panic。

This function first gets the total length of the slice. Then, it asserts that the index given as a parameter is within the slice by checking whether it’s less than or equal to the length. The assertion means that if we pass an index that is greater than the length to split the slice at, the function will panic before it attempts to use that index.

然后,我们在元组中返回两个可变切片:一个从原始切片的开始到 mid 索引,另一个从 mid 到切片的末尾。

Then, we return two mutable slices in a tuple: one from the start of the original slice to the mid index and another from mid to the end of the slice.

当我们尝试编译示例 20-5 中的代码时,会得到一个错误:

When we try to compile the code in Listing 20-5, we’ll get an error:

$ cargo run
   Compiling unsafe-example v0.1.0 (file:///projects/unsafe-example)
error[E0499]: cannot borrow `*values` as mutable more than once at a time
 --> src/main.rs:6:31
  |
1 | fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
  |                         - let's call the lifetime of this reference `'1`
...
6 |     (&mut values[..mid], &mut values[mid..])
  |     --------------------------^^^^^^--------
  |     |     |                   |
  |     |     |                   second mutable borrow occurs here
  |     |     first mutable borrow occurs here
  |     returning this value requires that `*values` is borrowed for `'1`
  |
  = help: use `.split_at_mut(position)` to obtain two mutable non-overlapping sub-slices

For more information about this error, try `rustc --explain E0499`.
error: could not compile `unsafe-example` (bin "unsafe-example") due to 1 previous error

Rust 的借用检查器无法理解我们正在借用切片的不同部分;它只知道我们两次借用了同一个切片。借用切片的不同部分在根本上是没有问题的,因为这两个切片没有重叠,但 Rust 不够聪明,无法识别这一点。当我们知道代码没问题,但 Rust 不知道时,就是动用不安全代码的时候了。

Rust’s borrow checker can’t understand that we’re borrowing different parts of the slice; it only knows that we’re borrowing from the same slice twice. Borrowing different parts of a slice is fundamentally okay because the two slices aren’t overlapping, but Rust isn’t smart enough to know this. When we know code is okay, but Rust doesn’t, it’s time to reach for unsafe code.

示例 20-6 展示了如何使用 unsafe 块、原生指针以及一些不安全函数的调用来使 split_at_mut 的实现正常工作。

Listing 20-6 shows how to use an unsafe block, a raw pointer, and some calls to unsafe functions to make the implementation of split_at_mut work.

use std::slice;

fn split_at_mut(values: &mut [i32], mid: usize) -> (&mut [i32], &mut [i32]) {
    let len = values.len();
    let ptr = values.as_mut_ptr();

    assert!(mid <= len);

    unsafe {
        (
            slice::from_raw_parts_mut(ptr, mid),
            slice::from_raw_parts_mut(ptr.add(mid), len - mid),
        )
    }
}

fn main() {
    let mut vector = vec![1, 2, 3, 4, 5, 6];
    let (left, right) = split_at_mut(&mut vector, 3);
}

回想一下第 4 章“切片类型”部分,切片是一个指向某些数据的指针和切片的长度。我们使用 len 方法获取切片的长度,使用 as_mut_ptr 方法访问切片的原生指针。在这种情况下,因为我们有一个指向 i32 值的可变切片,as_mut_ptr 返回一个类型为 *mut i32 的原生指针,我们将其存储在变量 ptr 中。

Recall from “The Slice Type” section in Chapter 4 that a slice is a pointer to some data and the length of the slice. We use the len method to get the length of a slice and the as_mut_ptr method to access the raw pointer of a slice. In this case, because we have a mutable slice to i32 values, as_mut_ptr returns a raw pointer with the type *mut i32, which we’ve stored in the variable ptr.

我们保留了 mid 索引在切片范围内的断言。然后进入不安全代码:slice::from_raw_parts_mut 函数接受一个原生指针和一个长度,并创建一个切片。我们使用此函数创建一个从 ptr 开始且长度为 mid 项的切片。然后,我们在 ptr 上调用 add 方法,并以 mid 作为参数,以获得一个从 mid 开始的原生指针,并使用该指针和 mid 之后的剩余项数作为长度创建一个切片。

We keep the assertion that the mid index is within the slice. Then, we get to the unsafe code: The slice::from_raw_parts_mut function takes a raw pointer and a length, and it creates a slice. We use this function to create a slice that starts from ptr and is mid items long. Then, we call the add method on ptr with mid as an argument to get a raw pointer that starts at mid, and we create a slice using that pointer and the remaining number of items after mid as the length.

函数 slice::from_raw_parts_mut 是不安全的,因为它接受一个原生指针,并且必须相信这个指针是有效的。原生指针上的 add 方法也是不安全的,因为它必须相信偏移位置也是一个有效的指针。因此,我们必须在调用 slice::from_raw_parts_mutadd 时加上 unsafe 块,以便调用它们。通过查看代码并添加 mid 必须小于或等于 len 的断言,我们可以判断在 unsafe 块内使用的所有原生指针都将是切片内数据的有效指针。这是对 unsafe 的一次可接受且恰当的使用。

The function slice::from_raw_parts_mut is unsafe because it takes a raw pointer and must trust that this pointer is valid. The add method on raw pointers is also unsafe because it must trust that the offset location is also a valid pointer. Therefore, we had to put an unsafe block around our calls to slice::from_raw_parts_mut and add so that we could call them. By looking at the code and by adding the assertion that mid must be less than or equal to len, we can tell that all the raw pointers used within the unsafe block will be valid pointers to data within the slice. This is an acceptable and appropriate use of unsafe.

请注意,我们不需要将得到的 split_at_mut 函数标记为 unsafe ,并且我们可以从安全 Rust 调用此函数。我们已经为不安全代码创建了一个安全抽象,其函数的实现以安全的方式使用了 unsafe 代码,因为它只从该函数有权访问的数据中创建有效指针。

Note that we don’t need to mark the resultant split_at_mut function as unsafe, and we can call this function from safe Rust. We’ve created a safe abstraction to the unsafe code with an implementation of the function that uses unsafe code in a safe way, because it creates only valid pointers from the data this function has access to.

相比之下,示例 20-7 中 slice::from_raw_parts_mut 的使用在切片被使用时很可能会崩溃。这段代码接受一个任意的内存位置并创建一个长度为 10,000 的切片。

In contrast, the use of slice::from_raw_parts_mut in Listing 20-7 would likely crash when the slice is used. This code takes an arbitrary memory location and creates a slice 10,000 items long.

fn main() {
    use std::slice;

    let address = 0x01234usize;
    let r = address as *mut i32;

    let values: &[i32] = unsafe { slice::from_raw_parts_mut(r, 10000) };
}

我们并不拥有此任意位置的内存,也不能保证此代码创建的切片包含有效的 i32 值。尝试像使用有效切片一样使用 values 会导致未定义行为。

We don’t own the memory at this arbitrary location, and there is no guarantee that the slice this code creates contains valid i32 values. Attempting to use values as though it’s a valid slice results in undefined behavior.

使用 extern 函数调用外部代码

Using extern Functions to Call External Code

有时你的 Rust 代码可能需要与用另一种语言编写的代码交互。为此,Rust 提供了关键字 extern,它有助于创建和使用外部函数接口 (Foreign Function Interface, FFI),这是编程语言定义函数并允许不同的(外部)编程语言调用这些函数的一种方式。

Sometimes your Rust code might need to interact with code written in another language. For this, Rust has the keyword extern that facilitates the creation and use of a Foreign Function Interface (FFI), which is a way for a programming language to define functions and enable a different (foreign) programming language to call those functions.

示例 20-8 演示了如何设置与 C 标准库中的 abs 函数的集成。在 extern 块中声明的函数通常在 Rust 代码中调用是不安全的,因此 extern 块也必须标记为 unsafe。原因在于其他语言不强制执行 Rust 的规则和保证,且 Rust 无法检查它们,因此确保安全性的责任落在了程序员身上。

Listing 20-8 demonstrates how to set up an integration with the abs function from the C standard library. Functions declared within extern blocks are generally unsafe to call from Rust code, so extern blocks must also be marked unsafe. The reason is that other languages don’t enforce Rust’s rules and guarantees, and Rust can’t check them, so responsibility falls on the programmer to ensure safety.

unsafe extern "C" {
    fn abs(input: i32) -> i32;
}

fn main() {
    unsafe {
        println!("Absolute value of -3 according to C: {}", abs(-3));
    }
}

unsafe extern "C" 块中,我们列出了想要调用的另一种语言的外部函数的名称和签名。"C" 部分定义了外部函数使用的应用二进制接口 (Application Binary Interface, ABI):ABI 定义了如何在汇编层面调用该函数。"C" ABI 是最常见的,遵循 C 编程语言的 ABI。关于 Rust 支持的所有 ABI 的信息可以在 Rust 参考手册中找到。

Within the unsafe extern "C" block, we list the names and signatures of external functions from another language we want to call. The "C" part defines which application binary interface (ABI) the external function uses: The ABI defines how to call the function at the assembly level. The "C" ABI is the most common and follows the C programming language’s ABI. Information about all the ABIs Rust supports is available in the Rust Reference.

unsafe extern 块内声明的每一项都隐含地是不安全的。然而,一些 FFI 函数调用起来是安全的。例如,C 标准库中的 abs 函数没有任何内存安全方面的考虑,我们知道它可以被任何 i32 调用。在这种情况下,我们可以使用 safe 关键字来表示这个特定的函数调用是安全的,即使它位于 unsafe extern 块中。一旦我们进行了这种更改,调用它就不再需要 unsafe 块,如示例 20-9 所示。

Every item declared within an unsafe extern block is implicitly unsafe. However, some FFI functions are safe to call. For example, the abs function from C’s standard library does not have any memory safety considerations, and we know it can be called with any i32. In cases like this, we can use the safe keyword to say that this specific function is safe to call even though it is in an unsafe extern block. Once we make that change, calling it no longer requires an unsafe block, as shown in Listing 20-9.

unsafe extern "C" {
    safe fn abs(input: i32) -> i32;
}

fn main() {
    println!("Absolute value of -3 according to C: {}", abs(-3));
}

将函数标记为 safe 并不代表它天生就是安全的!相反,这就像是你向 Rust 做出的一个它是安全的承诺。确保履行这一承诺仍然是你的责任!

Marking a function as safe does not inherently make it safe! Instead, it is like a promise you are making to Rust that it is safe. It is still your responsibility to make sure that promise is kept!

从其他语言调用 Rust 函数

Calling Rust Functions from Other Languages

我们还可以使用 extern 创建一个接口,允许其他语言调用 Rust 函数。我们不需要创建整个 extern 块,而是在相关函数的 fn 关键字之前添加 extern 关键字并指定要使用的 ABI。我们还需要添加一个 #[unsafe(no_mangle)] 注解,告诉 Rust 编译器不要混淆(mangle)此函数的名称。混淆 (Mangling) 是指编译器将我们给函数的名称更改为包含更多信息的不同名称,供编译过程的其他部分使用,但人类可读性较差。每种编程语言的编译器混淆名称的方式略有不同,因此为了让其他语言能够命名 Rust 函数,我们必须禁用 Rust 编译器的名称混淆。这是不安全的,因为如果没有内置的混淆,跨库可能会发生名称冲突,因此我们的责任是确保我们选择的名称在不混淆的情况下导出是安全的。

We can also use extern to create an interface that allows other languages to call Rust functions. Instead of creating a whole extern block, we add the extern keyword and specify the ABI to use just before the fn keyword for the relevant function. We also need to add an #[unsafe(no_mangle)] annotation to tell the Rust compiler not to mangle the name of this function. Mangling is when a compiler changes the name we’ve given a function to a different name that contains more information for other parts of the compilation process to consume but is less human readable. Every programming language compiler mangles names slightly differently, so for a Rust function to be nameable by other languages, we must disable the Rust compiler’s name mangling. This is unsafe because there might be name collisions across libraries without the built-in mangling, so it is our responsibility to make sure the name we choose is safe to export without mangling.

在下面的示例中,我们将 call_from_c 函数设置为可从 C 代码访问,在将其编译为共享库并从 C 链接之后:

In the following example, we make the call_from_c function accessible from C code, after it’s compiled to a shared library and linked from C:

#[unsafe(no_mangle)]
pub extern "C" fn call_from_c() {
    println!("Just called a Rust function from C!");
}

这种 extern 的用法只需要在属性中使用 unsafe ,而不需要在 extern 块上使用。

This usage of extern requires unsafe only in the attribute, not on the extern block.

访问或修改可变静态变量

Accessing or Modifying a Mutable Static Variable

在本书中,我们还没有谈到全局变量,Rust 确实支持全局变量,但 Rust 的所有权规则可能会使其出现问题。如果两个线程正在访问同一个可变全局变量,可能会导致数据竞争。

In this book, we’ve not yet talked about global variables, which Rust does support but which can be problematic with Rust’s ownership rules. If two threads are accessing the same mutable global variable, it can cause a data race.

在 Rust 中,全局变量被称为静态 (static) 变量。示例 20-10 展示了一个以字符串切片为值的静态变量声明和使用的示例。

In Rust, global variables are called static variables. Listing 20-10 shows an example declaration and use of a static variable with a string slice as a value.

static HELLO_WORLD: &str = "Hello, world!";

fn main() {
    println!("value is: {HELLO_WORLD}");
}

静态变量类似于我们在第 3 章“声明常量”部分讨论过的常量。按照惯例,静态变量的名称采用 SCREAMING_SNAKE_CASE。静态变量只能存储具有 'static 生命周期的引用,这意味着 Rust 编译器可以计算出生命周期,我们不需要显式地标注它。访问不可变的静态变量是安全的。

Static variables are similar to constants, which we discussed in the “Declaring Constants” section in Chapter 3. The names of static variables are in SCREAMING_SNAKE_CASE by convention. Static variables can only store references with the 'static lifetime, which means the Rust compiler can figure out the lifetime and we aren’t required to annotate it explicitly. Accessing an immutable static variable is safe.

常量和不可变静态变量之间的一个细微差别是,静态变量中的值在内存中具有固定的地址。使用该值将始终访问相同的数据。另一方面,常量允许在每次使用时复制其数据。另一个区别是静态变量可以是可变的。访问和修改可变静态变量是不安全的。示例 20-11 展示了如何声明、访问和修改名为 COUNTER 的可变静态变量。

A subtle difference between constants and immutable static variables is that values in a static variable have a fixed address in memory. Using the value will always access the same data. Constants, on the other hand, are allowed to duplicate their data whenever they’re used. Another difference is that static variables can be mutable. Accessing and modifying mutable static variables is unsafe. Listing 20-11 shows how to declare, access, and modify a mutable static variable named COUNTER.

static mut COUNTER: u32 = 0;

/// SAFETY: Calling this from more than a single thread at a time is undefined
/// behavior, so you *must* guarantee you only call it from a single thread at
/// a time.
unsafe fn add_to_count(inc: u32) {
    unsafe {
        COUNTER += inc;
    }
}

fn main() {
    unsafe {
        // SAFETY: This is only called from a single thread in `main`.
        add_to_count(3);
        println!("COUNTER: {}", *(&raw const COUNTER));
    }
}

与普通变量一样,我们使用 mut 关键字指定可变性。任何读取或写入 COUNTER 的代码都必须位于 unsafe 块中。示例 20-11 中的代码可以编译并如我们预期的那样打印出 COUNTER: 3,因为它是单线程的。让多个线程访问 COUNTER 很可能会导致数据竞争,因此这是未定义行为。因此,我们需要将整个函数标记为 unsafe 并记录安全限制,以便任何调用该函数的人都知道哪些操作是可以安全执行的。

As with regular variables, we specify mutability using the mut keyword. Any code that reads or writes from COUNTER must be within an unsafe block. The code in Listing 20-11 compiles and prints COUNTER: 3 as we would expect because it’s single threaded. Having multiple threads access COUNTER would likely result in data races, so it is undefined behavior. Therefore, we need to mark the entire function as unsafe and document the safety limitation so that anyone calling the function knows what they are and are not allowed to do safely.

每当我们编写不安全函数时,编写以 SAFETY 开头的注释并解释调用者需要做什么才能安全地调用该函数是一种惯例。同样,每当我们执行不安全操作时,编写以 SAFETY 开头的注释来解释如何维护安全规则也是一种惯例。

Whenever we write an unsafe function, it is idiomatic to write a comment starting with SAFETY and explaining what the caller needs to do to call the function safely. Likewise, whenever we perform an unsafe operation, it is idiomatic to write a comment starting with SAFETY to explain how the safety rules are upheld.

此外,编译器默认会通过编译器 lint 拒绝任何创建指向可变静态变量引用的尝试。你必须通过添加 #[allow(static_mut_refs)] 注解来显式选择不接受该 lint 的保护,或者通过使用其中一个原生借用运算符创建的原生指针来访问可变静态变量。这包括隐式创建引用的情况,例如在此代码清单的 println! 中使用它的情况。要求通过原生指针创建对静态可变变量的引用有助于使使用它们的安全要求更加明显。

Additionally, the compiler will deny by default any attempt to create references to a mutable static variable through a compiler lint. You must either explicitly opt out of that lint’s protections by adding an #[allow(static_mut_refs)] annotation or access the mutable static variable via a raw pointer created with one of the raw borrow operators. That includes cases where the reference is created invisibly, as when it is used in the println! in this code listing. Requiring references to static mutable variables to be created via raw pointers helps make the safety requirements for using them more obvious.

对于全局可访问的可变数据,很难确保没有数据竞争,这就是为什么 Rust 认为可变静态变量是不安全的。在可能的情况下,首选使用第 16 章讨论的并发技术和线程安全智能指针,以便编译器检查来自不同线程的数据访问是否安全。

With mutable data that is globally accessible, it’s difficult to ensure that there are no data races, which is why Rust considers mutable static variables to be unsafe. Where possible, it’s preferable to use the concurrency techniques and thread-safe smart pointers we discussed in Chapter 16 so that the compiler checks that data access from different threads is done safely.

实现不安全 trait

Implementing an Unsafe Trait

我们可以使用 unsafe 来实现一个不安全 trait。当一个 trait 的至少一个方法具有编译器无法验证的某些不变性(invariant)时,该 trait 就是不安全的。我们通过在 trait 之前添加 unsafe 关键字来声明一个 trait 是 unsafe 的,并将 trait 的实现也标记为 unsafe ,如示例 20-12 所示。

We can use unsafe to implement an unsafe trait. A trait is unsafe when at least one of its methods has some invariant that the compiler can’t verify. We declare that a trait is unsafe by adding the unsafe keyword before trait and marking the implementation of the trait as unsafe too, as shown in Listing 20-12.

unsafe trait Foo {
    // methods go here
}

unsafe impl Foo for i32 {
    // method implementations go here
}

fn main() {}

通过使用 unsafe impl,我们承诺我们将维护编译器无法验证的不变性。

By using unsafe impl, we’re promising that we’ll uphold the invariants that the compiler can’t verify.

作为一个例子,回想一下我们在第 16 章“使用 Send 和 Sync 的可扩展并发”部分讨论过的 SendSync 标记 trait:如果我们的类型完全由实现 SendSync 的其他类型组成,编译器会自动实现这些 trait。如果我们实现了一个包含未实现 SendSync 类型(如原生指针)的类型,并且我们想要将该类型标记为 SendSync,我们必须使用 unsafe。Rust 无法验证我们的类型是否维护了可以安全地在线程间发送或从多个线程访问的保证;因此,我们需要手动执行这些检查并使用 unsafe 做出指示。

As an example, recall the Send and Sync marker traits we discussed in the “Extensible Concurrency with Send and Sync section in Chapter 16: The compiler implements these traits automatically if our types are composed entirely of other types that implement Send and Sync. If we implement a type that contains a type that does not implement Send or Sync, such as raw pointers, and we want to mark that type as Send or Sync, we must use unsafe. Rust can’t verify that our type upholds the guarantees that it can be safely sent across threads or accessed from multiple threads; therefore, we need to do those checks manually and indicate as such with unsafe.

访问联合体的字段

Accessing Fields of a Union

最后一个仅在 unsafe 下工作的操作是访问联合体(union)的字段。联合体 (union) 类似于 struct,但在特定实例中一次只使用一个声明的字段。联合体主要用于与 C 代码中的联合体接口。访问联合体字段是不安全的,因为 Rust 无法保证当前存储在联合体实例中的数据的类型。你可以在 Rust 参考手册中了解更多关于联合体的信息。

The final action that works only with unsafe is accessing fields of a union. A union is similar to a struct, but only one declared field is used in a particular instance at one time. Unions are primarily used to interface with unions in C code. Accessing union fields is unsafe because Rust can’t guarantee the type of the data currently being stored in the union instance. You can learn more about unions in the Rust Reference.

使用 Miri 检查不安全代码

Using Miri to Check Unsafe Code

在编写不安全代码时,你可能想要检查所编写的内容是否确实安全且正确。最好的方法之一是使用 Miri,这是一个用于检测未定义行为的官方 Rust 工具。借用检查器是一个在编译时工作的静态 (static) 工具,而 Miri 是一个在运行时工作的动态 (dynamic) 工具。它通过运行你的程序(或其测试套件)来检查你的代码,并检测你何时违反了它所理解的 Rust 工作规则。

When writing unsafe code, you might want to check that what you have written actually is safe and correct. One of the best ways to do that is to use Miri, an official Rust tool for detecting undefined behavior. Whereas the borrow checker is a static tool that works at compile time, Miri is a dynamic tool that works at runtime. It checks your code by running your program, or its test suite, and detecting when you violate the rules it understands about how Rust should work.

使用 Miri 需要 Rust 的 nightly 版本(我们将在附录 G:Rust 是如何开发的以及“Nightly Rust”中详细讨论)。你可以通过输入 rustup +nightly component add miri 同时安装 nightly 版 Rust 和 Miri 工具。这不会改变你项目使用的 Rust 版本;它只是将工具添加到你的系统中,以便你可以随时使用它。你可以通过输入 cargo +nightly miri runcargo +nightly miri test 在项目上运行 Miri。

Using Miri requires a nightly build of Rust (which we talk about more in Appendix G: How Rust is Made and “Nightly Rust”). You can install both a nightly version of Rust and the Miri tool by typing rustup +nightly component add miri. This does not change what version of Rust your project uses; it only adds the tool to your system so you can use it when you want to. You can run Miri on a project by typing cargo +nightly miri run or cargo +nightly miri test.

为了展示这有多大帮助,看看我们对示例 20-7 运行 Miri 时会发生什么。

For an example of how helpful this can be, consider what happens when we run it against Listing 20-7.

$ cargo +nightly miri run
   Compiling unsafe-example v0.1.0 (file:///projects/unsafe-example)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.01s
     Running `file:///home/.rustup/toolchains/nightly/bin/cargo-miri runner target/miri/debug/unsafe-example`
warning: integer-to-pointer cast
 --> src/main.rs:5:13
  |
5 |     let r = address as *mut i32;
  |             ^^^^^^^^^^^^^^^^^^^ integer-to-pointer cast
  |
  = help: this program is using integer-to-pointer casts or (equivalently) `ptr::with_exposed_provenance`, which means that Miri might miss pointer bugs in this program
  = help: see https://doc.rust-lang.org/nightly/std/ptr/fn.with_exposed_provenance.html for more details on that operation
  = help: to ensure that Miri does not miss bugs in your program, use Strict Provenance APIs (https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance, https://crates.io/crates/sptr) instead
  = help: you can then set `MIRIFLAGS=-Zmiri-strict-provenance` to ensure you are not relying on `with_exposed_provenance` semantics
  = help: alternatively, `MIRIFLAGS=-Zmiri-permissive-provenance` disables this warning
  = note: BACKTRACE:
  = note: inside `main` at src/main.rs:5:13: 5:32

error: Undefined Behavior: pointer not dereferenceable: pointer must be dereferenceable for 40000 bytes, but got 0x1234[noalloc] which is a dangling pointer (it has no provenance)
 --> src/main.rs:7:35
  |
7 |     let values: &[i32] = unsafe { slice::from_raw_parts_mut(r, 10000) };
  |                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Undefined Behavior occurred here
  |
  = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
  = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
  = note: BACKTRACE:
  = note: inside `main` at src/main.rs:7:35: 7:70

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to 1 previous error; 1 warning emitted

Miri 正确地警告我们,我们正在将整数转换为指针,这可能是一个问题,但 Miri 无法确定是否存在问题,因为它不知道指针是如何产生的。然后,由于我们在示例 20-7 中有一个悬垂指针导致了未定义行为,Miri 返回了一个错误。感谢 Miri,我们现在知道存在未定义行为的风险,并且可以思考如何使代码安全。在某些情况下,Miri 甚至可以就如何修复错误提供建议。

Miri correctly warns us that we’re casting an integer to a pointer, which might be a problem, but Miri can’t determine whether a problem exists because it doesn’t know how the pointer originated. Then, Miri returns an error where Listing 20-7 has undefined behavior because we have a dangling pointer. Thanks to Miri, we now know there is a risk of undefined behavior, and we can think about how to make the code safe. In some cases, Miri can even make recommendations about how to fix errors.

Miri 并不能捕捉到你在编写不安全代码时可能犯下的所有错误。Miri 是一个动态分析工具,因此它只能捕捉到确实运行的代码中的问题。这意味着你需要结合良好的测试技术来使用它,以增加对所编写的不安全代码的信心。Miri 也不涵盖你的代码可能不健全的每一种可能方式。

Miri doesn’t catch everything you might get wrong when writing unsafe code. Miri is a dynamic analysis tool, so it only catches problems with code that actually gets run. That means you will need to use it in conjunction with good testing techniques to increase your confidence about the unsafe code you have written. Miri also does not cover every possible way your code can be unsound.

换句话说:如果 Miri 确实 捕捉到了问题,你就知道存在 bug;但仅仅因为 Miri 没有 捕捉到 bug 并不意味着没有问题。不过,它能捕捉到很多问题。尝试在本章的其他不安全代码示例上运行它,看看它会说什么!

Put another way: If Miri does catch a problem, you know there’s a bug, but just because Miri doesn’t catch a bug doesn’t mean there isn’t a problem. It can catch a lot, though. Try running it on the other examples of unsafe code in this chapter and see what it says!

你可以在 Miri 的 GitHub 仓库中了解更多信息。

You can learn more about Miri at its GitHub repository.

正确地使用不安全代码

Using Unsafe Code Correctly

使用 unsafe 来行使刚才讨论的五种超能力之一并不是错误的,甚至不被反对,但要确保 unsafe 代码正确是比较困难的,因为编译器无法帮助维护内存安全。当你有理由使用 unsafe 代码时,你可以这样做,并且显式的 unsafe 注解使得在发生问题时更容易追踪问题的根源。每当你编写不安全代码时,你可以使用 Miri 来帮助你更有信心地确保你编写的代码遵守了 Rust 的规则。

Using unsafe to use one of the five superpowers just discussed isn’t wrong or even frowned upon, but it is trickier to get unsafe code correct because the compiler can’t help uphold memory safety. When you have a reason to use unsafe code, you can do so, and having the explicit unsafe annotation makes it easier to track down the source of problems when they occur. Whenever you write unsafe code, you can use Miri to help you be more confident that the code you have written upholds Rust’s rules.

为了更深入地探索如何有效地使用不安全 Rust,请阅读 Rust 的官方 unsafe 指南:The Rustonomicon(Rust 死灵书)。

For a much deeper exploration of how to work effectively with unsafe Rust, read Rust’s official guide for unsafe, The Rustonomicon.