This is where some might say rust feels like you are dealing with an escaped asylum patience. So buckle up, this is going to get weird.
The most common smart pointers in the rust std are: Box, Rc, RefCell, Weak
Box is used for allocating values on the heap. Nothing more and nothing less. This is usefull if…
Using a Box is very easy:
fn main() {
let x = Box::new(5);
println!("x: {:?}", x);
}
This is not very useful yet, but just goes to show that Box is very simple in its essence.
Here is a more realistic and applicable use case:
#[derive(Debug)]
struct Node {
value: i32,
next: Option<Node>
}
fn main() {
let list = Node {
value: 3,
next: None
};
println!("list: {:?}", list);
}
This is a recursive type. That is because you can nest the type in itself (f.e. nested arrays). When you try and run this code, you get the following error:
error[E0072]: recursive type `Node` has infinite size
--> src/main.rs:2:1
|
2 | struct Node {
| ^^^^^^^^^^^ recursive type has infinite size
3 | value: i32,
4 | next: Option<Node>
| ------------ recursive without indirection
|
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `Node` representable
|
4 | next: Box<Option<Node>>
| ++++ +
For more information about this error, try `rustc --explain E0072`.
error: could not compile `rust` due to previous error
"recursive type has inifite size"
is the key sentence here. Thankfully the compiler
already tells us what to do. Wrap the recursive type in a Box<…>. So we get the following code:
#[derive(Debug)]
struct Node {
value: i32,
next: Box<Option<Node>> // <-- box it and ship it
}
fn main() {
let list = Node {
value: 3,
next: Box::new(None)
};
println!("list: {:?}", list);
}
Rc
, short for Reference counted
, is used when you want to enable multiple ownership of a shared
object in memory. At each instance where a new reference is created, the count goes up, and the object
is only cleaned up, if the count goes to 0.
Let’s stick to the JsonValue example
#[derive(Debug)]
struct Node {
value: i32,
next: Box<Option<Node>>
}
fn main() {
let next = Box::new(Some(Node {
value: 5,
next: Box::new(None)
}));
let list = Node {
value: 3,
next: next
};
let list2 = Node {
value: 4,
next: next
};
println!("list: {:?}", list);
println!("list2: {:?}", list2);
}
Here we get this error:
error[E0382]: use of moved value: `next`
--> src/main.rs:19:15
|
8 | let next = Box::new(Some(Node {
| ---- move occurs because `next` has type `Box<Option<Node>>`, which does not implement the `Copy` trait
...
15 | next: next
| ---- value moved here
...
19 | next: next
| ^^^^ value used here after move
For more information about this error, try `rustc --explain E0382`.
error: could not compile `rust` due to previous error
Basically it complaints that we gave up ownership of arr in line 15, so we can not use it again in line 19.
Let’s modify our code to use an Rc.
use std::rc::Rc;
#[derive(Debug)]
struct Node {
value: i32,
next: Rc<Option<Node>>
}
fn main() {
let next = Rc::new(Some(Node {
value: 5,
next: Rc::new(None)
}));
let list = Node {
value: 3,
next: Rc::clone(&next)
};
let list2 = Node {
value: 4,
next: Rc::clone(&next)
};
println!("list: {:?}", list);
println!("list2: {:?}", list2);
}
Now both lists share a reference to next
. next
will not be freed from memory, unless all references
to it (here: next, list, list2) give up their ownership (f.e. by going out of scope, or being freed manually).
You can even print out the reference count.
use std::rc::Rc;
#[derive(Debug)]
struct Node {
value: i32,
next: Rc<Option<Node>>
}
fn main() {
let next = Rc::new(Some(Node {
value: 5,
next: Rc::new(None)
}));
println!("count next = {}", Rc::strong_count(&next));
let list = Node {
value: 3,
next: Rc::clone(&next)
};
println!("count next = {}", Rc::strong_count(&next));
let list2 = Node {
value: 4,
next: Rc::clone(&next)
};
println!("count next = {}", Rc::strong_count(&next));
println!("list: {:?}", list);
println!("list2: {:?}", list2);
}
This gives us
count next = 1
count next = 2
count next = 3
list: Node { value: 3, next: Some(Node { value: 5, next: None }) }
list2: Node { value: 4, next: Some(Node { value: 5, next: None }) }
And here is an example, when the reference count goes down:
use std::rc::Rc;
#[derive(Debug)]
struct Node {
value: i32,
next: Rc<Option<Node>>
}
fn main() {
let next = Rc::new(Some(Node {
value: 5,
next: Rc::new(None)
}));
println!("count next = {}", Rc::strong_count(&next));
let list = Node {
value: 3,
next: Rc::clone(&next)
};
println!("count next = {}", Rc::strong_count(&next));
{
let list2 = Node {
value: 4,
next: Rc::clone(&next)
}; // ref count is increased
println!("count next = {}", Rc::strong_count(&next));
} // list2 ref to next goes out of scope, so the ref count is decreased
println!("count next = {}", Rc::strong_count(&next));
}
count next = 1
count next = 2
count next = 3
count next = 2
So when would I use a Rc? You use it to share heap allocated data in an immutable way. But you can not share data among threads with Rc. Rc is only meant for single-threaded scenarios.
If you now want to share data among multiple references (in a single-threaded scenario again), but still want to be able to mutate the data, RefCell might just be the smart pointer for you.
Usually the borrowing rules that the compiler enforces prevent you from mutating data, that has been shared with another reference to avoid things such as data races and null pointers.
The way RefCell manages to avoid the borrowing rules is by enforcing them at runtime, rather than at compile time. So your code will compile and run, but panic and exit if you dare to break the borrowing rules at any time of the computation.
While it is generally not advised to use unsafe
code and “cheat” the borrowing
rules, one must not forget that the rust compiler is very conservative and tries
to catch as much predictably memory unsafe code as possible at compile time.
But there are certain memory safe operations, that can not be statically analysed,
so rust always gives us the option to opt out of the borrowing rules with unsafe
code and the RefCell pointer.
It is also important to know, that the RefCell allows for interior mutability. Since the pointer is a wrapper for a value, it is possible for the pointer to be immutable, but for the wrapped value to be mutable.
fn main() {
let x = 5;
let mut y = &mut x;
}
This code will not compile, because x is immutable, but y wants a mutable reference.
The way we get around this is by doing the following:
use std::cell::RefCell;
fn main() {
let x = RefCell::new(5);
let mut y = x.borrow_mut();
*y += 1;
// gives us: x / y: 6
println!("x / y: {:?}", y);
}
A practical example for a RefCell is a scenario where you want to mock an object (struct). This is a common practice for testing your structs without having to implement most of the details of the thing you want to test. Mocking is commonly used to abstract network and IO operations. So if you had logic that relies on an API, you can test that logic by mocking the API and immitating the requests to make your tests more modular and faster.
Here we have some code that does this with a Limit Tracker
. This would be useful in a scenario
where you want to limit the number of operations a user is able to perform (such as calls
to your API).
This is the naive implementation, without a RefCell.
pub trait Messenger {
fn send(&self, msg: &str);
}
pub struct LimitTracker<'a, T: Messenger> {
messenger: &'a T,
value: usize,
max: usize,
}
impl<'a, T> LimitTracker<'a, T>
where
T: Messenger,
{
pub fn new(messenger: &T, max: usize) -> LimitTracker<T> {
LimitTracker {
messenger,
value: 0,
max,
}
}
pub fn set_value(&mut self, value: usize) {
self.value = value;
let percentage_of_max = self.value as f64 / self.max as f64;
if percentage_of_max >= 1.0 {
self.messenger.send("Error: You are over your quota!");
} else if percentage_of_max >= 0.9 {
self.messenger
.send("Urgent warning: You've used up over 90% of your quota!");
} else if percentage_of_max >= 0.75 {
self.messenger
.send("Warning: You've used up over 75% of your quota!");
}
}
}
#[cfg(test)]
mod tests {
use super::*;
struct MockMessenger {
sent_messages: Vec<String>,
}
impl MockMessenger {
fn new() -> MockMessenger {
MockMessenger {
sent_messages: vec![],
}
}
}
impl Messenger for MockMessenger {
fn send(&self, message: &str) {
// We are modifying the sent_messages vector
self.sent_messages.push(String::from(message));
}
}
#[test]
fn it_sends_an_over_75_percent_warning_message() {
let mock_messenger = MockMessenger::new();
let mut limit_tracker = LimitTracker::new(&mock_messenger, 100);
limit_tracker.set_value(80);
assert_eq!(mock_messenger.sent_messages.len(), 1);
}
}
Since we are trying to modify the send_messages vector without a &mut
mutable
reference, we get the following error
error[E0596]: cannot borrow `self.sent_messages` as mutable, as it is behind a `&` reference
--> src/lib.rs:57:13
|
2 | fn send(&self, msg: &str);
| ----- help: consider changing that to be a mutable reference: `&mut self`
...
57 | self.sent_messages.push(String::from(message));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&` reference, so the data it refers to cannot be borrowed as mutable
For more information about this error, try `rustc --explain E0596`.
error: could not compile `rust` due to previous error
Even if we try to do what the compiler suggests (changing the &self
to &mut self
) we will
get an error, since it does not match the signature from the Message trait
. You could change that aswell
and keep listening to the compiler until it compiles and all of your messenger references are
mutable, but I will explain later why this is not a good idea.
To get the RefCell involved, we would rewrite the code in the following way
pub trait Messenger {
fn send(&self, msg: &str);
}
pub struct LimitTracker<'a, T: Messenger> {
messenger: &'a T,
value: usize,
max: usize,
}
impl<'a, T> LimitTracker<'a, T>
where
T: Messenger,
{
pub fn new(messenger: &T, max: usize) -> LimitTracker<T> {
LimitTracker {
messenger,
value: 0,
max,
}
}
pub fn set_value(&mut self, value: usize) {
self.value = value;
let percentage_of_max = self.value as f64 / self.max as f64;
if percentage_of_max >= 1.0 {
self.messenger.send("Error: You are over your quota!");
} else if percentage_of_max >= 0.9 {
self.messenger
.send("Urgent warning: You've used up over 90% of your quota!");
} else if percentage_of_max >= 0.75 {
self.messenger
.send("Warning: You've used up over 75% of your quota!");
}
}
}
#[cfg(test)]
mod tests {
use super::*;
use std::cell::RefCell;
struct MockMessenger {
// Now we use the RefCell to wrap our sent_messages
sent_messages: RefCell<Vec<String>>,
}
impl MockMessenger {
fn new() -> MockMessenger {
MockMessenger {
// Creating a new RefCell
sent_messages: RefCell::new(vec![]),
}
}
}
impl Messenger for MockMessenger {
fn send(&self, message: &str) {
// We borrow a mutable reference to modify the interior
self.sent_messages.borrow_mut().push(String::from(message));
}
}
#[test]
fn it_sends_an_over_75_percent_warning_message() {
let mock_messenger = MockMessenger::new();
let mut limit_tracker = LimitTracker::new(&mock_messenger, 100);
limit_tracker.set_value(80);
// We borrow an immutable reference, since we only read a value
assert_eq!(mock_messenger.sent_messages.borrow().len(), 1);
}
}
So the RefCell allowed us to keep an immutable reference to the messenger, but allows us to mutate our mocked object anyways.
As stated earlier, you might be asking yourself why we would not just use a &mut
everywhere instead of using a seemingly complicated RefCell smart pointer.
The most obvious one is that we are only using the RefCell (which is technically unsafe and can not guarantee that our program will not fail at runtime) in a testing environment, where different rules apply. Since the use case for our limit tracker would realistically rather be used in combination with an API call or a database, we would not have any problem with having to modify values of our Messenger implementation. But in a testing environment we want to avoid exactly that. You do not want any network calls or database operations. You want to test everything as local as possible.
And since our unsafe code would not run in production, it is ok to use a RefCell for tests. This does not mean you may only use RefCell for other purposes. A great example, which we will cover next, is using a RefCell for doubly linkedin lists to avoid cyclic dependencies.
When using Rc or RefCell there is a way to create cyclic dependencies, where two references reference each other, thereby making it impossible for the reference count to reach 0, thus creating a memory leak.
Let’s look at how this would look in practice. First we create a List enum that we can nest (lisp style).
use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;
#[derive(Debug)]
enum List {
Cons(i32, RefCell<Rc<List>>),
Nil,
}
impl List {
fn tail(&self) -> Option<&RefCell<Rc<List>>> {
match self {
Cons(_, item) => Some(item),
Nil => None,
}
}
}
fn main() { }
Now, let’s create a cycle
fn main() {
let a = Rc::new(Cons(3, RefCell::new(Rc::new(Nil))));
let b = Rc::new(Cons(5, RefCell::new(Rc::clone(&a))));
if let Some(link) = a.tail() {
*link.borrow_mut() = Rc::clone(&b);
}
println!("tail a: {:?}", a.tail());
}
You can visualize the cycle like this:
As you can see, our call to a’s tail gives us a reference to b, which in turn has a reference to a, and back to b and a and we are left with a stack overflow.
Let’s imagine we have a tree structure, like this
This is how we could implement it in rust
use std::cell::RefCell;
use std::rc::Rc;
#[derive(Debug)]
struct Node {
value: i32,
children: RefCell<Vec<Rc<Node>>>,
}
This way we can work our way down a tree. But there is no way for us to get from a leaf to a branch (children -> parent). But we have to be careful, since this could easily turn into another cyclic dependency:
So to get around this, we can use the Weak
pointer. A Weak
pointer is similar to
a Rc
pointer, but different in that the reference count does not have to be 0 for
the reference to be invalid.
So we can create the following struct:
use std::cell::RefCell;
use std::rc::{Rc, Weak};
#[derive(Debug)]
struct Node {
value: i32,
parent: RefCell<Weak<Node>>,
children: RefCell<Vec<Rc<Node>>>,
}
And how do we use it? Since our parent reference is not guaranteed to be valid, it
will return an Option<Rc<T>>
to us by calling the .upgrade()
method on a Weak
pointer.
fn main() {
let leaf = Rc::new(Node {
value: 3,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![]),
});
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
let branch = Rc::new(Node {
value: 5,
parent: RefCell::new(Weak::new()),
children: RefCell::new(vec![Rc::clone(&leaf)]),
});
// turn our Rc into a Weak pointer
*leaf.parent.borrow_mut() = Rc::downgrade(&branch);
// upgrade turns our Weak pointer into a Rc
println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
}
Smart Pointer | Immutable borrow | Mutable borrow | checked at | safe*? | express ownership |
---|---|---|---|---|---|
Box | Yes | Yes | compile time | safe | yes |
Rc | Yes | No | compile time | safe | yes |
Weak | Yes | No | compile time | safe | no |
RefCell | Yes | Yes | runtime | unsafe | yes |
*safe in that the program will not panic at runtime.
Navigating the intricates of memory management is definitely not unavoidable. It is a super important topic, and it is important for a programming language to give us the right tools to express those flows.