home  >  software  >  rusts-complexity  >  smart-pointers

Exploring Rust’s Complexity

Smart Pointers

This is where some might say rust feels like you are dealing with an escaped asylum patience. So buckle up, this is going to get weird.

The most common smart pointers in the rust std are: Box, Rc, RefCell, Weak

Box

Box is used for allocating values on the heap. Nothing more and nothing less. This is usefull if…

Using a Box is very easy:

fn main() {
    let x = Box::new(5);
    println!("x: {:?}", x);
}

This is not very useful yet, but just goes to show that Box is very simple in its essence.

Here is a more realistic and applicable use case:

#[derive(Debug)]
struct Node {
    value: i32,
    next: Option<Node>
}

fn main() {
    let list = Node {
        value: 3,
        next: None
    };
    println!("list: {:?}", list);
}

This is a recursive type. That is because you can nest the type in itself (f.e. nested arrays). When you try and run this code, you get the following error:

error[E0072]: recursive type `Node` has infinite size
 --> src/main.rs:2:1
  |
2 | struct Node {
  | ^^^^^^^^^^^ recursive type has infinite size
3 |     value: i32,
4 |     next: Option<Node>
  |           ------------ recursive without indirection
  |
help: insert some indirection (e.g., a `Box`, `Rc`, or `&`) to make `Node` representable
  |
4 |     next: Box<Option<Node>>
  |           ++++            +

For more information about this error, try `rustc --explain E0072`.
error: could not compile `rust` due to previous error

"recursive type has inifite size" is the key sentence here. Thankfully the compiler already tells us what to do. Wrap the recursive type in a Box<…>. So we get the following code:

#[derive(Debug)]
struct Node {
    value: i32,
    next: Box<Option<Node>> // <-- box it and ship it
}

fn main() {
    let list = Node {
        value: 3,
        next: Box::new(None)
    };
    println!("list: {:?}", list);
}

Rc

Rc, short for Reference counted, is used when you want to enable multiple ownership of a shared object in memory. At each instance where a new reference is created, the count goes up, and the object is only cleaned up, if the count goes to 0.

Let’s stick to the JsonValue example

#[derive(Debug)]
struct Node {
    value: i32,
    next: Box<Option<Node>>
}

fn main() {
    let next = Box::new(Some(Node {
        value: 5,
        next: Box::new(None)
    }));

    let list = Node {
        value: 3,
        next: next
    };
    let list2 = Node {
        value: 4,
        next: next
    };
    println!("list: {:?}", list);
    println!("list2: {:?}", list2);
}

Here we get this error:

error[E0382]: use of moved value: `next`
  --> src/main.rs:19:15
   |
8  |     let next = Box::new(Some(Node {
   |         ---- move occurs because `next` has type `Box<Option<Node>>`, which does not implement the `Copy` trait
...
15 |         next: next
   |               ---- value moved here
...
19 |         next: next
   |               ^^^^ value used here after move

For more information about this error, try `rustc --explain E0382`.
error: could not compile `rust` due to previous error

Basically it complaints that we gave up ownership of arr in line 15, so we can not use it again in line 19.

Let’s modify our code to use an Rc.

use std::rc::Rc;

#[derive(Debug)]
struct Node {
    value: i32,
    next: Rc<Option<Node>>
}

fn main() {
    let next = Rc::new(Some(Node {
        value: 5,
        next: Rc::new(None)
    }));

    let list = Node {
        value: 3,
        next: Rc::clone(&next)
    };
    let list2 = Node {
        value: 4,
        next: Rc::clone(&next)
    };
    println!("list: {:?}", list);
    println!("list2: {:?}", list2);
}

Now both lists share a reference to next. next will not be freed from memory, unless all references to it (here: next, list, list2) give up their ownership (f.e. by going out of scope, or being freed manually).

You can even print out the reference count.

use std::rc::Rc;

#[derive(Debug)]
struct Node {
    value: i32,
    next: Rc<Option<Node>>
}

fn main() {
    let next = Rc::new(Some(Node {
        value: 5,
        next: Rc::new(None)
    }));
    println!("count next = {}", Rc::strong_count(&next));

    let list = Node {
        value: 3,
        next: Rc::clone(&next)
    };
    println!("count next = {}", Rc::strong_count(&next));
    let list2 = Node {
        value: 4,
        next: Rc::clone(&next)
    };
    println!("count next = {}", Rc::strong_count(&next));
    println!("list: {:?}", list);
    println!("list2: {:?}", list2);
}

This gives us

count next = 1
count next = 2
count next = 3
list: Node { value: 3, next: Some(Node { value: 5, next: None }) }
list2: Node { value: 4, next: Some(Node { value: 5, next: None }) }

And here is an example, when the reference count goes down:

use std::rc::Rc;

#[derive(Debug)]
struct Node {
    value: i32,
    next: Rc<Option<Node>>
}

fn main() {
    let next = Rc::new(Some(Node {
        value: 5,
        next: Rc::new(None)
    }));
    println!("count next = {}", Rc::strong_count(&next));

    let list = Node {
        value: 3,
        next: Rc::clone(&next)
    };
    println!("count next = {}", Rc::strong_count(&next));
    {
        let list2 = Node {
            value: 4,
            next: Rc::clone(&next)
        }; // ref count is increased
        println!("count next = {}", Rc::strong_count(&next));
    } // list2 ref to next goes out of scope, so the ref count is decreased
    println!("count next = {}", Rc::strong_count(&next));
}
count next = 1
count next = 2
count next = 3
count next = 2

So when would I use a Rc? You use it to share heap allocated data in an immutable way. But you can not share data among threads with Rc. Rc is only meant for single-threaded scenarios.

RefCell

If you now want to share data among multiple references (in a single-threaded scenario again), but still want to be able to mutate the data, RefCell might just be the smart pointer for you.

Usually the borrowing rules that the compiler enforces prevent you from mutating data, that has been shared with another reference to avoid things such as data races and null pointers.

The way RefCell manages to avoid the borrowing rules is by enforcing them at runtime, rather than at compile time. So your code will compile and run, but panic and exit if you dare to break the borrowing rules at any time of the computation.

While it is generally not advised to use unsafe code and “cheat” the borrowing rules, one must not forget that the rust compiler is very conservative and tries to catch as much predictably memory unsafe code as possible at compile time. But there are certain memory safe operations, that can not be statically analysed, so rust always gives us the option to opt out of the borrowing rules with unsafe code and the RefCell pointer.

It is also important to know, that the RefCell allows for interior mutability. Since the pointer is a wrapper for a value, it is possible for the pointer to be immutable, but for the wrapped value to be mutable.

fn main() {
    let x = 5;
    let mut y = &mut x;
}

This code will not compile, because x is immutable, but y wants a mutable reference.

The way we get around this is by doing the following:

use std::cell::RefCell;

fn main() {
    let x = RefCell::new(5);
    let mut y = x.borrow_mut();
    *y += 1;

    // gives us: x / y: 6
    println!("x / y: {:?}", y);
}

A practical example for a RefCell is a scenario where you want to mock an object (struct). This is a common practice for testing your structs without having to implement most of the details of the thing you want to test. Mocking is commonly used to abstract network and IO operations. So if you had logic that relies on an API, you can test that logic by mocking the API and immitating the requests to make your tests more modular and faster.

Here we have some code that does this with a Limit Tracker. This would be useful in a scenario where you want to limit the number of operations a user is able to perform (such as calls to your API).

This is the naive implementation, without a RefCell.

pub trait Messenger {
    fn send(&self, msg: &str);
}

pub struct LimitTracker<'a, T: Messenger> {
    messenger: &'a T,
    value: usize,
    max: usize,
}

impl<'a, T> LimitTracker<'a, T>
where
    T: Messenger,
{
    pub fn new(messenger: &T, max: usize) -> LimitTracker<T> {
        LimitTracker {
            messenger,
            value: 0,
            max,
        }
    }

    pub fn set_value(&mut self, value: usize) {
        self.value = value;

        let percentage_of_max = self.value as f64 / self.max as f64;

        if percentage_of_max >= 1.0 {
            self.messenger.send("Error: You are over your quota!");
        } else if percentage_of_max >= 0.9 {
            self.messenger
                .send("Urgent warning: You've used up over 90% of your quota!");
        } else if percentage_of_max >= 0.75 {
            self.messenger
                .send("Warning: You've used up over 75% of your quota!");
        }
    }
}
#[cfg(test)]
mod tests {
    use super::*;

    struct MockMessenger {
        sent_messages: Vec<String>,
    }

    impl MockMessenger {
        fn new() -> MockMessenger {
            MockMessenger {
                sent_messages: vec![],
            }
        }
    }

    impl Messenger for MockMessenger {
        fn send(&self, message: &str) {
            // We are modifying the sent_messages vector
            self.sent_messages.push(String::from(message));
        }
    }

    #[test]
    fn it_sends_an_over_75_percent_warning_message() {
        let mock_messenger = MockMessenger::new();
        let mut limit_tracker = LimitTracker::new(&mock_messenger, 100);

        limit_tracker.set_value(80);

        assert_eq!(mock_messenger.sent_messages.len(), 1);
    }
}

Since we are trying to modify the send_messages vector without a &mut mutable reference, we get the following error

error[E0596]: cannot borrow `self.sent_messages` as mutable, as it is behind a `&` reference
  --> src/lib.rs:57:13
   |
2  |     fn send(&self, msg: &str);
   |             ----- help: consider changing that to be a mutable reference: `&mut self`
...
57 |             self.sent_messages.push(String::from(message));
   |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `self` is a `&` reference, so the data it refers to cannot be borrowed as mutable

For more information about this error, try `rustc --explain E0596`.
error: could not compile `rust` due to previous error

Even if we try to do what the compiler suggests (changing the &self to &mut self) we will get an error, since it does not match the signature from the Message trait. You could change that aswell and keep listening to the compiler until it compiles and all of your messenger references are mutable, but I will explain later why this is not a good idea.

To get the RefCell involved, we would rewrite the code in the following way

pub trait Messenger {
    fn send(&self, msg: &str);
}

pub struct LimitTracker<'a, T: Messenger> {
    messenger: &'a T,
    value: usize,
    max: usize,
}

impl<'a, T> LimitTracker<'a, T>
where
    T: Messenger,
{
    pub fn new(messenger: &T, max: usize) -> LimitTracker<T> {
        LimitTracker {
            messenger,
            value: 0,
            max,
        }
    }

    pub fn set_value(&mut self, value: usize) {
        self.value = value;

        let percentage_of_max = self.value as f64 / self.max as f64;

        if percentage_of_max >= 1.0 {
            self.messenger.send("Error: You are over your quota!");
        } else if percentage_of_max >= 0.9 {
            self.messenger
                .send("Urgent warning: You've used up over 90% of your quota!");
        } else if percentage_of_max >= 0.75 {
            self.messenger
                .send("Warning: You've used up over 75% of your quota!");
        }
    }
}
#[cfg(test)]
mod tests {
    use super::*;
    use std::cell::RefCell;

    struct MockMessenger {
        // Now we use the RefCell to wrap our sent_messages
        sent_messages: RefCell<Vec<String>>,
    }

    impl MockMessenger {
        fn new() -> MockMessenger {
            MockMessenger {
                // Creating a new RefCell
                sent_messages: RefCell::new(vec![]),
            }
        }
    }

    impl Messenger for MockMessenger {
        fn send(&self, message: &str) {
            // We borrow a mutable reference to modify the interior
            self.sent_messages.borrow_mut().push(String::from(message));
        }
    }

    #[test]
    fn it_sends_an_over_75_percent_warning_message() {
        let mock_messenger = MockMessenger::new();
        let mut limit_tracker = LimitTracker::new(&mock_messenger, 100);

        limit_tracker.set_value(80);

        // We borrow an immutable reference, since we only read a value
        assert_eq!(mock_messenger.sent_messages.borrow().len(), 1);
    }
}

So the RefCell allowed us to keep an immutable reference to the messenger, but allows us to mutate our mocked object anyways.

As stated earlier, you might be asking yourself why we would not just use a &mut everywhere instead of using a seemingly complicated RefCell smart pointer.

The most obvious one is that we are only using the RefCell (which is technically unsafe and can not guarantee that our program will not fail at runtime) in a testing environment, where different rules apply. Since the use case for our limit tracker would realistically rather be used in combination with an API call or a database, we would not have any problem with having to modify values of our Messenger implementation. But in a testing environment we want to avoid exactly that. You do not want any network calls or database operations. You want to test everything as local as possible.

And since our unsafe code would not run in production, it is ok to use a RefCell for tests. This does not mean you may only use RefCell for other purposes. A great example, which we will cover next, is using a RefCell for doubly linkedin lists to avoid cyclic dependencies.

A note on cyclic dependencies

When using Rc or RefCell there is a way to create cyclic dependencies, where two references reference each other, thereby making it impossible for the reference count to reach 0, thus creating a memory leak.

Let’s look at how this would look in practice. First we create a List enum that we can nest (lisp style).

use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;

#[derive(Debug)]
enum List {
    Cons(i32, RefCell<Rc<List>>),
    Nil,
}

impl List {
    fn tail(&self) -> Option<&RefCell<Rc<List>>> {
        match self {
            Cons(_, item) => Some(item),
            Nil => None,
        }
    }
}

fn main() { }

Now, let’s create a cycle

fn main() {
    let a = Rc::new(Cons(3, RefCell::new(Rc::new(Nil))));
    let b = Rc::new(Cons(5, RefCell::new(Rc::clone(&a))));

    if let Some(link) = a.tail() {
        *link.borrow_mut() = Rc::clone(&b);
    }

    println!("tail a: {:?}", a.tail());
}

You can visualize the cycle like this:

flowchart LR subgraph a aa[3] ab[tail] end subgraph b ba[5] bb[tail] end ab --> b bb --> a

As you can see, our call to a’s tail gives us a reference to b, which in turn has a reference to a, and back to b and a and we are left with a stack overflow.

Weak

Let’s imagine we have a tree structure, like this

flowchart LR A -- children --> B A -- children --> C

This is how we could implement it in rust

use std::cell::RefCell;
use std::rc::Rc;

#[derive(Debug)]
struct Node {
    value: i32,
    children: RefCell<Vec<Rc<Node>>>,
}

This way we can work our way down a tree. But there is no way for us to get from a leaf to a branch (children -> parent). But we have to be careful, since this could easily turn into another cyclic dependency:

flowchart LR A -- children --> B A -- children --> C B -- parent --> A C -- parent --> A

So to get around this, we can use the Weak pointer. A Weak pointer is similar to a Rc pointer, but different in that the reference count does not have to be 0 for the reference to be invalid.

So we can create the following struct:

use std::cell::RefCell;
use std::rc::{Rc, Weak};

#[derive(Debug)]
struct Node {
    value: i32,
    parent: RefCell<Weak<Node>>,
    children: RefCell<Vec<Rc<Node>>>,
}

And how do we use it? Since our parent reference is not guaranteed to be valid, it will return an Option<Rc<T>> to us by calling the .upgrade() method on a Weak pointer.

fn main() {
    let leaf = Rc::new(Node {
        value: 3,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![]),
    });

    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());

    let branch = Rc::new(Node {
        value: 5,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![Rc::clone(&leaf)]),
    });

    // turn our Rc into a Weak pointer
    *leaf.parent.borrow_mut() = Rc::downgrade(&branch);

    // upgrade turns our Weak pointer into a Rc
    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
}

Recap

Smart Pointer Immutable borrow Mutable borrow checked at safe*? express ownership
Box Yes Yes compile time safe yes
Rc Yes No compile time safe yes
Weak Yes No compile time safe no
RefCell Yes Yes runtime unsafe yes

*safe in that the program will not panic at runtime.

Accidental Or Unavoidable?

Navigating the intricates of memory management is definitely not unavoidable. It is a super important topic, and it is important for a programming language to give us the right tools to express those flows.