If you have stumbled upon it, either you already used Rust or are interested in how different programming languages solve concurrency. Rust is different, as it does not have a garbage collector and promises memory safety. Those are big promises, but it does achieve that at compile time by making sure that data have ownership and are destroyed if ownership is not transferred as soon as ownership scope ends. And believe me, this guarantees memory safety.
Rust is implemented around two problems:
- How to make program memory safe
- How to make concurrency smooth without thinking much about abstraction
How a programming language handles accessing data via code when it should not. Memory safety and concurrency problems always boil down often when it comes to multiple pieces of program is accessing same data. Rust’s secret weapon for this is ownership of data can only lie with a single scope at any given time. Though ownership of data can be transferred at any time then ownership of transferring scope is done away with. This is a discipline that mostly followed by System programmers, but Rust’s compiler checks that statically for you.
This means that programmer can program without worrying about garbage collection and segfault, because those mistakes will be caught at compile time by Rust Compiler rustc
For concurrency, this means you can choose from a wide variety of paradigms, be it message passing, shared state, lock-free, or purely functional, and Rust will help you avoid common pitfalls.
Concurrency in Rust
In Rust, when you transfer a message via channel, it also transfers ownership of that message address, without worrying about that message is being read or mutated by previous thread/program. Rust’s channels also enforce thread isolation.
Apart from ownership, Rust also has borrowing concept, borrow-to-read-only
or borrow-to-mutate
. A piece of data can be borrowed by multiple threads simultaneously, but it can not be mutated. In case of borrow to mutate, Rust guarantees that only one borrow is active at any given time for given data
A lock knows what data it protects, and Rust guarantees that the data can only be accessed when the lock is held. State is never accidentally shared. “Lock data, not code” is strictly enforced in Rust.
Rust enforces safe usage of data type, by making every data type know if they can safely be shared and accessed by multiple threads. This guarantees that there is no race condition for data, even when data is not locked. Thread safety isn’t just documentation; it’s law.
Rust allows sharing even stack frames between threads, and Rust will statically ensure that the frames remain active while other threads are using them. Even the most daring forms of sharing are guaranteed safe in Rust.
All these paradigms (i.e locks, channels, lock-free data) are not defined by Rust but in libraries, meaning Rust is open to other paradigms. These benefits come from Rust’s ownership model and that’s what is at the core of Rust.
Ownership
In Rust, every data have a scope that has its ownership and passing or returning data means transferring ownership to a new scope. Data that are still owned when a scope ends are automatically destroyed/de-allocated at that point.
Let’s look at a simple example
fn make_arr() {
let mut arr = Vec::new(); // its now owned by make_arr's scope
arr.push(0);
arr.push(1);
// `arr` is de-allocated/destroyed at the end of the scope
}
The scope that creates data also initially owns it. In this case, the body of make_arr
is owns arr
. The owner can do anything it likes with vec, including mutating it by pushing. At the end of the scope, vec is still owned, so it is automatically deallocated. To transfer ownership, it needs to be returned and hence ownership is transferred to its caller.
fn make_arr() -> Vec<i32> {
let mut arr = Vec::new();
vec.push(0);
vec.push(1);
vec // this transfers ownership to the caller
}
fn print_arr(vec: Vec<i32>) {
// the `vec` parameter is part of this scope, so it's owned by `print_arr`
for i in vec.iter() {
println!("{}", i)
}
// since `vec` is not returned, it is de-allocated
}
fn use_vec() {
let vec = make_arr(); // take ownership of the vector
print_arr(vec); // pass ownership to `print_arr`
}
So when make_arr
is called and scope ends, vec
is moved to use_vec
by returning it; it is not de-allocated. Then the ownership of vec
is transferred to print_arr
. Since print_arr
does not transfer ownership any further, vec
is de-allocated at the end of the scope of print_arr
Once ownership has been given away, a value can no longer be used. For example, consider the following variant of use_vec
:
fn use_vec() {
let vec = make_arr(); // take ownership of the vector
print_arr(vec); // pass ownership to `print_arr`
for i in vec.iter() { // continue using `vec`
println!("{}", i * 2)
}
}
This would not compile because vec
has already been transferred to print_arr
and it did not get transferred back to use_vec
at the end of its scope. This would result in compile error
error: use of moved value: `vec`
for i in vec.iter() {
^~~
And that’s very good, because the vector has already been deallocated at this point! This has averted Disaster.
Borrowing/Lending
We may have not intented print_arr
to destroy vec
. What we really wanted is to give print_arr
a temporary acces to vec
and then continue using it afterwards. This is where Rust’s concept of borrowing/lending comes into picture. If one has the ownership of data, it can lend it’s access to function you may call.
To borrow a value, you make a reference to it, using the & operator:
fn use_vec() {
let vec = make_arr(); // take ownership of the vector
print_arr(&vec); // lend access to `print_arr`
for i in vec.iter() { // continue using `vec`
println!("{}", i * 2)
}
// vec is destroyed here
}
Now print_arr
takes a reference to a vec
, and use_vec
lends out the vec
by writing &vec
. Since borrows are temporary, use_vec
retains ownership of the vec
; it can continue using it after the call to print_arr
returns. Lease expires, once lesee returns
By default, this kind of lending is immutable, meaning that lesee can not mutate the data in question, it can only read. A var holding data can be leased out to many scopes simultaneously since it is read-only
Rust also allows lending that allows data mutation (or mutable references) but with following conditions
- There can only be one active reference at any given time
- No other reference even read-only are not allowed, simultaneously
Rust checks these rules at compile time; borrowing has no runtime overhead. To lease a var with mutation allowed, we use &mut T
. Why Rust does not allow multiple referenced for mutable reference, consider following example
fn push_all(from: &Vec<i32>, to: &mut Vec<i32>) {
for i in from.iter() {
to.push(*i);
}
}
This function iterates over each element of one vector, pushing it onto another. The iterator keeps a pointer into the vector at the current and final positions, stepping one toward the other. What if someone called this function with the same vector for both arguments?
push_all(&vec, &mut vec)
This would spell disaster! As we’re pushing elements onto the vector, it will occasionally need to resize, allocating a new hunk of memory and copying its elements over to it. The iterator would be left with a dangling pointer into the old memory, leading to memory unsafety (with attendant segfaults or worse).
Fortunately, Rust ensures that whenever a mutable borrow is active, no other borrows of the object are active, producing the message:
error: cannot borrow `vec` as mutable because it is also borrowed as immutable
push_all(&vec, &mut vec);
^~~
Disaster averted.
Message passing
Now that we’ve covered the basic ownership model in Rust, let’s see what it means for concurrency.
Concurrent programming comes in many styles, but a particularly simple one is message passing, where threads or actors or coroutines communicate by sending each other messages. Proponents of the style emphasize the way that it ties together sharing and communication
Rust’s ownership makes it easy to turn that advice into a compiler-checked rule. Consider the following channel API (channels in Rust’s standard library are a bit different):
use std::thread;
use std::sync::mpsc::channel;
// Create a simple streaming channel
let (tx, rx) = channel();
thread::spawn(move|| {
tx.send(10).unwrap();
});
assert_eq!(rx.recv().unwrap(), 10);
Channels are generic over the type of data they transmit. In Rust, data type should be considered safe to be send between threads; then only they will be sent.
As always in Rust, passing in a T to the send function means transferring ownership of it. This fact has profound consequences: it means that code like the following will generate a compiler error.
let mut vec = Vec::new();
// do some computation
let (tx, rx) = channel();
tx.send(vec);
print_arr(&vec);
Here, the thread creates a vector, sends it to another thread, and then continues using it. The thread receiving the vector could mutate it as this thread continues running, so the call to print_arr
could lead to race condition or, for that matter, a use-after-free bug. Instead, the Rust compiler will produce an error message on the call to print_arr
:
Error: use of moved value `vec`
Disaster averted.
In today’s Rust, concurrency is entirely a library thing; everything described in this post, is defined in the standard library, and could be defined in an external library as well.
About The Author
I am Pankaj Baagwan, a System Design Architect. A Computer Scientist by heart, process enthusiast, and open source author/contributor/writer. Advocates Karma. Love working with cutting edge, fascinating, open source technologies.
To consult Pankaj Bagwan on System Design, Cyber Security and Application Development, SEO and SMO, please reach out at me[at]bagwanpankaj[dot]com
For promotion/advertisement of your services and products on this blog, please reach out at me[at]bagwanpankaj[dot]com
Stay tuned <3. Signing off for RAAM