RUST for Java Developers
1. Introduction
Rust is a relatively new system programming language with a lot of hypes around it. Some people love Rust, others hate it, and some do these two at the same time. There is one thing you must not do: ignore it. Even if you identify yourself as a Java developer, you should know a little what Rust is.
In this article I will summarize what the most important features of Rust from a Java developer’s point of view. I will touch only lightly the obvious differences to Java, like natively versus virtual machine compiled. My focus will be more on the features that are special to Rust and are hard to grasp for a Java developer.
There is nothing in this article you cannot find somewhere else. Rust has good documentation, if you know Rust, you will find nothing special here. Reading this article, however, may save some time to get an overall understanding of what Rust is and why it is important.
2. Disclaimer
I am a Java developer, and I know Java fairly well, but I am not a Rust expert. I have never participated in a commercial Rust project. I have created a simple interpreter skeleton to learn the language, but that is all. A bit more than a hello world.
3. Low Level Programming Language
Rust is a low-level programming language. It is designed to write code for tools, operating systems and typically not for business applications. From this perspective it is very much like C.
On the other hand, the language is fairly abstract. While C can be considered as a portable readable assembly language, Rust cannot. Rust has very high-level coding abstractions which all compile to low-level code. Rust calls it zero-cost abstraction. Any abstract code has a clear way to translate into machine code, and the developer has the possibility to opt out of the abstraction.
In other words, zero-cost abstraction aims to generate as performant code as C providing programming efficiency through abstract coding structures. In my opinion, Rust fulfills this promise.
4. Compiled
Rust is compiled using the LLVM compiler tool. The generated code is directly executable on the target machine without any supporting run-time.
It means you need different versions of the compiled packages for different architectures, and the compile time may be significantly more than in the case of Java.
On the other hand, the code starts up fast; there is nothing like the JVM warm-up time.
5. No null
pointer
In Rust there is no null pointer. The language is designed in a way that it is not possible to have a pointer that points to nowhere.
Null pointers have caused a lot of issues in modern programming languages since 1965. It is called the billion-dollar mistake.
Note
|
This term was coined by Tony Hoare, a British computer scientist who introduced the null reference in 1965 while designing the ALGOL W programming language. |
In my opinion, computing power and compiler technology were not ready for languages like Rust to avoid null pointers from the start. So, I think it was not a mistake per se. Compiler technology and CPU speed developed by time, so today it is possible to create compilers that can compile languages without null pointers.
Note
|
At the time Algol was developed, there was another programming language in the works. It was FORTRAN. Algol was a high-abstraction language, while FORTRAN focused on practicalities. FORTRAN still exists and is used in many scientific projects. Algol, on the other hand, is not widely used. You may correct me, but I do not know any practical application. It lives on in the ideas that appear in different other programming languages. Algol was too early, even permitting null pointers. |
Eliminating null pointers and handling them properly is a continuous effort.
Java introduced Optional
, Kotlin the ?
postfix operator.
Rust it is part of the language, and the design is coherent with the other language constructs and the run-time.
You will never get a NPE in Rust unless you explicitly escape safety using safe
blocks.
6. Preprocessor, Macro
Rust, similar to C, has a preprocessing step.
A preprocessor is a processing step that transforms some source code to another. Usually it precedes lexical and syntax analysis.
The C preprocessor works on a character level before the lexical analysis. This is again something that comes from the early CPU limitations. With the available CPU power, it was a good choice. It can be implemented independently of the compiler, executed as a separate process. The integration on the file level. The preprocessor reads the source code as a file and writes the preprocessed output to be consumed by the compiler.
Half-century later, we can see that a character level preprocessor can be abused. There are a lot of examples that use the preprocessor not well.
Rust decided to apply the preprocessor at a different stage in the compilation. It reads the source code, does lexical analysis, and then starts the macro processing. The processing is not a separate process. It is part of the compiler itself. That way the integration is tighter, there is a higher cohesion between the compiler and the preprocessor. You define the preprocessing using a "macro" language that works with lexical units instead of characters.
Note
|
An interesting, and for me personally appealing twist in the preprocessor that it works on lexical trees. The result of the lexical analysis is a list of tokens. The syntax analyzer usually works on this list. Rust inserts an analysis step before it, finding all opening and closing braces, brackets, and so on in pairs. This is a simple syntax analysis already. The result is a lexical tree that contains the bracketed syntax elements as nodes. |
The use of macros is a powerful tool, helping Rust to push some coding error detection to the compilation phase.
For example, using the print!
macro the compiler can detect if the arguments do not match the formatting string.
The built-in macro language is one of two ways to define macros. The Rust compiler can start code written in Rust during the preprocessing phase. That way you can write your preprocessor working on, reading and writing, altering the lexical tree in Rust. This is somewhat similar to Java annotation processing. Java annotation processor is started after the syntax analysis, and it is not supposed to modify the abstract syntax tree (AST).
Note
|
Lombok does modify the AST, but it does so using non-documented and hence non-guaranteed APIs. It may not work with a new release of Java or with just any JDK. A JDK implementing Java in a way incompatible with Lombok is possible and still a valid Java. |
I see the modification of the lexical tree problematic. It opens up the language and makes a way to have different flavors. For example, at the moment we have Java and Java with Lombok as a flavor. If the number of flavors gets feral, it may have an adverse effect on the language.
7. Object-Oriented Programming
Rust is not object-oriented, but it contains a lot of OOP traits (sic). OOP is usually characterized as something supporting
-
Encapsulation
-
Abstraction
-
Inheritance, and
-
Polymorphism
7.1. Encapsulation
Encapsulation is the bundling of data (fields) and methods (functions) that operate on the data within a single unit, usually a class, and are controlling access to the data via access modifiers.
Rust achieves encapsulation using struct
and impl
blocks.
A struct
can define a data structure, like in C.
The impl
block defines methods that can work on that data structure.
This is similar to classes in Java.
If you know modern Java and are familiar with the project Valhalla, you can say a struct
is similar to a value object.
7.2. Abstraction
Abstraction involves hiding implementation details and exposing only essential functionalities.
Rust has access modifiers, and you can also define traits, which are somewhat similar to Java interfaces.
7.3. Inheritance
Inheritance allows a class to inherit properties and behavior from another class.
Rust does not support classical inheritance (where a class extends another class). Instead, it relies on composition and trait bounds to achieve similar functionality.
This is very similar to how Go approaches the same issue.
7.4. Polymorphism
This is the interesting part. Rust does support polymorphism, but different traits can implement the same method differently. A trait is something like an interface in Java. A trait can declare and define methods. When you implement the methods for the struct you have to define which trait you implement. As a simple example:
trait A {
fn a(&self) -> i32 {
1
}
}
trait B {
fn a(&self) -> i32 {
2
}
}
struct C {}
impl A for C {}
impl B for C {}
In this example we have two traits, and in this case the traits contain the definition of the method a
.
If we do NOT implement the trait B
for the struct C
, then we can write:
let f = C {};
println!("{}", f.a());
This, however, does not work if both A
and B
are implemented for the struct.
In that case we have to specify which method to invoke.
In this case the call can be written as
let f = C {};
println!("{}", B::a(&f));
to call the metgod defined in trait B
.
This syntax also shows that the f
struct is simply a receiver.
Although not well known, the "receiver" is also a terminology in Java and is sources from Smalltalk OOP where methods are considered as messages sent to objects.
Another possibility is to declare the variable f
as a trait reference:
let f : dyn B = C {};
println!("{}", f.a());
In this case there is no ambiguity. The compiler knows from the type of the variable which method to invoke.
The keyword dyn
is a hint that this reference or pointer is a dynamic one.
8. Dynamic Pointers
In Java, Python, and in many other programming languages there is only one reference type. The runtime will know the referenced objects actual type from the object header. This object header is there in front of every object in the memory.
It consumes a lot of memory, and it is superfluous many times.
When we have an Integer
variable then it can only refence an Integer
object.
The extra object header brings no extra information to the table.
The object header is needed when we have a reference of type that is not final and can be extended.
In that case the type of the reference does not fully define, only constraints the object type.
A language implementing OO using encapsulation rather than inheritance like Golang or Rust may not need object headers. Every pointer that has a type of some struct certainly points to a struct. Since a struct cannot be extended, it certainly is a given type. The type defines the pointed object’s type already during compile time.
If a pointer has a type which is not a struct, rather a trait (or interface in Golang), then the compiler cannot know the exact type. All it knows is the pointer points to a struct that implemented the trait. In this case the pointer has two memory addresses. One points to the actual struct, and another that defines the actual type.
It is automatic in Golang, the compiler will decide for you which pointer is static having a struct type and which is dynamic able to point to different structs.
Rust is more explicit.
It requires you to use the dyn
modifier even though the compiler could add it.
It wants no mistake.
You have to declare that you really want the pointer be dynamic.
9. Memory Management and Garbage Collection
Rust memory management is unique.
It does not have a garbage collector.
It allocates and frees memory very much like C, except you do not need to manually allocate and deallocate anything yourself.
Rust will allocate the memory, just like any other language when you create a new "object".
In the example above we created the object C
, in other words we allocated the memory for the struct C
.
The code does not contain any releasing code.
There is no call to free
like in the language C.
The compiler will recognize the points in the code when an object is not used any more and generates the code freeing the memory automatically.
In an arbitrary code this analysis would be too complex, probably requiring computational power, which is exponentially increases in regard to the source code size.
To mitigate this, Rust introduces the borrow checker into the compilation process and imposes certain rules the code has to follow. Rust says a pointer can not only point to a struct, but it can also own it. Eventually, there can only be one owner and there can be many other pointers that only borrow the object.
When you assign a struct to a variable or pass it to a method, then you also pass the ownership.
The reference losing ownership cannot be used later to access the object.
If you pass the "address", like in C language using the &
character in front of it, then you borrow the struct.
The code cannot access the referenced struct while it is borrowed.
This is simple when you pass &something
to a method, because the caller is not running until the called method returns, at which time the borrowing also ceases.
It is more tricky in the case of multithreaded programming, though.
To make things even more complex, you can borrow something immutable and mutable.
There can be many immutable borrows at the same time, but a mutable one must be exclusive also excluding other immutable borrows.
Rust explicitly requires that you use the mut
keyword both at the place giving and accepting the value.
It has to be used in the declaration of the variable or function argument as well as where the struct is referenced.
A borrowed reference must not live longer than the reference owning the struct.
When the owning reference gets out of scope, the struct is automatically released.
The compiler generates code that frees the structs and all other structs referenced by this struct via owning references.
The struct can also implement the Drop
trait containing the function drop
, which will be invoked when the struct is dropped/deallocated.
Theis method can implement special actions when a struct goes out of scope.
The language also makes it possible to implement the deref
function of the trait Deref
used every time the *
dereference operator is used.
10. Learning Curve
Comments
Please leave your comments using Disqus, or just press one of the happy faces. If for any reason you do not want to leave a comment here, you can still create a Github ticket.