- Miniboxing (OOPSLA 2013) - the first miniboxing paper, describing the representation,
- Late Data Layout (OOPSLA 2014) - the foundation of miniboxing (also described in detail here) and
- Miniboxing the Linked List (tech report) - describes the miniboxing transformation for Scala collections.
This document will only present a few of the bechmarks:
- macrobenchmarks - large and complex programs
- microbenchmarks - small and focused benchmarks
- bytecode size comparisons
To evaluate the miniboxing plugin, we implemented a mock-up of the Scala collections linked list and benchmarked the performance. The result: 1.5x-4x speedup just by adding the
@miniboxed annotation. And it’s worth pointing out our mock-up included all the common patterns found in the library:
Seq, closures, tuples etc.
The benchmark we ran is fitting a linear curve to a given set of points using the Least Squares method. Basically, we made a custom library and benchmarked this code:
With infinite heap memory (no garbage collection overhead)
We ran this code with two versions of the linked list: one with the plugin activated and one with generic classes. The numbers were obtained on an i7 server machine with 32GB of RAM, and we made sure no garbage collections occured (
This shows miniboxed linked lists are 1.5x to 2x faster than generic collections, despite the fact that linked lists are not contiguous, thus reducing the benefits of miniboxing. We have also tested specialization, but it ran out of memory and we were unable to get any garbage collection-free runs above 1500000 elements (we suspect this is due to bug SI-3585 Specialized class should not have duplicate fields, but haven’t examined in depth).
With limited heap memory
We also wanted to test how miniboxing copes with garbage collection cycles compared to the generic library. To do so, we limited the heap size to 2G (
To summarize, on linked lists, we can expect speedups between 1.5x and 4x, despite the non-contiguous nature of the linked list.
The full description of this experiment is available here.
A separate article describes the performance of the spire numeric abstraction library when using miniboxing.
Another important benchmark is the
ArrayBuffer.reverse. This is the most difficult benchmark to get right, since the
miniboxing transformation interacts with the Java Virtual Machine optimization heuristics and, if the transformation is
not done correctly, miniboxing can actually hurt performance (more details in the OOPSLA paper).
These are our current results:
genericis the generic code
miniboxingis the code generated by our plugin
specializationis the code generated by the
@specializedannotation in Scala
monomorphicis the code specialized by hand
These benchmarks are further described in a Miniboxing OOPSLA’13 paper.
When comparing the total bytecode size for spire we see a 4.5x bytecode reduction:
The OOPSLA’13 paper presents several other benchmarks:
- performance microbenchmarks
- on the HotSpot JVM with the Server compiler
- on the HotSpot JVM with the Graal compiler
- interpreter performance microbenchmarks
- bytecode size
- classloader overhead
- performance impact
- heap consumption
The SCALA’14 paper presents:
- a high-level overview of patterns in Scala
- benchmarks for a mock-up of the Scala collection linked list
In short, miniboxed code:
- is up to 22x faster than generic code
- it surpasses the performance of specialization
- is marginally slower compared to monomorphic code, with an overhead of about 10%
Comments are always welcome! But to make the best use of them, please consider this:
- If you have questions or feedback regarding the content of this page, please leave us a comment!
- If you have general questions about the miniboxing plugin, please ask on the mailing List.
- If you found a bug, please let us know on the github issue tracker.
Thanks! Looking forward to your messages!
comments powered by Disqus