Learn all about miniboxing in a 5 minute demo.


MbArray Tutorial

MbArray is an indexed sequence container that matches the performance of raw arrays when used in miniboxed contexts. Also, unlike arrays, MbArray creation does not require the presence of a ClassTag, which makes it more versatile. This page will describe the motivation behind MbArray and will show how to use it through examples.

Motivation

Raw arrays offer the best access performance for primitive types, but they can only be instantiated in generic contexts if a ClassTag is present – which is not always possible. Consider typing the following code in the REPL :

scala> class A[T](len: Int) {
     |   val array = new Array[T](len)
     | }

Pressing enter will result in the following error :

<console>:8: error: cannot find class tag for element type T
       val array = new Array[T](len)

This could be addressed, at the expense of performance, by letting the scala runtime have access to type informations about T using a ClassTag from scala.reflect :

scala> import scala.reflect._
import scala.reflect._

scala> class B[T : ClassTag](len: Int) {
     |   val array = new Array[T](len)
     | }
defined class B

While this works, it may not be sufficiently fast for algorithms where high performance is expected. Instead, MbArray combined with the miniboxing transformation offers performances that are similar to those of raw arrays without having to carry around a ClassTag. The following code will work without requiring any condition on T. On top of that, any read or write to the array will perform better.

scala> class C[@miniboxed T](len :Int) {
     |   val array = MbArray.empty[T](len)
     | }
Specializing class C...

  ... (The miniboxing transformation operates)

defined class C

MbArrays are included in the runtime support package for the miniboxing transformation. To see how to add miniboxing to your project, please see this page.

Usage

Let’s take a closer look at how exactly a program can be transformed to take full advantage of the miniboxing transformation and MbArrays. Consider a classic implementation of the merge sort algorithm using a raw Array with an implicit ClassTag :

import scala.reflect._
import scala.util._

object MergeSort {
  def mergeSort[T : ClassTag](ary: Array[T], comp: (T, T) => Boolean): Array[T] = {
    def merge(a: Array[T], b: Array[T]): Array[T] = {
      val res = new Array[T](a.length + b.length)
      var ai = 0
      var bi = 0
      while (ai < a.length && bi < b.length) {
        if (comp(a(ai), b(bi))) {
          res(ai + bi) = a(ai)
          ai += 1
	} else {
          res(ai + bi) = b(bi)
          bi += 1
	}
      }
      while (ai < a.length) {
        res(ai + bi) = a(ai)
        ai += 1
      }
      while (bi < b.length) {
        res(ai + bi) = b(bi)
        bi += 1
      }
      res
    }
    val len = ary.length
    if (len <= 1) ary
    else {
      val mid = len / 2
      val a = new Array[T](mid)
      val b = new Array[T](len - mid)
	  
      var i = 0
      while (i < mid) {
        a(i) = ary(i)
        i += 1
      }
      while (i < len) {
        b(i - mid) = ary(i)
        i += 1
      }
	  
      merge(mergeSort(a, comp), mergeSort(b, comp))
    }
  }
  
  final val len = 50
  
  def main(args: Array[String]) = {
    val ary = new Array[Int](len)
    val rnd = new Random
    for (i <- 0 until len) {
      ary(i) = rnd.nextInt(len)
    }
    val sorted = mergeSort(ary, (a: Int, b: Int) => a < b)
  }
}
  

The transformation

Now let’s transform the code above such that it uses miniboxing and MbArrays.

  1. Let’s first add the line import MbArray._.
  2. Then, replace all occurences of the type Array[T] by MbArray[T], and all the array instantiations new Array[T](...) by MbArray.empty[T](...).
  3. Finally, remove the ClassTag bound on the type parameter T.

Compiling at this point will yield the following output :

[warn] (...) The method MergeSort.mergeSort would benefit from miniboxing type parameter T, 
since it is instantiated by a primitive type.
[warn]     val sorted = mergeSort(ary, (a: Int, b: Int) => a < b)
[warn]                  ^
[warn] (...) The following code instantiating an `MbArray` object cannot be optimized since the 
type argument is not a primitive type (like Int), a miniboxed type parameter or a subtype of 
AnyRef. This means that primitive types could end up boxed:
[warn]    val res = MbArray.empty[T](a.length + b.length)
[warn]                      ^
[warn] (...) The following code instantiating an `MbArray` object cannot be optimized since the 
type argument is not a primitive type (like Int), a miniboxed type parameter or a subtype of 
AnyRef. This means that primitive types could end up boxed:
[warn]    val a = MbArray.empty[T](mid)
[warn]                    ^
[warn] (...) The following code instantiating an `MbArray` object cannot be optimized since the 
type argument is not a primitive type (like Int), a miniboxed type parameter or a subtype of 
AnyRef. This means that primitive types could end up boxed:
[warn]    val b = MbArray.empty[T](len - mid)
[warn]                    ^
[warn] 5 warnings found

The miniboxing plugin informs us that code is suboptimal and could get faster if we were to use the @miniboxed annotation on the type parameter T. After proceeding and compiling again, we observe that there are no more warnings and our code has been successfully optimized by the miniboxing transformation. The final transformation looks like this :

import scala.reflect._
import scala.util._

object MergeSort {
  def mergeSort[@miniboxed T](ary: MbArray[T], comp: (T, T) => Boolean): MbArray[T] = {
    def merge(a: MbArray[T], b: MbArray[T]): MbArray[T] = {
      val res = MbArray.empty[T](a.length + b.length)
      var ai = 0
      var bi = 0
      while (ai < a.length && bi < b.length) {
        if (comp(a(ai), b(bi))) {
          res(ai + bi) = a(ai)
          ai += 1
        } else {
          res(ai + bi) = b(bi)
          bi += 1
        }
      }
      while (ai < a.length) {
          res(ai + bi) = a(ai)
          ai += 1
      }
      while (bi < b.length) {
          res(ai + bi) = b(bi)
          bi += 1
      }
      res
    }
    val len = ary.length
    if (len <= 1) ary
    else {
      val mid = len / 2
      val a = MbArray.empty[T](mid)
      val b = MbArray.empty[T](len - mid)
      
      var i = 0
      while (i < mid) {
        a(i) = ary(i)
        i += 1
      }
      while (i < len) {
        b(i - mid) = ary(i)
        i += 1
      }
      
      merge(mergeSort(a, comp), mergeSort(b, comp))
    }
  }
  
  final val len = 50
  
  def main(args: Array[String]) = {
    val ary = MbArray.empty[Int](len)
    val rnd = new Random
    for (i <- 0 until len) {
      ary(i) = rnd.nextInt(len)
    }
    val sorted = mergeSort(ary, (a: Int, b: Int) => a < b)
  }
}
  

Benchmarks

We benchmarked the merge sort algorithm implementation above with different sizes of array and ended up with the following numbers (in milliseconds) :

Array Size Array with ClassTag MbArray Improvement
500’000 713 528 35%
1’000’000 1536 1163 32%
3’000’000 5311 3545 50%

We can observe an average speedup of approximately 40%. Note that the Array with ClassTag version is compiled without the miniboxing plugin. You can try it yourself by downloading the benchmarks here.

Conclusion

MbArray is therefore a great choice of the underlying container for any custom collection, as it does not impose additional conditions on the type parameter(s) of the collection, without compromising the performances.

Please keep in mind that the miniboxing plugin is a beta release, and you may encounter occasional hickups as we are working towards a feature-complete and super-stable implementation.

We are doing our best to make miniboxing a stable transformation, with nightly builds and hundreds of test cases running every night. Yet, there are bugs we haven't fixed yet, so don't be surprised if the plugin fails on some programs. But please do file bugs for such failures, so we can fix them asap! (on average, we fixed a bug every 3 days for the last two months!)

The miniboxing release currently supports two specific versions of the Scala compiler: paper

  • 2.11.7, the current release in the 2.11 series and
  • 2.10.6, the current release in the 2.10 series

Due to compiler bugs (for the 2.10 series) and binary incompatibility of the compiler API (for the 2.11 series), we do not currently support the other versions. We are planning, however, to fully cross-compile against all 2.10 and 2.11 versions and to provide error messages for the earlier and unsupported versions of the Scala compiler. We're sorry for this, but the engineering effort in supporting even two versions of the compiler is significant!

Comments

Comments are always welcome! But to make the best use of them, please consider this:

Thanks! Looking forward to your messages!


comments powered by Disqus