Random forest benchmarks

ALGLIB numerical library includes parallel large-scale implementation of the decision forest algorithm available in C++, C#, Python and Delphi/FreePascal. In this article we compare the performance of ALGLIB decision forests with other random forests implementations. A general overview of decision forests is provided in another article.

Libraries and test problems

Following random forest libraries are compared:

The following synthetic problem was used for testing: 64 variables, 10240 samples, 2 classes. Each of the algorithms was configured to build a forest with 100 trees, using sqrt(N) variables for the splits. All libraries in the list support the parallel forest construction, but for this test only one CPU core was used.

In addition to the running time we also compare model accuracies (out-of-bag estimates, as reported by the algorithm) and model sizes (after serialization). The results are as follows:

Results and discussion

The results are as follows:

Random forests benchmark (single-threaded)
Library
Build time
OOB err
Model size
ALGLIB (compressed)
5 s
0.25
1469 KB
ALGLIB (default)
5 s
0.25
7317 KB
Ranger
9 s
0.25
6957 KB
scikit-learn
10 s
0.25
17377 KB
Accord.NET
9421 s
n/a
19822 KB

First, one may see that all random forest implementations have nearly the same accuracy. They are all implementations of the same core algorithm, after all. The only exception is Accord.NET which does not report out-of-bag estimates, but we have no reasons to suspect that it is worse or better than its competitors in this respect.

The running times, however, are significantly different. ALGLIB library clearly outperforms its best competitors (Ranger and sklearn) by roughly 2x factor. This behavior is roughly scale-independent and is explained by the better optimized implementation of the same ideas. On the other hand, Accord.NET is a clear outlier - its performance on this problem is 1800 times worse than that of ALGLIB.

Finally, we move to the memory footprint of the random forests. Generally, all random forest implementations have the same upper bound on the model size: O(NTrees·M), where NTrees is the forest size and M is the dataset size (samples count). However, depending on the implementation details (whether it is a dynamic language like Python, managed .NET environment or the native C/C++ application), different multipliers can be hidden behind the big-O notation.

By default, ALGLIB produces uncompressed decision trees that result in roughly the same memory usage as that of Ranger (C++), with Python (sklearn) and C# (Accord.NET) implementation being roughly 2.5x worse than both ALGLIB and Ranger. The situation changes when the binary model compression (dfbinarycompression function) is activated: compressed decision forests are 5x smaller than the ones produced by Ranger, 12x smaller than the ones produced by sklearn and 14x smaller than the ones produced by Accord.NET.

Conclusions

ALGLIB is 2x faster than the closest competitors (Ranger, sklearn) and produces compact models even in its default configuration (no binary compression). Compressed trees produced by ALGLIB are the most compact trees ever, which makes it a good choice for extremely large machine learning problems!

This article is licensed for personal use only.

Download ALGLIB for C++ / C# / Java / Python / ...

ALGLIB Project offers you two editions of ALGLIB:

ALGLIB Free Edition:
+delivered for free
+offers full set of numerical functionality
+extensive algorithmic optimizations
-no multithreading
-non-commercial license

ALGLIB Commercial Edition:
+flexible pricing
+offers full set of numerical functionality
+extensive algorithmic optimizations
+high performance (SMP, SIMD)
+commercial license with support plan

Links to download sections for Free and Commercial editions can be found below:

ALGLIB 4.01.0 for C++

C++ library.
Delivered with sources.
Monolithic design.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 4.01.0 for C#

C# library with native kernels.
Delivered with sources.
VB.NET and IronPython wrappers.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 4.01.0 for Java

Java wrapper around HPC core.
Delivered with sources.
Seamless integration with Java.
Editions:   FREE   COMMERCIAL

ALGLIB 4.01.0 for Delphi

Delphi wrapper around C core.
Delivered as precompiled binary.
Compatible with FreePascal.
Editions:   FREE   COMMERCIAL

ALGLIB 4.01.0 for CPython

CPython wrapper around C core.
Delivered as precompiled binary.
Editions:   FREE   COMMERCIAL