Compressed decision forests

ALGLIB numerical library includes parallel large-scale implementation of the decision forest algorithm available in C++, C#, Python and Delphi/FreePascal. This article describes the binary compression of ALGLIB decision forests - a recently introduced feature that allows significant (up to 6x) savings in the memory footprint, when compared with baseline random forests.

A general overview of the decision forests is provided in another article. You can also be interested in benchmarks, that compares ALGLIB decision forests with other random forest implementations.

Binary compression of random decision forests

Random forests are known to have impressive memory footprint. For a problem with N variables and M samples in the dataset an individual decision tree may need up to O(M) memory, and usually we have hundreds of trees in the forest. Thus, an upper bound on the memory consumption is O(NTrees·M) bytes.

Another point to consider is the efficiency of the underlying data structures, which controls the multiplier, hidden behind the Big-O notation. A decision tree node needs two values for the split (variable index and split value) and two pointers to the child nodes. Most modern C/C++ programs use 64-bit types, so we need at least 32 bytes to represent a split. If you use managed languages like C# or dynamic languages like Python, this value is likely to be much larger.

By default, ALGLIB decision forests use 64-bit types for their internal structures. However, it is possible to activate optional binary compression (dfbinarycompression function), which utilizes several tricks to compress the forest:

Even without binary compression, ALGLIB performs on par with the best random forests implementations. When the binary compression is activated, it results in the impressive 4x-6x reduction of the memory usage. See our benchmarks page for the detailed comparison. The only downside of the binary compression is that it makes inference ~1.5x slower due to the necessity of additional dynamic decoding.

This article is licensed for personal use only.

Download ALGLIB for C++ / C# / Java / Python / ...

ALGLIB Project offers you two editions of ALGLIB:

ALGLIB Free Edition:
+delivered for free
+offers full set of numerical functionality
+extensive algorithmic optimizations
-no multithreading
-non-commercial license

ALGLIB Commercial Edition:
+flexible pricing
+offers full set of numerical functionality
+extensive algorithmic optimizations
+high performance (SMP, SIMD)
+commercial license with support plan

Links to download sections for Free and Commercial editions can be found below:

ALGLIB 4.03.0 for C++

C++ library.
Delivered with sources.
Monolithic design.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 4.03.0 for C#

C# library with native kernels.
Delivered with sources.
VB.NET and IronPython wrappers.
Extreme portability.
Editions:   FREE   COMMERCIAL

ALGLIB 4.03.0 for Java

Java wrapper around HPC core.
Delivered with sources.
Seamless integration with Java.
Editions:   FREE   COMMERCIAL

ALGLIB 4.03.0 for Delphi

Delphi wrapper around C core.
Delivered as precompiled binary.
Compatible with FreePascal.
Editions:   FREE   COMMERCIAL

ALGLIB 4.03.0 for CPython

CPython wrapper around C core.
Delivered as precompiled binary.
Editions:   FREE   COMMERCIAL