Welcome to PyDL8.5’s documentation!

Decision Trees (DTs) are machine learning models used for classification and other prediction tasks. They perform prediction by means of simple decision rules inferred from data.

Traditional algorithms for learning decision trees, such as CART and C4.5, are heuristic in nature. However, as a result, the trees that are learned by these algorithms may sometimes be more complex than necessary, and hence less interpretable.

This repository contains an implementation of DL8.5, an algorithm for finding optimal decision trees under formal requirements on the accuracy, support and depth of the decision trees to be found. Details about this algorithm can be found in [ANS20a] and [ANS20b]. The key idea underlying this algorithm is the use of a cache of itemsets in combination with branch-and-bound search; this new type of cache also stores results for parts of the search space that have been traversed partially. An experimental comparison with other methods in [ANS20a] shows that DL8.5’s performance is much better than that of competing methods.

This implementation is scikit-learn compatible and can be used in combination with scikit-learn.

Getting started: User guide

This is a small tutorial that explains how to use DL8.5.

API Documentation

This is the API documentation of PyDL8.5.

Examples

These examples illustrate further how DL8.5 can be used; for more detailed information, please consult the User Guide.

References

ANS20a

Gaël Aglin, Siegfried Nijssen, and Pierre Schaus. Learning optimal decision trees using caching branch-and-bound search. In AAAI. 2020.

ANS20b

Gaël Aglin, Siegfried Nijssen, and Pierre Schaus. Pydl8.5: a library for learning optimal decision trees (demonstration). In IJCAI. 2020.