dl85.unsupervised.clustering.DL85Cluster

class dl85.unsupervised.clustering.DL85Cluster(max_depth=1, min_sup=1, error_function=None, max_error=0, stop_after_better=False, time_limit=0, verbose=False, desc=False, asc=False, repeat_sort=False, leaf_value_function=None, print_output=False)[source]

An optimal binary decision tree classifier.

Parameters
max_depthint, default=1

Maximum depth of the tree to be found

min_supint, default=1

Minimum number of examples per leaf

max_errorint, default=0

Maximum allowed error. Default value stands for no bound. If no tree can be found that is strictly better, the model remains empty.

stop_after_betterbool, default=False

A parameter used to indicate if the search will stop after finding a tree better than max_error

time_limitint, default=0

Allocated time in second(s) for the search. Default value stands for no limit. The best tree found within the time limit is stored, if this tree is better than max_error.

verbosebool, default=False

A parameter used to switch on/off the print of what happens during the search

descbool, default=False

A parameter used to indicate if the sorting of the items is done in descending order of information gain

ascbool, default=False

A parameter used to indicate if the sorting of the items is done in ascending order of information gain

repeat_sortbool, default=False

A parameter used to indicate whether the sorting of items is done at each level of the lattice or only before the search

print_outputbool, default=False

A parameter used to indicate if the search output will be printed or not

Attributes
tree_str

Outputted tree in serialized form; remains empty as long as no model is learned.

size_int

The size of the outputted tree

depth_int

Depth of the found tree

error_float

Error of the found tree

accuracy_float

Accuracy of the found tree on training set

lattice_size_int

The number of nodes explored before found the optimal tree

runtime_float

Time of the optimal decision tree search

timeout_bool

Whether the search reached timeout or not

classes_ndarray, shape (n_classes,)

The classes seen at fit().

fit(X, X_error=None)[source]

Implements the standard fitting function for a DL8.5 classifier.

Parameters
Xarray-like, shape (n_samples, n_features)

The training input samples. If X_error is provided, it represents explanation input

X_errorarray-like, shape (n_samples, n_features_1)

The training input used to calculate error. If it is not provided X is used to calculate error

Returns
selfobject

Returns self.

fit_predict(X, y=None)

Perform clustering on X and returns cluster labels.

Parameters
Xarray-like of shape (n_samples, n_features)

Input data.

yIgnored

Not used, present for API consistency by convention.

Returns
labelsndarray of shape (n_samples,), dtype=np.int64

Cluster labels.

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

predict(X)[source]

Implements the standard predict function for a DL8.5 classifier.

Parameters
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns
yndarray, shape (n_samples,)

The label for each sample is the label of the closest sample seen during fit.

predict_proba(X)

Implements the standard predict function for a DL8.5 classifier.

Parameters
Xarray-like, shape (n_samples, n_features)

The input samples.

Returns
yndarray, shape (n_samples,)

The label for each sample is the label of the closest sample seen during fit.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.