dl85.unsupervised.clustering.DL85Cluster¶

class dl85.unsupervised.clustering.DL85Cluster(max_depth=1, min_sup=1, error_function=None, max_error=0, stop_after_better=False, time_limit=0, verbose=False, desc=False, asc=False, repeat_sort=False, leaf_value_function=None, print_output=False)[source]¶

An optimal binary decision tree classifier.

Parameters

max_depthint, default=1: Maximum depth of the tree to be found
min_supint, default=1: Minimum number of examples per leaf
max_errorint, default=0: Maximum allowed error. Default value stands for no bound. If no tree can be found that is strictly better, the model remains empty.
stop_after_betterbool, default=False: A parameter used to indicate if the search will stop after finding a tree better than max_error
time_limitint, default=0: Allocated time in second(s) for the search. Default value stands for no limit. The best tree found within the time limit is stored, if this tree is better than max_error.
verbosebool, default=False: A parameter used to switch on/off the print of what happens during the search
descbool, default=False: A parameter used to indicate if the sorting of the items is done in descending order of information gain
ascbool, default=False: A parameter used to indicate if the sorting of the items is done in ascending order of information gain
repeat_sortbool, default=False: A parameter used to indicate whether the sorting of items is done at each level of the lattice or only before the search
print_outputbool, default=False: A parameter used to indicate if the search output will be printed or not

Attributes

tree_str: Outputted tree in serialized form; remains empty as long as no model is learned.
size_int: The size of the outputted tree
depth_int: Depth of the found tree
error_float: Error of the found tree
accuracy_float: Accuracy of the found tree on training set
lattice_size_int: The number of nodes explored before found the optimal tree
runtime_float: Time of the optimal decision tree search
timeout_bool: Whether the search reached timeout or not
classes_ndarray, shape (n_classes,): The classes seen at fit().

fit(X, X_error=None)[source]¶

Implements the standard fitting function for a DL8.5 classifier.

Parameters

Xarray-like, shape (n_samples, n_features): The training input samples. If X_error is provided, it represents explanation input
X_errorarray-like, shape (n_samples, n_features_1): The training input used to calculate error. If it is not provided X is used to calculate error

Returns

selfobject: Returns self.

fit_predict(X, y=None)¶

Perform clustering on X and returns cluster labels.

Parameters

Xarray-like of shape (n_samples, n_features): Input data.
yIgnored: Not used, present for API consistency by convention.

Returns

labelsndarray of shape (n_samples,), dtype=np.int64: Cluster labels.

get_params(deep=True)¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

predict(X)[source]¶

Implements the standard predict function for a DL8.5 classifier.

Parameters

Xarray-like, shape (n_samples, n_features): The input samples.

Returns

yndarray, shape (n_samples,): The label for each sample is the label of the closest sample seen during fit.

predict_proba(X)¶

Implements the standard predict function for a DL8.5 classifier.

Parameters

Xarray-like, shape (n_samples, n_features): The input samples.

Returns

yndarray, shape (n_samples,): The label for each sample is the label of the closest sample seen during fit.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.