dl85.supervised.classifiers.DL85Booster¶

class dl85.supervised.classifiers.DL85Booster(base_estimator=None, max_depth=1, min_sup=1, max_iterations=0, model=2, gamma=None, error_function=None, fast_error_function=None, opti_gap=0.01, max_error=0, regulator=-1, stop_after_better=False, time_limit=0, verbose=False, desc=False, asc=False, repeat_sort=False, print_output=False, quiet=True)[source]¶

An optimal binary decision tree classifier.

Parameters

base_etimatorclassifier, default=None: The base classifier to boost
max_depthint, default=1: Maximum depth of the tree to be found
min_supint, default=1: Minimum number of examples per leaf
max_iterationsint, default=0: The maximum number of iterations after which the search is stopped. Default value means “no stop on iterations”
modelint, default=MODEL_LP_DEMIRIZ: The column generation model to solve
gammastr, default=None: Variance matrix parameter for MDBoost
error_functionfunction, default=None: User-specific error function based on transactions
fast_error_functionfunction, default=None: User-specific error function based on supports per class
opti_gapfloat, default=0.01: This value is a tolerance to stop the column generation before optimality. It fixes the convergence problem of column generation approaches
max_errorint, default=0: Maximum allowed error. Default value stands for no bound. If no tree can be found that is strictly better, the model remains empty.
stop_after_betterbool, default=False: A parameter used to indicate if the search will stop after finding a tree better than max_error
regulatorfloat, default=-1: This is the regularization parameter of column generation models.
time_limitint, default=0: Allocated time in second(s) for the search. Default value stands for no limit. The best tree found within the time limit is stored, if this tree is better than max_error.
verbosebool, default=False: A parameter used to switch on/off the print of what happens during the search
descbool, default=False: A parameter used to indicate if the sorting of the items is done in descending order of information gain
ascbool, default=False: A parameter used to indicate if the sorting of the items is done in ascending order of information gain
repeat_sortbool, default=False: A parameter used to indicate whether the sorting of items is done at each level of the lattice or only before the search
quietbool, default=True: Whether to print or not the column generation details
print_outputbool, default=False: A parameter used to indicate if the search output will be printed or not

Attributes

estimators_list: The list of estimators in the final ensemble.
estimator_weights_list: The weight of each estimator.
n_estimators_int: Total number of estimators
n_iterations_int: Total number of iterations needed to find the optimal ensemble.
objective_float: The objective value reached by the ensemble.
accuracy_float: Accuracy of the found tree on training set
margins_list: The list of margin of the found ensemble on the training set
margins_norm_list: Same value as above but normalized. Each value is between -1 and 1.
duration_float: Time of the optimal forest search
optimal_bool: Whether the ensemble is optimal or not
classes_ndarray, shape (n_classes,): The classes seen at fit().

get_params(deep=True)¶

Get parameters for this estimator.

Parameters

deepbool, default=True: If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

paramsdict: Parameter names mapped to their values.

score(X, y, sample_weight=None)¶

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters

Xarray-like of shape (n_samples, n_features): Test samples.
yarray-like of shape (n_samples,) or (n_samples, n_outputs): True labels for X.
sample_weightarray-like of shape (n_samples,), default=None: Sample weights.

Returns

scorefloat: Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**paramsdict: Estimator parameters.

Returns

selfestimator instance: Estimator instance.

softmax(X, copy=True)[source]¶

Calculate the softmax function. The softmax function is calculated by np.exp(X) / np.sum(np.exp(X), axis=1) This will cause overflow when large values are exponentiated. Hence the largest value in each row is subtracted from each data point to prevent this. Parameters ———- X : array-like of float of shape (M, N)

Argument to the logistic function.

copybool, default=True: Copy X or not.

Returns¶

outndarray of shape (M, N): Softmax function evaluated at every point in x.