dl85.supervised.classifiers.DL85Booster

class dl85.supervised.classifiers.DL85Booster(base_estimator=None, max_depth=1, min_sup=1, max_iterations=0, model=2, gamma=None, error_function=None, fast_error_function=None, opti_gap=0.01, max_error=0, regulator=-1, stop_after_better=False, time_limit=0, verbose=False, desc=False, asc=False, repeat_sort=False, print_output=False, quiet=True)[source]

An optimal binary decision tree classifier.

Parameters
base_etimatorclassifier, default=None

The base classifier to boost

max_depthint, default=1

Maximum depth of the tree to be found

min_supint, default=1

Minimum number of examples per leaf

max_iterationsint, default=0

The maximum number of iterations after which the search is stopped. Default value means “no stop on iterations”

modelint, default=MODEL_LP_DEMIRIZ

The column generation model to solve

gammastr, default=None

Variance matrix parameter for MDBoost

error_functionfunction, default=None

User-specific error function based on transactions

fast_error_functionfunction, default=None

User-specific error function based on supports per class

opti_gapfloat, default=0.01

This value is a tolerance to stop the column generation before optimality. It fixes the convergence problem of column generation approaches

max_errorint, default=0

Maximum allowed error. Default value stands for no bound. If no tree can be found that is strictly better, the model remains empty.

stop_after_betterbool, default=False

A parameter used to indicate if the search will stop after finding a tree better than max_error

regulatorfloat, default=-1

This is the regularization parameter of column generation models.

time_limitint, default=0

Allocated time in second(s) for the search. Default value stands for no limit. The best tree found within the time limit is stored, if this tree is better than max_error.

verbosebool, default=False

A parameter used to switch on/off the print of what happens during the search

descbool, default=False

A parameter used to indicate if the sorting of the items is done in descending order of information gain

ascbool, default=False

A parameter used to indicate if the sorting of the items is done in ascending order of information gain

repeat_sortbool, default=False

A parameter used to indicate whether the sorting of items is done at each level of the lattice or only before the search

quietbool, default=True

Whether to print or not the column generation details

print_outputbool, default=False

A parameter used to indicate if the search output will be printed or not

Attributes
estimators_list

The list of estimators in the final ensemble.

estimator_weights_list

The weight of each estimator.

n_estimators_int

Total number of estimators

n_iterations_int

Total number of iterations needed to find the optimal ensemble.

objective_float

The objective value reached by the ensemble.

accuracy_float

Accuracy of the found tree on training set

margins_list

The list of margin of the found ensemble on the training set

margins_norm_list

Same value as above but normalized. Each value is between -1 and 1.

duration_float

Time of the optimal forest search

optimal_bool

Whether the ensemble is optimal or not

classes_ndarray, shape (n_classes,)

The classes seen at fit().

get_params(deep=True)

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters
Xarray-like of shape (n_samples, n_features)

Test samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs)

True labels for X.

sample_weightarray-like of shape (n_samples,), default=None

Sample weights.

Returns
scorefloat

Mean accuracy of self.predict(X) w.r.t. y.

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

softmax(X, copy=True)[source]

Calculate the softmax function. The softmax function is calculated by np.exp(X) / np.sum(np.exp(X), axis=1) This will cause overflow when large values are exponentiated. Hence the largest value in each row is subtracted from each data point to prevent this. Parameters ———- X : array-like of float of shape (M, N)

Argument to the logistic function.

copybool, default=True

Copy X or not.

Returns

outndarray of shape (M, N)

Softmax function evaluated at every point in x.