dl85.supervised.classifiers.DL85Booster¶
- class dl85.supervised.classifiers.DL85Booster(base_estimator=None, max_depth=1, min_sup=1, max_iterations=0, model=2, gamma=None, error_function=None, fast_error_function=None, opti_gap=0.01, max_error=0, regulator=-1, stop_after_better=False, time_limit=0, verbose=False, desc=False, asc=False, repeat_sort=False, print_output=False, quiet=True)[source]¶
An optimal binary decision tree classifier.
- Parameters
- base_etimatorclassifier, default=None
The base classifier to boost
- max_depthint, default=1
Maximum depth of the tree to be found
- min_supint, default=1
Minimum number of examples per leaf
- max_iterationsint, default=0
The maximum number of iterations after which the search is stopped. Default value means “no stop on iterations”
- modelint, default=MODEL_LP_DEMIRIZ
The column generation model to solve
- gammastr, default=None
Variance matrix parameter for MDBoost
- error_functionfunction, default=None
User-specific error function based on transactions
- fast_error_functionfunction, default=None
User-specific error function based on supports per class
- opti_gapfloat, default=0.01
This value is a tolerance to stop the column generation before optimality. It fixes the convergence problem of column generation approaches
- max_errorint, default=0
Maximum allowed error. Default value stands for no bound. If no tree can be found that is strictly better, the model remains empty.
- stop_after_betterbool, default=False
A parameter used to indicate if the search will stop after finding a tree better than max_error
- regulatorfloat, default=-1
This is the regularization parameter of column generation models.
- time_limitint, default=0
Allocated time in second(s) for the search. Default value stands for no limit. The best tree found within the time limit is stored, if this tree is better than max_error.
- verbosebool, default=False
A parameter used to switch on/off the print of what happens during the search
- descbool, default=False
A parameter used to indicate if the sorting of the items is done in descending order of information gain
- ascbool, default=False
A parameter used to indicate if the sorting of the items is done in ascending order of information gain
- repeat_sortbool, default=False
A parameter used to indicate whether the sorting of items is done at each level of the lattice or only before the search
- quietbool, default=True
Whether to print or not the column generation details
- print_outputbool, default=False
A parameter used to indicate if the search output will be printed or not
- Attributes
- estimators_list
The list of estimators in the final ensemble.
- estimator_weights_list
The weight of each estimator.
- n_estimators_int
Total number of estimators
- n_iterations_int
Total number of iterations needed to find the optimal ensemble.
- objective_float
The objective value reached by the ensemble.
- accuracy_float
Accuracy of the found tree on training set
- margins_list
The list of margin of the found ensemble on the training set
- margins_norm_list
Same value as above but normalized. Each value is between -1 and 1.
- duration_float
Time of the optimal forest search
- optimal_bool
Whether the ensemble is optimal or not
- classes_ndarray, shape (n_classes,)
The classes seen at
fit()
.
- get_params(deep=True)¶
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- score(X, y, sample_weight=None)¶
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters
- Xarray-like of shape (n_samples, n_features)
Test samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs)
True labels for X.
- sample_weightarray-like of shape (n_samples,), default=None
Sample weights.
- Returns
- scorefloat
Mean accuracy of
self.predict(X)
w.r.t. y.
- set_params(**params)¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.
- softmax(X, copy=True)[source]¶
Calculate the softmax function. The softmax function is calculated by np.exp(X) / np.sum(np.exp(X), axis=1) This will cause overflow when large values are exponentiated. Hence the largest value in each row is subtracted from each data point to prevent this. Parameters ———- X : array-like of float of shape (M, N)
Argument to the logistic function.
- copybool, default=True
Copy X or not.
Returns¶
- outndarray of shape (M, N)
Softmax function evaluated at every point in x.