GROOT forest

The GrootRandomForestClassifier class uses bootstrap aggregation and partially random feature selection to train an ensemble of GrootTreeClassifiers. On datasets with many features, a GrootRandomForestClassifier might perform better than a GrootTreeClassifier as it is not limited in the number of features it can use by a maximum size.

Example:

from sklearn.datasets import make_moons
X, y = make_moons(random_state=1)

from groot.model import GrootRandomForestClassifier
forest = GrootRandomForestClassifier(attack_model=[0.1, 0.1], random_state=1)
forest.fit(X, y)
print(forest.score(X, y))

1.0

`groot.model.GrootRandomForestClassifier (BaseGrootRandomForest, ClassifierMixin)`

A robust random forest for binary classification.

`init(self, n_estimators=100, max_depth=None, max_features='sqrt', min_samples_split=2, min_samples_leaf=1, robust_weight=1.0, attack_model=None, one_adversarial_class=False, verbose=False, chen_heuristic=False, max_samples=None, n_jobs=None, compile=True, random_state=None)` `special`

Parameters:

Name	Type	Description	Default
`n_estimators`	`int`	The number of decision trees to fit in the forest.	`100`
`max_depth`	`int`	The maximum depth for the decision trees once fitted.	`None`
`max_features`	`int or {"sqrt", "log2", None}`	The number of features to consider while making each split, if None then all features are considered.	`'sqrt'`
`min_samples_split`	`int`	The minimum number of samples required to split a tree node.	`2`
`min_samples_leaf`	`int`	The minimum number of samples required to make a tree leaf.	`1`
`robust_weight`	`float`	The ratio of samples that are actually moved by an adversary.	`1.0`
`attack_model`	`array-like of shape (n_features,)`	Attacker capabilities for perturbing X. The attack model needs to describe for every feature in which way it can be perturbed.	`None`
`one_adversarial_class`	`bool`	Whether one class (malicious, 1) perturbs their samples or if both classes (benign and malicious, 0 and 1) do so.	`False`
`verbose`	`bool`	Whether to print fitting progress on screen.	`False`
`chen_heuristic`	`bool`	Whether to use the heuristic for the adversarial Gini impurity from Chen et al. (2019) instead of GROOT's adversarial Gini impurity.	`False`
`max_samples`	`float`	The fraction of samples to draw from X to train each decision tree. If None (default), then draw X.shape[0] samples.	`None`
`n_jobs`	`int`	The number of jobs to run in parallel when fitting trees. See joblib.	`None`
`compile`	`bool`	Whether to compile decision trees for faster predictions.	`True`
`random_state`	`int`	Controls the sampling of the features to consider when looking for the best split at each node.	`None`

Attributes:

Name	Type	Description
`estimators_`	`list of GrootTree`	The collection of fitted sub-estimators.
`n_samples_`	`int`	The number of samples when `fit` is performed.
`n_features_`	`int`	The number of features when `fit` is performed.

`predict(self, X)`

Predict the classes of the input samples X.

The predicted class is the rounded average of the class labels in each predicted leaf.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features)`	The input samples to predict.	required

Returns:

Type	Description
`array-like of shape (n_samples,)`	The predicted class labels

`predict_proba(self, X)`

Predict class probabilities of the input samples X.

The class probability is the average of the probabilities predicted by each decision tree. The probability prediction of each tree is the fraction of samples of the same class in the leaf.

Parameters:

Name	Type	Description	Default
`X`	`array-like of shape (n_samples, n_features)`	The input samples to predict.	required

Returns:

Type	Description
`array of shape (n_samples,)`	The probability for each input sample of being malicious.

GROOT forest

groot.model.GrootRandomForestClassifier (BaseGrootRandomForest, ClassifierMixin)

__init__(self, n_estimators=100, max_depth=None, max_features='sqrt', min_samples_split=2, min_samples_leaf=1, robust_weight=1.0, attack_model=None, one_adversarial_class=False, verbose=False, chen_heuristic=False, max_samples=None, n_jobs=None, compile=True, random_state=None) special

predict(self, X)

predict_proba(self, X)

`groot.model.GrootRandomForestClassifier (BaseGrootRandomForest, ClassifierMixin)`

`init(self, n_estimators=100, max_depth=None, max_features='sqrt', min_samples_split=2, min_samples_leaf=1, robust_weight=1.0, attack_model=None, one_adversarial_class=False, verbose=False, chen_heuristic=False, max_samples=None, n_jobs=None, compile=True, random_state=None)` `special`

`predict(self, X)`

`predict_proba(self, X)`