Toolbox
The groot.toolbox
package exposes the Model
class that allows easy loading, converting and attacking decision tree ensembles of different formats. The Model
class can load tree ensembles from the following formats using:
- Scikit-learn:
Model.from_sklearn
- JSON file:
Model.from_json_file
- GROOT:
Model.from_groot
- TREANT:
Model.from_treant
- Provably robust boosting:
Model.from_provably_robust_boosting
After loading you can then easily determine metrics such as accuracy and adversarial accuracy (against a given perturbation radius epsilon). It is also possible to get access to more information about adversarial robustness than just a metric. The model class has three methods for this:
attack_feasibility
: Compute for each sample whether or not an adversarial example exists within an radius around it.attack_distances
: Compute for each sample the distance it needs to move to turn into an adversarial example.adversarial_examples
: Generate adversarial examples for each input sample.
These three methods are theoretically listed in order of increasing complexity. That means that when you only need to know e.g. attack_feasibility
and not attack_distances
calling only the first function might be faster than calling the second and computing the 'feasibility' from that. For example for the default 'milp'
attack, attack_feasibility
is orders of magnitude faster than attack_distances
and adversarial_examples
.
Example
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from groot.toolbox import Model
X, y = load_iris(return_X_y=True)
tree = DecisionTreeClassifier(max_depth=3)
tree.fit(X, y)
model = Model.from_sklearn(tree)
print("Accuracy:", model.accuracy(X, y))
epsilon = 0.3
print("Adversarial accuracy:", model.adversarial_accuracy(X, y, epsilon=epsilon))
X_adv = model.adversarial_examples(X, y)
print("Adversarial examples:")
print(X_adv)
Code reference
groot.toolbox
Model
__init__(self, json_model, n_classes)
special
General model class that exposes a common API for evaluating decision tree (ensemble) models. Usually you won't have to call this constructor manually, instead use from_json_file
, from_sklearn
, from_treant
, from_provably_robust_boosting
or from_groot
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
json_model |
list of dicts |
List of decision trees encoded as dicts. See the XGBoost JSON format. |
required |
n_classes |
int |
Number of classes that this model predicts. |
required |
accuracy(self, X, y)
Determine the accuracy of the model on unperturbed samples.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features) |
Input samples. |
required |
y |
array-like of shape (n_samples,) |
True labels. |
required |
Returns:
Type | Description |
---|---|
float |
Accuracy on unperturbed samples. |
adversarial_accuracy(self, X, y, attack='auto', order=inf, epsilon=0.0, options={})
Determine the accuracy against adversarial examples within maximum perturbation radius epsilon.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features) |
Samples to attack. |
required |
y |
array-like of shape (n_samples,) |
True labels for the samples. |
required |
attack |
{"auto", "milp", "tree"} |
The attack to use, if "auto" the attack is chosen automatically: - "milp" for optimal attacks on tree ensembles using a Mixed-Integer Linear Programming formulation. - "tree" for optimal attacks on single decision trees by enumerating all possible paths through the tree. |
'auto' |
order |
{0, 1, 2, inf} |
L-norm order to use. See numpy documentation of more explanation. |
inf |
epsilon |
float |
Maximum distance by which samples can move. |
0.0 |
Returns:
Type | Description |
---|---|
float |
Adversarial accuracy given the maximum perturbation radius epsilon. |
adversarial_examples(self, X, y, attack='auto', order=inf, options={})
Create adversarial examples for each input sample.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features) |
Samples to attack. |
required |
y |
array-like of shape (n_samples,) |
True labels for the samples. |
required |
attack |
{"auto", "milp", "tree"} |
The attack to use, if "auto" the attack is chosen automatically: - "milp" for optimal attacks on tree ensembles using a Mixed-Integer Linear Programming formulation. - "tree" for optimal attacks on single decision trees by enumerating all possible paths through the tree. |
'auto' |
order |
{0, 1, 2, inf} |
L-norm order to use. See numpy documentation of more explanation. |
inf |
options |
dict |
Extra attack-specific options. |
{} |
Returns:
Type | Description |
---|---|
ndarray of shape (n_samples, n_features) |
Adversarial examples. |
attack_distance(self, X, y, attack='auto', order=inf, options={})
Determine the perturbation distance for each sample to make an adversarial example.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features) |
Samples to attack. |
required |
y |
array-like of shape (n_samples,) |
True labels for the samples. |
required |
attack |
{"auto", "milp", "tree"} |
The attack to use, if "auto" the attack is chosen automatically: - "milp" for optimal attacks on tree ensembles using a Mixed-Integer Linear Programming formulation. - "tree" for optimal attacks on single decision trees by enumerating all possible paths through the tree. |
'auto' |
order |
{0, 1, 2, inf} |
L-norm order to use. See numpy documentation of more explanation. |
inf |
options |
dict |
Extra attack-specific options. |
{} |
Returns:
Type | Description |
---|---|
ndarray of shape (n_samples,) of floats |
Distances to create adversarial examples. |
attack_feasibility(self, X, y, attack='auto', order=inf, epsilon=0.0, options={})
Determine whether an adversarial example is feasible for each sample given the maximum perturbation radius epsilon.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features) |
Samples to attack. |
required |
y |
array-like of shape (n_samples,) |
True labels for the samples. |
required |
attack |
{"auto", "milp", "tree"} |
The attack to use, if "auto" the attack is chosen automatically: - "milp" for optimal attacks on tree ensembles using a Mixed-Integer Linear Programming formulation. - "tree" for optimal attacks on single decision trees by enumerating all possible paths through the tree. |
'auto' |
order |
{0, 1, 2, inf} |
L-norm order to use. See numpy documentation of more explanation. |
inf |
epsilon |
float |
Maximum distance by which samples can move. |
0.0 |
options |
dict |
Extra attack-specific options. |
{} |
Returns:
Type | Description |
---|---|
ndarray of shape (n_samples,) of booleans |
Vector of True/False. Whether an adversarial example is feasible. |
decision_function(self, X)
Compute prediction values for some samples. These values are the sum of leaf values in which the samples end up.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features) |
Samples to predict. |
required |
Returns:
Type | Description |
---|---|
ndarray of shape (n_samples) or ndarray of shape (n_samples, n_classes) |
Predicted values. Returns a 1-dimensional array if n_classes=2, else a 2-dimensional array. |
from_groot(classifier)
staticmethod
Create a Model instance from a GrootTree, GrootRandomForest or GROOT OneVsRestClassifier.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
classifier |
GrootTree, GrootRandomForest or OneVsRestClassifier (of GROOT models) |
GROOT model to load. |
required |
Returns:
Type | Description |
---|---|
Model |
Instantiated Model object. |
from_json_file(filename, n_classes)
staticmethod
Create a Model instance from a JSON file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str |
Path to JSON file that contains a list of decision trees encoded as dicts. See the XGBoost JSON format. |
required |
n_classes |
int |
Number of classes that this model predicts. |
required |
Returns:
Type | Description |
---|---|
Model |
Instantiated Model object. |
from_provably_robust_boosting(classifier)
staticmethod
Create a Model instance from a Provably Robust Boosting TreeEnsemble.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
classifier |
groot.provably_robust_boosting.TreeEnsemble |
Provably Robust Boosting model to load. |
required |
Returns:
Type | Description |
---|---|
Model |
Instantiated Model object. |
from_sklearn(classifier)
staticmethod
Create a Model instance from a Scikit-learn classifier.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
classifier |
DecisionTreeClassifier, RandomForestClassifier or GradientBoostingClassifier |
Scikit-learn model to load. |
required |
Returns:
Type | Description |
---|---|
Model |
Instantiated Model object. |
from_treant(classifier)
staticmethod
Create a Model instance from a TREANT decision tree.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
classifier |
groot.treant.RobustDecisionTree |
TREANT model to load. |
required |
Returns:
Type | Description |
---|---|
Model |
Instantiated Model object. |
predict(self, X)
Predict classes for some samples. The raw prediction values are turned into class labels.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X |
array-like of shape (n_samples, n_features) |
Samples to predict. |
required |
Returns:
Type | Description |
---|---|
ndarray of shape (n_samples) |
Predicted class labels. |
to_json(self, filename, indent=2)
Export the model object to a JSON file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename |
str |
Name of the JSON file to export to. |
required |
indent |
int |
Number of spaces to use for indentation in the JSON file. Can be reduced to save storage. |
2 |