niteshade.attack.BrewPoison

class niteshade.attack.BrewPoison(target, M=10, aggressiveness=0.1, alpha=0.8, start_ep=10, total_eps=20, one_hot=False)

Bases: niteshade.attack.PerturbPointsAttacker

Perturb points while minimising detectability.

Given a batch of input data and corresponding labels, the user chooses which label to target. Lets take the example of MNIST, and say the user targets the label 1. Then, all points in the batch with label 1 will be identified. Aggressiveness helps determine the maximum number of points that can be perturbed, ie, poison_budget. So, poison_budget number of points are identified from the set of points with label 1.

A random perturbation is initialised in the range (0,1). However, the data probably is not normalised to this range. For image data, the data is likely to be in the range of (1, 255). So, after initialising a perturbation, it is multiplied by the max of input data to scale it up.The perturbation is applied to the datapoints that are to bne poisoned. Then, using the model, a prediction is made. If the perturbed points are able to cause a misclassification, ie the model predicts the label to not be 1, then the infinity norm of the perturbation is calculated, and a new, ‘smaller’ perturbation is initialised by sampling between (0, alpha*inf_norm), where inf_norm is the infinity norm of the previous perturbation. The perturbation is then applied to the orignal points to be poisoned, ie, now we have a set of perturbed points, but which is more similar to the unperturbed points, and we use the model to predict again.

If instead, there is no misclassification, ie, the predicted label is 1, then we return the unperturbed set or previously successful perturbed set that was able to cause a misclassification.

This is repeated for either M optimization steps or until the perturbation is unable to cause a misclassification. The perturbed points then replace the orignal points in the batch.

For such an attacker which makes use of a model and its predictions to poison, it would make sense to be using a model that has already been pre-trained. The user may use a pretrained or an untrained model. In the case of an untrained model (or otherwise), the user has the ability to implement a delay to BrewPoison, so as to allow the model to train for a few episodes without the attacker intervening, thus simulating a pretrained model. This is done by passing in the total_eps and start_ep parameters. Here, for a 20 episode run where the attacker should poison in the last 10 episodes, the user should set total_eps=20 and start_ep=10.

This strategy is not a direct implementation, but it is inspired by the following paper: “Witches’ Brew: Industrial Scale Data Poisoning via Gradient Matching”, https://arxiv.org/abs/2009.02276.

Parameters

target (label) – label to use as a target for misclassification
M (int) – number of optimization steps for perturbation
aggressiveness (float) – determine max number of points to poison
alpha (float) – perturbation reduction parameter
start_ep (int) – number of episode after which attacker will poison
total_eps (int) – total number of eps in the simulation
one_hot (bool) – tells if labels are one_hot encoded or not

__init__(target, M=10, aggressiveness=0.1, alpha=0.8, start_ep=10, total_eps=20, one_hot=False)

Methods

`__init__`(target[, M, aggressiveness, alpha, ...])
`apply_pert`(selected_X, pert)	Apply the perturbation to a list of inputs.
`attack`(X, y, model)	Attacks batch of input data by perturbing.
`get_new_pert`(pert, alpha, X)	Initialise a new perturbation using the previous perturbation.
`inc_reset_ep`(curr_ep, total_eps)	Increase or reset the current episode number back to 0.

apply_pert(selected_X, pert)

Apply the perturbation to a list of inputs.

Parameters

selected_X (list) – list of tensors to perturb
pert (torch.tensor) – tensor used to perturb

Returns

list of perturbed tensors

Return type

perturbed_X (list)

attack(X, y, model)

Attacks batch of input data by perturbing.

Parameters

X (array) – data
y (array/list) – labels

Returns

data y (array/list) : flipped labels

Return type

X (array)

get_new_pert(pert, alpha, X)

Initialise a new perturbation using the previous perturbation.

Given a perturbation, calculate the infinity norm of the perturbation, then sample a new perturbation, with the maximum value being alpha*infinity norm.

Parameters

pert (tensor) – tensor to determine infinity norm
alpha (float) – Used to limit inf norm for max of new_pert
X (tensor) – tensor to use for shaping the pert

Returns

new pert tensor limited by alpha and pert

Return type

new_pert (tensor)

inc_reset_ep(curr_ep, total_eps)

Increase or reset the current episode number back to 0.

Increase the current episode number by 1 or reset it.

Reset needed since the attacker is initialised only once, and so when we add to the attribute curr_ep, it carries ahead through simulations. So, when running two simulations, this function will reset the attribute to 0 before the next simulation starts.

Parameters

curr_ep (int) – current episode number
total_eps (int) – total number of episodes

Returns

current episode number

Return type

curr_ep (int)