niteshade.postprocessing.PostProcessor

class niteshade.postprocessing.PostProcessor(simulators: dict)

Bases: object

Class used for a variety of post-processing functionalities common in assessing and visualising the effectiveness of different attack and defense strategies in simulating data poisoning attacks during online learning. :param simulators: Dictionary containing the simulator objects

(presumably after making use of their .run() method) as values and descriptive labels for each Simulator as keys.

__init__(simulators: dict) None

Methods

__init__(simulators)

compute_online_learning_metrics(X_test, y_test)

Returns a dictionary of lists with metrics.

evaluate_simulators_metrics(X_test, y_test)

Returns a dictionary of lists with metrics.

get_data_modifications()

Retrieves for each simulation the following: a) number of poisoned points, b) number of points that were unaugmented by the attacker, c) number of points correctly rejected by the defender, d) number of points that were incorrectly rejected by the defender, e) total number of points that were originally available for training, f) number of points that the model was actually trained on.

plot_decision_boundaries(X_test, y_test[, ...])

Plot the decision boundaries of the final models inside all the ran Simulator objects passed in the constructor method of the PostProcessor. This method uses sklearn.manifold.TSNE to reduce the dimensionality of X_test to 2D for visualisation purposes. An sklearn C-Support Vector Classifier is then trained using the points in the smaller feature space with the predicted labels of each model to show their decision boundaries in 2D. :param X_test: Test input data. :type X_test: np.ndarray, torch.Tensor :param y_test: Test labels. :type y_test: np.ndarray, torch.Tensor :param num_points: Number of points within X_test/y_test to plot in the figure. Consider selecting a value between 300 and 1000. :type num_points: int :param perplexity: The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. Default = 50. :type perplexity: int :param n_iter: Maximum number of iterations for the optimization. Should be at least 250. Default = 2000. :type n_iter: iter :param C: Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. :type C: float :param kernel: {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'} or callable. Specifies the kernel type to be used in the algorithm. If none is given, 'rbf' will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples). Default='poly'. :type kernel: str :param degree: Degree of the polynomial kernel function ('poly'). Ignored by all other kernels. Default = 3. :type degree: int :param figsize: Tuple (W,H) indicating the size of the figures to plot. Default = (20,20). :type figsize: tuple :param fontsize: Size of the font in the plots. Default = 10. :type fontsize: int :param markersize: Size of the markers representing individual points in X_test and y_test. Default = 20. :type markersize: int :param resolution: Size of the "steps" used to create the meshgrid upon which the predictions are made to plot the decision boundaries of the model. The smaller the value the more computationally exhaustive the process becomes. Values < 0.1 are not recommended for this reason (Default = 0.2). :type resolution: float :param save: Boolean indicating wether to save the plots or not. (Default = False). :type save: bool :param show_plot: Boolean indicating if plot should be showed (Default = True). :type show_plot: bool.

plot_online_learning_metrics(metrics[, ...])

Prints a plot into a console.

compute_online_learning_metrics(X_test, y_test)

Returns a dictionary of lists with metrics. Requirement: The model must have an evaluate method of the form: Input: X_test, y_test, self.batch_sizes[simulation_label]. Output: metric. :param X_test: NumPy array containing features. :type X_test: np.ndarray :param y_test: NumPy array containing labels. :type y_test: np.ndarray

Returns

Dictionary where each key is a simulator and each

value is a list of coresponding metrics throughout the simulation (each value corresponds to a single timestep of a simulation).

Return type

metrics (dict)

evaluate_simulators_metrics(X_test, y_test)

Returns a dictionary of lists with metrics. Requires the model to have an .evaluate() method with arguments (X_test, y_test) that returns any given metric that the user deems appropriate for the task at hand. :param X_test: NumPy array containing features. :type X_test: np.ndarray :param y_test: NumPy array containing labels. :type y_test: np.ndarray

Returns

Dictionary where each key is a simulator

and each value is a final evaluation metric.

Return type

metrics (dict)

get_data_modifications()

Retrieves for each simulation the following: a) number of poisoned points, b) number of points that were unaugmented by the attacker, c) number of points correctly rejected by the defender, d) number of points that were incorrectly rejected by the defender, e) total number of points that were originally available for training, f) number of points that the model was actually trained on.

Returns

Dictionary with keys

corresponding to simulator names, values corresponding to dicts with keys a, b, c, d per above.

Return type

results (pd.core.frame.DataFrame)

plot_decision_boundaries(X_test, y_test, num_points=500, perplexity=50, n_iter=2000, C=10, kernel='poly', degree=3, figsize=(20, 20), fontsize=10, markersize=20, resolution=0.2, save=False, show_plot=True)

Plot the decision boundaries of the final models inside all the ran Simulator objects passed in the constructor method of the PostProcessor. This method uses sklearn.manifold.TSNE to reduce the dimensionality of X_test to 2D for visualisation purposes. An sklearn C-Support Vector Classifier is then trained using the points in the smaller feature space with the predicted labels of each model to show their decision boundaries in 2D. :param X_test: Test input data. :type X_test: np.ndarray, torch.Tensor :param y_test: Test labels. :type y_test: np.ndarray, torch.Tensor :param num_points: Number of points within X_test/y_test to plot

in the figure. Consider selecting a value between 300 and 1000.

Parameters
  • perplexity (int) – The perplexity is related to the number of nearest neighbors that is used in other manifold learning algorithms. Larger datasets usually require a larger perplexity. Consider selecting a value between 5 and 50. Different values can result in significantly different results. Default = 50.

  • n_iter (iter) – Maximum number of iterations for the optimization. Should be at least 250. Default = 2000.

  • C (float) – Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.

  • kernel (str) – {‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable. Specifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape (n_samples, n_samples). Default=’poly’.

  • degree (int) – Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels. Default = 3.

  • figsize (tuple) – Tuple (W,H) indicating the size of the figures to plot. Default = (20,20).

  • fontsize (int) – Size of the font in the plots. Default = 10.

  • markersize (int) – Size of the markers representing individual points in X_test and y_test. Default = 20.

  • resolution (float) – Size of the “steps” used to create the meshgrid upon which the predictions are made to plot the decision boundaries of the model. The smaller the value the more computationally exhaustive the process becomes. Values < 0.1 are not recommended for this reason (Default = 0.2).

  • save (bool) – Boolean indicating wether to save the plots or not. (Default = False).

  • show_plot (bool) – Boolean indicating if plot should be showed (Default = True).

plot_online_learning_metrics(metrics, show_plot=True, save=True, plotname=None, set_plot_title=True)

Prints a plot into a console. Supports supervised learning only.

Parameters
  • metrics (np.ndarray) – an array of metrics of length equal to the number of episodes in a simulation.

  • save (bool) – enable saving.

  • plotname (str) – if set to None, file name is set to a current timestamp.