Usage
=====

Below are some simple example uses of the various functions and classes in 
niteshade. For a more comprehensive overview of niteshade's functionality, 
please refer to the :doc:`api` section.


.. _getting_started:

Getting Started
---------------

Before we begin, many of the following sections will use various functions and 
classes from ``PyTorch``, so let's go ahead and import ``PyTorch`` so we can 
focus exclusively on niteshade imports from here on out:

>>> import torch
>>> import torch.nn as nn

Note that in many examples we use ``torch.randn()`` to generate example data 
tensors. The dimensions are completely arbitrary.


.. _setting_up_an_online_data_pipeline:

Setting Up an Online Data Pipeline
----------------------------------

niteshade makes setting up an online data pipeline easy, thanks to its bespoke 
data loader class specifically designed for online learning 
``niteshade.data.DataLoader``. 

>>> from niteshade.data import DataLoader

A ``DataLoader`` may be instantiated with a particular set of features (X) and 
labels (y):

>>> X = torch.randn(100, 3, 32, 32)
>>> y = torch.randn(100)
>>> pipeline = DataLoader(X, y, batch_size=8, shuffle=True)

Alternatively, data may be added by calling the ``.add_to_cache()`` method:

>>> X_more = torch.randn(50, 3, 32, 32)
>>> y_more = torch.randn(50)
>>> pipeline.add_to_cache(X_more, y_more)

``DataLoader`` instances have a cache and queue attribute, which together help 
ensure that data is batched and loaded consistently. When data is added to a 
``DataLoader``, either during instantiation or by calling the 
``.add_to_cache()`` method, it is added to the cache then automatically grouped 
into batches of the provided batch size and moved to the queue. Any remaining 
points which do not "fit" into a batch are kept in the cache, where they remain 
until enough new datapoints are added to form a complete batch. E.g. in the 
above case, a total of 150 datapoints have been added to a ``DataLoader`` with 
a batch size of 8. This results in 18 batches of 8 datapoints (144 datapoints 
total) in the queue and 6 points in the cache.

>>> len(pipeline)
18

``DataLoader`` instances are iterators; the queue can be iterated over and 
depleted in a for loop:

>>> for batch in pipeline:
...     pass
...
>>> len(pipeline)
0

Note that after executing the above for loop there would still be 6 points in 
the cache. If we add 2 additional points to the cache we can form a complete 
batch of 8 which will be added to the queue.

>>> X_last = torch.randn(2, 3, 32, 32)
>>> y_last = torch.randn(2)
>>> pipeline.add_to_cache(X_last, y_last)
>>> len(pipeline)
1

The cache is now empty.


.. _managing_pipeline_asynchrononicity:

Managing Pipeline Asynchronicity
--------------------------------

In many scenarios, data generation and learning are asynchronous. For example, 
if data is generated in batches of 10 datapoints (let's call these episodes for 
notational clarity), but the model wants to learn on batches of size 16, then 
the model will only be able to do an incremental learning step every 1.6 
episodes on average. To complicate matters, if we add deploy a poisoning attack 
and implement a defence strategy that rejects suspicious datapoints, the 
pipeline becomes even more asynchronous (episodes may now consist of fewer than 
16 datapoints if the defence strategy rejects points). To address this 
asynchronicity, niteshade workflows generally involve separate generation and 
learning loops, each with their own ``DataLoader`` (leveraging the cache and 
queue to ensure consistent episode and batch sizes). Below is a very simple 
example (model, attack and defence strategies not specified):

.. code-block:: python

    from niteshade.data import DataLoader

    X = torch.randn(100, 5)
    y = torch.randn(100)

    episodes = DataLoader(X, y, batch_size=10)
    batches = DataLoader(batch_size=16)

    for epiosde in episodes:

        # Attack strategy deployed (may change shape of episode)
        ...
        
        # Defense strategy deployed (may change shape of episode)
        ...

        batches.add_to_cache(episode)

        for batch in batches:

            # Incremental learning update
            ...

Note that the inner loop (learning loop) will only execute if the batch 
``DataLoader`` contains sufficient datapoints to form a complete batch. 
Otherwise, its queue attribute will be empty and iterating over it will do 
nothing. 


.. _importing_a_model:

Setting Up a Victim Model
-------------------------

Setting up a victim model (an online learning model which will be the subject 
of a data poisoning attack) can be done in two different ways. The simplest way 
is to use one of niteshade's out-of-the-box model classes, e.g. 
``shade.models.IrisClassifier`` (designed specifically for the Iris dataset), 
``shade.models.MNISTClassifier`` (designed specifically for MNIST), or 
``shade.models.CifarClassifier`` (designed specifically for CIFAR-10), for 
example:

>>> from niteshade.models import IrisClassifier
>>> model = IrisClassifier(optimizer="adam", loss_func="cross_entropy", lr=1e-3)

However, most users will prefer to create a custom model class. Custom model 
classes can be easily created by inheriting the ``niteshade.models.BaseModel`` 
superclass, providing it the necessary arguments in the constructor, and 
filling in the ``.forward()``, and ``.evaluate()`` methods. Below is an example 
of a simple multi-layer perceptron regressor: 

.. code-block:: python

    class MLPRegressor(BaseModel):
        """ Simple MLP regressor class. """

        def __init__(self, optimizer="adam", loss_func="mse", lr=1e-3):
            """ Specify architecture, optimizer, loss and learning rate. """
            architecture = [nn.Linear(4, 16), nn.ReLU(), nn.Linear(16, 1)]
            super().__init__(architecture, optimizer, loss_func, lr)
        
        def forward(self, x):
            """ Execute the forward pass. """
            return self.network(x) 

        def evaluate(self, X_test, y_test):
            """ Evaluate the model predictions. """
            self.eval()
            with torch.no_grad():
                y_pred = self.forward(X_test)
                accuracy = 1 - (y_pred - y_test).square().mean().sqrt()
            return accuracy

In the constructor (``.__init__()`` method), the model architecture must be 
defined as a list of PyTorch building blocks (layers, activations etc.), then 
passed to the ``BaseModel`` superclass along with the desired optimiser, loss 
function and learning rate (see :doc:`api` section for possible values). The 
``BaseModel`` class has a ``.device`` attribute which is automatically set to 
"cuda" or "cpu" depending on whether a GPU is available, and a ``.network`` 
attribute which assembles the provided architecture as a callable that passes 
inputs through the layers and activations in sequence. Both these attributes 
are used in the ``.forward()`` method, which implements the forward pass. 
Finally, the ``.evaluate()`` method computes whichever performance metric we 
are interested in analysing during the simulation (accuracy, in this case).

All niteshade models (out-of-the-box and custom) perform incremental learning 
updates using the ``.step()`` method, which is inherited from ``BaseModel``.


.. _defining_an_attack_strategy:

Defining an Attack Strategy
---------------------------

niteshade's attack module (``niteshade.attack``) includes several 
out-of-the-box classes based on some of the most commonly encountered data 
poisoning attack strategies, e.g. ``LabelFlipperAttacker`` (which as the name 
suggests, flips training labels) and ``AddLabelledPointsAttacker`` (which 
injects fake datapoints into the learning pipeline). 

>>> from niteshade.attack import AddLabelledPointsAttacker
>>> attacker = AddLabeledPointsAttacker(aggressiveness=0.5, label=1)

An attack can be deployed against a batch of datapoints by calling the 
``.attack()`` method:

>>> X = torch.randn(10, 5)
>>> y = torch.randn(10)
>>> X_attacked, y_attacked = attacker.attack(X, y)

Custom attack strategies may also be defined following niteshade's attack class 
hierarchy by inheriting from the relevant superclass and filling in the 
``.attack()`` method. At the top of the hierarchy is the ``Attacker`` class, 
which is a general abstract base class for all attack strategies. The next tier 
in the hierarchy is comprised of general categories of attack strategies, 
namely ``AddPointsAttacker`` (for strategies which involve injecting *fake* 
datapoints into the learning pipeline), ``PerturbPointsAttacker`` (for 
strategies which involve perturbing *real* datapoints in the learning pipeline) 
and ``ChangeLabelAttacker`` (for strategies which involve altering training 
data labels). Below is an example of a very simple custom attack strategy which 
involves appending zeros to the end of training batches:

.. code-block:: python

    from niteshade.attack import AddPointsAttacker

    class AppendZerosAttacker(AddPointsAttacker):
        """ Append zeros attack strategy class. """

        def __init__(self, aggressiveness):
            """ Set the aggressiveness. """
            super().__init__(aggressiveness)

        def attack(self, X, y):
            """ Define the attack strategy. """
            num_to_add = super().num_pts_to_add(X)
            X_fake = torch.zeros(num_to_add, *X.shape[1:])
            y_fake = torch.zeros(num_to_add, *y.shape[1:])
            return (torch.cat((X, X_fake)), torch.cat((y, y_fake)))

This simple (and ineffective) strategy involves injecting fake datapoints, so 
the class inherits from ``AddPointsAttacker`` in its constructor. The 
``aggressiveness`` attribute is a float between 0.0-1.0 which determines 
the proportion of points the attacker is allowed to attack (or append, in this 
case). The ``.attack()`` method defines the attack strategy, which in this case 
is very straightforward. The ``AddPointsAttacker`` superclass has a method 
``.num_pts_to_add()`` which uses ``aggressiveness`` to determine the (integer) 
number of points to add. Note that if the attack strategy we wish to define 
doesn't fit into any of the aforementioned categories, we can simply inherit 
from ``Attacker``.


.. _defining_a_defence_strategy:

Defining a Defence Strategy
---------------------------

Similarly to the attack module, niteshade's defence module 
(``niteshade.defence``) includes several out-of-the-box classes based on some 
of the most well-known defence strategies against data poisoning attacks, e.g. 
``FeasibleSetDefender`` (which functions as an outlier detector based on a 
"clean" set of feasible points), ``KNN_Defender`` (which adjusts labels based 
on the consensus of neighbouring points) and ``SoftmaxDefender`` (which rejects 
points based on a softmax threshold).

>>> from niteshade.defence import SoftmaxDefender
>>> defender = SoftmaxDefender(threshold=0.1)

After an attack has been deployed on a batch of datapoints, a defence can be 
implemented to minimise the damage by calling the ``.defend()`` method:

>>> X_attacked = torch.randn(10, 5)
>>> y_attacked = torch.randn(10)
>>> X_defended, y_defended = defender.defend(X_attacked, y_attacked)

Custom defence strategies may also be defined following niteshade's defence 
class hierarchy by inheriting from the relevant superclass and filling in the 
``.defend()`` method. At the top of the hierarchy is the ``Defender`` class, 
which is a general abstract base class for all defence strategies. The next 
tier in the hierarchy is comprised of general categories of defence strategies, 
namely ``OutlierDefender`` (for strategies which involve filtering outliers), 
``ModelDefender`` (for strategies which require access to the model and its 
parameters) and ``PointModifierDefender`` (for strategies which modify 
datapoints). Below is an example of a very simple custom defence strategy which 
involves removing points which have even-valued labels:

.. code-block:: python

    from niteshade.defence import Defender

    class EvenLabelDefender(Defender):
        """ Even-valued label filtering defence strategy. """

        def __init__(self):
            """ Constructor. """
            super().__init__()

        def defend(self, X, y):
            """ Define the defence strategy. """
            return (X[y % 2 != 0], y[y % 2 != 0])

Although this simple (and ineffective) strategy resembles an 
``OutlierDefender``-type strategy, it doesn't require a clean feasible set for 
outlier detection, and thus we have just inherited from ``Defender``.


.. _running_a_simulation:

Running a Simulation
--------------------

Once a model has been set up and attack and defence strategies have been 
defined, simulating an attack against online learning is very straightforward. 
niteshade's simulation module (``niteshade.simulation``) contains a 
``Simulator`` class which sets up and executes the adversarial online learning 
pipeline (the asynchronous double-loop pipeline shown previously):

>>> from niteshade.models import MNISTClassifier
>>> from niteshade.attack import LabelFlipperAttacker
>>> from niteshade.defence import KNN_Defender
>>> from niteshade.simulation import Simulator
>>> from niteshade.utils import train_test_MNIST
>>> 
>>> X_train, y_train, X_test, y_test = train_test_MNIST()
>>> model = MNISTClassifier()
>>> attacker = LabelFlipperAttacker(aggressiveness=1, label_flips_dict={1:9, 9:1})
>>> defender = KNN_Defender(X_train, y_train, nearest_neighbours=3, confidence_threshold=0.5)
>>> batch_size = 128
>>> num_eps = 50
>>> simulator = Simulator(X_train, y_train, model, attacker, defender, batch_size, num_eps)

In the above example, we are simulating a digit classification model trained on 
MNIST subject to a label-flipping attack (specifically one which flips 1's and 
9's with 100% aggressiveness) with a k-nearest neighbours defence (k=3, 50% 
consensus). We use a helper function from ``niteshade.utils`` to load in the 
MNIST dataset and specify that the online data pipeline should split the 
dataset into 50 sequential episodes. Finally, we set the training batch size to 
128 and pass all the above information to the ``Simulator`` class before 
running the simulation by calling the ``.run()`` method:

>>> simulator.run()

The ``Simulator`` class has a ``.results`` attribute which stores snapshots of 
the model's state dictionary at each episode as well as datapoint tracking 
information to monitor the effects of the attack and defence strategies.

Note that the attacker and defender arguments in ``Simulator`` are optional and 
default to None; simulations can be run without any attack or defence strategy 
in place, with just an attack strategy, with just a defence strategy or with 
both. If custom model, attack or defence classes have been created, they can be 
passed as arguments to the ``Simulator`` class exactly as shown above.


.. _postprocessing_results:

Postprocessing Results
----------------------

niteshade's postprocessing module (``niteshade.postprocessing``) contains 
several useful tools for analysing and visualising results. Once a simulation 
has been run, (by calling ``Simulator.run()``, which populates the ``.results`` 
attribute), it may be passed to the ``PostProcessor`` class in a dictionary 
keyed by the name of the simulation. Building off the previous example:

>>> from niteshade.models import MNISTClassifier
>>> from niteshade.attack import LabelFlipperAttacker
>>> from niteshade.defence import KNN_Defender
>>> from niteshade.simulation import Simulator
>>> from niteshade.postprocessing import PostProcessor
>>> from niteshade.utils import train_test_MNIST
>>> 
>>> X_train, y_train, X_test, y_test = train_test_MNIST()
>>> model = MNISTClassifier()
>>> attacker = LabelFlipperAttacker(1, {1:9, 9:1})
>>> defender = KNN_Defender(X_train, y_train, 3, 0.5)
>>> batch_size = 128
>>> num_eps = 50
>>> simulator = Simulator(X_train, y_train, model, attacker, defender, batch_size, num_eps)
>>> simulation.run()
>>> simulation_dict = {"example_name": simulation}
>>> postprocessor = PostProcessor(simulation_dict)

We can also run multiple simulations and pass them to ``PostProcessor``:

>>> model1 = MNISTClassifier()
>>> model2 = MNISTClassifier()
>>> model3 = MNISTClassifier()
>>> s1 = Simulator(X_train, y_train, model1, None, None, batch_size, num_eps)
>>> s2 = Simulator(X_train, y_train, model2, attacker, None, batch_size, num_eps)
>>> s3 = Simulator(X_train, y_train, model3, attacker, defender, batch_size, num_eps)
>>> s1.run()
>>> s2.run()
>>> s3.run()
>>> simulation_dict = {"baseline": s1, "attack": s2, "attack_and_defence": s3}
>>> postprocessor = PostProcessor(simulation_dict)

This is useful because the impact of an attack or defence strategy is usually 
relative to some baseline case. For example, it may be of interest to compare 
the attacked and un-attacked learning scenarios to isolate the effect of the 
attack. Similarly, comparing the scenario in which both attack and defence 
strategies are implemented to the case in which only the attack strategy is 
implemented can isolate the effect of the defence. Notice that we create 3 
separate model instances as we want the models to be independent between the 
simulations.

``PostProcessor`` can then be used to compute and plot the model's performance 
over the course of the simulation:

>>> metrics = postprocessor.compute_online_learning_metrics(X_test, y_test)
>>> postprocessor.plot_online_learning_metrics(metrics, show_plot=True)

.. image:: _figures/metrics.png

The performance metric that ``PostProcessor`` computes and plots on the y-axis 
is whatever is written in the model's ``.evaluate()`` method (predictive 
accuracy for ``MNISTClassifier``). We can see that in the baseline case, the 
model achieves a predictive accuracy across all classes of ~0.95 after 50 
episodes. When the model is subjected to the label-flipping attack, it is only 
able to achieve a predictive accuracy of ~0.75 (specific accuracy for 1's and 
9's is likely be even lower). When the kNN defence strategy is deployed against 
the label-flipping attack, the model learns more slowly but is able to achieve 
a final predictive accuracy of ~0.95 again, meaning the defence strategy is 
very effective against this particular attack.

``PostProcessor`` also has a ``.get_data_modifications()`` method which 
creates a table (pandas ``DataFrame`` object) which summarises the simulation 
outcomes in terms of the numbers of datapoints which have been poisoned and 
defended:

>>> data_modifications = postprocessor.get_data_modifications()
>>> print(data_modifications)
                       baseline  attack  attack_and_defence
poisoned                      0   12691               12691
not_poisoned              60000   47309               47309
correctly_defended            0       0               12677
incorrectly_defended          0       0                 916
original_points_total     60000   60000               60000
training_points_total     60000   60000               60000

In the above table,

- poisoned: datapoints perturbed or injected by the attacker
- not_poisoned: datapoints not perturbed or injected by the attacker
- correctly_defended: poisoned points correctly removed or modified by the defender
- incorrectly_defended: clean datapoints incorrectly removed or modified by the defender
- original_points_total: total datapoints in the original training dataset
- training_points_total: datapoints the model actually gets to train on (certain attack/defence strategies remove datapoints from the learning pipeline)

``niteshade.postprocessing`` also contains a ``PDF`` class, which can generate 
a summary report of the simulation(s). Adding tables and figures to the report 
is easy, as shown below. In this case, our summary report will contain a single 
table and plot (the one shown above). If we generated additional plots and 
saved them to the ``/outputs`` directory, they would also be included in the 
report.

>>> from niteshade.postprocessing import PDF
>>> header_title = f"Example Report"
>>> pdf = PDF()
>>> pdf.set_title(header_title)
>>> pdf.add_table(data_modifications, "Datapoint Summary")
>>> pdf.add_all_charts_from_directory("output")
>>> pdf.output("example_report.pdf", "F")

Here, we have saved the report to our current working directory:

.. code-block:: console

    $ export REPORT=example_report.pdf
    $ test -f $REPORT && echo "$REPORT exists :)"
    example_report.pdf exists :)


.. _end_to_end_example:

End-To-End Example
------------------

To wrap thing up, here is an end-to-end example of a niteshade workflow using 
out-of-the-box model, attack and defence classes:

.. code-block:: python

    # Imports & dependencies
    from niteshade.models import MNISTClassifier
    from niteshade.attack import LabelFlipperAttacker
    from niteshade.defence import KNN_Defender
    from niteshade.simulation import Simulator
    from niteshade.postprocessing import PostProcessor, PDF
    from niteshade.utils import train_test_MNIST

    # Get MNIST training and test datasets
    X_train, y_train, X_test, y_test = train_test_MNIST()
    
    # Instantiate out-of-the-box MNIST classifiers
    model1 = MNISTClassifier()
    model2 = MNISTClassifier()
    model3 = MNISTClassifier()

    # Specify attack and defence strategies
    attacker = LabelFlipperAttacker(aggressiveness=1, label_flips_dict={1:9, 9:1})
    defender = KNN_Defender(X_train, y_train, nearest_neighbours=3, confidence_threshold=0.5)

    # Set batch size and number of episodes
    batch_size = 128
    num_eps = 50

    # Instatiate simulations
    s1 = Simulator(X_train, y_train, model1, None, None, batch_size, num_eps)
    s2 = Simulator(X_train, y_train, model2, attacker, None, batch_size, num_eps)
    s3 = Simulator(X_train, y_train, model3, attacker, defender, batch_size, num_eps)

    # Run simulations (may take a few minutes)
    s1.run()
    s2.run()
    s3.run()

    # Postprocess simulation results
    simulation_dict = {"baseline": s1, "attack": s2, "attack_and_defence": s3}
    postprocessor = PostProcessor(simulation_dict)
    metrics = postprocessor.compute_online_learning_metrics(X_test, y_test)
    data_modifications = postprocessor.get_data_modifications()
    postprocessor.plot_online_learning_metrics(metrics, show_plot=False, save=True)

    # Create summary report
    header_title = f"Example Report"
    pdf = PDF()
    pdf.set_title(header_title)
    pdf.add_table(data_modifications, "Datapoint Summary")
    pdf.add_all_charts_from_directory("output")
    pdf.output("example_report.pdf", "F")

This is a relatively simple workflow. For advanced users desiring more 
customised workflows, consider the following options:

- Writing custom model, attack and defence classes following niteshade's class hierarchy
- Writing custom online learning pipelines using ``DataLoader``'s rather than using ``Simulation``
- Writing custom postprocessing functions and plots for the ``.results`` dictionary