niteshade.data.DataLoader

class niteshade.data.DataLoader(X=None, y=None, batch_size=1, shuffle=False, seed=69)

Bases: object

DataLoader class.

Contains a cache and a queue. Features (X) and labels (y) can be added to the cache either by passing them as inputs in the constructor, or by calling the method add_to_cache(X, y).

When data is added to the cache, the points are automatically batched into arrays of length batch_size, removed from the cache and added to the queue. For example, if a dataset containing 10 points is added to the cache with a batch size of 3, 3 batches of length 3 will be created, removed from the cache and added to the queue. In the end, the queue will contain 3 batches of size 3 and the cache will contain a single datapoint. If more (X, y) values are added to the cache by calling add_to_cache(X, y), they will be appended to the cache and the batching/queuing process will repeat, this time with the single datapoint at the front of the cache (first in line to be batched and added to the queue).

The class is an iterator, and returns batches from the queue when iterated over until the queue is empty. If more (X, y) values get added to the cache, the class instance may be iterated over again and new batches will be produced. If the shuffle argument is set to True, any data added to the cache will be shuffled prior to batching.

__init__(X=None, y=None, batch_size=1, shuffle=False, seed=69)

Initialise the DataLoader.

Features (X) and labels (y) may be passed as inputs in the constructor, but this is not necessary. If they are set to their default values of None, the cache and queue will initially be empty.

Parameters
  • X (np.ndarray or torch.tensor) – features (first dimension = N)

  • y (np.ndarray or torch.tensor) – labels (first dimension = N)

  • batch_size (int) – size of the batches to generate

  • shuffle (bool) – whether or not to shuffle the datapoints before

  • seed (int) – seed for the random number generator

Methods

__init__([X, y, batch_size, shuffle, seed])

Initialise the DataLoader.

add_to_cache(X, y)

Add features (X) and labels (y) to the cache.

add_to_cache(X, y)

Add features (X) and labels (y) to the cache.

If shuffle was set to true in the constructor, the datapoints are shuffled before being added to the cache. After points are added to the cache, the _cache_to_queue() method is called automatically, which creates as many batches as possible based on the batch size, removes them from the cache and adds them to the queue.

Parameters
  • X (np.ndarray or torch.tensor) – features (first dimension = N)

  • y (np.ndarray or torch.tensor) – labels (first dimension = N)