kaolin.io.dataset

API

class kaolin.io.dataset.Cache(func, cache_dir, cache_key)

Bases: object

Caches the results of a function to disk. If already cached, data is returned from disk. Otherwise, the function is executed. Output tensors are always on CPU device.

Deprecated since version 0.13.0: Cache is deprecated.

Parameters
  • func (Callable) – The function to cache.

  • cache_dir (str or Path) – Directory where objects will be cached.

  • cache_key (str) – The corresponding cache key for this function.

try_get(unique_id)

Read cache from disk. If not found, raise error.

Parameters

unique_id (str) – The unique id with which to name the cached file.

Returns

Results from self.func if exists on disk.

class kaolin.io.dataset.CachedDataset(dataset, cache_dir=None, save_on_disk=False, num_workers=0, force_overwrite=False, cache_at_runtime=False, progress_message=None, ignore_diff_error=False, transform=None)

Bases: Dataset

A wrapper dataset that caches the data to disk or RAM depending on save_on_disk.

For all dataset[i] with i from 0 to len(dataset) the output is store on RAM or disk depending on save_on_disk.

The base dataset or the preprocessing_transform if defined, should have a __getitem__(idx) method that returns a dictionary.

Note

if CUDA is used in preprocessing, num_workers must be set to 0.

Parameters
  • dataset (torch.utils.data.Dataset or Sequence) – The base dataset to use.

  • cache_dir (optional, str) – Path where the data must be saved. Must be given if save_on_disk is not False.

  • save_on_disk (optional, bool or Sequence[str]) – If True all the preprocessed outputs are stored on disk, if False all the preprocessed outputs are stored on RAM, if it’s a sequence of strings then all the corresponding fields are stored on disk.

  • num_workers (optional, int) – Number of process used in parallel for preprocessing. Default: 0 (run in main process).

  • force_overwrite (optional, bool) – If True, force overwriting on disk even if files already exist. Default: False.

  • cache_at_runtime (optional, bool) – If True, instead of preprocessing everything at construction of the dataset, each new __getitem__ will cache if necessary. Default: False.

  • progress_message (optional, str) – Message to be displayed during preprocessing. This is unuse with cache_at_runtime=True. Default: don’t show any message.

  • transform (optional, Callable) – If defined, called on the data at __getitem__. The result of this function is not cached. Default: don’t apply any transform.

class kaolin.io.dataset.CombinationDataset(datasets)

Bases: KaolinDataset

Dataset combining a list of datasets into a unified dataset object.

Deprecated since version 0.13.0: CombinationDataset is deprecated. See ProcessedDatasetV2.

Useful when multiple output representations are needed from a common base representation (Eg. when a mesh is to be served as both a pointcloud and a voxelgrid, etc.)

The output of get_attributes will be a tuple of all the get_attributes of the dataset list.

The output of get_data will be a tuple of all the get_data of the dataset list.

If a dataset does not have get_data, __getitem__ will be used instead.

The output of get_cache_key will be the cache key of the first dataset. If that dataset does not provide get_cache_key, the index will be used instead.

Parameters

datasets – list or tuple of datasets

get_attributes(index)

Returns the attributes at the given index. Attributes are usually not transformed by wrappers such as ProcessedDataset.

get_cache_key(index)
get_data(index)

Returns the data at the given index.

class kaolin.io.dataset.KaolinDataset(*args, **kwds)

Bases: Dataset

A dataset supporting the separation of data and attributes, and combines them in its __getitem__. The return value of __getitem__ will be a named tuple containing the return value of both get_data and get_attributes. The difference between get_data and get_attributes is that data are able to be transformed or preprocessed (such as using ProcessedDataset), while attributes are generally not.

Deprecated since version 0.13.0: KaolinDataset is deprecated. Datasets should always output a dictionary to be compatible with ProcessedDataset.

abstract get_attributes(index)

Returns the attributes at the given index. Attributes are usually not transformed by wrappers such as ProcessedDataset.

abstract get_data(index)

Returns the data at the given index.

class kaolin.io.dataset.KaolinDatasetItem(data, attributes)

Bases: tuple

attributes

Alias for field number 1

data

Alias for field number 0

class kaolin.io.dataset.ProcessedDataset(dataset, preprocessing_transform=None, cache_dir=None, num_workers=None, transform=None, no_progress=False)

Bases: KaolinDataset

get_attributes(index)

Returns the attributes at the given index. Attributes are usually not transformed by wrappers such as ProcessedDataset.

get_cache_key(index)
get_data(index)

Returns the data at the given index.