padl.transforms

The Transform class and its fundamental children.

Transforms should be created using the padl.transform wrap-function.

class padl.transforms.AtomicTransform(call: str, call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None)

Base class for “atomic” transforms (transforms that are not made by combining other transforms - in contrast to Pipeline).

Examples of AtomicTransform s are ClassTransform and FunctionTransform.

Parameters
  • call – The transform’s call string.

  • call_info – A CallInfo object containing information about the how the transform was created (needed for saving).

  • pd_name – The transform’s name.

class padl.transforms.Batchify(dim=0)

Mark end of preprocessing.

Batchify adds batch dimension at dim. During inference, this unsqueezes tensors and, recursively, tuples thereof. Batchify also moves the input tensors to device specified for the transform.

Parameters

dim – Batching dimension.

class padl.transforms.BuiltinTransform(pd_name: Optional[str] = None, ignore_scope: bool = False, arguments: Optional[collections.OrderedDict] = None)

A builtin transform will simply always be imported, never fully dumped.

class padl.transforms.ClassTransform(pd_name: Optional[str] = None, ignore_scope: bool = False, arguments: Optional[collections.OrderedDict] = None)

Class Transform.

Do not use this directly, instead, use the transform decorator to wrap a class.

Parameters
  • pd_name – name of the transform

  • ignore_scope – Don’t try to determine the scope (use the toplevel scope instead).

  • arguments – ordered dictionary of initialization arguments to be used in printing

property source: str

The class source code.

class padl.transforms.Compose(transforms: Iterable[padl.transforms.Transform], call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None, pd_group: bool = False)

Apply series of transforms on input.

Compose([t1, t2, t3])(x) = t3(t2(t1(x)))

Parameters
  • transforms – List of transforms to compose.

  • call_info – A CallInfo object containing information about the how the transform was created (needed for saving).

  • pd_name – name of the Compose transform.

  • pd_group – If True, do not flatten this when used as child transform in a Pipeline.

Returns

output from series of transforms

class padl.transforms.FunctionTransform(function: Callable, call_info: padl.dumptools.inspector.CallInfo, pd_name: Optional[str] = None, call: Optional[str] = None, source: Optional[str] = None, wrap_type: str = 'decorator')

A transform that wraps a function.

Do not use this directly - rather, wrap a function using padl.transform,

as a decorator:

@transform
def f(x):
    ...

inline:

t = transform(f)

or with a lambda function:

t = transform(lambda x: x + 1)
Parameters
  • function – The wrapped function.

  • call_info – A CallInfo object containing information about the how the transform was created (needed for saving).

  • pd_name – name of the transform

  • call – The call string (defaults to the function’s name).

  • source – The source code (optional).

  • wrap_type – One of {‘module’, ‘lambda’, ‘decorator’, ‘inline’} - specifying how the was function was wrapped.

property source: str

The source of the wrapped function.

class padl.transforms.Identity

Do nothing. Just pass on.

class padl.transforms.Map(transform: padl.transforms.Transform, call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None)

Apply one transform to each element of a list.

>>> from padl import identity
>>> t = identity
>>> x1, x2, x3 = 1, 2, 3
>>> Map(t)([x1, x2, x3]) == (t(x1), t(x2), t(x3))
True
Parameters
  • transform – Transform to be applied to a list of inputs.

  • call_info – A CallInfo object containing information about the how the transform was

created (needed for saving). :param pd_name: name of the transform

class padl.transforms.Parallel(transforms, call_info=None, pd_name=None, pd_group=False)

Apply transforms in parallel to a tuple of inputs and get tuple output

Parallel([f1, f2, …])((x1, x2, ..)) := (f1(x1), f2(x2), …)

Parameters
  • transforms – List of transforms to parallelize.

  • call_info – A CallInfo object containing information about the how the transform was created (needed for saving).

  • pd_name – Name of the transform.

  • pd_group – If True, do not flatten this when used as child transform in a Pipeline.

class padl.transforms.Pipeline(transforms, call_info=None, pd_name=None, pd_group=False)

Abstract base class for Pipeline

Parameters
  • transforms – List of sub-transforms.

  • call_info – A CallInfo object containing information about the how the transform was created (needed for saving).

  • pd_name – Name of the Pipeline.

  • pd_group – If True, do not flatten this when used as child transform in a Pipeline().

grouped()

Return a grouped version of self.

pd_forward_device_check()

Check all transform in forward are in correct device

All transforms in forward need to be in same device as specified for the whole Pipeline.

Returns

Bool

pd_to(device: str)

Set the transform’s device to device

Parameters

device – device on which to send {‘cpu’, cuda’, ‘cuda:N’}

class padl.transforms.Rollout(transforms: Iterable[padl.transforms.Transform], call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None, pd_group=False)

Apply a list of transform to same input and get tuple output

Rollout([t1, t2, …])(x) := (t1(x), t2(x), …)

Parameters
  • transforms – List of transforms to rollout.

  • call_info – A CallInfo object containing information about the how the transform was created (needed for saving).

  • pd_name – Name of the transform.

  • pd_group – If True, do not flatten this when used as child transform in a Pipeline.

class padl.transforms.TorchModuleTransform(pd_name: Optional[str] = None, ignore_scope: bool = False, arguments: Optional[collections.OrderedDict] = None)

Transform class for use with torch.nn.Module.

post_load(path, i)

Load the model’s parameters form a save-folder.

Parameters
  • path – The save-folder path.

  • i – Unique transform index, used to construct filenames.

pre_save(path: pathlib.Path, i: int)

Dump the model’s parameters to a save-folder.

Parameters
  • path – The save-folder path.

  • i – Unique transform index, used to construct filenames.

class padl.transforms.Transform(call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None)

Transform base class.

Parameters
  • call_info – A CallInfo object containing information about the how the transform was created (needed for saving).

  • pd_name – name of the transform.

eval_apply(inputs: Iterable, flatten: bool = False, **kwargs)

Call transform within the eval context.

This will use multiprocessing for the preprocessing part via DataLoader and turn of gradients for the forward part.

It expects an iterable input and returns a generator.

Parameters
  • inputs – The arguments - an iterable (e.g. list) of inputs.

  • kwargs – Keyword arguments to be passed on to the dataloader. These can be any that a torch.data.utils.DataLoader accepts.

  • flatten – If True, flatten the output.

infer_apply(inputs)

Call the Transform within the infer context.

This expects a single argument and returns a single output.

Parameters

inputs – The input.

pd_call_in_mode(arg, mode: Literal['infer', 'eval', 'train'], ignore_grad=False)

Call the transform, with possibility to pass multiple arguments.

Parameters
  • arg – Argument to call the transform with.

  • mode – The mode (“infer”, “eval”, “train”) to perform the call with.

  • ignore_grad – If True gradient settings are ignored.

Returns

Whatever the transform returns.

property pd_device: str

Return the device (“cpu” / “cuda”) the Transform is on.

property pd_forward: padl.transforms.Transform

The forward part of the Transform (that what’s typically done on the GPU).

The device must be propagated from self.

pd_forward_device_check() bool

Check if all Transforms in the “forward” part are on the correct device.

All transforms in the “forward” part of a Pipeline need to be on the same device as specified for the whole Pipeline.

pd_get_loader(args, preprocess: padl.transforms.Transform, mode: str, **kwargs) torch.utils.data.dataloader.DataLoader

Get a pytorch data loader applying preprocess to args.

Parameters
  • args – A sequence of datapoints.

  • preprocess – Preprocessing Transform.

  • mode – PADL mode to call the preprocess Transform in.

  • kwargs – Keyword arguments passed to the data loader (see the pytorch DataLoader documentation for details).

property pd_layers: List[torch.nn.modules.module.Module]

Get a list with all pytorch layers in the Transform (including layers in sub-transforms).

property pd_name: Optional[str]

The “name” of the transform.

A transform can have a name. This is optional, but helps when inspecting complex transforms. Good transform names indicate what the transform does.

If a transform does not have an explicitly set name, the name will default to the name of the last variable the transforms was assigned to.

pd_parameters() Iterator

Iterate over all (pytorch-) parameters in all layers contained in the transform.

pd_post_load(path: pathlib.Path, i: int)

Method that is called on each transform after loading.

This normally does nothing. Override to implement custom serialization.

Parameters
  • path – The load path.

  • i – Unique transform index, can be used to construct filenames.

property pd_postprocess: padl.transforms.Transform

The postprocessing part of the Transform.

The device must be propagated from self.

pd_pre_save(path: pathlib.Path, i: int)

Method that is called on each transform before saving.

This normally does nothing. Override to implement custom serialization.

Parameters
  • path – The save-folder path.

  • i – Unique transform index, can be used to construct filenames.

property pd_preprocess: padl.transforms.Transform

The preprocessing part of the Transform.

The device must be propagated from self.

pd_save(path: Union[pathlib.Path, str], force_overwrite: bool = False)

Save the transform to a folder at path.

The folder’s name should end with ‘.padl’. If no extension is given, it will be added automatically.

If the folder exists, call with force_overwrite = True to overwrite. Otherwise, this will raise a FileExistsError.

property pd_stages

Get a tuple of the pre-process, forward, and post-process stages.

pd_to(device: str) padl.transforms.Transform

Set the transform’s device to device.

Parameters

device – Device to set the transform to {‘cpu’, ‘cuda’, ‘cuda:N’}.

pd_varname(scope=None) Optional[str]

The name of the variable name the transform was last assigned to.

Example:

>>> from padl import transform
>>> foo = transform(lambda x: x + 1)
>>> foo.pd_varname()  
'foo'
Parameters

scope – Scope to search

Returns

A string with the variable name or None if the transform has not been assigned to any variable.

pd_zip_save(path: Union[pathlib.Path, str], force_overwrite: bool = False)

Save the transform to a zip-file at path.

The file’s name should end with ‘.padl’. If no extension is given, it will be added automatically.

If the file exists, call with force_overwrite = True to overwrite. Otherwise, this will raise a FileExistsError.

train_apply(inputs: Iterable, flatten: bool = False, **kwargs)

Call transform within the train context.

This will use multiprocessing for the preprocessing part via DataLoader and turn on gradients for the forward part.

It expects an iterable input and returns a generator.

Parameters
  • inputs – The arguments - an iterable (e.g. list) of inputs.

  • kwargs – Keyword arguments to be passed on to the dataloader. These can be any that a torch.data.utils.DataLoader accepts.

  • flatten – If True, flatten the output.

class padl.transforms.Unbatchify(dim=0, cpu=True)

Mark start of postprocessing.

Unbatchify removes batch dimension (inverse of Batchify) and moves the input tensors to ‘cpu’.

Parameters
  • dim – Batching dimension.

  • cpu – If True, moves output to cpu after unbatchify.

padl.transforms.fulldump(transform_or_module)

Switch a Transform or module or package to the “fulldump” mode.

This means that the Transform or any Transform from that module or package will be fully dumped instead of just dumping the statement importing it.

Parameters

transform_or_module – A Transform, module or package for which to enable full dump. Can also be a string. In that case, will enable full dump for the module or package with matching name.

padl.transforms.group(transform: Union[padl.transforms.Rollout, padl.transforms.Parallel])

Group transforms. This prevents them from being flattened when used

Example:

When writing a Rollout as (a + (b + c)), this is automatically flattened to (a + b + c) - i.e. the resulting Rollout transform expects a 3-tuple whose inputs are passed to a, b, c respectively. To prevent that, do (a + group(b + c)). The resulting Rollout will expect a 2-tuple whose first item will be passed to a and whose second item will be passed to b + c.

padl.transforms.importdump(transform_or_module)

Disable full dump (see padl.transforms.fulldump() for more).

padl.transforms.load(path)

Load a transform (as saved with padl.save) from path.

padl.transforms.save(transform: padl.transforms.Transform, path: Union[pathlib.Path, str], force_overwrite: bool = False, compress: bool = False)

Save the transform to a folder at path or a compressed (zip-)file of the same name if compress == True.

The folder’s name should end with ‘.padl’. If no extension is given, it will be added automatically.

If the folder exists, call with force_overwrite = True to overwrite. Otherwise, this will raise a FileExistsError.