`padl.transforms`

The Transform class and its fundamental children.

Transforms should be created using the padl.transform wrap-function.

class padl.transforms.AtomicTransform(call: str, call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None)

Base class for “atomic” transforms (transforms that are not made by combining other transforms - in contrast to Pipeline).

Examples of AtomicTransform s are ClassTransform and FunctionTransform.

Parameters

call – The transform’s call string.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – The transform’s name.

class padl.transforms.Batchify(dim=0)

Mark end of preprocessing.

Batchify adds batch dimension at dim. During inference, this unsqueezes tensors and, recursively, tuples thereof. Batchify also moves the input tensors to device specified for the transform.

Parameters: dim – Batching dimension.

class padl.transforms.BuiltinTransform(pd_name: Optional[str] = None, ignore_scope: bool = False, arguments: Optional[collections.OrderedDict] = None): A builtin transform will simply always be imported, never fully dumped.

class padl.transforms.ClassTransform(pd_name: Optional[str] = None, ignore_scope: bool = False, arguments: Optional[collections.OrderedDict] = None)

Class Transform.

Do not use this directly, instead, use the transform decorator to wrap a class.

Parameters

pd_name – name of the transform
ignore_scope – Don’t try to determine the scope (use the toplevel scope instead).
arguments – ordered dictionary of initialization arguments to be used in printing

property source: str: The class source code.

class padl.transforms.Compose(transforms: Iterable[padl.transforms.Transform], call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None, pd_group: bool = False)

Apply series of transforms on input.

Compose([t1, t2, t3])(x) = t3(t2(t1(x)))

Parameters

transforms – List of transforms to compose.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – name of the Compose transform.
pd_group – If True, do not flatten this when used as child transform in a Pipeline.

Returns

output from series of transforms

class padl.transforms.FunctionTransform(function: Callable, call_info: padl.dumptools.inspector.CallInfo, pd_name: Optional[str] = None, call: Optional[str] = None, source: Optional[str] = None, wrap_type: str = 'decorator')

A transform that wraps a function.

Do not use this directly - rather, wrap a function using padl.transform,

as a decorator:

@transform
def f(x):
    ...

inline:

t = transform(f)

or with a lambda function:

t = transform(lambda x: x + 1)

Parameters

function – The wrapped function.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – name of the transform
call – The call string (defaults to the function’s name).
source – The source code (optional).
wrap_type – One of {‘module’, ‘lambda’, ‘decorator’, ‘inline’} - specifying how the was function was wrapped.

property source: str: The source of the wrapped function.

class padl.transforms.Identity: Do nothing. Just pass on.

class padl.transforms.Map(transform: padl.transforms.Transform, call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None)

Apply one transform to each element of a list.

>>> from padl import identity
>>> t = identity
>>> x1, x2, x3 = 1, 2, 3
>>> Map(t)([x1, x2, x3]) == (t(x1), t(x2), t(x3))
True

Parameters

transform – Transform to be applied to a list of inputs.
call_info – A CallInfo object containing information about the how the transform was

created (needed for saving). :param pd_name: name of the transform

class padl.transforms.Parallel(transforms, call_info=None, pd_name=None, pd_group=False)

Apply transforms in parallel to a tuple of inputs and get tuple output

Parallel([f1, f2, …])((x1, x2, ..)) := (f1(x1), f2(x2), …)

Parameters

transforms – List of transforms to parallelize.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – Name of the transform.
pd_group – If True, do not flatten this when used as child transform in a Pipeline.

class padl.transforms.Pipeline(transforms, call_info=None, pd_name=None, pd_group=False)

Abstract base class for Pipeline

Parameters

transforms – List of sub-transforms.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – Name of the Pipeline.
pd_group – If True, do not flatten this when used as child transform in a Pipeline().

grouped(): Return a grouped version of self.

pd_forward_device_check()

Check all transform in forward are in correct device

All transforms in forward need to be in same device as specified for the whole Pipeline.

Returns: Bool

pd_to(device: str)

Set the transform’s device to device

Parameters: device – device on which to send {‘cpu’, cuda’, ‘cuda:N’}

class padl.transforms.Rollout(transforms: Iterable[padl.transforms.Transform], call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None, pd_group=False)

Apply a list of transform to same input and get tuple output

Rollout([t1, t2, …])(x) := (t1(x), t2(x), …)

Parameters

transforms – List of transforms to rollout.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – Name of the transform.
pd_group – If True, do not flatten this when used as child transform in a Pipeline.

class padl.transforms.TorchModuleTransform(pd_name: Optional[str] = None, ignore_scope: bool = False, arguments: Optional[collections.OrderedDict] = None)

Transform class for use with torch.nn.Module.

post_load(path, i)

Load the model’s parameters form a save-folder.

Parameters

path – The save-folder path.
i – Unique transform index, used to construct filenames.

pre_save(path: pathlib.Path, i: int)

Dump the model’s parameters to a save-folder.

Parameters

path – The save-folder path.
i – Unique transform index, used to construct filenames.

class padl.transforms.Transform(call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None)

Transform base class.

Parameters

call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – name of the transform.

eval_apply(inputs: Iterable, flatten: bool = False, **kwargs)

Call transform within the eval context.

This will use multiprocessing for the preprocessing part via DataLoader and turn of gradients for the forward part.

It expects an iterable input and returns a generator.

Parameters

inputs – The arguments - an iterable (e.g. list) of inputs.
kwargs – Keyword arguments to be passed on to the dataloader. These can be any that a torch.data.utils.DataLoader accepts.
flatten – If True, flatten the output.

infer_apply(inputs)

Call the Transform within the infer context.

This expects a single argument and returns a single output.

Parameters: inputs – The input.

pd_call_in_mode(arg, mode: Literal[~~'infer'~~, ~~'eval'~~, ~~'train'~~], ignore_grad=False)

Call the transform, with possibility to pass multiple arguments.

Parameters

arg – Argument to call the transform with.
mode – The mode (“infer”, “eval”, “train”) to perform the call with.
ignore_grad – If True gradient settings are ignored.

Returns

Whatever the transform returns.

property pd_device: str: Return the device (“cpu” / “cuda”) the Transform is on.

property pd_forward: padl.transforms.Transform

The forward part of the Transform (that what’s typically done on the GPU).

The device must be propagated from self.

pd_forward_device_check() → bool

Check if all Transforms in the “forward” part are on the correct device.

All transforms in the “forward” part of a Pipeline need to be on the same device as specified for the whole Pipeline.

pd_get_loader(args, preprocess: padl.transforms.Transform, mode: str, **kwargs) → torch.utils.data.dataloader.DataLoader

Get a pytorch data loader applying preprocess to args.

Parameters

args – A sequence of datapoints.
preprocess – Preprocessing Transform.
mode – PADL mode to call the preprocess Transform in.
kwargs – Keyword arguments passed to the data loader (see the pytorch DataLoader documentation for details).

property pd_layers: List[torch.nn.modules.module.Module]: Get a list with all pytorch layers in the Transform (including layers in sub-transforms).

property pd_name: Optional[str]

The “name” of the transform.

A transform can have a name. This is optional, but helps when inspecting complex transforms. Good transform names indicate what the transform does.

If a transform does not have an explicitly set name, the name will default to the name of the last variable the transforms was assigned to.

pd_parameters() → Iterator: Iterate over all (pytorch-) parameters in all layers contained in the transform.

pd_post_load(path: pathlib.Path, i: int)

Method that is called on each transform after loading.

This normally does nothing. Override to implement custom serialization.

Parameters

path – The load path.
i – Unique transform index, can be used to construct filenames.

property pd_postprocess: padl.transforms.Transform

The postprocessing part of the Transform.

The device must be propagated from self.

pd_pre_save(path: pathlib.Path, i: int)

Method that is called on each transform before saving.

This normally does nothing. Override to implement custom serialization.

Parameters

path – The save-folder path.
i – Unique transform index, can be used to construct filenames.

property pd_preprocess: padl.transforms.Transform

The preprocessing part of the Transform.

The device must be propagated from self.

pd_save(path: Union[pathlib.Path, str], force_overwrite: bool = False)

Save the transform to a folder at path.

The folder’s name should end with ‘.padl’. If no extension is given, it will be added automatically.

If the folder exists, call with force_overwrite = True to overwrite. Otherwise, this will raise a FileExistsError.

property pd_stages: Get a tuple of the pre-process, forward, and post-process stages.

pd_to(device: str) → padl.transforms.Transform

Set the transform’s device to device.

Parameters: device – Device to set the transform to {‘cpu’, ‘cuda’, ‘cuda:N’}.

pd_varname(scope=None) → Optional[str]

The name of the variable name the transform was last assigned to.

Example:

>>> from padl import transform
>>> foo = transform(lambda x: x + 1)
>>> foo.pd_varname()  
'foo'

Parameters: scope – Scope to search
Returns: A string with the variable name or None if the transform has not been assigned to any variable.

pd_zip_save(path: Union[pathlib.Path, str], force_overwrite: bool = False)

Save the transform to a zip-file at path.

The file’s name should end with ‘.padl’. If no extension is given, it will be added automatically.

If the file exists, call with force_overwrite = True to overwrite. Otherwise, this will raise a FileExistsError.

train_apply(inputs: Iterable, flatten: bool = False, **kwargs)

Call transform within the train context.

This will use multiprocessing for the preprocessing part via DataLoader and turn on gradients for the forward part.

It expects an iterable input and returns a generator.

Parameters

inputs – The arguments - an iterable (e.g. list) of inputs.
kwargs – Keyword arguments to be passed on to the dataloader. These can be any that a torch.data.utils.DataLoader accepts.
flatten – If True, flatten the output.

class padl.transforms.Unbatchify(dim=0, cpu=True)

Mark start of postprocessing.

Unbatchify removes batch dimension (inverse of Batchify) and moves the input tensors to ‘cpu’.

Parameters

dim – Batching dimension.
cpu – If True, moves output to cpu after unbatchify.

padl.transforms.fulldump(transform_or_module)

Switch a Transform or module or package to the “fulldump” mode.

This means that the Transform or any Transform from that module or package will be fully dumped instead of just dumping the statement importing it.

Parameters: transform_or_module – A Transform, module or package for which to enable full dump. Can also be a string. In that case, will enable full dump for the module or package with matching name.

padl.transforms.group(transform: Union[padl.transforms.Rollout, padl.transforms.Parallel])

Group transforms. This prevents them from being flattened when used

Example:

When writing a Rollout as (a + (b + c)), this is automatically flattened to (a + b + c) - i.e. the resulting Rollout transform expects a 3-tuple whose inputs are passed to a, b, c respectively. To prevent that, do (a + group(b + c)). The resulting Rollout will expect a 2-tuple whose first item will be passed to a and whose second item will be passed to b + c.

padl.transforms.importdump(transform_or_module): Disable full dump (see padl.transforms.fulldump() for more).

padl.transforms.load(path): Load a transform (as saved with padl.save) from path.

padl.transforms.save(transform: padl.transforms.Transform, path: Union[pathlib.Path, str], force_overwrite: bool = False, compress: bool = False)

Save the transform to a folder at path or a compressed (zip-)file of the same name if compress == True.

The folder’s name should end with ‘.padl’. If no extension is given, it will be added automatically.

If the folder exists, call with force_overwrite = True to overwrite. Otherwise, this will raise a FileExistsError.

padl.transforms

`padl.transforms`