padl.transforms
The Transform class and its fundamental children.
Transforms should be created using the padl.transform wrap-function.
- class padl.transforms.AtomicTransform(call: str, call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None)
Base class for “atomic” transforms (transforms that are not made by combining other transforms - in contrast to
Pipeline
).Examples of
AtomicTransform
s areClassTransform
andFunctionTransform
.- Parameters
call – The transform’s call string.
call_info – A
CallInfo
object containing information about the how the transform was created (needed for saving).pd_name – The transform’s name.
- class padl.transforms.Batchify(dim=0)
Mark end of preprocessing.
Batchify adds batch dimension at dim. During inference, this unsqueezes tensors and, recursively, tuples thereof. Batchify also moves the input tensors to device specified for the transform.
- Parameters
dim – Batching dimension.
- class padl.transforms.BuiltinTransform(pd_name: Optional[str] = None, ignore_scope: bool = False, arguments: Optional[collections.OrderedDict] = None)
A builtin transform will simply always be imported, never fully dumped.
- class padl.transforms.ClassTransform(pd_name: Optional[str] = None, ignore_scope: bool = False, arguments: Optional[collections.OrderedDict] = None)
Class Transform.
Do not use this directly, instead, use the transform decorator to wrap a class.
- Parameters
pd_name – name of the transform
ignore_scope – Don’t try to determine the scope (use the toplevel scope instead).
arguments – ordered dictionary of initialization arguments to be used in printing
- property source: str
The class source code.
- class padl.transforms.Compose(transforms: Iterable[padl.transforms.Transform], call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None, pd_group: bool = False)
Apply series of transforms on input.
Compose([t1, t2, t3])(x) = t3(t2(t1(x)))
- Parameters
transforms – List of transforms to compose.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – name of the Compose transform.
pd_group – If True, do not flatten this when used as child transform in a Pipeline.
- Returns
output from series of transforms
- class padl.transforms.FunctionTransform(function: Callable, call_info: padl.dumptools.inspector.CallInfo, pd_name: Optional[str] = None, call: Optional[str] = None, source: Optional[str] = None, wrap_type: str = 'decorator')
A transform that wraps a function.
Do not use this directly - rather, wrap a function using padl.transform,
as a decorator:
@transform def f(x): ...
inline:
t = transform(f)
or with a lambda function:
t = transform(lambda x: x + 1)
- Parameters
function – The wrapped function.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – name of the transform
call – The call string (defaults to the function’s name).
source – The source code (optional).
wrap_type – One of {‘module’, ‘lambda’, ‘decorator’, ‘inline’} - specifying how the was function was wrapped.
- property source: str
The source of the wrapped function.
- class padl.transforms.Identity
Do nothing. Just pass on.
- class padl.transforms.Map(transform: padl.transforms.Transform, call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None)
Apply one transform to each element of a list.
>>> from padl import identity >>> t = identity >>> x1, x2, x3 = 1, 2, 3 >>> Map(t)([x1, x2, x3]) == (t(x1), t(x2), t(x3)) True
- Parameters
transform – Transform to be applied to a list of inputs.
call_info – A CallInfo object containing information about the how the transform was
created (needed for saving). :param pd_name: name of the transform
- class padl.transforms.Parallel(transforms, call_info=None, pd_name=None, pd_group=False)
Apply transforms in parallel to a tuple of inputs and get tuple output
Parallel([f1, f2, …])((x1, x2, ..)) := (f1(x1), f2(x2), …)
- Parameters
transforms – List of transforms to parallelize.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – Name of the transform.
pd_group – If True, do not flatten this when used as child transform in a Pipeline.
- class padl.transforms.Pipeline(transforms, call_info=None, pd_name=None, pd_group=False)
Abstract base class for Pipeline
- Parameters
transforms – List of sub-transforms.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – Name of the Pipeline.
pd_group – If True, do not flatten this when used as child transform in a
Pipeline()
.
- grouped()
Return a grouped version of self.
- pd_forward_device_check()
Check all transform in forward are in correct device
All transforms in forward need to be in same device as specified for the whole Pipeline.
- Returns
Bool
- pd_to(device: str)
Set the transform’s device to device
- Parameters
device – device on which to send {‘cpu’, cuda’, ‘cuda:N’}
- class padl.transforms.Rollout(transforms: Iterable[padl.transforms.Transform], call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None, pd_group=False)
Apply a list of transform to same input and get tuple output
Rollout([t1, t2, …])(x) := (t1(x), t2(x), …)
- Parameters
transforms – List of transforms to rollout.
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – Name of the transform.
pd_group – If True, do not flatten this when used as child transform in a Pipeline.
- class padl.transforms.TorchModuleTransform(pd_name: Optional[str] = None, ignore_scope: bool = False, arguments: Optional[collections.OrderedDict] = None)
Transform class for use with torch.nn.Module.
- post_load(path, i)
Load the model’s parameters form a save-folder.
- Parameters
path – The save-folder path.
i – Unique transform index, used to construct filenames.
- pre_save(path: pathlib.Path, i: int)
Dump the model’s parameters to a save-folder.
- Parameters
path – The save-folder path.
i – Unique transform index, used to construct filenames.
- class padl.transforms.Transform(call_info: Optional[padl.dumptools.inspector.CallInfo] = None, pd_name: Optional[str] = None)
Transform base class.
- Parameters
call_info – A CallInfo object containing information about the how the transform was created (needed for saving).
pd_name – name of the transform.
- eval_apply(inputs: Iterable, flatten: bool = False, **kwargs)
Call transform within the eval context.
This will use multiprocessing for the preprocessing part via DataLoader and turn of gradients for the forward part.
It expects an iterable input and returns a generator.
- Parameters
inputs – The arguments - an iterable (e.g. list) of inputs.
kwargs – Keyword arguments to be passed on to the dataloader. These can be any that a torch.data.utils.DataLoader accepts.
flatten – If True, flatten the output.
- infer_apply(inputs)
Call the Transform within the infer context.
This expects a single argument and returns a single output.
- Parameters
inputs – The input.
-
pd_call_in_mode(arg, mode: Literal[
'infer','eval','train'], ignore_grad=False) Call the transform, with possibility to pass multiple arguments.
- Parameters
arg – Argument to call the transform with.
mode – The mode (“infer”, “eval”, “train”) to perform the call with.
ignore_grad – If True gradient settings are ignored.
- Returns
Whatever the transform returns.
- property pd_device: str
Return the device (“cpu” / “cuda”) the Transform is on.
- property pd_forward: padl.transforms.Transform
The forward part of the Transform (that what’s typically done on the GPU).
The device must be propagated from self.
- pd_forward_device_check() bool
Check if all Transforms in the “forward” part are on the correct device.
All transforms in the “forward” part of a Pipeline need to be on the same device as specified for the whole Pipeline.
- pd_get_loader(args, preprocess: padl.transforms.Transform, mode: str, **kwargs) torch.utils.data.dataloader.DataLoader
Get a pytorch data loader applying preprocess to args.
- Parameters
args – A sequence of datapoints.
preprocess – Preprocessing Transform.
mode – PADL mode to call the preprocess Transform in.
kwargs – Keyword arguments passed to the data loader (see the pytorch DataLoader documentation for details).
- property pd_layers: List[torch.nn.modules.module.Module]
Get a list with all pytorch layers in the Transform (including layers in sub-transforms).
- property pd_name: Optional[str]
The “name” of the transform.
A transform can have a name. This is optional, but helps when inspecting complex transforms. Good transform names indicate what the transform does.
If a transform does not have an explicitly set name, the name will default to the name of the last variable the transforms was assigned to.
- pd_parameters() Iterator
Iterate over all (pytorch-) parameters in all layers contained in the transform.
- pd_post_load(path: pathlib.Path, i: int)
Method that is called on each transform after loading.
This normally does nothing. Override to implement custom serialization.
- Parameters
path – The load path.
i – Unique transform index, can be used to construct filenames.
- property pd_postprocess: padl.transforms.Transform
The postprocessing part of the Transform.
The device must be propagated from self.
- pd_pre_save(path: pathlib.Path, i: int)
Method that is called on each transform before saving.
This normally does nothing. Override to implement custom serialization.
- Parameters
path – The save-folder path.
i – Unique transform index, can be used to construct filenames.
- property pd_preprocess: padl.transforms.Transform
The preprocessing part of the Transform.
The device must be propagated from self.
- pd_save(path: Union[pathlib.Path, str], force_overwrite: bool = False)
Save the transform to a folder at path.
The folder’s name should end with ‘.padl’. If no extension is given, it will be added automatically.
If the folder exists, call with force_overwrite = True to overwrite. Otherwise, this will raise a FileExistsError.
- property pd_stages
Get a tuple of the pre-process, forward, and post-process stages.
- pd_to(device: str) padl.transforms.Transform
Set the transform’s device to device.
- Parameters
device – Device to set the transform to {‘cpu’, ‘cuda’, ‘cuda:N’}.
- pd_varname(scope=None) Optional[str]
The name of the variable name the transform was last assigned to.
Example:
>>> from padl import transform >>> foo = transform(lambda x: x + 1) >>> foo.pd_varname() 'foo'
- Parameters
scope – Scope to search
- Returns
A string with the variable name or None if the transform has not been assigned to any variable.
- pd_zip_save(path: Union[pathlib.Path, str], force_overwrite: bool = False)
Save the transform to a zip-file at path.
The file’s name should end with ‘.padl’. If no extension is given, it will be added automatically.
If the file exists, call with force_overwrite = True to overwrite. Otherwise, this will raise a FileExistsError.
- train_apply(inputs: Iterable, flatten: bool = False, **kwargs)
Call transform within the train context.
This will use multiprocessing for the preprocessing part via DataLoader and turn on gradients for the forward part.
It expects an iterable input and returns a generator.
- Parameters
inputs – The arguments - an iterable (e.g. list) of inputs.
kwargs – Keyword arguments to be passed on to the dataloader. These can be any that a torch.data.utils.DataLoader accepts.
flatten – If True, flatten the output.
- class padl.transforms.Unbatchify(dim=0, cpu=True)
Mark start of postprocessing.
Unbatchify removes batch dimension (inverse of Batchify) and moves the input tensors to ‘cpu’.
- Parameters
dim – Batching dimension.
cpu – If True, moves output to cpu after unbatchify.
- padl.transforms.fulldump(transform_or_module)
Switch a Transform or module or package to the “fulldump” mode.
This means that the Transform or any Transform from that module or package will be fully dumped instead of just dumping the statement importing it.
- Parameters
transform_or_module – A Transform, module or package for which to enable full dump. Can also be a string. In that case, will enable full dump for the module or package with matching name.
- padl.transforms.group(transform: Union[padl.transforms.Rollout, padl.transforms.Parallel])
Group transforms. This prevents them from being flattened when used
Example:
When writing a Rollout as (a + (b + c)), this is automatically flattened to (a + b + c) - i.e. the resulting Rollout transform expects a 3-tuple whose inputs are passed to a, b, c respectively. To prevent that, do (a + group(b + c)). The resulting Rollout will expect a 2-tuple whose first item will be passed to a and whose second item will be passed to b + c.
- padl.transforms.importdump(transform_or_module)
Disable full dump (see
padl.transforms.fulldump()
for more).
- padl.transforms.load(path)
Load a transform (as saved with padl.save) from path.
- padl.transforms.save(transform: padl.transforms.Transform, path: Union[pathlib.Path, str], force_overwrite: bool = False, compress: bool = False)
Save the transform to a folder at path or a compressed (zip-)file of the same name if compress == True.
The folder’s name should end with ‘.padl’. If no extension is given, it will be added automatically.
If the folder exists, call with force_overwrite = True to overwrite. Otherwise, this will raise a FileExistsError.