Affine

Targets:

image

mask

bboxes

keypoints

volume

mask3d

Image Types:uint8, float32

Augmentation to apply affine transformations to images.

Affine transformations involve:

Translation ("move" image on the x-/y-axis)
Rotation
Scaling ("zoom" in/out)
Shear (move one side of the image, turning a square into a trapezoid)

All such transformations can create "new" pixels in the image without a defined content, e.g. if the image is translated to the left, pixels are created on the right. A method has to be defined to deal with these pixel values. The parameters fill and fill_mask of this class deal with this.

Some transformations involve interpolations between several pixels of the input image to generate output pixel values. The parameters interpolation and mask_interpolation deals with the method of interpolation used for this.

Arguments

scale

tuple[float, float] | float | dict[str, float | tuple[float, float]]

[1,1]

Scaling factor to use, where 1.0 denotes "no change" and 0.5 is zoomed out to 50 percent of the original size.

If a single number, then that value will be used for all images.
If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That the same range will be used for both x- and y-axis. To keep the aspect ratio, set keep_ratio=True, then the same value will be used for both x- and y-axis.
If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes. Note that when the keep_ratio=True, the x- and y-axis ranges should be the same.

translate_percent

tuple[float, float] | float | dict[str, float | tuple[float, float]] | None

Translation as a fraction of the image height/width (x-translation, y-translation), where 0 denotes "no change" and 0.5 denotes "half of the axis size". * If None then equivalent to 0.0 unless translate_px has a value other than None. * If a single number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b]. That sampled fraction value will be used identically for both x- and y-axis. * If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

translate_px

tuple[int, int] | int | dict[str, int | tuple[int, int]] | None

Translation in pixels.

If None then equivalent to 0 unless translate_percent has a value other than None.
If a single int, then that value will be used for all images.
If a tuple (a, b), then a value will be uniformly sampled per image from the discrete interval [a..b]. That number will be used identically for both x- and y-axis.
If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

rotate

tuple[float, float] | float

Rotation in degrees (NOT radians), i.e. expected value range is around [-360, 360]. Rotation happens around the center of the image, not the top left corner as in some other frameworks. * If a number, then that value will be used for all images. * If a tuple (a, b), then a value will be uniformly sampled per image from the interval [a, b] and used as the rotation value.

shear

tuple[float, float] | float | dict[str, float | tuple[float, float]]

[0,0]

Shear in degrees (NOT radians), i.e. expected value range is around [-360, 360], with reasonable values being in the range of [-45, 45].

If a number, then that value will be used for all images as the shear on the x-axis (no shear on the y-axis will be done).
If a tuple (a, b), then two value will be uniformly sampled per image from the interval [a, b] and be used as the x- and y-shear value.
If a dictionary, then it is expected to have the keys x and/or y. Each of these keys can have the same values as described above. Using a dictionary allows to set different values for the two axis and sampling will then happen independently per axis, resulting in samples that differ between the axes.

interpolation

0 | 1 | 2 | 3 | 4

OpenCV interpolation flag.

mask_interpolation

0 | 1 | 2 | 3 | 4

OpenCV interpolation flag.

fill

tuple[float, ...] | float

The constant value to use when filling in newly created pixels. (E.g. translating by 1px to the right will create a new 1px-wide column of pixels on the left of the image). The value is only used when mode=constant. The expected value range is [0, 255] for uint8 images.

fill_mask

tuple[float, ...] | float | None

Same as fill but only for masks.

border_mode

0 | 1 | 2 | 3 | 4

OpenCV border flag.

fit_output

bool

false

If True, the image plane size and position will be adjusted to tightly capture the whole image after affine transformation (translate_percent and translate_px are ignored). Otherwise (False), parts of the transformed image may end up outside the image plane. Fitting the output shape can be useful to avoid corners of the image being outside the image plane after applying rotations. Default: False

keep_ratio

bool

true

When True, the original aspect ratio will be kept when the random scale is applied. Default: True.

rotate_method

largest_box | ellipse

largest_box

rotation method used for the bounding boxes. Should be one of "largest_box" or "ellipse"[1]. Default: "largest_box"

balanced_scale

bool

false

When True, scaling factors are chosen to be either entirely below or above 1, ensuring balanced scaling. Default: False.

This is important because without it, scaling tends to lean towards upscaling. For example, if we want the image to zoom in and out by 2x, we may pick an interval [0.5, 2]. Since the interval [0.5, 1] is three times smaller than [1, 2], values above 1 are picked three times more often if sampled directly from [0.5, 2]. With balanced_scale, the function ensures that half the time, the scaling factor is picked from below 1 (zooming out), and the other half from above 1 (zooming in). This makes the zooming in and out process more balanced.

p

float

0.5

probability of applying the transform. Default: 0.5.

Examples

>>> import numpy as np
>>> import albumentations as A
>>> import cv2
>>>
>>> # Prepare sample data
>>> image = np.random.randint(0, 256, (100, 100, 3), dtype=np.uint8)
>>> mask = np.random.randint(0, 2, (100, 100), dtype=np.uint8)
>>> bboxes = np.array([[10, 10, 50, 50], [40, 40, 80, 80]], dtype=np.float32)
>>> bbox_labels = [1, 2]
>>> keypoints = np.array([[20, 30], [60, 70]], dtype=np.float32)
>>> keypoint_labels = [0, 1]
>>>
>>> # Define transform with different parameter types
>>> transform = A.Compose([
...     A.Affine(
...         # Tuple for scale (will be used for both x and y)
...         scale=(0.8, 1.2),
...         # Dictionary with tuples for different x/y translations
...         translate_percent={"x": (-0.2, 0.2), "y": (-0.1, 0.1)},
...         # Tuple for rotation range
...         rotate=(-30, 30),
...         # Dictionary with tuples for different x/y shearing
...         shear={"x": (-10, 10), "y": (-5, 5)},
...         # Interpolation methods
...         interpolation=cv2.INTER_LINEAR,
...         mask_interpolation=cv2.INTER_NEAREST,
...         # Other parameters
...         fit_output=False,
...         keep_ratio=True,
...         rotate_method="largest_box",
...         balanced_scale=True,
...         border_mode=cv2.BORDER_CONSTANT,
...         fill=0,
...         fill_mask=0,
...         p=1.0
...     ),
... ], bbox_params=A.BboxParams(coord_format='pascal_voc', label_fields=['bbox_labels']),
...    keypoint_params=A.KeypointParams(coord_format='xy', label_fields=['keypoint_labels']))
>>>
>>> # Apply the transform
>>> transformed = transform(
...     image=image,
...     mask=mask,
...     bboxes=bboxes,
...     bbox_labels=bbox_labels,
...     keypoints=keypoints,
...     keypoint_labels=keypoint_labels
... )
>>>
>>> # Get the transformed data
>>> transformed_image = transformed['image']      # Image with affine transforms applied
>>> transformed_mask = transformed['mask']        # Mask with affine transforms applied
>>> transformed_bboxes = transformed['bboxes']    # Bounding boxes with affine transforms applied
>>> transformed_bbox_labels = transformed['bbox_labels']  # Labels for transformed bboxes
>>> transformed_keypoints = transformed['keypoints']  # Keypoints with affine transforms applied
>>> transformed_keypoint_labels = transformed['keypoint_labels']  # Labels for transformed keypoints
>>>
>>> # Simpler example with only essential parameters
>>> simple_transform = A.Compose([
...     A.Affine(
...         scale=1.1,  # Single scalar value for scale
...         rotate=15,  # Single scalar value for rotation (degrees)
...         translate_px=30,  # Single scalar value for translation (pixels)
...         p=1.0
...     ),
... ])
>>> simple_result = simple_transform(image=image)
>>> simple_transformed = simple_result['image']

References

Towards Rotation Invariance in Object Detectionhttps://arxiv.org/abs/2109.13488